💡 Quick Summary
Ident and Sim accepts two or more pre-aligned protein or DNA sequences (FASTA or GDE format) and calculates the percent identity and percent similarity for every pair. Identity counts exact matches; similarity additionally counts positions where both residues belong to the same user-defined biochemical group.
📋 How to Use
- Paste two or more aligned sequences in FASTA or GDE format into the top textarea. All sequences must be the same length (gaps included). Input limit is 20,000,000 characters.
- Review the similarity groups field. The default groups (GAVLI, FYW, CM, ST, KRH, DENQ, P) work well for protein comparisons. Clear the field entirely when comparing DNA sequences.
- Click Run. Results for every sequence pair are listed with alignment length, identical residues, similar residues, percent identity, and percent similarity.
- Use the Copy button to copy all results to your clipboard.
- Click Load Example to try with three aligned C. elegans FEM-2 protein sequences.
🧮 Formulas & Logic
📊 Result Interpretation
Positions where both sequences have the same non-gap character.
Positions (excluding identical ones) where both characters belong to the same similarity group.
Effective alignment length after removing positions that are gaps in both sequences.
The most stringent measure — only exact residue matches count.
A broader measure — includes positions with biochemically equivalent substitutions.
🔬 Applications
- Assessing how conserved a protein is across species after multiple-sequence alignment
- Comparing closely related DNA sequences (e.g. alleles, paralogs) to quantify divergence
- Checking that a protein of interest shares sufficient identity with a well-characterised homologue to justify functional inference
- Generating a pairwise identity matrix for a set of sequences prior to phylogenetic analysis
⚠️ Common Mistakes & Warnings
This tool does not perform alignment — it only compares positions column by column. Paste sequences that have already been aligned (same length including gap characters).
The similarity group calculation is designed for amino-acid comparisons. When comparing DNA sequences, clear the groups field so only identity is calculated.
Only the first 20 characters of each FASTA title line are used in the results table to keep the output readable.