Aligned Sequences

💡 Quick Summary

Ident and Sim accepts two or more pre-aligned protein or DNA sequences (FASTA or GDE format) and calculates the percent identity and percent similarity for every pair. Identity counts exact matches; similarity additionally counts positions where both residues belong to the same user-defined biochemical group.

📋 How to Use

Paste two or more aligned sequences in FASTA or GDE format into the top textarea. All sequences must be the same length (gaps included). Input limit is 20,000,000 characters.
Review the similarity groups field. The default groups (GAVLI, FYW, CM, ST, KRH, DENQ, P) work well for protein comparisons. Clear the field entirely when comparing DNA sequences.
Click Run. Results for every sequence pair are listed with alignment length, identical residues, similar residues, percent identity, and percent similarity.
Use the Copy button to copy all results to your clipboard.
Click Load Example to try with three aligned C. elegans FEM-2 protein sequences.

🧮 Formulas & Logic

Alignment length

Total positions minus gap-vs-gap columns (positions where both sequences have "−" or ".")

Percent identity

identical_positions / alignment_length × 100

Percent similarity

(identical_positions + similar_positions) / alignment_length × 100

📊 Result Interpretation

Identical residues

Positions where both sequences have the same non-gap character.

Similar residues

Positions (excluding identical ones) where both characters belong to the same similarity group.

Alignment length

Effective alignment length after removing positions that are gaps in both sequences.

Percent identity

The most stringent measure — only exact residue matches count.

Percent similarity

A broader measure — includes positions with biochemically equivalent substitutions.

🔬 Applications

Assessing how conserved a protein is across species after multiple-sequence alignment
Comparing closely related DNA sequences (e.g. alleles, paralogs) to quantify divergence
Checking that a protein of interest shares sufficient identity with a well-characterised homologue to justify functional inference
Generating a pairwise identity matrix for a set of sequences prior to phylogenetic analysis

⚠️ Common Mistakes & Warnings

Sequences must be pre-aligned and equal length

This tool does not perform alignment — it only compares positions column by column. Paste sequences that have already been aligned (same length including gap characters).

Leave similarity groups empty for DNA

The similarity group calculation is designed for amino-acid comparisons. When comparing DNA sequences, clear the groups field so only identity is calculated.

Titles are truncated to 20 characters in the output

Only the first 20 characters of each FASTA title line are used in the results table to keep the output readable.

❓ Frequently Asked Questions

What do the default similarity groups mean?

The groups represent biochemically similar amino acids: GAVLI (small/aliphatic), FYW (aromatic), CM (sulphur-containing), ST (hydroxyl), KRH (basic/positively charged), DENQ (acidic/amide), P (proline). A substitution is counted as "similar" if both residues appear in the same group.

How is the alignment length calculated?

The tool starts with the full alignment column count and subtracts any column where both sequences contain a gap character ("−" or "."). This avoids inflating the denominator with uninformative gap-vs-gap positions.

What is GDE format?

GDE (Genetic Data Environment) is an older alignment format. Sequences start with "%" (uppercase) or "#" (lowercase) instead of ">". This tool accepts both FASTA and GDE format.

Can I compare DNA sequences?

Yes. Clear the similarity groups field completely. The tool will then report only identity values (no similarity calculation will be performed since there are no groups to check against).