Pairwise Align Codons
Global codon-level alignment of two coding DNA sequences

Raw sequence or FASTA format. Length must be divisible by 3. Input limit: 6,000,000 characters.

Raw sequence or FASTA format. Length must be divisible by 3. Input limit: 6,000,000 characters.

Use the following parameters to specify how alignments are scored.

💡 Quick Summary

Pairwise Align Codons accepts two coding sequences and determines the optimal global alignment. The scoring matrix used to calculate the alignment is described in Schneider et al. (2005). Use Pairwise Align Codons to look for conserved coding sequence regions. Only the bases A, C, G, T and U are used in the alignment.

📋 How to Use
  1. Paste the first coding DNA sequence (raw or FASTA) into Sequence One. Input limit is 6,000,000 characters.
  2. Paste the second coding DNA sequence (raw or FASTA) into Sequence Two. Input limit is 6,000,000 characters.
  3. Both sequences must contain only A, C, G, T (or U) bases and must be divisible by 3.
  4. Set the Value for gaps preceding a sequence, Value for internal gaps, and Value for gaps following a sequence. Positive display values add to the score; negative display values subtract from it.
  5. Click Submit. The aligned sequences are shown in FASTA format with the alignment score.
  6. Click Load Example to align the two sample coding sequences from the original SMS.
  7. Use the Copy button to copy the output to your clipboard.
🧮 Formulas & Logic
Alignment algorithm
Hirschberg divide-and-conquer (linear space, O(n) memory) with Needleman–Wunsch DP for sub-problems
Scoring matrix
64×64 codon substitution matrix from Schneider et al. (2005)
Gap penalty
Select value is added directly to the running score — positive display = reward, negative display = penalty
📊 Result Interpretation
Aligned output

Each codon in the output is 3 characters; gaps are represented as ".--" (codon-width placeholder)

Alignment score

Sum of codon substitution scores plus gap values; higher is better

Seq 1 codons

Number of codons (triplets) in the first input sequence

Seq 2 codons

Number of codons (triplets) in the second input sequence

🔬 Applications
  • Comparing orthologous coding sequences across species at the codon level
  • Identifying conserved and divergent codons between two homologous genes
  • Aligning two CDS variants before performing dN/dS (Ka/Ks) analysis
  • Finding conserved coding sequence regions between distantly related organisms
⚠️ Common Mistakes & Warnings
Sequences must be in-frame coding DNA

Both sequences must be divisible by 3. Non-A/C/G/T/U characters are removed before alignment.

Long sequences may be slow

The Hirschberg algorithm is O(nm) time but O(n) space. Very long sequences may take several seconds in the browser.

Gap values are per codon, not per nucleotide

Each gap unit represents one missing codon (3 nucleotides). A display value of -2 means 2 points are subtracted per gap codon.

❓ Frequently Asked Questions

What does ".--" mean in the output?
".--" is the codon-width gap placeholder — it occupies the same 3-character width as a real codon so the alignment columns line up.
What gap values should I use?
The SMS defaults are: gaps preceding = 0 (free), internal gaps = -2 (penalty of 2), gaps following = 0 (free). This allows one sequence to extend beyond the other without penalty.
What is the Schneider et al. (2005) codon matrix?
A 64×64 substitution matrix scoring every codon pair based on observed codon usage frequencies. Synonymous substitutions typically score higher than non-synonymous ones; stop codons receive large negative scores.