Pairwise Align Codons - TheBiologyBro

💡 Quick Summary

Pairwise Align Codons accepts two coding sequences and determines the optimal global alignment using the Schneider et al. (2005) 64×64 codon substitution matrix with affine gap penalties. Use Pairwise Align Codons to look for conserved coding sequence regions. Only the bases A, C, G, T and U are used in the alignment.

📋 How to Use

Paste the first coding DNA sequence (raw or FASTA) into Sequence One. Input limit is 6,000,000 characters.
Paste the second coding DNA sequence (raw or FASTA) into Sequence Two. Input limit is 6,000,000 characters.
Both sequences must contain only A, C, G, T (or U) bases and must be divisible by 3.
Set the Gap opening penalty — cost paid once when a codon gap is started.
Set the Gap extension penalty — cost paid for each additional codon gap.
Click Submit. The aligned sequences are shown in FASTA format with the alignment score.
Click Load Example to align the two sample coding sequences from the original SMS.
Use the Copy button to copy the output to your clipboard.

🧮 Formulas & Logic

Alignment algorithm

Hirschberg divide-and-conquer (linear space, O(n) memory) with affine-gap Needleman–Wunsch DP for sub-problems

Scoring matrix

64×64 codon substitution matrix from Schneider et al. (2005)

Affine gap cost

Gap of k codons costs: gap_open + (k − 1) × gap_extend

📊 Result Interpretation

Aligned output

Each codon in the output is 3 characters; gaps are represented as ".--" (codon-width placeholder)

Alignment score

Sum of codon substitution scores minus gap penalties; higher is better

Seq 1 codons

Number of codons (triplets) in the first input sequence

Seq 2 codons

Number of codons (triplets) in the second input sequence

🔬 Applications

Comparing orthologous coding sequences across species at the codon level
Identifying conserved and divergent codons between two homologous genes
Aligning two CDS variants before performing dN/dS (Ka/Ks) analysis
Finding conserved coding sequence regions between distantly related organisms

⚠️ Common Mistakes & Warnings

Sequences must be in-frame coding DNA

Both sequences must be divisible by 3. Non-A/C/G/T/U characters are removed before alignment.

Long sequences may be slow

The Hirschberg algorithm is O(nm) time but O(n) space. Very long sequences may take several seconds in the browser.

Gap values are per codon, not per nucleotide

Each gap unit represents one missing codon (3 nucleotides). Affine penalties apply at the codon level.

❓ Frequently Asked Questions

What does ".--" mean in the output?

".--" is the codon-width gap placeholder — it occupies the same 3-character width as a real codon so the alignment columns line up.

What gap values should I use?

A gap-open of 40 and gap-extend of 10 is the default for codon alignment. The codon matrix uses larger score values than nucleotide matrices, so gap penalties need to be proportionally higher.

What is the Schneider et al. (2005) codon matrix?

A 64×64 substitution matrix scoring every codon pair based on observed codon usage frequencies. Synonymous substitutions typically score higher than non-synonymous ones; stop codons receive large negative scores.