Split Codons
Divide a coding sequence into its three codon-position subsequences

Paste one or more FASTA-formatted coding sequences. Gap characters (- .) are preserved. Input limit: 500,000,000 characters.

💡 Quick Summary

Split Codons divides a coding sequence into three new sequences, each consisting of the bases from one of the three codon positions. For example, "ATGATGATG" is converted to "AAA" (1st positions), "TTT" (2nd positions), and "GGG" (3rd positions). This is useful when you want to analyse codon positions separately in phylogenetic or substitution-rate analyses.

📋 How to Use
  1. Paste one or more FASTA-formatted coding sequences into the input area. Input limit is 500,000,000 characters.
  2. Click Run. Each input sequence produces three output FASTA entries — one per codon position — with the codon position and lengths recorded in the FASTA title.
  3. If a sequence does not end on a codon boundary the trailing partial codon is removed automatically; a warning is shown in the Warnings panel.
  4. The Summary panel shows how many sequences were processed and the total number of complete codons extracted.
  5. Use the Copy button to copy all output to your clipboard.
  6. Click Load Example to try with a sample coding sequence that contains alignment gaps.
  7. Click Clear to reset.
🧮 Formulas & Logic
Position 1 bases
Every 1st base of each triplet: seq[0], seq[3], seq[6], …
Position 2 bases
Every 2nd base of each triplet: seq[1], seq[4], seq[7], …
Position 3 bases
Every 3rd base of each triplet: seq[2], seq[5], seq[8], …
Partial codon
If length mod 3 ≠ 0, the trailing 1 or 2 bases are discarded before splitting
📊 Result Interpretation
Sequences Processed

Number of FASTA entries successfully split.

Total Codons

Sum of complete (non-partial) codons across all input sequences. Each input sequence contributes floor(length / 3) codons.

🔬 Applications
  • Separating third-codon-position (synonymous) sites from first and second positions for substitution-rate analysis
  • Creating codon-position-partitioned alignments for phylogenetic inference (e.g. MrBayes, IQ-TREE partition files)
  • Estimating dS (synonymous) and dN (non-synonymous) rates per codon position
  • Checking compositional bias at wobble (3rd) positions vs. constrained (1st/2nd) positions
  • Splitting an aligned CDS to feed each position class into a separate nucleotide model
⚠️ Common Mistakes & Warnings
Sequences not divisible by 3

If a coding sequence length is not a multiple of 3, the trailing 1 or 2 bases form an incomplete codon. They are removed before splitting, and a warning is shown listing the affected sequence and how many bases were trimmed.

Digit and whitespace characters are stripped

The tool removes all digits and whitespace from the sequence body before splitting. This strips GenBank-style line numbering if present. Gap characters such as dashes (-) and dots (.) are kept, allowing aligned sequences to be split position-by-position.

❓ Frequently Asked Questions

What does "codon position" mean?
In a coding sequence read in triplets (codons), each base occupies one of three positions within its codon. Position 1 is the first base of each codon, position 2 the second, and position 3 (the "wobble" position) the third. Because of the degeneracy of the genetic code, synonymous substitutions occur almost exclusively at position 3, making it useful to analyse positions separately.
Can I use aligned (gap-containing) sequences?
Yes. Gap characters such as "-" are treated as ordinary characters and are preserved at their respective codon positions. This lets you split a codon-aligned multiple sequence alignment while keeping alignment columns intact.
What happens if my sequence length is not a multiple of 3?
The trailing 1 or 2 bases that form an incomplete codon are automatically removed before splitting. A warning message is shown in the Warnings panel indicating which sequence was affected and how many bases were trimmed.
Can I process multiple sequences at once?
Yes. Paste any number of FASTA-formatted sequences (each starting with ">title") and all will be split in a single run, producing three output entries per input sequence.
How do I use the output in a phylogenetic analysis?
Copy the output and save it as a FASTA file. You can then import the three position-class sequences into phylogenetic software (e.g. MEGA, IQ-TREE, MrBayes) as separate partitions and assign different substitution models to each.