💡 Quick Summary
Split Codons divides a coding sequence into three new sequences, each consisting of the bases from one of the three codon positions. For example, "ATGATGATG" is converted to "AAA" (1st positions), "TTT" (2nd positions), and "GGG" (3rd positions). This is useful when you want to analyse codon positions separately in phylogenetic or substitution-rate analyses.
📋 How to Use
- Paste one or more FASTA-formatted coding sequences into the input area. Input limit is 500,000,000 characters.
- Click Run. Each input sequence produces three output FASTA entries — one per codon position — with the codon position and lengths recorded in the FASTA title.
- If a sequence does not end on a codon boundary the trailing partial codon is removed automatically; a warning is shown in the Warnings panel.
- The Summary panel shows how many sequences were processed and the total number of complete codons extracted.
- Use the Copy button to copy all output to your clipboard.
- Click Load Example to try with a sample coding sequence that contains alignment gaps.
- Click Clear to reset.
🧮 Formulas & Logic
📊 Result Interpretation
Number of FASTA entries successfully split.
Sum of complete (non-partial) codons across all input sequences. Each input sequence contributes floor(length / 3) codons.
🔬 Applications
- Separating third-codon-position (synonymous) sites from first and second positions for substitution-rate analysis
- Creating codon-position-partitioned alignments for phylogenetic inference (e.g. MrBayes, IQ-TREE partition files)
- Estimating dS (synonymous) and dN (non-synonymous) rates per codon position
- Checking compositional bias at wobble (3rd) positions vs. constrained (1st/2nd) positions
- Splitting an aligned CDS to feed each position class into a separate nucleotide model
⚠️ Common Mistakes & Warnings
If a coding sequence length is not a multiple of 3, the trailing 1 or 2 bases form an incomplete codon. They are removed before splitting, and a warning is shown listing the affected sequence and how many bases were trimmed.
The tool removes all digits and whitespace from the sequence body before splitting. This strips GenBank-style line numbering if present. Gap characters such as dashes (-) and dots (.) are kept, allowing aligned sequences to be split position-by-position.