Reverse Translate - TheBiologyBro

💡 Quick Summary

Reverse Translate accepts a protein sequence and uses a codon usage table to generate a DNA sequence representing the most likely non-degenerate coding sequence. A consensus sequence derived from all the possible codons for each amino acid is also returned. Use Reverse Translate when designing PCR primers to anneal to an unsequenced coding sequence from a related species.

📋 How to Use

Paste a raw protein sequence or one or more FASTA sequences into the top textarea. Valid single-letter amino acid codes are accepted. Stop codons (*) are supported. Input limit: 20,000,000 characters.
Review or replace the codon usage table. The default is the standard E. coli GCG-format table. You can paste any GCG-format table from the Codon Usage Database.
Click Run. The tool outputs: (1) most likely codons, (2) consensus (IUPAC degenerate) codons, and (3) a base probability graph for each sequence.
Use the Copy buttons to copy any output section to your clipboard.
Click Load Example to try with a sample 20-amino-acid sequence.

🧮 Formulas & Logic

Most likely codon

For each amino acid, the codon with the highest fraction value in the codon usage table is selected

Consensus codon

IUPAC degenerate code derived from all bases with non-zero frequency at each codon position

Base frequency

Sum of codon fractions for codons sharing a given base at a given position (recalculated from /1000 to fix database bugs)

Bar scale

Bar length = round( base_frequency × 98 ) characters out of 98 maximum

📊 Result Interpretation

Most likely codons

A non-degenerate DNA sequence that picks the statistically most probable codon for each amino acid based on the selected organism's codon usage.

Consensus codons

A degenerate DNA sequence using IUPAC ambiguity codes (R, Y, S, W, K, M, B, D, H, V, N) to represent all bases that could encode each amino acid. Useful for designing degenerate PCR primers.

Base probability graph

For each amino acid, three sections (first, second, third codon position) each show four bars (G, A, T, C). A single long bar means that base dominates — low degeneracy, ideal for primer design.

Bar labels

g = guanine (lowercase), a = adenine (lowercase), T = thymine (uppercase), C = cytosine (uppercase). Each bar is followed by the frequency (0.00–1.00).

🔬 Applications

Designing degenerate PCR primers to clone an unsequenced ortholog from a related species
Generating the most likely DNA coding sequence for a known protein from a specific organism
Identifying codon positions with minimal degeneracy for primer annealing
Producing a codon-usage-optimised synthetic gene design starting from a protein sequence

⚠️ Common Mistakes & Warnings

Codon table must be in GCG format with a ".." header marker

The tool strips everything before the ".." marker and then parses AmAcid / Codon / Number / /1000 / Fraction columns. Tables from the Codon Usage Database (kazusa.or.jp) are in this format.

Fractions are recalculated from /1000 values

Some entries in the Codon Usage Database list fraction as 0. The tool automatically recalculates fractions from the /1000 column to fix this.

Unknown amino acids (X) get equal codon frequencies

Residues coded as X are treated as equally likely to be any amino acid — all four bases receive 0.25 frequency at each codon position, producing N in the consensus sequence.

❓ Frequently Asked Questions

What is the difference between "most likely codons" and "consensus codons"?

Most likely codons picks the single most-used codon for each amino acid in the chosen organism, producing an unambiguous DNA sequence. Consensus codons uses IUPAC ambiguity codes to represent all bases that occur with non-zero frequency at each codon position, producing a degenerate sequence that covers a wider range of possible coding sequences — useful for PCR primer design.

Which codon usage table should I use?

Use the table for the organism whose coding sequences you want to amplify or mimic. The Codon Usage Database at kazusa.or.jp provides GCG-format tables for thousands of organisms. The default E. coli table is appropriate when the target gene is expressed in E. coli or when the organism is unknown.

How do I design degenerate PCR primers from the output?

Use the consensus codons sequence. Select a 18–24 base region (6–8 amino acids) around a conserved stretch of your protein. Regions where the bar graph shows one dominant base per codon position will produce primers with low degeneracy and high specificity.

What does the base probability graph show?

For each amino acid in your sequence, the graph displays three sections (first, second, third codon position). Within each section, four bars represent the probability (0.00–1.00) that G, A, T, or C appears at that position across all codons for that amino acid in the usage table. A bar spanning the full width means that base is used exclusively at that position.