Codon Usage - TheBiologyBro

Input Sequence

Paste a raw sequence or one or more FASTA sequences. Non-DNA characters are stripped automatically. U is treated as T. Input limit: 500,000,000 characters.

Genetic code

💡 Quick Summary

Codon Usage accepts one or more DNA sequences and returns the number and frequency of each codon type. Since it also compares the frequencies of codons that encode the same amino acid (synonymous codons), you can use it to assess whether a sequence shows a preference for particular synonymous codons — a property known as codon usage bias.

📋 How to Use

Paste a raw DNA sequence or one or more FASTA sequences into the input area. Input limit is 500,000,000 characters.
Choose a Genetic code from the dropdown. The standard code is selected by default.
Click Run. Each input sequence produces its own codon usage table in GCG format.
The table columns are: AmAcid (amino acid name), Codon (triplet), Number (raw count), /1000 (frequency per 1,000 codons), Fraction (fraction of synonymous codons).
Use the Copy button to copy the result to your clipboard. The output format is compatible with the Codon Usage Database at kazusa.or.jp.
Click Load Example to try with two sample coding sequences using the standard genetic code.

🧮 Formulas & Logic

/1000

codon_count × 1000 / total_codons_in_sequence

Fraction

codon_count / total_count_of_synonymous_codons_for_that_amino_acid

📊 Result Interpretation

Sequences Processed

Number of FASTA records successfully processed.

Total Codons

Sum of all codons counted across all sequences.

Fraction near 1.0

This codon dominates usage for its amino acid — strong codon preference.

Fraction near 0

This codon is rarely or never used for its amino acid in the input.

High /1000

The codon appears frequently relative to total codon count.

🔬 Applications

Assessing codon usage bias in a gene before expressing it in a heterologous host
Comparing the codon usage of a synthetic gene against a host organism's preference
Generating a GCG-format codon usage table to upload to the Codon Usage Database
Identifying over- or under-represented codons in a set of genes
Checking whether stop codon usage is consistent with the expected genetic code

⚠️ Common Mistakes & Warnings

Sequence length not a multiple of 3

If the cleaned sequence length is not divisible by 3, the last 1 or 2 bases are ignored. A warning is shown for each affected sequence.

Non-DNA characters are stripped

Any character that is not a valid IUPAC DNA/RNA letter is removed before counting. RNA sequences (with U) are treated as DNA (U → T).

Each sequence is analysed independently

Counts are not pooled across sequences. Each FASTA record produces its own separate codon usage table.

❓ Frequently Asked Questions

What is codon usage bias?

Most amino acids are encoded by multiple synonymous codons, but organisms do not use all synonymous codons equally. This unequal usage — codon usage bias — correlates with the abundance of corresponding tRNAs. Genes with many rare codons (low fraction, low /1000) may be translated slowly or inaccurately in a heterologous host.

Which genetic code should I choose?

Choose the code that matches your organism. Nuclear genes from animals, plants, and fungi typically use the standard code (1). Organelle genes use organism-specific mitochondrial codes. Ciliates and some other protists use alternative nuclear codes.

How is the output format structured?

The output is in GCG format: a header line followed by one row per codon grouped by amino acid. Columns are: amino acid name (AmAcid), codon triplet (Codon), raw count (Number), frequency per 1,000 codons (/1000), and fraction of synonymous codons (Fraction). This format is used by the Codon Usage Database at kazusa.or.jp.

Can I use the output with other tools?

Yes. The GCG-format output is compatible with the Codon Plot tool — you can paste it directly into the Codon Plot codon table field to visualise usage patterns as a bar chart.

Can I process multiple sequences at once?

Yes. Paste any number of FASTA-formatted sequences. Each produces its own codon usage table. Counts are not pooled — each table reflects only that sequence.