GenBank Trans Extractor
Extract annotated protein translations from GenBank records as FASTA sequences

Paste one or more GenBank records. Each must begin with a "LOCUS" line, contain a FEATURES section with /translation qualifiers, and end with "//". Input limit: 200,000,000 characters.

💡 Quick Summary

GenBank Trans Extractor reads one or more GenBank records and returns every protein translation annotated in the feature table as a separate FASTA entry. It scans for features that carry a /translation qualifier — typically CDS features — making it the fastest way to pull predicted protein sequences out of a GenBank file without writing a single line of code.

📋 How to Use
  1. Paste the contents of one or more GenBank files into the input area. Each record must begin with a LOCUS line, contain a FEATURES section, and end with "//".
  2. Click Extract Translations. Each CDS (or other feature) that carries a /translation qualifier is output as a FASTA entry. Multiple records are separated by an "=== record title ===" header line.
  3. The Extraction Summary panel shows how many records were processed and how many protein sequences were found.
  4. The Processing Warnings panel appears if any records had no /translation qualifiers or could not be fully parsed.
  5. Use the Copy button to copy all protein sequences to your clipboard.
  6. Click Load Example to try the tool with the Strongylocentrotus purpuratus fascin (FSCN1) record — a real GenBank entry with a complete CDS translation.
  7. Click Clear to reset.
🧮 Formulas & Logic
Translation source
/translation qualifier value, extracted verbatim from the GenBank feature table — no in-browser translation of the DNA is performed
Multi-line handling
Continuation lines within a /translation value are concatenated and non-amino-acid characters stripped before output
Output wrapping
Protein sequence wrapped at 60 characters per line
FASTA header
>featureKey /firstQualifier (e.g. >CDS /gene="FSCN1")
📊 Result Interpretation
Records Processed

Number of GenBank records (LOCUS … //) found in the input.

Translations Extracted

Total number of FASTA protein sequences returned across all records. One entry per /translation qualifier found.

🔬 Applications
  • Retrieving predicted protein sequences from NCBI GenBank downloads without local bioinformatics software
  • Checking all annotated CDS translations in a multi-record GenBank file in a single pass
  • Feeding extracted protein sequences directly into BLAST or multiple-sequence alignment tools
  • Comparing the annotated translation against a de-novo translation to spot codon-table mismatches
  • Collecting protein sequences for proteomics or homology modelling from annotated genome regions
⚠️ Common Mistakes & Warnings
Only /translation qualifiers are extracted — no in-browser translation

This tool reads the protein sequence that the database curator already annotated in the /translation field. It does not translate the DNA sequence itself. If a CDS feature lacks a /translation qualifier, it will not appear in the output.

Only the DEFINITION line is used as the record title

Multi-line DEFINITION fields are truncated to the text before the ACCESSION keyword. If the definition spans many lines, only the first portion appears in the output header.

Records without a FEATURES section are skipped

If a GenBank record has no FEATURES section, it is skipped and a warning is shown in the Processing Warnings panel.

Stop-codon asterisks are removed

The trailing "*" that some GenBank records include in /translation values is stripped so the output contains only standard single-letter amino acid codes.

❓ Frequently Asked Questions

Why does a CDS feature not appear in the output?
The tool only outputs features that have a /translation qualifier. Some CDS entries in older or partial records omit this qualifier — in that case the DNA sequence is present but no pre-computed protein is stored. Use a dedicated translation tool on the extracted DNA if you need the protein sequence.
Can I paste multiple GenBank records at once?
Yes. Paste any number of complete records (each beginning with "LOCUS" and ending with "//"). Each record is processed independently; results are grouped under a "=== record title ===" header.
What is a /translation qualifier?
In the GenBank feature table, the /translation qualifier stores the pre-computed protein sequence for a CDS feature as a string of single-letter amino acid codes. It is added by database curators or NCBI submission tools using the annotated reading frame and codon table.
Where do I get GenBank-format files?
GenBank flat files are available from NCBI at www.ncbi.nlm.nih.gov. Search for an accession number and use "Send to → File → Format: GenBank (full)" to download.
Does this tool translate the DNA sequence?
No. It reads the /translation qualifier already stored in the GenBank feature table. No codon table or reading-frame arithmetic is applied. The output is identical to what the original database submitter or NCBI pipeline calculated.
What if the protein sequence spans multiple lines in the GenBank file?
GenBank files wrap long /translation values across several indented continuation lines. The tool automatically concatenates those fragments and strips the whitespace before outputting the complete sequence.