💡 Quick Summary
GenBank Trans Extractor reads one or more GenBank records and returns every protein translation annotated in the feature table as a separate FASTA entry. It scans for features that carry a /translation qualifier — typically CDS features — making it the fastest way to pull predicted protein sequences out of a GenBank file without writing a single line of code.
📋 How to Use
- Paste the contents of one or more GenBank files into the input area. Each record must begin with a LOCUS line, contain a FEATURES section, and end with "//".
- Click Extract Translations. Each CDS (or other feature) that carries a
/translationqualifier is output as a FASTA entry. Multiple records are separated by an "=== record title ===" header line. - The Extraction Summary panel shows how many records were processed and how many protein sequences were found.
- The Processing Warnings panel appears if any records had no
/translationqualifiers or could not be fully parsed. - Use the Copy button to copy all protein sequences to your clipboard.
- Click Load Example to try the tool with the Strongylocentrotus purpuratus fascin (FSCN1) record — a real GenBank entry with a complete CDS translation.
- Click Clear to reset.
🧮 Formulas & Logic
📊 Result Interpretation
Number of GenBank records (LOCUS … //) found in the input.
Total number of FASTA protein sequences returned across all records. One entry per /translation qualifier found.
🔬 Applications
- Retrieving predicted protein sequences from NCBI GenBank downloads without local bioinformatics software
- Checking all annotated CDS translations in a multi-record GenBank file in a single pass
- Feeding extracted protein sequences directly into BLAST or multiple-sequence alignment tools
- Comparing the annotated translation against a de-novo translation to spot codon-table mismatches
- Collecting protein sequences for proteomics or homology modelling from annotated genome regions
⚠️ Common Mistakes & Warnings
This tool reads the protein sequence that the database curator already annotated in the /translation field. It does not translate the DNA sequence itself. If a CDS feature lacks a /translation qualifier, it will not appear in the output.
Multi-line DEFINITION fields are truncated to the text before the ACCESSION keyword. If the definition spans many lines, only the first portion appears in the output header.
If a GenBank record has no FEATURES section, it is skipped and a warning is shown in the Processing Warnings panel.
The trailing "*" that some GenBank records include in /translation values is stripped so the output contains only standard single-letter amino acid codes.
❓ Frequently Asked Questions
Why does a CDS feature not appear in the output?
/translation qualifier. Some CDS entries in older or partial records omit this qualifier — in that case the DNA sequence is present but no pre-computed protein is stored. Use a dedicated translation tool on the extracted DNA if you need the protein sequence.Can I paste multiple GenBank records at once?
What is a /translation qualifier?
/translation qualifier stores the pre-computed protein sequence for a CDS feature as a string of single-letter amino acid codes. It is added by database curators or NCBI submission tools using the annotated reading frame and codon table.