💡 Quick Summary
EMBL Trans Extractor reads one or more EMBL records and returns every protein translation annotated in the feature table as a separate FASTA entry. It scans for features that carry a /translation qualifier — typically CDS features — making it the fastest way to pull predicted protein sequences out of an EMBL file without writing a single line of code.
📋 How to Use
- Paste the contents of one or more EMBL files into the input area.
- Click Extract Translations. Each CDS (or other feature) that carries a
/translationqualifier is output as a FASTA entry. Multiple records are separated by an "=== record title ===" header line. - The Extraction Summary panel shows how many records were processed and how many protein sequences were found.
- The Processing Warnings panel appears if any records had no
/translationqualifiers, or if the feature table could not be located. - Use the Copy button to copy all protein sequences to your clipboard.
- Click Load Example to try the tool with a synthetic two-CDS record demonstrating both a simple and a complement-strand feature.
- Click Clear to reset.
🧮 Formulas & Logic
📊 Result Interpretation
Number of EMBL records (ID … //) found in the input.
Total number of FASTA protein sequences returned across all records. One entry per /translation qualifier found.
🔬 Applications
- Retrieving predicted protein sequences from ENA downloads without local bioinformatics software
- Checking all annotated CDS translations in a multi-record EMBL file in a single pass
- Feeding extracted protein sequences directly into BLAST or multiple-sequence alignment tools
- Comparing the annotated translation against a de-novo translation to spot codon-table mismatches
- Collecting protein sequences for proteomics or homology modelling from annotated genome regions
⚠️ Common Mistakes & Warnings
This tool reads the protein sequence that the database curator already annotated in the /translation field. It does not translate the DNA sequence itself. If a CDS feature lacks a /translation qualifier, it will not appear in the output.
Multi-line DE (description) fields are truncated to the first line. Subsequent DE lines are not appended to the FASTA header.
If an EMBL record has no FH / FT block (e.g. a pure sequence submission), it is skipped and a warning is shown.
The trailing "*" that some EMBL records include in /translation values is stripped so the output contains only standard single-letter amino acid codes.
❓ Frequently Asked Questions
Why does a CDS feature not appear in the output?
/translation qualifier. Some CDS entries in older or partial records omit this qualifier — in that case the DNA sequence is present but no pre-computed protein is stored. Use a dedicated translation tool on the extracted DNA if you need the protein sequence.Can I paste multiple EMBL records at once?
What is a /translation qualifier?
/translation qualifier stores the pre-computed protein sequence for a CDS feature as a string of single-letter amino acid codes. It is added by database curators (or submission tools) using the annotated reading frame and codon table.