EMBL Trans Extractor
Extract annotated protein translations from EMBL records as FASTA sequences

Paste one or more EMBL records. Each must begin with an "ID " line, contain FT feature lines with /translation qualifiers, and end with "//". Input limit: 200,000,000 characters.

💡 Quick Summary

EMBL Trans Extractor reads one or more EMBL records and returns every protein translation annotated in the feature table as a separate FASTA entry. It scans for features that carry a /translation qualifier — typically CDS features — making it the fastest way to pull predicted protein sequences out of an EMBL file without writing a single line of code.

📋 How to Use
  1. Paste the contents of one or more EMBL files into the input area.
  2. Click Extract Translations. Each CDS (or other feature) that carries a /translation qualifier is output as a FASTA entry. Multiple records are separated by an "=== record title ===" header line.
  3. The Extraction Summary panel shows how many records were processed and how many protein sequences were found.
  4. The Processing Warnings panel appears if any records had no /translation qualifiers, or if the feature table could not be located.
  5. Use the Copy button to copy all protein sequences to your clipboard.
  6. Click Load Example to try the tool with a synthetic two-CDS record demonstrating both a simple and a complement-strand feature.
  7. Click Clear to reset.
🧮 Formulas & Logic
Translation source
/translation qualifier value, extracted verbatim from the EMBL feature table — no in-browser translation of the DNA is performed
Multi-line handling
Continuation lines (FT + 19 spaces) within a /translation value are concatenated before non-amino-acid characters are stripped
Output wrapping
Protein sequence wrapped at 60 characters per line
FASTA header
>featureKey /firstQualifier (e.g. >CDS /gene="fem-2")
📊 Result Interpretation
Records Processed

Number of EMBL records (ID … //) found in the input.

Translations Extracted

Total number of FASTA protein sequences returned across all records. One entry per /translation qualifier found.

🔬 Applications
  • Retrieving predicted protein sequences from ENA downloads without local bioinformatics software
  • Checking all annotated CDS translations in a multi-record EMBL file in a single pass
  • Feeding extracted protein sequences directly into BLAST or multiple-sequence alignment tools
  • Comparing the annotated translation against a de-novo translation to spot codon-table mismatches
  • Collecting protein sequences for proteomics or homology modelling from annotated genome regions
⚠️ Common Mistakes & Warnings
Only /translation qualifiers are extracted — no in-browser translation

This tool reads the protein sequence that the database curator already annotated in the /translation field. It does not translate the DNA sequence itself. If a CDS feature lacks a /translation qualifier, it will not appear in the output.

Only the first DE line is used as the record title

Multi-line DE (description) fields are truncated to the first line. Subsequent DE lines are not appended to the FASTA header.

Records without a feature table are skipped

If an EMBL record has no FH / FT block (e.g. a pure sequence submission), it is skipped and a warning is shown.

Stop-codon asterisks are removed

The trailing "*" that some EMBL records include in /translation values is stripped so the output contains only standard single-letter amino acid codes.

❓ Frequently Asked Questions

Why does a CDS feature not appear in the output?
The tool only outputs features that have a /translation qualifier. Some CDS entries in older or partial records omit this qualifier — in that case the DNA sequence is present but no pre-computed protein is stored. Use a dedicated translation tool on the extracted DNA if you need the protein sequence.
Can I paste multiple EMBL records at once?
Yes. Paste any number of complete records (each beginning with "ID " and ending with "//"). Each record is processed independently; results are grouped under a "=== record title ===" header.
What is a /translation qualifier?
In the EMBL feature table, the /translation qualifier stores the pre-computed protein sequence for a CDS feature as a string of single-letter amino acid codes. It is added by database curators (or submission tools) using the annotated reading frame and codon table.
Where do I get EMBL-format files?
EMBL flat files are available from the European Nucleotide Archive (ENA) at www.ebi.ac.uk/ena. Search for an accession number and choose "EMBL" as the download format. GenBank records can be converted using Biopython's SeqIO or EMBOSS seqret.
Does this tool translate the DNA sequence?
No. It reads the /translation qualifier that is already stored in the EMBL feature table. No codon table or reading-frame arithmetic is applied. This makes the output identical to what the original database submitter or ENA pipeline calculated.
What if the protein sequence spans multiple lines in the EMBL file?
EMBL files wrap long /translation values across several FT continuation lines (each prefixed with "FT" and 19 spaces). The tool automatically strips those prefixes and concatenates the fragments before outputting the complete sequence.