💡 Quick Summary
EMBL Feature Extractor reads the feature table (FT lines) of one or more EMBL records and returns each annotated feature — CDS, mRNA, gene, exon, and more — as a separate FASTA entry. A feature-type breakdown panel shows exactly what was found. Two output modes: Separated (isolated feature sequence) or Uppercased in context (full sequence with the feature capitalised).
📋 How to Use
- Paste the contents of one or more EMBL files into the input area.
- Choose an Output mode: Separated returns only the nucleotides within the feature coordinates — ideal for downstream analysis. Uppercased in context returns the full genomic sequence in lowercase with the feature region in uppercase — useful for visually locating the feature.
- Click Extract Features. Each annotated feature is output as a FASTA entry; records are separated by a "=== title ===" header line.
- The Feature Types Found panel shows every distinct feature key (CDS, mRNA, source, etc.) with its count across all records.
- Use the Copy button to copy all extracted sequences to your clipboard.
- Click Load Example to try with a compact synthetic record that demonstrates source, gene, CDS (join coordinates), and a complement-strand feature.
- Click Clear to reset.
🧮 Formulas & Logic
📊 Result Interpretation
Number of EMBL records (ID … //) successfully found and parsed.
Total number of feature table entries converted to FASTA sequences. Features with unsupported position formats are excluded and listed in the Warnings panel.
Count of distinct feature keys (e.g. CDS, mRNA, gene) across all processed records.
🔬 Applications
- Extracting CDS sequences from EMBL records for codon-usage analysis or translation
- Reconstructing spliced mRNA from multi-exon join() coordinates
- Deriving the exact nucleotide sequence for a single annotated feature before BLAST or primer design
- Visually inspecting where a feature sits within its genomic context using Uppercased in context mode
- Processing multiple EMBL records in a single pass to collect all features of a given type
⚠️ Common Mistakes & Warnings
Positions using one-of(), order(), bond(), or other advanced EMBL location descriptors cannot be represented as a simple sequence. They are skipped with a warning in the output panel. Simple positions (e.g. 1..100) and join() coordinates are fully supported.
The "" markers (indicating a partial or fuzzy position boundary) are removed before extraction. The resulting sequence may be shorter than the feature annotation implies.
Multi-line DE (description) fields are truncated to the first line. If the description continues on subsequent DE lines, those are not appended.
The SQ block is treated as a DNA sequence. EMBL records where the SQ section contains amino acids (rare) will produce incorrect output.