Range Extractor Protein
Extract residues at specified positions or ranges from a protein sequence

Paste a raw sequence or one or more FASTA sequences. Input limit: 500,000,000 characters.

Use ".." for a span (10..12), commas to separate entries. Keywords: start, end, center, length. Arithmetic allowed: (end-10)..end

💡 Quick Summary

Range Extractor Protein accepts a protein sequence along with a set of positions or ranges, and returns the matching residues in your choice of four formats: merged into a new sequence, as separate FASTA records, uppercased within the full sequence, or lowercased within the full sequence. Ranges support numeric positions, spans (10..12), the keywords start/end/center/length, and arithmetic expressions such as (end-10)..end.

📋 How to Use
  1. Paste a raw protein sequence or one or more FASTA sequences into the top input area.
  2. Enter positions or ranges in the Ranges field, separated by commas. Use x..y for a span. You can use the keywords start, end, center, and length in place of numbers, and arithmetic expressions such as (end-10)..end.
  3. Choose a Return as mode: New sequence joins all ranges into one FASTA entry. Separate FASTA records gives each range its own header. Uppercased in context or Lowercased in context returns the full sequence with the range regions changed in case.
  4. Click Extract. Multiple FASTA input sequences are each processed independently.
  5. Use Copy to copy the result to your clipboard.
  6. Click Load Example to try with a sample protein using the default ranges 1, 5, 10..12.
  7. Click Clear to reset.
🧮 Formulas & Logic
Single position
sequence[ position − 1 ] (1-based → 0-based)
Range
sequence.substring( start − 1, stop ) (both ends inclusive)
Keyword: start
1
Keyword: end
sequence length
Keyword: center
Math.round( sequence.length / 2 )
Keyword: length
sequence length
Arithmetic
Simple expressions allowed, e.g. (end − 10)..end
📊 Result Interpretation
Sequences Processed

Number of FASTA records (or bare sequences) found in the input.

Ranges Parsed

Number of valid position or range entries extracted from the Ranges field.

🔬 Applications
  • Extracting a specific domain, signal peptide, or motif by coordinate from a protein sequence
  • Obtaining the last N residues of a sequence using the expression (end-N+1)..end
  • Pulling multiple non-contiguous regions and joining them as a synthetic subsequence
  • Highlighting the position of a feature within its protein context using Uppercased in context mode
  • Verifying that annotated domain boundaries match the expected residues
⚠️ Common Mistakes & Warnings
Positions are 1-based

All coordinates follow the biological convention: position 1 is the first residue. The sequence is converted to 0-based indices internally.

Out-of-range positions are skipped

If a position or range end is less than 1 or greater than the sequence length, or if the start is greater than the stop, that range entry is skipped and a warning is shown.

Non-protein characters are stripped from the sequence

Before extraction, the input sequence is cleaned to retain only valid amino acid characters. The range positions apply to the cleaned sequence.

❓ Frequently Asked Questions

Can I extract multiple ranges at once?
Yes. Enter ranges separated by commas, e.g. 1..30, 100..150, end. In "New sequence" mode all ranges are concatenated in order. In "Separate FASTA records" mode each range becomes its own FASTA entry.
What do the keywords start, end, center, and length mean?
start and begin equal 1. end and stop equal the sequence length. center equals Math.round(length / 2). length also equals the sequence length. These can be used in arithmetic expressions, e.g. (center-10)..(center+10).
What is the difference between the four output modes?
"New sequence" concatenates all extracted residues into a single FASTA entry — useful for joining domains or building a synthetic construct. "Separate FASTA records" gives each range its own >residue X..Y header. "Uppercased in context" returns the full sequence in lowercase with the specified ranges in uppercase, making it easy to see where the ranges fall. "Lowercased in context" is the reverse — sequence in uppercase with ranges lowercased.
Can I use this for subsequence extraction from UniProt or NCBI entries?
Yes. Paste any FASTA protein sequence directly from UniProt, NCBI, or any other source. Multiple sequences in a single paste are each processed independently.