Range Extractor Protein - TheBiologyBro

Q: Can I extract multiple ranges at once?

Yes. Enter ranges separated by commas, e.g. 1..30, 100..150, end. In "New sequence" mode all ranges are concatenated in order. In "Separate FASTA records" mode each range becomes its own FASTA entry.

Q: What do the keywords start, end, center, and length mean?

start and begin equal 1. end and stop equal the sequence length. center equals Math.round(length / 2). length also equals the sequence length. These can be used in arithmetic expressions, e.g. (center-10)..(center+10).

Input Protein

💡 Quick Summary

Range Extractor Protein accepts a protein sequence along with a set of positions or ranges, and returns the matching residues in your choice of four formats: merged into a new sequence, as separate FASTA records, uppercased within the full sequence, or lowercased within the full sequence. Ranges support numeric positions, spans (10..12), the keywords start/end/center/length, and arithmetic expressions such as (end-10)..end.

📋 How to Use

Paste a raw protein sequence or one or more FASTA sequences into the top input area.
Enter positions or ranges in the Ranges field, separated by commas. Use x..y for a span. You can use the keywords start, end, center, and length in place of numbers, and arithmetic expressions such as (end-10)..end.
Choose a Return as mode: New sequence joins all ranges into one FASTA entry. Separate FASTA records gives each range its own header. Uppercased in context or Lowercased in context returns the full sequence with the range regions changed in case.
Click Extract. Multiple FASTA input sequences are each processed independently.
Use Copy to copy the result to your clipboard.
Click Load Example to try with a sample protein using the default ranges 1, 5, 10..12.
Click Clear to reset.

🧮 Formulas & Logic

Single position

sequence[ position − 1 ] (1-based → 0-based)

Range

sequence.substring( start − 1, stop ) (both ends inclusive)

Keyword: start

Keyword: end

sequence length

Keyword: center

Math.round( sequence.length / 2 )

Keyword: length

sequence length

Arithmetic

Simple expressions allowed, e.g. (end − 10)..end

📊 Result Interpretation

Sequences Processed

Number of FASTA records (or bare sequences) found in the input.

Ranges Parsed

Number of valid position or range entries extracted from the Ranges field.

🔬 Applications

Extracting a specific domain, signal peptide, or motif by coordinate from a protein sequence
Obtaining the last N residues of a sequence using the expression (end-N+1)..end
Pulling multiple non-contiguous regions and joining them as a synthetic subsequence
Highlighting the position of a feature within its protein context using Uppercased in context mode
Verifying that annotated domain boundaries match the expected residues

⚠️ Common Mistakes & Warnings

Positions are 1-based

All coordinates follow the biological convention: position 1 is the first residue. The sequence is converted to 0-based indices internally.

Out-of-range positions are skipped

If a position or range end is less than 1 or greater than the sequence length, or if the start is greater than the stop, that range entry is skipped and a warning is shown.

Non-protein characters are stripped from the sequence

Before extraction, the input sequence is cleaned to retain only valid amino acid characters. The range positions apply to the cleaned sequence.

❓ Frequently Asked Questions

Can I extract multiple ranges at once?

Yes. Enter ranges separated by commas, e.g. 1..30, 100..150, end. In "New sequence" mode all ranges are concatenated in order. In "Separate FASTA records" mode each range becomes its own FASTA entry.

What do the keywords start, end, center, and length mean?

start and begin equal 1. end and stop equal the sequence length. center equals Math.round(length / 2). length also equals the sequence length. These can be used in arithmetic expressions, e.g. (center-10)..(center+10).

What is the difference between the four output modes?

"New sequence" concatenates all extracted residues into a single FASTA entry — useful for joining domains or building a synthetic construct. "Separate FASTA records" gives each range its own >residue X..Y header. "Uppercased in context" returns the full sequence in lowercase with the specified ranges in uppercase, making it easy to see where the ranges fall. "Lowercased in context" is the reverse — sequence in uppercase with ranges lowercased.

Can I use this for subsequence extraction from UniProt or NCBI entries?

Yes. Paste any FASTA protein sequence directly from UniProt, NCBI, or any other source. Multiple sequences in a single paste are each processed independently.