💡 Quick Summary
Sample Protein randomly selects residues from a guide sequence (with replacement) until a new sequence of the desired length is constructed. Because sampling is proportional to the residue composition of the guide, the output sequences preserve the amino acid frequencies of the guide. Sampled sequences serve as composition-matched controls for evaluating sequence analysis results.
📋 How to Use
- Enter the desired output length in residues (default: 100; maximum: 100,000,000).
- Paste a raw or FASTA guide sequence into the textarea. The tool samples from the residues in this sequence. Input limit: 100,000,000 characters.
- Choose how many sequences to generate (1, 10, 50, or 100).
- Click Run. Each output sequence is independently sampled from the guide and output as a FASTA record. Use Copy to copy the plain-text result.
🧮 Formulas & Logic
📊 Result Interpretation
Each residue is drawn independently and uniformly from all positions in the guide. The guide is not consumed — the same position can be selected multiple times.
If the guide is 20% Leucine, the output sequences will on average also be ~20% Leucine. The exact composition will vary slightly due to random sampling.
The output length can be shorter or longer than the guide. The guide only defines the sampling pool, not the output size.
🔬 Applications
- Generating composition-matched null sequences for statistical testing of motif enrichment
- Producing background sequences that reflect the residue bias of a specific protein family or domain
- Creating synthetic sequences with a defined amino acid composition for benchmarking tools
- Testing analysis pipelines with sequences that share compositional properties with real proteins
⚠️ Common Mistakes & Warnings
Digits, spaces, and non-IUPAC amino acid characters are removed from the guide sequence before sampling. Only valid residues remain in the sampling pool.
If multiple FASTA records are pasted as the guide, only the first sequence is used as the sampling source.