Shuffle Protein - TheBiologyBro

💡 Quick Summary

Shuffle Protein randomly shuffles the residues of one or more protein sequences using a Fisher-Yates permutation. Because shuffling is a permutation rather than random sampling, the output sequences have exactly the same residue composition as the inputs — only the order of residues changes. Shuffled sequences serve as composition- and length-matched controls for evaluating sequence analysis results.

📋 How to Use

Paste one or more raw or FASTA sequences into the textarea. Multiple FASTA records are each shuffled independently. Input limit: 300,000,000 characters.
Click Run. Each sequence is independently shuffled and output as a FASTA record. Use Copy to copy the plain-text result.

🧮 Formulas & Logic

Fisher-Yates shuffle

For an array of N residues, iterate from index N−1 down to 1; at each step i, pick a random index j in [0, i] and swap positions i and j. Every permutation of the sequence is equally likely.

📊 Result Interpretation

Exact composition preserved

Unlike sampling-based tools, shuffling does not change the count of any residue. If the input is 20% Leucine, the shuffled output is also exactly 20% Leucine.

Order randomised

All positional information (motif positions, domain boundaries, sequence biases) is destroyed. The shuffled sequence has no sequence-level similarity to the original.

Multi-FASTA support

If you paste multiple FASTA records, each sequence is shuffled independently. The number of output records equals the number of valid input records.

Degenerate residues preserved

IUPAC degenerate amino acid codes (B, X, U, O, J) are retained in the shuffle pool alongside standard residues.

🔬 Applications

Generating composition- and length-matched null sequences for statistical testing of motif enrichment
Producing background sequences that share exactly the same amino acid composition as a query protein
Testing analysis pipelines with sequences that have identical composition but no sequence similarity to real proteins
Evaluating whether an analysis result depends on sequence order rather than just residue composition

⚠️ Common Mistakes & Warnings

Non-protein characters are stripped before shuffling

Digits, spaces, and non-IUPAC characters are removed from each sequence before shuffling. The output length may be shorter than the input if invalid characters were present.

Shuffling is not the same as sampling

Shuffle Protein produces a single permutation of the input. To generate many independent randomised sequences from a template, use Sample Protein instead.

❓ Frequently Asked Questions

How does Shuffle Protein differ from Sample Protein?

Shuffle Protein is a permutation — every residue in the input appears exactly once in the output, so composition is preserved exactly. Sample Protein draws residues with replacement from a guide, so the output composition is only approximately equal to the guide and the output length is independent of the guide length.

Can I shuffle multiple sequences at once?

Yes — paste multiple FASTA records into the input textarea. Each record is shuffled independently and output as a separate FASTA entry.

Are stop codons (*) preserved in the shuffle?

Yes — stop codon symbols (*) are treated as regular characters and included in the shuffle pool. They will appear in the output in proportion to how often they occurred in the input.