💡 Quick Summary
Sample DNA randomly selects bases from a guide sequence (with replacement) until a new sequence of the desired length is constructed. Because sampling is proportional to the base composition of the guide, the output sequences preserve the nucleotide frequencies of the guide. Sampled sequences serve as composition-matched controls for evaluating sequence analysis results.
📋 How to Use
- Enter the desired output length in bases (default: 100; maximum: 10,000,000).
- Paste a raw or FASTA guide sequence into the textarea. The tool samples from the bases in this sequence. Input limit: 10,000,000 characters.
- Choose how many sequences to generate (1, 10, 50, or 100).
- Click Run. Each output sequence is independently sampled from the guide and output as a FASTA record. Use Copy to copy the plain-text result.
🧮 Formulas & Logic
📊 Result Interpretation
Each base is drawn independently and uniformly from all positions in the guide. The guide is not consumed — the same position can be selected multiple times.
If the guide is 60% G+C, the output sequences will on average also be ~60% G+C. The exact composition will vary slightly due to random sampling.
The output length can be shorter or longer than the guide. The guide only defines the sampling pool, not the output size.
🔬 Applications
- Generating composition-matched null sequences for statistical testing of motif enrichment
- Producing background sequences that reflect the GC content of a specific organism or genomic region
- Creating synthetic sequences with a defined base composition for benchmarking tools
- Testing analysis pipelines with sequences that share properties with real data
⚠️ Common Mistakes & Warnings
Digits, spaces, and non-IUPAC characters are removed from the guide sequence before sampling. Only valid DNA bases remain in the sampling pool.
If multiple FASTA records are pasted as the guide, only the first sequence is used as the sampling source.