💡 Quick Summary
Fuzzy Search DNA finds sites in a target DNA sequence that are identical or similar to a short query sequence. Scoring is controlled by match, mismatch, and gap parameters. Use it to locate sequences that can be mutated into a restriction site, or to find approximate occurrences of a motif.
📋 How to Use
- Paste the target sequence (raw or FASTA format) into the top textarea. Input limit is 2,000,000 characters.
- Type the query sequence into the query field (max 30 characters). This is the short motif you want to find.
- Adjust the scoring parameters if needed: Match value (reward for identical bases), Mismatch value (reward/penalty for non-identical bases), Gap value (cost for insertions/deletions), and the number of hits to report.
- Click Run. Each hit shows the aligned query and target segment with its alignment score.
- Click Load Example to try searching for cccggg (an SmaI restriction site) in a sample sequence.
🧮 Formulas & Logic
📊 Result Interpretation
Number of distinct local alignments returned (up to the requested maximum).
1-based start and end positions of the aligned portion within the query.
1-based start and end positions of the aligned portion within the target.
Cumulative alignment score. Higher scores indicate closer matches.
A dash in the aligned sequence marks an insertion in the opposite sequence.
🔬 Applications
- Finding near-matches to a restriction enzyme recognition site so you can plan a silent mutation to introduce the site
- Locating degenerate primer binding sites in a template sequence
- Identifying approximate occurrences of a short regulatory motif across a genomic region
- Checking whether a synthesised oligo will bind off-target sites with only a few mismatches
⚠️ Common Mistakes & Warnings
The algorithm uses O(n × m) memory where n and m are the sequence lengths. Keeping the query short ensures the alignment matrix stays manageable.
Any character that is not a valid DNA base (A, T, G, C, N, U, R, Y, S, W, K, M, B, D, H, V) is removed from both the target and query before the search.
If you paste a multi-FASTA file only the first sequence is used as the target. Use a single-sequence FASTA or a raw sequence.