💡 Quick Summary
DNA Pattern Find accepts one or more DNA sequences along with a search pattern and returns the number and positions of all matching sites. Both the direct strand and the reverse complement strand are searched, so you can locate binding sites, motifs, or other features that may occur on either strand.
📋 How to Use
- Paste a raw sequence or one or more FASTA sequences into the input area. Input limit is 500,000,000 characters.
- Enter a search pattern in the pattern field. The pattern is a JavaScript regular expression (e.g.
ctt[ca]matches "cttc" and "ctta"). - Click Run. Each match is reported with its match number, pattern, start and end positions, strand, and the matched sequence.
- Positions are reported in 1-based coordinates relative to the direct (input) strand for both direct and reverse strand matches.
- Use the Copy button to copy all results to your clipboard.
- Click Load Example to try with three sample sequences using the default pattern
ctt[ca].
🧮 Formulas & Logic
📊 Result Interpretation
Number of FASTA records successfully searched.
Sum of all hits across all sequences and both strands.
1-based positions on the direct (input) strand. For reverse strand matches these coordinates indicate where the motif lies on the direct strand.
Match was found in the 5'→3' direction of the input sequence.
Match was found in the reverse complement of the input sequence (i.e. the 3'→5' strand).
🔬 Applications
- Finding transcription factor binding sites or promoter motifs in genomic sequences
- Locating restriction enzyme recognition sites (simple patterns without degeneracy)
- Identifying repeated sequence elements or tandem repeats
- Searching for primer annealing sites on both strands of a template
- Locating open reading frame start codons (pattern:
atg) or stop codons (pattern:taa|tag|tga) - Finding TATA boxes, splice sites, or other functional sequence signals
⚠️ Common Mistakes & Warnings
The search pattern uses JavaScript regex syntax. Literal brackets, dots, stars, and other special characters must be escaped with a backslash if you want them treated as literals (e.g. use \. to match a literal dot). An invalid pattern will display an error.
Whitespace, digits, and non-IUPAC characters are removed from the input sequence before searching. The pattern is applied to the stripped sequence, so position numbers reflect the cleaned sequence.
The search advances one base at a time after each match, so overlapping occurrences of the same pattern are each listed separately.
❓ Frequently Asked Questions
What pattern syntax is supported?
atg (literal match), ctt[ca] (character class), at{3} (exactly 3 T's after AT), gc+ (GC followed by one or more C's), taa|tag|tga (stop codons), gg.cc (any base between GG and CC). The search is case-insensitive.How are reverse strand positions calculated?
Why does a match appear at both strands?
Can I search for IUPAC degenerate codes?
[ag], Y → [ct], N → [acgt], etc.