DNA Pattern Find - TheBiologyBro

Input Sequence

💡 Quick Summary

DNA Pattern Find accepts one or more DNA sequences along with a search pattern and returns the number and positions of all matching sites. Both the direct strand and the reverse complement strand are searched, so you can locate binding sites, motifs, or other features that may occur on either strand.

📋 How to Use

Paste a raw sequence or one or more FASTA sequences into the input area. Input limit is 500,000,000 characters.
Enter a search pattern in the pattern field. The pattern is a JavaScript regular expression (e.g. ctt[ca] matches "cttc" and "ctta").
Click Run. Each match is reported with its match number, pattern, start and end positions, strand, and the matched sequence.
Positions are reported in 1-based coordinates relative to the direct (input) strand for both direct and reverse strand matches.
Use the Copy button to copy all results to your clipboard.
Click Load Example to try with three sample sequences using the default pattern ctt[ca].

🧮 Formulas & Logic

Direct strand position

start = match_end − match_length + 1 (1-based)

Reverse strand position

start = sequence_length − match_end_on_revcomp + 1 (maps back to direct strand coordinates)

Overlapping matches

Each match advances the search by 1 bp so overlapping occurrences are all reported

📊 Result Interpretation

Sequences Processed

Number of FASTA records successfully searched.

Total Matches

Sum of all hits across all sequences and both strands.

start / end

1-based positions on the direct (input) strand. For reverse strand matches these coordinates indicate where the motif lies on the direct strand.

direct strand

Match was found in the 5'→3' direction of the input sequence.

reverse strand

Match was found in the reverse complement of the input sequence (i.e. the 3'→5' strand).

🔬 Applications

Finding transcription factor binding sites or promoter motifs in genomic sequences
Locating restriction enzyme recognition sites (simple patterns without degeneracy)
Identifying repeated sequence elements or tandem repeats
Searching for primer annealing sites on both strands of a template
Locating open reading frame start codons (pattern: atg) or stop codons (pattern: taa|tag|tga)
Finding TATA boxes, splice sites, or other functional sequence signals

⚠️ Common Mistakes & Warnings

Pattern is a JavaScript regular expression

The search pattern uses JavaScript regex syntax. Literal brackets, dots, stars, and other special characters must be escaped with a backslash if you want them treated as literals (e.g. use \. to match a literal dot). An invalid pattern will display an error.

Non-DNA characters are stripped from the sequence

Whitespace, digits, and non-IUPAC characters are removed from the input sequence before searching. The pattern is applied to the stripped sequence, so position numbers reflect the cleaned sequence.

Overlapping matches are all reported

The search advances one base at a time after each match, so overlapping occurrences of the same pattern are each listed separately.

❓ Frequently Asked Questions

What pattern syntax is supported?

Patterns use standard JavaScript regular expression syntax. Common examples: atg (literal match), ctt[ca] (character class), at{3} (exactly 3 T's after AT), gc+ (GC followed by one or more C's), taa|tag|tga (stop codons), gg.cc (any base between GG and CC). The search is case-insensitive.

How are reverse strand positions calculated?

The reverse complement of the entire sequence is searched. Match positions on the reverse complement are then converted back to direct-strand coordinates: start = sequence_length − revcomp_match_end + 1. This means start and end positions always refer to the same strand as the input sequence.

Why does a match appear at both strands?

Palindromic or near-palindromic sequences can match the pattern on both strands at the same or nearby positions. This is biologically meaningful — many restriction sites and some transcription factor motifs are palindromic.

Can I search for IUPAC degenerate codes?

Not directly, because degenerate codes (R, Y, N, etc.) are treated as literals by the regex engine. To search for a degenerate site, convert it to a character class: R → [ag], Y → [ct], N → [acgt], etc.

Can I process multiple sequences at once?

Yes. Paste any number of FASTA-formatted sequences. Each record is searched independently and results are listed separately.