Protein Pattern Find
Find all occurrences of a regex pattern in one or more protein sequences

Raw sequence or multi-FASTA format. Input limit: 500,000 characters.

JavaScript regular expression syntax. The default pattern finds two serine residues no more than 5 residues apart. The search is case-insensitive.

💡 Quick Summary

Protein Pattern Find accepts one or more protein sequences along with a regular expression search pattern, and returns the number and positions of all sites that match the pattern. Use it to locate sequence regions matching a consensus sequence of interest, such as phosphorylation motifs, binding sites, or conserved domains.

📋 How to Use
  1. Paste one or more protein sequences (raw or FASTA) into the text area.
  2. Enter a search pattern in the pattern field. The default pattern S[^S]{0,5}S finds pairs of serine residues that are no more than 5 residues apart.
  3. Standard JavaScript regular expression syntax is supported: use [ACE] for character classes, {n,m} for repeat counts, . for any residue, and [^X] for "not X".
  4. Click Submit. Each matching region is reported with its position (1-based) within the sequence.
  5. Overlapping matches are reported — for example, SS in SSS is found at positions 1 and 2.
  6. Use Copy All to copy the full results report to your clipboard.
🧮 Formulas & Logic
Pattern matching
JavaScript RegExp with global + case-insensitive flags. lastIndex is decremented by match length − 1 after each hit to detect overlapping occurrences.
Position numbering
1-based: start = (lastIndex − matchLength + 1), end = lastIndex.
📊 Result Interpretation
start / end

Residue positions are 1-based and count from the first residue of the cleaned sequence (after non-protein characters are stripped).

Overlapping matches

If your pattern can match overlapping regions (e.g. SS in SSS), all overlapping occurrences are reported.

"no matches found"

The pattern did not match anywhere in that sequence. Check the pattern syntax and residue case (the search is case-insensitive).

🔬 Applications
  • Locating known phosphorylation, glycosylation, or cleavage motifs across a sequence set
  • Finding occurrences of a consensus binding site or recognition sequence
  • Identifying conserved short linear motifs (SLiMs) in protein families
  • Verifying the presence or absence of a tag, linker, or signal sequence after cloning
  • Scanning translated sequences for protease cleavage sites before digest experiments
⚠️ Common Mistakes & Warnings
Non-standard characters stripped before search

Characters outside the 20 standard amino acid letters (B, Z, X, *, gaps, digits, whitespace) are removed from the sequence before the pattern is applied. Reported positions refer to the cleaned sequence.

Pattern must be a valid regular expression

Any valid JavaScript regular expression is accepted. Unbalanced brackets, invalid quantifiers, or other syntax errors will be caught and reported before the search runs.

Very broad patterns may produce many matches

Patterns such as .* or .{0,} will match very large regions. Narrow the pattern with specific residue requirements or tighter repeat bounds.

❓ Frequently Asked Questions

What pattern syntax is supported?
Standard JavaScript regular expression syntax. Common constructs: [ACDE] matches any of A, C, D, or E; [^P] matches any residue except P; . matches any residue; {2,4} means 2–4 repeats; ^ and $ anchor to the start/end of the sequence. The search is always case-insensitive.
What does the default pattern S[^S]{0,5}S mean?
It finds two serine (S) residues separated by 0–5 non-serine residues. This is useful for finding potential CK2 phosphorylation sites and serine-rich clusters.
Why do the positions differ from those in a database annotation?
Database annotations typically include signal peptides, propeptides, or other regions that may be absent from the sequence you pasted. This tool reports positions within the exact sequence provided, not within the full canonical protein.
Can I search for a PROSITE-style pattern?
PROSITE patterns use a different notation (e.g. x(2,4) instead of .{2,4}). You will need to convert them to regular expression syntax manually before searching here.