💡 Quick Summary
Fuzzy Search Protein finds sites in a target protein sequence that are identical or similar to a short query sequence. Similarity is scored using a selected amino-acid substitution matrix (PAM30, PAM70, BLOSUM80, BLOSUM62, or BLOSUM45) plus a configurable gap penalty.
📋 How to Use
- Paste the target protein sequence (raw or FASTA format) into the top textarea. Input limit is 2,000 characters.
- Type the query sequence into the query field (max 30 amino acids). This is the short motif you want to find.
- Choose a scoring matrix. BLOSUM62 is the standard choice for most protein homology searches.
- Adjust the gap value and the number of hits to report if needed, then click Run.
- Each hit shows the aligned query and target segment with the alignment score.
- Click Load Example to try searching for GAD in a sample protein sequence.
🧮 Formulas & Logic
📊 Result Interpretation
Number of distinct local alignments returned (up to the requested maximum).
1-based start and end positions of the aligned portion within the query.
1-based start and end positions of the aligned portion within the target.
Cumulative substitution-matrix alignment score. Higher scores indicate greater similarity.
A dash in the aligned sequence marks an insertion in the opposite sequence.
🔬 Applications
- Finding approximate occurrences of a short functional motif (e.g. an active-site residue pattern) within a full-length protein
- Identifying regions of a protein that could be mutated to match a desired epitope
- Comparing how well a peptide query matches different regions of a target protein
- Detecting conserved short motifs across distantly related sequences where exact matches are unlikely
⚠️ Common Mistakes & Warnings
The algorithm allocates an O(n × m) scoring matrix. Protein scoring with full substitution matrices is more expensive than nucleotide identity checks, so the target limit is kept small to ensure the calculation completes in reasonable time.
This keeps the scoring matrix manageable and ensures near-real-time results even for longer target sequences.
Any character that is not a standard single-letter amino acid code is removed from both the target and query before the search.
If you paste a multi-FASTA file only the first sequence is used as the target.