Pairwise Align Protein - TheBiologyBro

💡 Quick Summary

Pairwise Align Protein accepts two protein sequences and determines the optimal global alignment. Choose from BLOSUM45, BLOSUM62, BLOSUM80, PAM30, or PAM70 substitution matrices with affine gap penalties. Use this tool to look for conserved sequence regions between two proteins.

📋 How to Use

Paste the first protein sequence (raw or FASTA) into Sequence One. Input limit is 20,000 characters.
Paste the second protein sequence (raw or FASTA) into Sequence Two. Input limit is 20,000 characters.
Select a Scoring Matrix. BLOSUM62 is the default and works well for most comparisons.
Set the Gap opening penalty — cost paid once when a gap is started.
Set the Gap extension penalty — cost paid for each additional position in a gap.
Click Submit. The aligned sequences are shown in FASTA format with the alignment score.
Click Load Example to align the Human and Xenopus p53 proteins from the original SMS.
Use the Copy button to copy the output to your clipboard.

🧮 Formulas & Logic

Alignment algorithm

Hirschberg divide-and-conquer (linear space, O(n) memory) with affine-gap Needleman–Wunsch DP for sub-problems

Scoring matrices

BLOSUM45, BLOSUM62, BLOSUM80 (Henikoff & Henikoff, 1992); PAM30, PAM70 (Dayhoff et al., 1978)

Affine gap cost

Gap of length k costs: gap_open + (k − 1) × gap_extend

📊 Result Interpretation

Aligned output

Gaps in the alignment are represented as "-"

Alignment score

Sum of substitution scores minus gap penalties; higher is better

Seq 1 length

Number of amino acid residues in the first input sequence

Seq 2 length

Number of amino acid residues in the second input sequence

Matrix used

The substitution matrix selected for scoring

🔬 Applications

Comparing orthologous proteins across different species
Identifying conserved functional domains between two protein sequences
Assessing overall sequence similarity before a BLAST search
Aligning two protein variants (e.g. wild-type vs. mutant) to locate changes

⚠️ Common Mistakes & Warnings

Non-amino acid characters are removed

Characters that are not valid IUPAC amino acid symbols are stripped before alignment.

Long sequences may be slow

The Hirschberg algorithm is O(nm) time but O(n) space. Sequences near the 20,000 character limit may take several seconds in the browser.

Matrix choice affects results

BLOSUM62 is best for sequences with ~40–60% identity. Use BLOSUM80 for highly similar sequences, BLOSUM45 or PAM matrices for more distantly related ones.

❓ Frequently Asked Questions

Which scoring matrix should I use?

BLOSUM62 is the standard choice for most pairwise alignments. Use BLOSUM80 for closely related sequences (>60% identity) or BLOSUM45/PAM matrices for distantly related sequences.

What gap penalties should I use?

A gap-open of 12 and gap-extend of 2 is the default for protein alignment. Higher gap-open values favour fewer, longer gaps.

What does "-" mean in the output?

A "-" character represents a gap introduced into one sequence to maximise the global alignment score.