Pairwise Align DNA - TheBiologyBro

💡 Quick Summary

Pairwise Align DNA accepts two DNA sequences and determines the optimal global alignment using the EDNAFULL substitution matrix and affine gap penalties. Use Pairwise Align DNA to look for conserved sequence regions between two DNA sequences.

📋 How to Use

Paste the first DNA sequence (raw or FASTA) into Sequence One. Input limit is 20,000 characters.
Paste the second DNA sequence (raw or FASTA) into Sequence Two. Input limit is 20,000 characters.
Set the Gap opening penalty — cost paid once when a gap is started.
Set the Gap extension penalty — cost paid for each additional position in a gap.
Click Submit. The aligned sequences are shown in FASTA format with the alignment score.
Click Load Example to align the two sample sequences from the original SMS.
Use the Copy button to copy the output to your clipboard.

🧮 Formulas & Logic

Alignment algorithm

Hirschberg divide-and-conquer (linear space, O(n) memory) with affine-gap Needleman–Wunsch DP for sub-problems

Scoring matrix

EDNAFULL (15×15 IUPAC DNA substitution matrix)

Affine gap cost

Gap of length k costs: gap_open + (k − 1) × gap_extend

📊 Result Interpretation

Aligned output

Gaps in the alignment are represented as "-"

Alignment score

Sum of EDNAFULL substitution scores minus gap penalties; higher is better

Seq 1 length

Number of valid DNA bases in the first input sequence

Seq 2 length

Number of valid DNA bases in the second input sequence

🔬 Applications

Comparing homologous gene sequences from different species
Identifying conserved regulatory elements between two genomic regions
Aligning two alleles or splice variants of the same gene
Quick global comparison before running a BLAST search

⚠️ Common Mistakes & Warnings

Non-DNA characters are removed

Characters that are not valid IUPAC DNA/RNA symbols are stripped before alignment. U is accepted and treated as T.

Long sequences may be slow

The Hirschberg algorithm is O(nm) time but O(n) space. Sequences near the 20,000 character limit may take several seconds in the browser.

Affine gaps reward longer gaps

With affine penalties, opening a new gap is more costly than extending an existing one — biologically, a single insertion/deletion event often spans multiple bases.

❓ Frequently Asked Questions

What is the EDNAFULL matrix?

EDNAFULL is a 15×15 DNA substitution scoring matrix that handles all IUPAC ambiguity codes (A, T, G, C, S, W, R, Y, K, M, B, V, H, D, N). It rewards exact matches (+5) and penalises mismatches based on nucleotide similarity.

What gap penalties should I use?

A gap-open of 16 and gap-extend of 4 is the default for DNA alignment using the EDNAFULL matrix. Lower gap-open values allow more gaps; higher values favour fewer, longer gaps.

What does "-" mean in the output?

A "-" character represents a gap introduced into one sequence to maximise the global alignment score.