Restriction Digest - TheBiologyBro

Q: What does the cut distance number mean?

In the enzyme option value string (e.g. /ggatcc/ (BamHI g|gatcc)5), the trailing number is the distance in bases from the end of the match to the cut site on the top strand. For a 6-bp site with the number 5, the cut is after the 1st base. For a blunt cutter (number = half the recognition sequence length), both strands are cut at the same position.

💡 Quick Summary

Restriction Digest performs a virtual restriction digest of one or more DNA sequences using up to three restriction enzymes simultaneously. You can digest linear or circular molecules, including mixtures of sequences in multi-FASTA format. The resulting fragments are sorted by size and each is reported in FASTA format with its length, position in the original sequence, and the enzyme sites that produced it.

📋 How to Use

Paste one or more DNA sequences (raw or FASTA) into the text area.
Select the molecule topology: linear (default) or circular.
Select up to three restriction enzymes. Enzyme 1 is required; enzymes 2 and 3 default to "nothing" (no digestion).
Click Submit. Fragments are displayed sorted from largest to smallest, each with its source sequence, position range, and the enzyme pair that produced it.
Click Load Example to digest a sample pBR322 sequence with AluI.
Use Copy All to copy the FASTA fragment report to your clipboard.

🧮 Formulas & Logic

Cut position (linear)

matchEnd − cutDistance, where matchEnd is the regex lastIndex after each match and cutDistance is the number of bases from the end of the recognition site to the cut.

Cut position (circular)

Same as linear but applied to an extended sequence (±50 bp wraparound). Positions outside the original sequence length are discarded.

Fragment sizes

Fragments are sorted descending by sequence length, then ascending by start position for equal-length fragments.

📊 Result Interpretation

Linear fragments

Each fragment entry reports: SIZE bp linear fragment from TOPOLOGY parent TITLE, base START to base STOP (ENZYME1 - ENZYME2).

Circular molecules uncut

If no enzyme site is found in a circular molecule, it is reported as: SIZE bp circular molecule from circular parent TITLE.

Enzyme pair labels

"sequence start" and "sequence end" indicate the natural ends of an uncut linear molecule. All other labels are enzyme names.

Multiple sequences

All fragments from all input sequences are pooled and sorted together by size — simulating running a mixed digest on a gel.

🔬 Applications

Predicting fragment sizes before running a restriction digest gel
Verifying a cloning strategy by checking expected fragment patterns
Identifying diagnostic restriction sites that distinguish two sequences
Designing Southern blot probes by finding appropriate fragment sizes
Confirming the orientation of an insert by asymmetric digest pattern

⚠️ Common Mistakes & Warnings

Up to three enzymes only

This tool supports digestion with one, two, or three enzymes in a single run. For additional enzymes, perform successive digests using the previous run's output.

IUPAC degenerate bases in enzyme patterns

Some enzymes (e.g. HinfI g|antc, HincII gty|rac) use IUPAC degenerate codes in their recognition sequences. The sequence you enter should contain only standard bases (A, T, G, C) for reliable results; degenerate input bases are retained by removeNonDna() but may affect site detection.

Dam/Dcm methylation not modelled

This is a purely sequence-based virtual digest. Methylation-sensitive enzymes (e.g. ClaI blocked by Dam methylation at ATCGAT) may show more sites in silico than in a real digest.

❓ Frequently Asked Questions

How is a circular molecule handled?

For a circular molecule, the sequence is virtually extended with 50 bp from the end prepended and 50 bp from the start appended before searching. Cut sites that fall within the original sequence length are recorded. If one or more cuts are found, the circular molecule is linearised at the first cut site, and the fragment spanning the origin (last cut to end + start to first cut) is joined into a single linear fragment.

Why do I see more fragments than expected?

Some enzyme entries share the same recognition sequence (e.g. SacI and SstI both recognise GAGCTC, SalI and AccI overlap). If you select two enzymes with identical recognition sequences, you will effectively get twice as many cuts. Also, very short recognition sequences (4-cutters like AluI, TaqI) cut frequently in random sequence.

What does the cut distance number mean?

In the enzyme option value string (e.g. /ggatcc/ (BamHI g|gatcc)5), the trailing number is the distance in bases from the end of the match to the cut site on the top strand. For a 6-bp site with the number 5, the cut is after the 1st base. For a blunt cutter (number = half the recognition sequence length), both strands are cut at the same position.