💡 Quick Summary
CpG Islands scans a DNA sequence for regions where CpG dinucleotides occur more frequently than expected by chance and the GC content is elevated. These regions — called CpG islands — are often found near the promoters of vertebrate genes and can indicate transcription start sites.
📋 How to Use
- Paste a raw DNA sequence or one or more FASTA sequences into the input area. Input limit is 100,000,000 characters.
- Optionally adjust the Window size (default 200 bp) and Obs/Exp cutoff (default 0.6).
- Click Run. Each detected CpG island is reported with its start and end position, Obs/Exp ratio, and %GC.
- Use the Copy button to copy the results to your clipboard.
- Click Load Example to try with a sample genomic sequence.
🧮 Formulas & Logic
📊 Result Interpretation
Number of FASTA records successfully scanned.
Total number of windows that met both the Obs/Exp and %GC thresholds.
The standard threshold indicating CpG dinucleotides are not strongly suppressed.
The region is GC-rich, a hallmark of CpG islands in vertebrate genomes.
🔬 Applications
- Identifying potential gene promoter regions in vertebrate genomic sequences
- Locating transcription start sites upstream of known or predicted genes
- Assessing methylation patterns — CpG islands in promoters are often unmethylated in expressed genes
- Annotating novel genomic sequences from sequencing projects
- Comparing CpG island density between organisms or genomic regions
⚠️ Common Mistakes & Warnings
If the input sequence is shorter than the window size (default 200 bp), no analysis can be performed. Use a longer sequence or reduce the window size.
Any character that is not a valid IUPAC DNA letter is removed before analysis. FASTA header lines are ignored automatically.
Adjacent windows that all meet the thresholds are listed separately, not merged into a single island region. This matches the original Gardiner-Garden & Frommer method.