Group DNA - TheBiologyBro

💡 Quick Summary

Group DNA adjusts the spacing of DNA sequences and adds numbering. Specify the group size (bases per group) and the number of bases per line. The output serves as a convenient annotated reference — the numbering and spacing let you quickly locate specific bases.

📋 How to Use

Paste a raw DNA sequence or one or more FASTA sequences into the textarea. Input limit: 100,000,000 characters.
Choose the group size: the number of bases in each space-separated block (3, 5, or 10).
Set bases per line: how many bases to display on each line (30–100; default 80).
Choose whether to show the reverse (complement) strand below each forward strand line.
Choose the numbering position: Left places the line start position before each line; Above places a ruler above the sequence block; Right places the line end position after each line.
Set the starting number (default 1) to offset the position counter — useful when displaying a subsequence of a longer molecule.
Click Run. Use Copy to copy the plain-text output.

🧮 Formulas & Logic

Position (left/right)

Line start (left) or end (right) = sequence offset + adjusted start + 1, counting only non-gap bases

Ruler (above)

Position numbers placed right-justified at every 10th base, spaced to match the grouped sequence

Complement strand

Each base is replaced with its Watson–Crick complement (A↔T, G↔C, with IUPAC ambiguity codes preserved); displayed antiparallel under the forward strand

📊 Result Interpretation

Group spacing

Spaces separate each block of bases, making it easier to count to a specific position within a line.

Position numbers

Show the cumulative base number at the start or end of each line (or as a ruler above), starting from the configured starting number.

Uppercase / lowercase

Case is preserved. Mixed-case input (e.g. lowercase introns, uppercase exons from a GenBank Feature Extractor) is displayed as-is.

Complement strand

Shown directly below the forward strand, antiparallel (3′→5′, left to right). Useful for locating restriction sites or primer binding sites on the opposite strand.

🔬 Applications

Creating a numbered reference sequence for a cloning project or PCR design
Displaying a gene sequence with codon grouping (group size 3) for reading frame analysis
Annotating a plasmid region for a publication figure
Quickly locating specific positions within a long sequence
Generating formatted sequence output that matches a lab notebook entry

⚠️ Common Mistakes & Warnings

Sequence must be pre-cleaned

Numbers, spaces, and non-letter characters are stripped automatically. Only IUPAC nucleotide letters (A, T, G, C, R, Y, K, M, S, W, B, D, H, V, N) are retained.

Negative starting numbers are supported

You can enter a negative starting number (e.g. −10) to display positions before position 1. This is useful for showing sequence context upstream of a feature.

❓ Frequently Asked Questions

Which group size should I use?

Use 3 to group by codons (helpful for reading frame analysis), 5 for a compact five-base spacing common in Sanger sequencing outputs, or 10 (default) for easy positional counting.

What does the starting number do?

It sets the position number assigned to the first base in your input. If you are displaying a subsequence extracted from positions 500–600 of a chromosome, enter 500 so the numbering reflects the original coordinates.

Why does my sequence look the same as the input?

The tool preserves case and sequence content — it only adds spacing and numbering. If you see unexpected characters removed, they were non-letter characters (digits, dashes, spaces) which are stripped before formatting.

How is the complement strand determined?

Each base is complemented according to the IUPAC Watson–Crick rules (A↔T, G↔C, R↔Y, K↔M, B↔V, D↔H, S→S, W→W, N→N). Case is preserved. The complement is shown reading left to right, which corresponds to the 3′→5′ direction of the complementary strand.