💡 Quick Summary
Filter DNA removes or replaces unwanted characters from a DNA sequence using a choice of preset patterns, then optionally converts the remaining text to uppercase or lowercase. It is the fastest way to strip line numbers, spaces, digits, or non-IUPAC characters from copied sequence text so it is ready for downstream tools.
📋 How to Use
- Paste any DNA text — numbered sequences, FASTA bodies, or raw text — into the input area.
- Choose what to Remove: pick the pattern that matches the characters you want to discard. The most common choice is "Remove non-GATCN characters", which strips everything except valid DNA bases and N.
- Choose what to Replace with: by default the matched characters are deleted. You can instead substitute them with a placeholder such as N, a gap character (−), or any other single character.
- Choose a Case conversion: leave case unchanged, convert all bases to uppercase, or convert all to lowercase.
- Click Filter. The output is a FASTA entry whose header line states the final sequence length.
- Use Copy to copy the result to your clipboard.
- Click Load Example to try the tool with a numbered DNA sequence — a format frequently encountered when copying from databases or textbooks.
- Click Clear to reset.
🧮 Formulas & Logic
📊 Result Interpretation
Number of characters in the output sequence after all replacements and case conversion.
Number of input characters that matched the selected removal pattern. If "replace with nothing" was chosen, these were deleted; otherwise each was swapped for the chosen placeholder.
🔬 Applications
- Removing line numbers and whitespace from numbered sequences copied from databases or textbooks
- Stripping all non-IUPAC characters from mixed text before BLAST or alignment
- Converting a DNA sequence to uppercase for tools that require it
- Replacing T with nothing to prepare a sequence for RNA-focused tools that expect U
- Masking ambiguous positions by replacing non-GATCN characters with N
⚠️ Common Mistakes & Warnings
Replacing T with U, or stripping all non-lowercase letters, produces a new character string but no biological check is performed. Review the output before using it in an analysis.
If your input includes a FASTA ">" header line, the characters in that line are also subject to the selected filter. Strip the header before filtering if you want to preserve it, or use the output header generated by the tool.
This matches the original SMS limit and is intentionally larger than the EMBL tools, since raw sequence text files can be very large.