Combine FASTA
Merge multiple FASTA records into one combined sequence

Paste one or more FASTA records. For multiple sequences, each must begin with a ">" header line. A single sequence can be pasted with or without a header. Input limit: 500,000,000 characters.

💡 Quick Summary

Combine FASTA merges multiple FASTA sequence records into a single concatenated sequence, and reports key statistics: record count, total and average length, shortest and longest sequence, overall GC content, and detected sequence type (DNA / Protein).

📋 How to Use
  1. Paste one or more FASTA-formatted sequences into the input area. Each record starts with a >title line followed by sequence data.
  2. Click Analyze. All title lines are stripped, non-letter characters removed, and sequences concatenated in order.
  3. The Analysis Summary panel shows sequence statistics immediately above the combined output.
  4. Use the Copy button to copy the full FASTA output to your clipboard.
  5. Click Load Example to try the tool with two sample records.
  6. Click Clear to reset and start again.
🧮 Formulas & Logic
Combination
combined = seq₁ + seq₂ + … + seqN (non-letter characters stripped from each record before joining)
GC content
GC% = (G + C residues) ÷ total length × 100
Average length
avg = total length ÷ number of records
Output header
>results for [total] residue sequence made from [N] records, starting "[first_10_chars]"
📊 Result Interpretation
Total Sequences

The number of individual FASTA records found. A record begins with a line starting ">". Input with no ">" lines is treated as a single bare-sequence record.

GC Content

Percentage of G and C residues in the combined sequence. Values in the 40–60% range are typical for most organisms.

🔬 Applications
  • Calculating codon usage across a set of coding sequences when a single-sequence input is required
  • Concatenating orthologous loci from multiple organisms for multi-locus phylogenetic analysis
  • Merging exon sequences before ORF finding or translation
  • Combining replicate reads or assembled contigs into a single entry for length statistics
  • Pre-processing multi-FASTA files for tools that accept only one sequence at a time
⚠️ Common Mistakes & Warnings
Record order is preserved

Sequences are concatenated in the exact order they appear in the input. Rearrange records before analyzing if order matters for your downstream application.

All header lines are discarded

Title lines (starting with ">") are stripped entirely. The output carries only a generated summary header. Note down original identifiers beforehand if they are needed later.

Non-letter characters are silently removed

Digits, spaces, hyphens, asterisks, and all non-alphabetic characters are removed before concatenation. This includes numbering added by sequence editors. Check your input format if the result length is unexpectedly short.

❓ Frequently Asked Questions

What is FASTA format?
FASTA format is a text-based representation of nucleotide or protein sequences. Each record begins with a description line starting with ">", followed by one or more lines of sequence characters. Multiple records can appear in the same text block.
Do I need a ">" header line?
No. If the input contains no ">" lines the entire text is treated as a single bare sequence. The ">" line is only required to separate multiple records from each other.
Are gaps and stop-codon symbols (*) removed?
Yes. The tool strips all non-letter characters — gaps (–), asterisks (*), digits, whitespace — and retains only the 26 alphabetic characters (A–Z), matching the original SMS behaviour.
Is there an input size limit?
The tool accepts up to 500,000,000 characters of input text, matching the original SMS limit.