Input Sequences

💡 Quick Summary

Combine FASTA merges multiple FASTA sequence records into a single concatenated sequence, and reports key statistics: record count, total and average length, shortest and longest sequence, overall GC content, and detected sequence type (DNA / Protein).

📋 How to Use

Paste one or more FASTA-formatted sequences into the input area. Each record starts with a >title line followed by sequence data.
Click Analyze. All title lines are stripped, non-letter characters removed, and sequences concatenated in order.
The Analysis Summary panel shows sequence statistics immediately above the combined output.
Use the Copy button to copy the full FASTA output to your clipboard.
Click Load Example to try the tool with two sample records.
Click Clear to reset and start again.

🧮 Formulas & Logic

Combination

combined = seq₁ + seq₂ + … + seqN (non-letter characters stripped from each record before joining)

GC content

GC% = (G + C residues) ÷ total length × 100

Average length

avg = total length ÷ number of records

Output header

>results for [total] residue sequence made from [N] records, starting "[first_10_chars]"

📊 Result Interpretation

Total Sequences

The number of individual FASTA records found. A record begins with a line starting ">". Input with no ">" lines is treated as a single bare-sequence record.

GC Content

Percentage of G and C residues in the combined sequence. Values in the 40–60% range are typical for most organisms.

🔬 Applications

Calculating codon usage across a set of coding sequences when a single-sequence input is required
Concatenating orthologous loci from multiple organisms for multi-locus phylogenetic analysis
Merging exon sequences before ORF finding or translation
Combining replicate reads or assembled contigs into a single entry for length statistics
Pre-processing multi-FASTA files for tools that accept only one sequence at a time

⚠️ Common Mistakes & Warnings

Record order is preserved

Sequences are concatenated in the exact order they appear in the input. Rearrange records before analyzing if order matters for your downstream application.

All header lines are discarded

Title lines (starting with ">") are stripped entirely. The output carries only a generated summary header. Note down original identifiers beforehand if they are needed later.

Non-letter characters are silently removed

Digits, spaces, hyphens, asterisks, and all non-alphabetic characters are removed before concatenation. This includes numbering added by sequence editors. Check your input format if the result length is unexpectedly short.

❓ Frequently Asked Questions

What is FASTA format?

FASTA format is a text-based representation of nucleotide or protein sequences. Each record begins with a description line starting with ">", followed by one or more lines of sequence characters. Multiple records can appear in the same text block.

Do I need a ">" header line?

No. If the input contains no ">" lines the entire text is treated as a single bare sequence. The ">" line is only required to separate multiple records from each other.

Are gaps and stop-codon symbols (*) removed?

Yes. The tool strips all non-letter characters — gaps (–), asterisks (*), digits, whitespace — and retains only the 26 alphabetic characters (A–Z), matching the original SMS behaviour.

Is there an input size limit?

The tool accepts up to 500,000,000 characters of input text, matching the original SMS limit.