Protein Stats
Residue counts and percentages, including eight biologically relevant groups

Raw sequence or multi-FASTA format. Degenerate codes B, X, and Z are counted. Input limit: 500,000 characters.

💡 Quick Summary

Protein Stats returns the count and percentage of each amino acid residue in the sequence you enter. Percentage totals are also given for eight biologically meaningful residue groups — aliphatic, aromatic, sulphur-containing, basic, acidic, aliphatic hydroxyl, and the two tRNA synthetase classes — allowing you to quickly compare the composition of different sequences.

📋 How to Use
  1. Paste one or more protein sequences (raw or FASTA) into the text area.
  2. Click Submit. A residue composition table is displayed for each sequence.
  3. The upper section of each table shows individual residue counts (A–Z); the lower section shows group totals.
  4. Click Load Example to analyse two sample sequences.
  5. Use Copy All to copy the full text report to your clipboard.
🧮 Formulas & Logic
Count
Number of occurrences of the residue or group pattern in the cleaned sequence.
Percentage
(Count / sequence length) × 100, reported to two decimal places.
📊 Result Interpretation
Aliphatic (G,A,V,L,I)

Non-polar, hydrophobic residues that form the hydrophobic core of globular proteins.

Aromatic (F,W,Y)

Bulky hydrophobic residues; often buried or involved in stacking interactions.

Sulphur (C,M)

Cysteine can form disulfide bonds; methionine is rarely involved in cross-links.

Basic (K,R,H)

Positively charged at physiological pH. High in DNA-binding proteins.

Acidic (B,D,E,N,Q,Z)

Negatively charged or amidated residues. D and E are fully charged at pH 7.

Aliphatic hydroxyl (S,T)

Common phosphorylation targets; also involved in hydrogen bonding.

tRNA synthetase class I (Z,E,Q,R,C,M,V,I,L,Y,W)

Charged by class I aminoacyl-tRNA synthetases.

tRNA synthetase class II (B,G,A,P,S,T,H,D,N,K,F)

Charged by class II aminoacyl-tRNA synthetases.

🔬 Applications
  • Comparing amino acid composition between protein families or organisms
  • Estimating hydrophobicity and charge properties before wet-lab work
  • Checking for unusual amino acid distributions in designed or synthetic sequences
  • Identifying proteins likely to be membrane-associated (high aliphatic/aromatic content)
  • Verifying that a translated sequence has the expected stop codon or composition
⚠️ Common Mistakes & Warnings
Degenerate codes are counted

B (Asp or Asn), X (any residue), and Z (Glu or Gln) are retained and counted separately. They are also included in the acidic group (B, D, E, N, Q, Z). Sequences from automated pipelines may contain X residues that inflate group counts.

Non-standard characters stripped before counting

Digits, whitespace, gap characters, stop codons (*), and any letter not in the standard 20 + B/X/Z set are removed before statistics are calculated. The reported length reflects the cleaned sequence.

❓ Frequently Asked Questions

Why are B, X, and Z included?
These are standard IUPAC degenerate amino acid codes that appear in sequences downloaded from databases. B represents Asp or Asn, Z represents Glu or Gln, and X represents any amino acid. They are counted separately so you can see how many ambiguous residues your sequence contains.
Do the group percentages add up to 100%?
No. Group categories are not mutually exclusive and are not designed to sum to 100%. For example, Trp (W) appears in both the Aromatic group and tRNA synthetase class I. Individual residue percentages sum to 100% (excluding B, X, Z if they are absent).
What is the tRNA synthetase classification based on?
Aminoacyl-tRNA synthetases (aaRS) are divided into two classes based on their active site architecture. Class I enzymes have a Rossmann fold and typically aminoacylate the 2′-OH of the terminal adenosine; class II enzymes have a seven-stranded antiparallel β-sheet and aminoacylate the 3′-OH.