What is FASTA format?

FASTA format is a text format to represent protein peptide sequences (or nucleotide sequences), in which amino acids (or nucleotides) are represented using single-letter codes. The format may have sequence names and comments to precede the sequences. The format originates from the FASTA software package, but now has been regarded a standard in the field of bioinformatics.
  • A sequence in FASTA format consists of:
    • One line starting with a ">" sign, followed by a sequence identification code.
      It is optionally be followed by a textual description of the sequence. Since it is not part of the official description of the format, software can choose to ignore this, when it is present.
    • One or more lines containing the sequence itself.
  • A file in FASTA format may comprise more than one sequence.

  • The FASTA format is sometimes also referred to as the "Pearson" format (after the author of the FASTA program and ditto format).
  • Example:
    >sp|Q9LW07|PGLR3_ARATH Probable polygalacturonase At3g15720 OS=Arabidopsis thaliana GN=At3g15720 PE=1 SV=1 MKKKTWFLNFSLFFLQIFTSSNALDVTQFGAVGDGVTDDSQAFLKAWEAVCSGTGDGQFV

What is UniProtKB Accession Number format?

In UniProtKB, when an entry is integrated into the database, it is assigned a unique stable identifier which may be used to cite the UniProtKB entry, which is called "Primary (citable) accession number". Entries in UniProtKB may have more than one accession number due to merging or splitting the entries.

UniProtKB accession numbers consist of 6 alphanumerical characters in the format:

1 2 3 4 5 6
[A-N,R-Z] [0-9] [A-Z] [A-Z, 0-9] [A-Z, 0-9] [0-9]
[O,P,Q] [0-9] [A-Z, 0-9] [A-Z, 0-9] [A-Z, 0-9] [0-9]


Example: Q9LW07