Here you can download the full dataset of manually curated sequences used in the classifier.

The files are in FASTA format and the header of each sequence contains four fields separated by a vertical bar (pipe) character (|) according to the structure:

  • species name | ncbi taxon id | sequence accession code | manual classification