Dataset for GOASVM (Eukaryote)

This dataset contains 608 single-label eukaryotic protein sequences that were added to Swiss-prot between 08-Mar-2011 and 18-Apr-2012. The proteins are dividied into 14 subcellular locations (note there were no new proteins located in "Centriole" and "Cyanelle" during this period). Both the accession numbers and sequences are given. The sequence similarity of this dataset is no more than 25%. Please refer to the paper for more explanation.

Download the dataset here.