RefSeq accession numbersPosted: 2018-06-11
When working with biological sequences it is, of course, important to know the (accession) Prefix code meaning. This is not a tutorial or anything Linux-related, but simply a note for my future reference.
So, for RefSeq at NCBI, here it is. A comprehensive list:
Prefix Molecule type Comment AC_ Genomic Complete genomic molecule, usually alternate assembly NC_ Genomic Complete genomic molecule, usually reference assembly NG_ Genomic Incomplete genomic region NT_ Genomic Contig or scaffold, clone-based or WGSa NW_ Genomic Contig or scaffold, primarily WGSa NZ_b Genomic Complete genomes and unfinished WGS data NM_ mRNA Protein-coding transcripts (usually curated) NR_ RNA Non-protein-coding transcripts XM_c mRNA Predicted model protein-coding transcript XR_c RNA Predicted model non-protein-coding transcript AP_ Protein Annotated on AC_ alternate assembly NP_ Protein Associated with an NM_ or NC_ accession YP_c Protein Annotated on genomic molecules without an instantiated transcript record XP_c Protein Predicted model, associated with an XM_ accession WP_ Protein Non-redundant across multiple strains and species
a: Whole Genome Shotgun sequence data.
b: An ordered collection of WGS sequence for a genome.