RefSeq accession numbers

When working with biological sequences it is, of course, important to know the (accession) Prefix code meaning. This is not a tutorial or anything Linux-related, but simply a note for my future reference.

So, for RefSeq at NCBI, here it is. A comprehensive list:

Prefix	Molecule type 	Comment
AC_ 	Genomic 	Complete genomic molecule, usually alternate assembly
NC_ 	Genomic 	Complete genomic molecule, usually reference assembly
NG_ 	Genomic 	Incomplete genomic region
NT_ 	Genomic 	Contig or scaffold, clone-based or WGSa
NW_ 	Genomic 	Contig or scaffold, primarily WGSa
NZ_b 	Genomic 	Complete genomes and unfinished WGS data
NM_ 	mRNA 		Protein-coding transcripts (usually curated)
NR_ 	RNA 		Non-protein-coding transcripts
XM_c 	mRNA 		Predicted model protein-coding transcript
XR_c 	RNA 		Predicted model non-protein-coding transcript
AP_ 	Protein 	Annotated on AC_ alternate assembly
NP_ 	Protein 	Associated with an NM_ or NC_ accession
YP_c 	Protein 	Annotated on genomic molecules without an instantiated transcript record
XP_c 	Protein 	Predicted model, associated with an XM_ accession
WP_ 	Protein 	Non-redundant across multiple strains and species

Where:
a: Whole Genome Shotgun sequence data.
b: An ordered collection of WGS sequence for a genome.
c: Computed.

The table is from here. More info here.



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s