Cleaning up with sed 2

I have a list of nucleotide sequences in fasta format, obtained from NCBI. I want to clean the sequences names, leaving only the accession numbers. Additionally, I want to extract the accession numbers to a separate list. I am not that familiar with sed, since I use it mainly in my SlackBuilds to fix small things. Therefore, for my fuuture reference, I decided to sum things up.
Read the rest of this entry »


FASTA syntax highlight

FASTA is among the most common text formats for nucleotide or amino acid sequences. I use medit and a feature that I have been looking for is a syntax highlighting for FASTA files. I want to have colour code for nucleotides and amino acids, similarly to how scripting or programming languages are displayed by the editor. I have been looking for this feature for a long time and all I found was the fasta.lang at GNOME’s wiki and the fastalang at git. Both projects are for Gedit and support only nucleotide sequences. Let’s say neither of these did the job for me, but I appreciate very much both projects. I decided to do things myself.
Read the rest of this entry »