Cleaning up with sed 2

I have a list of nucleotide sequences in fasta format, obtained from NCBI. I want to clean the sequences names, leaving only the accession numbers. Additionally, I want to extract the accession numbers to a separate list. I am not that familiar with sed, since I use it mainly in my SlackBuilds to fix small things. Therefore, for my fuuture reference, I decided to sum things up.
Read the rest of this entry »


FASTA syntax highlight

FASTA is among the most common text formats for nucleotide or amino acid sequences. I use medit and a feature that I have been looking for is a syntax highlighting for FASTA files. I want to have colour code for nucleotides and amino acids, similarly to how scripting or programming languages are displayed by the editor. I have been looking for this feature for a long time and all I found was the fasta.lang at GNOME’s wiki and the fastalang at git. Both projects are for Gedit and support only nucleotide sequences. Let’s say neither of these did the job for me, but I appreciate very much both projects. I decided to do things myself.
Read the rest of this entry »


Converting sequences by squizz

Sometimes I need to have a DNA or amino acid sequence (or an alignment) in several different formats. A neat little program that can convert between different sequence formats is squizz. Its primary function is to serve as a sequence format file checker, but can do some conversions. Read the rest of this entry »


Geneconv: detect gene conversion

Geneconv is a program for detecting gene conversion between aligned DNA sequences. It can also search for gene conversion fragments from outside the alignment. The output results are ranked by P-values and presented in a spreadsheet manner. The data are valuable for bioinformatics studies and papers that deal with evolution.
Read the rest of this entry »


ALINE: colouring, numbering and annotation

Aline is an extensible WYSIWYG protein sequence alignment editor for publication quality figures. It canĀ  read common sequence alignment formats which the user can then alter, embellish, markup etc to produce the kind of sequence figure commonly found in biochemical articles. Read the rest of this entry »