Cleaning up with sed 2
Posted: 2017-07-08 Filed under: academic | Tags: clean, fasta, grep, sed, sequence Leave a commentI have a list of nucleotide sequences in fasta format, obtained from NCBI. I want to clean the sequences names, leaving only the accession numbers. Additionally, I want to extract the accession numbers to a separate list. I am not that familiar with sed
, since I use it mainly in my SlackBuilds to fix small things. Therefore, for my fuuture reference, I decided to sum things up.
Read the rest of this entry »
FASTA syntax highlight
Posted: 2017-06-25 Filed under: academic | Tags: amino acid, color code, DNA, fasta, gedit, highlight, medit, nucleotide, protein, sequence, syntax 1 CommentFASTA is among the most common text formats for nucleotide or amino acid sequences. I use medit and a feature that I have been looking for is a syntax highlighting for FASTA
files. I want to have colour code for nucleotides and amino acids, similarly to how scripting or programming languages are displayed by the editor. I have been looking for this feature for a long time and all I found was the fasta.lang at GNOME’s wiki and the fastalang at git. Both projects are for Gedit
and support only nucleotide sequences. Let’s say neither of these did the job for me, but I appreciate very much both projects. I decided to do things myself.
Read the rest of this entry »
Converting sequences by squizz
Posted: 2016-07-06 Filed under: academic | Tags: amino acid, convert, DNA, msa, protein, sequence, squizz Leave a commentSometimes I need to have a DNA or amino acid sequence (or an alignment) in several different formats. A neat little program that can convert between different sequence formats is squizz. Its primary function is to serve as a sequence format file checker, but can do some conversions. Read the rest of this entry »
Geneconv: detect gene conversion
Posted: 2016-07-04 Filed under: academic | Tags: conversion, DNA, gene, GENECONV, msa, sequence 3 CommentsGeneconv is a program for detecting gene conversion between aligned DNA sequences. It can also search for gene conversion fragments from outside the alignment. The output results are ranked by P-values and presented in a spreadsheet manner. The data are valuable for bioinformatics studies and papers that deal with evolution.
Read the rest of this entry »