Geneconv: detect gene conversionPosted: 2016-07-04
Geneconv is a program for detecting gene conversion between aligned DNA sequences. It can also search for gene conversion fragments from outside the alignment. The output results are ranked by P-values and presented in a spreadsheet manner. The data are valuable for bioinformatics studies and papers that deal with evolution.
Geneconv is available at SBo and at the moment its version is 1.81a. The SlackBuild bundles into the package some example materials and extensive documentation, placed respectively in
/usr/doc/geneconv-1.81a. This post is by no means intended to be a hand-holding tutorial, it does not even cover the basics that well! It is merely meant as a quick reference for my future use.
Running the program takes the following format:
geneconv input-alignment.fasta output.frags /fl -nolog
In the example above:
input-alignment.fastais the file containing the multiple sequence alignment (MSA). It should be in a supported format: NEXUS, Pearson/FASTA, NBRF/PIR, CLUSTAL, ASF or PHYLIP interleaved.
output.fragsis the output file. If you do not specify a name, the output file will be named as the input-alignment file, but with the file extension
-flagsin case long synthaxis is used) lists any specified program settings and options for writing the output file. The list of available flags is long, so see the section below for a few examples.
-nologis needed if you get a “Segmentation fault” when the program writes to its log file (e.g. input-alignment.sum).
Some possible flags to use are:
/w123will start the random number generator of
geneconvat that value. This will guarantee that the output is the same, indifferent to the computer or time the program is run at.
geneconvto list pairwise significant fragments in addition to global lists.
/spwill make polymorphisms and their offsets to be written in the output file.
/sbwill tell the program to write the formulas for the BLAST-like weighted global scores to the output file.
/g1allows mismatches within fragments with a mismatch penalty of
/rtells geneconv that the alignment used is from a DNA coding region. Therefore the program will use codon polymorphisms instead of site polymorphisms. The alignment used should be a actually a codon aligment, generated for example by PAL2NAL.
/fmakes a fancier output.
To have an example, let’s take a look at the E. coli 6-phosphogluconate dehydrogenase coding region DNA. The MSA is found here:
/usr/geneconv/examples/gnd7.asf. Therefore, to run the program with the options from above:
geneconv gnd7.asf gnd7-output.frags /w123 /lp /sp /sb /g1 /r /f -nolog
Check the contents of the
gnd7-output.frags locally in a text editor. The program finds 3 global inner fragments (GI) and 5 pairwise inner (PI) fragments. You can compare the output with the examples
/usr/share/geneconv/gnd7.frags and especially
The documentation that comes with the program is a highly recommended reading material.