Cleaning up with sed
Posted: 2017-03-31 Filed under: academic | Tags: clean, phylogenetic tree, phylogeny, SeaView, sed, TimeTree, txt Leave a commentI needed to clean up some plain text files from data and characters I did not want.
To be more precise, I have a phylogenetic tree, obtained from the TimeTree web-site. Species presented there are mainly birds, several reptiles and a fish. The tree has branches length information and node labels, which I wanted to remove.
This is the tree text file:
(Latimeria_chalumnae:412.98864059,(Anolis_carolinensis:279.65697667,(((Meleagris_gallopavo:79.95541635,Anas_platyrhynchos:79.95541635)'14':18.08745294,(Cuculus_canorus:85.20000000,((Melopsittacus_undulatus:82.45491770,((Ficedula_albicollis:43.70000000,(Geospiza_fortis:38.00000000,Taeniopygia_guttata:38.00000000)'13':5.70000000)'11':0.30000000,Corvus_brachyrhynchos:44.00000000)'10':38.45491770)'19':0.65432695,(Opisthocomus_hoazin:82.10000000,Falco_peregrinus:82.10000000)'9':1.00924465)'22':2.09075535)'8':12.84286929)'6':155.69133889,(Chrysemys_picta:100.65271923,Chelonia_mydas:100.65271923)'30':153.08148895)'29':25.92276848)'27':133.33166392);
And it looks like this when presented graphically (thanks SeaView!):
To remove the branches lengths and node labels data, I used sed
to delete certain characters: '
, :
and .
, as well as numbers.
To remove '
, :
and .
:
sed -i "s|[':.]||g" species.tree
To remove the numbers:
sed -i "s|[0-9]||g" species.tree
I did it in two steps, but sure it could be combined. Now the tree looks like this:
(Latimeria_chalumnae,(Anolis_carolinensis,(((Meleagris_gallopavo,Anas_platyrhynchos),(Cuculus_canorus,((Melopsittacus_undulatus,((Ficedula_albicollis,(Geospiza_fortis,Taeniopygia_guttata)),Corvus_brachyrhynchos)),(Opisthocomus_hoazin,Falco_peregrinus)))),(Chrysemys_picta,Chelonia_mydas))));
That’s it!