Cleaning up with sed

I needed to clean up some plain text files from data and characters I did not want.

To be more precise, I have a phylogenetic tree, obtained from the TimeTree web-site. Species presented there are mainly birds, several reptiles and a fish. The tree has branches length information and node labels, which I wanted to remove.

This is the tree text file:

(Latimeria_chalumnae:412.98864059,(Anolis_carolinensis:279.65697667,(((Meleagris_gallopavo:79.95541635,Anas_platyrhynchos:79.95541635)'14':18.08745294,(Cuculus_canorus:85.20000000,((Melopsittacus_undulatus:82.45491770,((Ficedula_albicollis:43.70000000,(Geospiza_fortis:38.00000000,Taeniopygia_guttata:38.00000000)'13':5.70000000)'11':0.30000000,Corvus_brachyrhynchos:44.00000000)'10':38.45491770)'19':0.65432695,(Opisthocomus_hoazin:82.10000000,Falco_peregrinus:82.10000000)'9':1.00924465)'22':2.09075535)'8':12.84286929)'6':155.69133889,(Chrysemys_picta:100.65271923,Chelonia_mydas:100.65271923)'30':153.08148895)'29':25.92276848)'27':133.33166392);

And it looks like this when presented graphically (thanks SeaView!):

To remove the branches lengths and node labels data, I used sed to delete certain characters: ', : and ., as well as numbers.
To remove ', : and .:

sed -i "s|[':.]||g" species.tree

To remove the numbers:

sed -i "s|[0-9]||g" species.tree

I did it in two steps, but sure it could be combined. Now the tree looks like this:

(Latimeria_chalumnae,(Anolis_carolinensis,(((Meleagris_gallopavo,Anas_platyrhynchos),(Cuculus_canorus,((Melopsittacus_undulatus,((Ficedula_albicollis,(Geospiza_fortis,Taeniopygia_guttata)),Corvus_brachyrhynchos)),(Opisthocomus_hoazin,Falco_peregrinus)))),(Chrysemys_picta,Chelonia_mydas))));

That’s it!

Leave a comment

Filed under Academic

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s