Cleaning up with sed

I needed to clean up some plain text files from data and characters I did not want.

To be more precise, I have a phylogenetic tree, obtained from the TimeTree web-site. Species presented there are mainly birds, several reptiles and a fish. The tree has branches length information and node labels, which I wanted to remove.

This is the tree text file:


And it looks like this when presented graphically (thanks SeaView!):

To remove the branches lengths and node labels data, I used sed to delete certain characters: ', : and ., as well as numbers.
To remove ', : and .:

sed -i "s|[':.]||g" species.tree

To remove the numbers:

sed -i "s|[0-9]||g" species.tree

I did it in two steps, but sure it could be combined. Now the tree looks like this:


That’s it!

