Automatic transliteration

Recently, there was an interesting discussion in #slackbuilds @libera.chat with lockywolf about Cyrillic to Latin transliteration. I never name files or folders in Cyrillic (a habit left from the Windows 9* times), but others do, so it’s not a bad idea to be able to automatically achieve this.

So, he posted the following line as a test:

echo "Привет мир" | uconv -x "Any-Latin; Latin-ASCII;"

Works fine, returning “Privet mir”, which means “Hello world” in Russian. I am а Bulgarian, so naturally I tried some words in my own language and noticed that certain characters were not converted to anything meaningful. Let’s see how the Bulgarian alphabet gets transliterated. Doing analogously, as above:

echo "А Б В Г Д Е Ж З И Й К Л М Н О П Р С Т У Ф Х Ц Ч Ш Щ Ъ ь Ю Я" | uconv -x "Any-Latin; Latin-ASCII;"

Yields the following substitutions:

А Б В Г Д Е Ж З И Й К Л М Н О П Р С Т У Ф Х Ц Ч Ш Щ Ъ ь Ю Я
A B V G D E Z Z I J K L M N O P R S T U F H C C S S "̱ ' U A

So, if something has ъ or ь in the name, that’s going to be a problem. Recently, I used a program called detox, meant to deal exactly with stuff like this. It is available from SBo.

So, let’s say I have a folder, called “Photos”, containing sub-folders “Дъблин” and “Кьолн”. So, to batch convert, navigate there and do:

for i in * ; do mv $i `echo "$i" | uconv -x "Any-Latin; Latin-ASCII;"` ; done

To further fix the names with detox, go one level up and do this on the parent folder:

detox Photos/

This will result in “D_blin” and “K_oln”, which should correspond to “Dublin” and “Köln”. Well, better than nothing…



Leave a comment