Transliteration in PHP

This one time, I read this article about how Unicode is for foreigners, and everybody should learn English. It’s not quite as offensive as it sounds, but it’s pretty rude. It was awesome.

Anyway, at the end, he called for transliteration implementations in languages besides ones that suck (my words, not his); here of course I’m referring to Perl, Python and Java. Luckily, PHP is so blindingly awesome that it only requires a function call to do such things. No compiling extensions or importing libraries.

$string = 'Möbius FTW because my name is Rölph Diäålysis';
echo $string . "\n" . iconv('UTF-8', 'ASCII//TRANSLIT', $string);

//The console says this:
/*
Möbius FTW because my name is Rölph Diäålysis
M"obius FTW because my name is R"olph Di"aalysis
*/

Obviously, it’s not totally perfect (it tries to approximate the look of them as closely as possible, which is why quotation marks replace the umlauts). This is compiled into PHP by default unless you’re stupid and decide to disable the iconv extension for no reason. Don’t do that. I believe it’s un-disablable in PHP 5.3.

You can also use the strtr function, but then you have to pass the translation characters yourself.

$string = 'Möbius FTW because my name is Rölph Diäålysis';
echo $string . "\n" . strtr($string, array('ö' => 'o', 'ä' => 'a', 'å' => 'a'));

//The console says:
/*
Möbius FTW because my name is Rölph Diäålysis
Mobius FTW because my name is Rolph Diaalysis
*/

You could fairly easily and non-painlessly write a reusable library function in PHP using strtr() that will convert from unicode to ASCII, in the spirit of this.Obviously you wouldn’t want to write a conversion array for 65000 unicode characters, but you could at least do the characters you use most often.

Another option is the translit PECL extension. This would require you to compile an extension, though.

August 22, 2009   Posted in: php, unicode

Leave a Reply