Asciify

Today I had to map free text to plausible filenames, with the caveat that the text could contain UTF-8 characters with accents. Even though it is possible to have filenames with these characters, I wanted to end up with ASCII-only filenames for easier handling. Also, the filenames will be exposed via URLs, and just having ASCII there takes away a log of headaches. But how to convert this?

I quickly found the apparently wonderful Text::Unidecode for Perl which seemed to do anything I wanted, but since we build our web services with Ruby on Rails I needed a Ruby solution. I hoped that someone would already have created a ruby version of Text::Unidecode, but that’s not the case (or I could not find it). I did find the Asciify gem, though. Although simpler in design and reach than Text::Unidecode, it does enough for my purposes and custom mappings can be created for it.

Asciify’s documentation is pretty much non-existing, but some reading of the source code revealed that this was how I could convert my text:

  Asciify.new(Asciify::Mapping.new(:default, '_')).convert('some text')

The default replacement character for Asciify is a question mark, which makes sense in general, but not in URLs, so I opted to use the underscore character instead for lack of a better candidate. Since I’ve included the gem as a plugin in the Rails project I’ve just changed the default mapping to include some characters rather than using my own mapping.

Posted by Hans de Graaff Tue, 27 Feb 2007 15:59:24 GMT