Changing File Encoding Using Ruby 1.9.2
written by Paul on January 3rd, 2011 @ 06:22 AM
Currently, I am in the process of upgrading an application from Ruby 1.8.7 to Ruby 1.9.2. One of the big differences between 1.8 and 1.9 is the multi-byte character support.
We have thousands of static html files that were generated in Ruby 1.8 and when Ruby 1.9 reads them it fails. As usual, before I start to dig in to solving the problem I do a quick search and see what other people have been doing to solve the problem. My search yielded a bunch of multi-lined scripts and techniques… most of which were from the Ruby 1.8 days.
In short I wrote a simple 4 lined script in
irb and it completed my task quickly. One thing that I am really happy about it how Ruby 1.9.2 strings have a method called
escape that provides great utility when performing these kinds of tasks.
So here is the code:
`find . -name '*.html'`.split("\n").each do |filename| puts filename handle = File.open(filename,"w+") handle.write(handle.read.encode('UTF-8')) handle.close end; nil
If you are interested in the options with the encode method, go check them out.