Changing File Encoding Using Ruby 1.9.2
Changing File Encoding Using Ruby 1.9.2
written by Paul on January 3rd, 2011 @ 06:22 AM
Currently, I am in the process of upgrading an application from Ruby 1.8.7 to Ruby 1.9.2. One of the big differences between 1.8 and 1.9 is the multi-byte character support.
The Problem
We have thousands of static html files that were generated in Ruby 1.8 and when Ruby 1.9 reads them it fails. As usual, before I start to dig in to solving the problem I do a quick search and see what other people have been doing to solve the problem. My search yielded a bunch of multi-lined scripts and techniques… most of which were from the Ruby 1.8 days.
The Solution
In short I wrote a simple 4 lined script in irb and it completed my task quickly. One thing that I am really happy about it how Ruby 1.9.2 strings have a method called escape that provides great utility when performing these kinds of tasks.
So here is the code:
`find . -name '*.html'`.split("\n").each do |filename|
puts filename
handle = File.open(filename,"w+")
handle.write(handle.read.encode('UTF-8'))
handle.close
end; nil
If you are interested in the options with the encode method, go check them out.




Dir['**/*.html'].each do |filename|
puts filename
File.open(filename, ‘w+’) do |handle|
handle.write(handle.read.encode(‘UTF-8′))
end
end; nil
Formatting lost in the last comment. Textile doesn’t like me.
Have a look at the posted code in “this gist”:https://gist.github.com/764563
You are right, Jan, the Dir['*/.html'] is a better approach, thanks for sharing.