MySQL on the move from Latin1 to UTF8

A few days a go I had to move a WordPress blog from one server to another and it turned out to be a bigger project than I had originally thought due to the character set being set to Latin1 on the old server and about 180+ posts that were copied in from Microsoft Word containing strange opening and closing quotes and hyphens. When I did a dump of the database and then reimported the data in to a utf8 database man strange characters showed up in the post. I did what I usually do in situations and started to Google for an explanation. I found this article and it referenced this article and here is what I ended up doing to get the issue solved.

I opened up the raw sql dump file in less and saw the strange characters in the test and they looked something like this:
Don<C3><A2><E2><82><AC><E2><84><A2>t

I looked at the context of the skewed characters and saw immediately that it was an apostrophe that was made “special” by Word and then copied into WordPress. I removed the “< ” and “>” and got C3A2E282ACE284A2 which I then put in the queries that were posted on the articles that I read (links above.)

I repeated the above steps until all of the strange characters were fixed, If you are reading this because you are trying to do the same fix you may find the below helpful.



-- C3A2E282ACE284A2 = ' (apostrophe)
UPDATE wp_posts SET post_content = REPLACE(post_content, UNHEX\('C3A2E282ACE284A2'), "’") WHERE post_content REGEXP UNHEX('C3A2E282ACE284A2');

-- C3A2E282ACC29D = " (close quote)
UPDATE wp_posts SET post_content = REPLACE(post_content, UNHEX('C3A2E282ACC29D'), "\"") WHERE post_content REGEXP UNHEX('C3A2E282ACC29D');

-- E28099 = ' (another form of a singe quote)
UPDATE wp_posts SET post_content = REPLACE(post_content, UNHEX('E28099'), "'") WHERE post_content REGEXP UNHEX('E28099');

-- C382C2B4 = ' (yet another quote)
UPDATE wp_posts SET post_content = REPLACE(post_content, UNHEX('C382C2B4'), "'") WHERE post_content REGEXP UNHEX('C382C2B4');

-- C3A2E282ACC593 = " (open quote)
UPDATE wp_posts SET post_content = REPLACE(post_content, UNHEX('C3A2E282ACC593'), "\"") WHERE post_content REGEXP UNHEX('C3A2E282ACC593');

-- C3A2E282ACE2809C = - (dash/hyphen)
UPDATE wp_posts SET post_content = REPLACE(post_content, UNHEX('C3A2E282ACE2809C'), "-") WHERE post_content REGEXP UNHEX('C3A2E282ACE2809C');

I hope posting this helps someone save a few hours of hunting around. 🙂

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: