Archive | PHP / LAMP RSS for this section

MySQL on the move from Latin1 to UTF8

A few days a go I had to move a WordPress blog from one server to another and it turned out to be a bigger project than I had originally thought due to the character set being set to Latin1 on the old server and about 180+ posts that were copied in from Microsoft Word containing strange opening and closing quotes and hyphens. When I did a dump of the database and then reimported the data in to a utf8 database man strange characters showed up in the post. I did what I usually do in situations and started to Google for an explanation. I found this article and it referenced this article and here is what I ended up doing to get the issue solved.

I opened up the raw sql dump file in less and saw the strange characters in the test and they looked something like this:

I looked at the context of the skewed characters and saw immediately that it was an apostrophe that was made “special” by Word and then copied into WordPress. I removed the “< ” and “>” and got C3A2E282ACE284A2 which I then put in the queries that were posted on the articles that I read (links above.)

I repeated the above steps until all of the strange characters were fixed, If you are reading this because you are trying to do the same fix you may find the below helpful.

-- C3A2E282ACE284A2 = ' (apostrophe)
UPDATE wp_posts SET post_content = REPLACE(post_content, UNHEX\('C3A2E282ACE284A2'), "’") WHERE post_content REGEXP UNHEX('C3A2E282ACE284A2');

-- C3A2E282ACC29D = " (close quote)
UPDATE wp_posts SET post_content = REPLACE(post_content, UNHEX('C3A2E282ACC29D'), "\"") WHERE post_content REGEXP UNHEX('C3A2E282ACC29D');

-- E28099 = ' (another form of a singe quote)
UPDATE wp_posts SET post_content = REPLACE(post_content, UNHEX('E28099'), "'") WHERE post_content REGEXP UNHEX('E28099');

-- C382C2B4 = ' (yet another quote)
UPDATE wp_posts SET post_content = REPLACE(post_content, UNHEX('C382C2B4'), "'") WHERE post_content REGEXP UNHEX('C382C2B4');

-- C3A2E282ACC593 = " (open quote)
UPDATE wp_posts SET post_content = REPLACE(post_content, UNHEX('C3A2E282ACC593'), "\"") WHERE post_content REGEXP UNHEX('C3A2E282ACC593');

-- C3A2E282ACE2809C = - (dash/hyphen)
UPDATE wp_posts SET post_content = REPLACE(post_content, UNHEX('C3A2E282ACE2809C'), "-") WHERE post_content REGEXP UNHEX('C3A2E282ACE2809C');

I hope posting this helps someone save a few hours of hunting around. 🙂



In a content management system I was working on at work we noticed there was a couple of problems with how the URLs entered by users broke our web site due to their length. We also had a request to automatically turn hand typed URLs into links, so here is what I did in C#

Added the RegEx namespace:

using System.Text.RegularExpressions;


Here is the method I added to our utility class:

public string URLsToHyperlinks(string sInput)<br /> {<br /> return Regex.Replace(sInput, @&quot;(\bhttp://[^ ]+\b)&quot;, @&quot;&lt;a href=&quot;&quot;$0&quot;&quot;&gt;$0&lt;/a&gt;&quot;);<br /> }

Here is a function that does the same thing only in PHP

function urls_to_hyperlinks($text)<br /> {<br /> return preg_replace( &quot;`((http)+(s)?:(//)|(www\.))((\w|\.|\-|_)+)(/)?(\S+)?`i&quot;, &quot;&lt;a href=\&quot;http\\3://\\5\\6\\8\\9\&quot; title=\&quot;\\&quot; target=\&quot;_blank\&quot;&gt;\\5\\6&lt;/a&gt;&quot;, $text); <br /> }