Archive | Linux / Unix RSS for this section

Setting the ACLs to public-read on Millions of S3 Objects

I learned a valuable lesson today. When you use Amazon’s Import/Export service be sure that your manifest file includes proper ACL meta-data. I left it to the defaults and my more than 600G of files (yes, all 23 million of them) were not readable on CloudFront for use on my site because they were not public. I tried to use Amazon’s web-based console to change the ACLs, but it was quite discouraging when it only updated about 100 a minute. I tried “Bucket Explorer” and although it was a bit faster, my 30-day trial would have expired before it was finished. I knew I had to script something that could do it quicker so I did a bit of research and figured that if I usedEC2 it could be 100-1000 x faster because it was considered an internal call by S3.

So here are the steps that I took to hack a solution together and I hope that if you are in my same boat you might find this helpful.

Start and EC2 instance and ssh into it:

ssh -p 22 -i ~/Sites/mysite/myec2key.pem root@ec2-174-129-75-24.compute-1.amazonaws.com

Install Python’s easy_install utiltity (on Ubuntu it is like this):

sudo apt-get install python-setuptools

A helpful utility named s3tunnel doesn’t allow you to update objects ACL so we will only use it to build our object list. The reason why I am used s2tunnel is because it is very fast at getting the list of objects. In my tests is was over 2000 objects per second.

Install s3tunnel: easy_install s3funnel

On the instance I create a directory to store work in so I can keep things simple. (you don’t have to do this if you don’t want to)

mkdir -p ~/amazon-files/input
mkdir ~/amazon-files/output
cd ~/amazon-files

Then I run the s3tunnel dump (you will have to replace AWS_KEY andAWS_SECRET with your respective s3 key)

nohup s3funnel bucket_name --aws_key=AWS_KEY --aws_secret_key=AWS_SECRET list > s3files.txt &

Then once the object list was complete I split it up a bit into smaller files:

cd input
split -l 5000 --suffix-length=6 ../s3files.txt s3

For my 23 million files this created about 4600 files.

Then I wrote a bash script that moved the files into 10 different directories. I chose 10 because that is how many threads that I wanted to run at the same time.

for file in $(ls input/s3*)
do 
  csum=`sum $file | cut -f1 -d' '`
  process = `expr $csum % 50`

  echo "Moving $file into input/$process" 
  if [[ ! -d input/$process ]]
  then
    mkdir input/$process
    mkdir output/$process    
  fi
  mv $file input/$process
done

Then I wrote a simple python script named amazon.py which I placed in the/amazon-files directory that used boto (a python library for s3—the same one that s3funnel uses under the hood.) This script looked like this:

#! /usr/bin/env python

import sys
import boto 
import re

print 'processing file: ' + sys.argv[1]

f = open('input/' + sys.argv[1], 'r')
c = boto.connect_s3("AWS_KEY", "AWS_SECRET")
b = c.get_bucket("bucket_name")
for line in f:
    b.set_acl('public-read', line.strip())

f.close()

Now that I have all of my objects evenly distributed into 10 separate directories I can now loop through each directory and kick-off one bash process at a time and move the completed files into the completed (output) this way if something goes wrong, I can see progress and can just restart the scripts. And it will continue (pretty much) where it left off. (give or take ~4000 objects)

for directory in $(ls input);
do
  nohup bash -l -c "for file in \$(ls input/${directory}); do python amazon.py input/${directory}/\${file} && mv input/${directory}/\${file} output/${directory}; done" &
done

I first started with 10 processes and then realized that 50 processes would have been better so I continued with 50. Running 50 processes on my 23 million objects would take about 12 hours to finish (532 objects updated a second.) All in all I was able to update all of the ACLs for the objects in what I now consider the fastest method possible.

This is obviously a hack and could use some cleanup and consolidation. Part of me wanted to just modify s3funnel to update all of the TCL’s but I am not that strong with Python and really just wanted to get my ACLs updated.

How about next time we use Import/Export we take a little longer to read about ACLs.

Ubuntu and PostgreSQL upgrade from 8.3 to 8.4

I wanted to install postgis 1.4 and postgres 8.4 and had to upgrade ubuntu from 9.04 to 9.10, then to 10.04. After the long wait I realized that I had painted myself into a corner and my data in postgres version 8.3 was not usable with the new postgres 10.04. I could not uninstall 8.4 and reinstall 8.3 to dump the data. After searching I found the help that I needed and after following the steps below got my databases back up and running.

After adding the following two lines into my /etc/apt/sources.list:

deb http://archive.ubuntu.com/ubuntu/ karmic main restricted universe
deb-src http://archive.ubuntu.com/ubuntu/ karmic main restricted universe

I was able to install postgresql 8.3 once again.

apt-get update && apt-get install postgresql-8.3

Then while posotgres 8.3 was running again on my new system I was able to dump all my databases like so:

bash # psql -U postgres
postrgres # \l

Then I copied my dbs into a file called dbs.txt like so:

contineo          
mantis            
opennms           
redmine           
rtcache_build_8030
rtcache_build_8031

Then I simply dumped all of the databases:

bash # mkdir databases ; cd databases
bash # for db in $(cat dbs.txt); do pg_dump -U postgres $db > ${db}.sql; done

Then I uninstall postgres 8.3:

bash # sudo apt-get remove postgresql-8.3
bash # sudo apt-get remove postgresql-client-8.3

I ran into some problems with postgres not fully uninstalling so I had some issues so I had to run the following to fully remove all versions of postgres. You may not experience these problems so if you do, you may need to fully

bash # sudo dpkg -r postgresql-8.3 && sudo dpkg -P postgresql-8.3
bash # sudo dpkg -r postgresql-8.4 && sudo dpkg -P postgresql-8.4

I then removed the two lines that I added to the /etc/apt/sources.list above. Then I updated the apt:

bash # sudo apt-get update

Then I reinstall postgres 8.4:

bash # sudo apt-get install postgresql-8.4

I then had to modify the /etc/postgresql/8.4/main/pg_hba.conf and for my needs. Note this is not a secure configuration and leaves postgres very open:

...
local   all         postgres                          trust
local   all         all                                     trust
host    all         all         0.0.0.0/0           trust 
...

Then I added my database user:

CREATE USER myuser WITH PASSWORD 'jw8s0F4' CREATEDB;

After removing some databases that I no longer need in addition to the postgres database as well as the default templates, I imported the dumps into the new database.

Now, the moment that we all have been waiting for:

for db in $(cat dbs.txt); do echo ${db}; createdb -U myuser ${db}; psql -U myuser ${db} < ${db}.sql; done

With any luck we now have a system that works.

Sharing shell with ytalk on Ubuntu

A good friend of mine years ago used to use a command-line app called ytalk to show me around the bash shell (thanks Sione!). After a short while I stopped needing his help and so I stopped using ytalk. At work we really wanted to shell-share with remote team members who were unable to use the iChat screenshare because of OS and bandwidth limitations.

I remembered that ytalk was such a good tool for being able to see what someone else was doing in the shell and to show off your bash skills. I thought it was going to be easy to setup on Ubuntu, but as it turns out, although its still an available package, it is dead on install.

So…. here is what I ended up doing and I hope that if you do the same you will be ytalk’in in no time..

On ubuntu install ytalk:


sudo apt-get install ytalk

Change the default/broken inetd.comf configuration:


talk            dgram   udp    wait    nobody.tty    /usr/sbin/in.talkd      in.talkd
ntalk           dgram   udp    wait    nobody.tty    /usr/sbin/in.ntalkd     in.ntalkd

to:


talk            dgram   udp4    wait    root    /usr/sbin/in.talkd      in.talkd
ntalk           dgram   udp4    wait    root    /usr/sbin/in.ntalkd     in.ntalkd

Note the “4” after the udp and the “nobody.tty” change to “root”

In the /etc/services file, make sure the following lines are in there:


paul@box:~$ sudo grep talk /etc/services
talk            517/udp
ntalk           518/udp

I didn’t have to change anything, but its a good idea to confirm things.

Using YTalk

Initiating the chat:

You can do this in a couple of ways, the first and most obvious way is to coordinate with another person/user and ensure that the two of you are only logged in once to the same box ad then type.


paul@box:~$ ytalk fred

Or if your logged on more than once you can specify the tty in the request after finding out which one it is:


paul@box:~$ who
fred      pts/0        2009-11-06 10:50 (208.X.X.X)
fred      pts/2        2009-11-06 10:48 (208.X.X.X)
paul      pts/3        2009-11-06 14:02 (208.X.X.X)


ytalk fred#2

More on that can be found here: http://manpages.ubuntu.com/manpages/intrepid/man1/ytalk.1.html

Thanks to euphemus for the breakthroughs!

Hope you find ytalk as useful and coolific as I do.

Enjoy!

FAIL: COMPROMISED SSH Public Key on Ubuntu

Last night I was setting up a new application on my server and while I was configuring capistrano I came across this strange problem and didn’t immediately find much help on google, so I thought I would post this to help someone else along.

I had already setup my ssh keys months ago but when I tried to ssh into my subversion repository it would ask me for a password/passphrase and it just about drove me crazy.

I came across this article in google and checked off each potential problem and nothing. Then I saw that my key was conprimized when I ran the “ssh-vulnkey -a” command.


capistrano@allison:~/.ssh$  ssh-vulnkey -a
Unknown (no blacklist information): 2048 5a:b4:d6:94:10:14:e1:a0:35:35:ff:c6:08:e6:9f:10 
Not blacklisted: 2048 5f:43:c2:f0:fb:e6:52:c4:90:59:fb:d2:e0:fe:66:d0 
Unknown (no blacklist information): 2048 ab:5e:39:5c:33:f0:02:e3:cf:cd:99:84:ca:9e:f8:e1 Paul@paul-hepworths-computer.local
COMPROMISED: 2048 81:85:1d:a7:b1:c6:ff:b2:d5:3f:60:3e:2e:c0:25:5c capistrano@mislice
COMPROMISED: 1024 fa:87:13:5f:0c:01:3e:53:b9:a1:ff:4a:8a:29:b2:a1 capistrano@mislice

So I searched google some more to find out how to fix the problem. I regenerated keys multiple times on my client server and no-dice.

Then after searching and searching I found this tutorial and followed it to update openssl and openssh and regenerate my private keys .

What a relief! (and a waste of time, but now I am secure I guess)

 

FileMerge Command Line Tools for Subversion

Here are some pretty useful tools for using FileMerge on OSX with the command line subversion.

I stumbled across these tools while I was looking around at how I can better use FileMerge with Subversion

Here is what I did to fully set them up.


# sudo su
# cd /usr/bin
# svn export http://ssel.vub.ac.be/svn-gen/bdefrain/fmscripts/fmdiff .
# svn export http://ssel.vub.ac.be/svn-gen/bdefrain/fmscripts/fmdiff3 .
# svn export http://ssel.vub.ac.be/svn-gen/bdefrain/fmscripts/fmresolve .
# exit
# vim ~/.subversion/config

Be sure to set your diff and diff3 tools to use the fmdiff and fmdiff3

Thanks Bruno De Fraine for publishing these great tools and making my life a little easier. 🙂

Bulk Zone file Serial Number Increment

I have way too many domain names, so that means that when I want to make a change to my zone template files including a search and replace for certain ips or just changing the email in the zone like I do below. (Or whatever you need to do.)

I first backed up my zone files with a basic but effective cp command:

blah@server ~# cp /var/named /var/named-backup

Then I replaced my email with one that would handle the spam and put it in the right mailbox (/dev/null.) 🙂

blah@server ~# for file in $(ls /var/named/*.db); do sed -i "s/paul.mydomain.com/dns.omniop.com/g" $file; done

Now that all of the zone files are updated, even if I were to restart my named, the files would not update my slave DNS servers because the serial number in the zones have not changed.

...
2008011502    ; serial, todays date+todays
...

So here is a quick little shell script that I wrote that increments all of myBIND zone files for my DNS server.

#!/bin/bash
for file in $(ls /var/named/*.db);
do
  if [ -f $file ];
  then
    OLD=`egrep -ho "2008[0-9]*" $file`
    NEW=$(($OLD + 1))
    sed -i "s/$OLD/$NEW/g" $file
    echo "fixed $file" 
  fi
done

There may be a better way of doing this, but I found this very quick and painless.

Now I will hopefully get less spam now that the DNS email scrapers won’t get my email from my zone files.

Hope this helps someone!

VPS restoration from backup kills your InnoDB database — don’t let it happen to you!

A month ago my then VPS provider, JaguarPC, has some really freaky hardware issues, that to this day I have no idea what happened and they ended up restoring a two week old backup of the whole server which included my VPS. When I fired up this blog and a couple of other sites they failed due to mysql table corruption. The corrupt databases that used Myisam tables seemed to repair just fine, but all of my InnoDB databases (Rails uses InnoDB by default when you use migrations) were unrecoverable and I ended up having to try other means for getting my data back or at least as much of my data that I could get back.

Here is what I learned:

  1. Never assume that your hosts backups of your VPS will work when they are restored because they perform backups while the server is running and databases don’t like that too much.
  2. Always keep backups of your databases, especially the ones that use the InnoDB table engine, in a SQL dump format.

So here is what I do now to prevent this from happening again:

  1. Perform your own backups of your databases using the methods that are suggested for your db and db table engines.
  2. Get the data into SQL so when your VPS is backed up it will properly backup a dump.

Assuming that you have a file that contains a list of databases with one per line, you can do something like the following and then hook up your script t cron.

#!/bin/bash

cd /var/lib/mysql

if [ ! -d sql_backup ]; then 
  mkdir sql_backup
fi

for db in $(cat databases.txt); do echo $db; mysqldump --single-transaction $db > sql_backup/${db}.sql; done

Good luck, and oh BTW, you might want to get this running on your VPSbefore your host does the restore. 😉

 

VPS restoration from backup kills your InnoDB database — don't let it happen to you!

A month ago my then VPS provider, JaguarPC, has some really freaky hardware issues, that to this day I have no idea what happened and they ended up restoring a two week old backup of the whole server which included my VPS. When I fired up this blog and a couple of other sites they failed due to mysql table corruption. The corrupt databases that used Myisam tables seemed to repair just fine, but all of my InnoDB databases (Rails uses InnoDB by default when you use migrations) were unrecoverable and I ended up having to try other means for getting my data back or at least as much of my data that I could get back.

Here is what I learned:

  1. Never assume that your hosts backups of your VPS will work when they are restored because they perform backups while the server is running and databases don’t like that too much.
  2. Always keep backups of your databases, especially the ones that use the InnoDB table engine, in a SQL dump format.

So here is what I do now to prevent this from happening again:

  1. Perform your own backups of your databases using the methods that are suggested for your db and db table engines.
  2. Get the data into SQL so when your VPS is backed up it will properly backup a dump.

Assuming that you have a file that contains a list of databases with one per line, you can do something like the following and then hook up your script t cron.

#!/bin/bash

cd /var/lib/mysql

if [ ! -d sql_backup ]; then
  mkdir sql_backup
fi

for db in $(cat databases.txt); do echo $db; mysqldump --single-transaction $db > sql_backup/${db}.sql; done

Good luck, and oh BTW, you might want to get this running on your VPSbefore your host does the restore. 😉

 

Installing RMagick on OSX

I am working on a little app (link coming soon) with a friend of mine in an effort to practice my rails and now Rmagick skills since my day job doesn’t allow me the opportunity.

One of the things that I am building is an logo generator so I need to have an image manipulator/generator of some sort. I have used ImageMagick on many projects in the past so I looked forward to spitting out the classy logos uswing Rmagick.

Like most open-souce installs on OSX and Linux there were some issues that came up along the way.

I first ran the following command on my OSX terminal but got a couple of errors.

# sudo gem install RMagick
...
Can't find Magick-config or GraphicsMagick-config program.
...

I fixed this error by installing the imagemagick-dev version as opposed to imagemagick.

Then when I tried it again I received this error:

...
Can't install RMagick. Can't find libMagick or one of the dependent libraries
...

I resolved this error by searching google and finding this thread so I toldfink (one of my osx package managers) that I wanted it to build imagemagick from source with the following command:

# fink --no-use-binary-dist install imagemagick-dev

After I rebuilt ImageMagick form source and inclused all of the dependent libraries i was able to successfully run the following command with no problems:

# sudo gem install RMagick

It worked! Yeah!

Now I will get back to the Rmagick docs. 🙂

 

MySQL on the move from Latin1 to UTF8

A few days a go I had to move a WordPress blog from one server to another and it turned out to be a bigger project than I had originally thought due to the character set being set to Latin1 on the old server and about 180+ posts that were copied in from Microsoft Word containing strange opening and closing quotes and hyphens. When I did a dump of the database and then reimported the data in to a utf8 database man strange characters showed up in the post. I did what I usually do in situations and started to Google for an explanation. I found this article and it referenced this article and here is what I ended up doing to get the issue solved.

I opened up the raw sql dump file in less and saw the strange characters in the test and they looked something like this:
Don<C3><A2><E2><82><AC><E2><84><A2>t

I looked at the context of the skewed characters and saw immediately that it was an apostrophe that was made “special” by Word and then copied into WordPress. I removed the “< ” and “>” and got C3A2E282ACE284A2 which I then put in the queries that were posted on the articles that I read (links above.)

I repeated the above steps until all of the strange characters were fixed, If you are reading this because you are trying to do the same fix you may find the below helpful.



-- C3A2E282ACE284A2 = ' (apostrophe)
UPDATE wp_posts SET post_content = REPLACE(post_content, UNHEX\('C3A2E282ACE284A2'), "’") WHERE post_content REGEXP UNHEX('C3A2E282ACE284A2');

-- C3A2E282ACC29D = " (close quote)
UPDATE wp_posts SET post_content = REPLACE(post_content, UNHEX('C3A2E282ACC29D'), "\"") WHERE post_content REGEXP UNHEX('C3A2E282ACC29D');

-- E28099 = ' (another form of a singe quote)
UPDATE wp_posts SET post_content = REPLACE(post_content, UNHEX('E28099'), "'") WHERE post_content REGEXP UNHEX('E28099');

-- C382C2B4 = ' (yet another quote)
UPDATE wp_posts SET post_content = REPLACE(post_content, UNHEX('C382C2B4'), "'") WHERE post_content REGEXP UNHEX('C382C2B4');

-- C3A2E282ACC593 = " (open quote)
UPDATE wp_posts SET post_content = REPLACE(post_content, UNHEX('C3A2E282ACC593'), "\"") WHERE post_content REGEXP UNHEX('C3A2E282ACC593');

-- C3A2E282ACE2809C = - (dash/hyphen)
UPDATE wp_posts SET post_content = REPLACE(post_content, UNHEX('C3A2E282ACE2809C'), "-") WHERE post_content REGEXP UNHEX('C3A2E282ACE2809C');

I hope posting this helps someone save a few hours of hunting around. 🙂