Archive | November 2010

Setting the ACLs to public-read on Millions of S3 Objects

I learned a valuable lesson today. When you use Amazon’s Import/Export service be sure that your manifest file includes proper ACL meta-data. I left it to the defaults and my more than 600G of files (yes, all 23 million of them) were not readable on CloudFront for use on my site because they were not public. I tried to use Amazon’s web-based console to change the ACLs, but it was quite discouraging when it only updated about 100 a minute. I tried “Bucket Explorer” and although it was a bit faster, my 30-day trial would have expired before it was finished. I knew I had to script something that could do it quicker so I did a bit of research and figured that if I usedEC2 it could be 100-1000 x faster because it was considered an internal call by S3.

So here are the steps that I took to hack a solution together and I hope that if you are in my same boat you might find this helpful.

Start and EC2 instance and ssh into it:

ssh -p 22 -i ~/Sites/mysite/myec2key.pem root@ec2-174-129-75-24.compute-1.amazonaws.com

Install Python’s easy_install utiltity (on Ubuntu it is like this):

sudo apt-get install python-setuptools

A helpful utility named s3tunnel doesn’t allow you to update objects ACL so we will only use it to build our object list. The reason why I am used s2tunnel is because it is very fast at getting the list of objects. In my tests is was over 2000 objects per second.

Install s3tunnel: easy_install s3funnel

On the instance I create a directory to store work in so I can keep things simple. (you don’t have to do this if you don’t want to)

mkdir -p ~/amazon-files/input
mkdir ~/amazon-files/output
cd ~/amazon-files

Then I run the s3tunnel dump (you will have to replace AWS_KEY andAWS_SECRET with your respective s3 key)

nohup s3funnel bucket_name --aws_key=AWS_KEY --aws_secret_key=AWS_SECRET list > s3files.txt &

Then once the object list was complete I split it up a bit into smaller files:

cd input
split -l 5000 --suffix-length=6 ../s3files.txt s3

For my 23 million files this created about 4600 files.

Then I wrote a bash script that moved the files into 10 different directories. I chose 10 because that is how many threads that I wanted to run at the same time.

for file in $(ls input/s3*)
do 
  csum=`sum $file | cut -f1 -d' '`
  process = `expr $csum % 50`

  echo "Moving $file into input/$process" 
  if [[ ! -d input/$process ]]
  then
    mkdir input/$process
    mkdir output/$process    
  fi
  mv $file input/$process
done

Then I wrote a simple python script named amazon.py which I placed in the/amazon-files directory that used boto (a python library for s3—the same one that s3funnel uses under the hood.) This script looked like this:

#! /usr/bin/env python

import sys
import boto 
import re

print 'processing file: ' + sys.argv[1]

f = open('input/' + sys.argv[1], 'r')
c = boto.connect_s3("AWS_KEY", "AWS_SECRET")
b = c.get_bucket("bucket_name")
for line in f:
    b.set_acl('public-read', line.strip())

f.close()

Now that I have all of my objects evenly distributed into 10 separate directories I can now loop through each directory and kick-off one bash process at a time and move the completed files into the completed (output) this way if something goes wrong, I can see progress and can just restart the scripts. And it will continue (pretty much) where it left off. (give or take ~4000 objects)

for directory in $(ls input);
do
  nohup bash -l -c "for file in \$(ls input/${directory}); do python amazon.py input/${directory}/\${file} && mv input/${directory}/\${file} output/${directory}; done" &
done

I first started with 10 processes and then realized that 50 processes would have been better so I continued with 50. Running 50 processes on my 23 million objects would take about 12 hours to finish (532 objects updated a second.) All in all I was able to update all of the ACLs for the objects in what I now consider the fastest method possible.

This is obviously a hack and could use some cleanup and consolidation. Part of me wanted to just modify s3funnel to update all of the TCL’s but I am not that strong with Python and really just wanted to get my ACLs updated.

How about next time we use Import/Export we take a little longer to read about ACLs.

Real Travel Acquired by Uptake and The Back Story

Today is the day that it has become public knowledge that my company Real Travel has been acquired by Uptake Networks. It’s been over 5 years since I wrote the first line of PHP code for Real Travel. I wanted to share a bit about my ride so far and I look forward to even more excitement within Real Travel as part of the Uptake Network.

Real Travel’s Conception

Over 5 years ago I was introduced by to Ken Leeder by a mutual friend of ours. After a few breakfast meetings with a few of us (Michael T., Christina B., and Ken L.) we put together some wireframes and I went to work in the moonlight. Six months later we had a pilot/prototype that was built by me using PHP/MySQL with some help from a contract designer who gave me some photoshop designs. (Christina also helped with some of the design.) The application was quite simple then; we had lists of hotels in destinations and a review form for hotels.

A few more months later I quit my job [yes, the link no longer works.:)] and the company was incorporated, and our initial seed round of fundingwas coming together. This was also the time that we hired a new CTO and Chief Architect (who are no longer with us.) The decision was made to rewrite the pilot using the then beta ASP.NET 2. I pushed back but other were more familiar with Microsoft technologies and won the battle. Knowing now what was coming next, in retrospect I wish I had fought harder for an open source LAMP stack that couldn’t scale.

The Pain of Learning What You Already Knew (Again)

There was a “culture” clash in respect to development and design approaches. Within three months we went from my pilot’s 5-10K lines of PHPcode and uber simple database schema, you know the kind that you can mange from the command-line without a plethora of GUI tools, to over 100K lines of code of C# and a object oriented db schema in PostgreSQL (not the built in kind of object oriented that PostgreSQL provides, but a new home grown schema that was designed for fourth normal form.)

I will never forget the day when our architect showed the database schema relation map, it looked like a ball of yarn with so many relationship lines that the shapes of tables themselves were indistinguishable as rectangles. This was my first run in with doing it the “right” way the first time fallacy, at least to this extreme. As an engineer with then 5 years of experience I expressed my concern with the complexity but was quickly extinguished by the group-think cliches like “this is how the big guys do it.”

We started to release our site weekly and launched our site at the Web 2.0 Conference Launch Pad. The development was slow due to the complexity of the database. Not kidding, it took about 10 SQL inserts in order to add a single photo to the database; tables for strings, dates, root objects, photo, photo renditions, and so on.

During the year or two following…. a lot of “fun” happened”….. and our site’s architecture was extremely brittle and we spent a good part of our time debugging strange bugs, slow queries (with at least 5 if not 8 joins in them,) and strange IIS bugs. (A very educational experience for me building the ivory tower inefficiently.)

Born Again

I was playing with a new toy on the side called Ruby on Rails and became really enchanted in the framework and the Ruby language its self. I finally felt free from the 90 second compile time and memory cache load that ourASP.NET app required between code changes. It reminded me of thePHP/MySQL days and I realizes just how horribly over engineered our system was. I started to propose a rewrite but it went over like a lead balloon, but this did not deter me. I did freelance projects in Rails and PHPjust to keep my sanity. I believed in Real Travel and didn’t want to leave; I was drawn in by the opportunity to be the catalyst for positive change the second I was given a chance. The time would come.

As time went on and our releases became more distant apart, and our site became slower and slower, my challenge to rewrite the site became that much more appealing, but it was becoming a bigger task each day. Each time I had to change a line of code I had to wait (And wait) for hours to compile the app, start it and test it, then more hours and even days to release it, I would say “if this were Ruby on Rails it would have been done a while ago.” We knew as a company that something had to be done and as a team we were unable to develop new features and move our company forward.

As a company, we were forced by the market to change our midset, and accepted that something had to be done but a port to Rails was still out of the question. We attempted to de-normalize our tables and rewrite the code base in ASP.NET and C#, but that only proved to take even more time and we were still on ASP.NET.

A New Chapter With Ruby on Rails

It wasn’t easy, but I continued to champion the move to Ruby on Rails and we started to build all of our new development on it. We ended up having two applications, the main one was APS.NET the new one Ruby on Rails. With a use of a load balancer we were able to make much of this transparent and we found ourselves spending must of out time where it could count—on the Rails application. In fact there was a time where only one Windows development machine existed on a desk in case we had to make a change to the old system we could make the change, test it, and then push the code to production. After many many long discussions and debates we finally made the decision to make the port to Rails. It was also about this time that we decided that unless we started to use SCRUM we would either all quit or jump off of a bridge.

SCRUM, Agile, TDD/BDD, Quality, and Accountability

We had all learned our lesson. As a company we went to SCRUM training and this was one of the most pivotal points in my tener at Real Travel (or in my career.) We began to form good process and better working relationships. I begin to jump into rspec and getting the quality built in to the product it was a start, each day getting better and better.

About 11 two-week sprints later (579 story points) we had a new version ofReal Travel up and running on Ruby on Rails. I had the pleasure of powering down the last windows server—it was nice after all of the pain.

Nothing was stopping us: continuous improvement, open communication, retrospectives, developer productivity, and backlog directed self organized team. Within a year we ported the system and made major improvements to our site and even started to make some decent revenue. The summer of 2009 was great for us, major traffic gains, increased revenue, and then the Google problems started. Although I did the Rails development I could not have done it without Chris Sloan and Francisco Marin as team members, peers, and heros.

Google: The Authoritarian Mime

I won’t bore you with the details in this post, but due to some spammed pages in our site Google had decided to kick us out of their index and not tell us why. Fortunately after some time we found the problem (we found and deleted some pages that were link spammed with V1agra links and had 5K links pointing into them) and once it was fixed our traffic started to come back a bit.

Then months later we had another problem with a load balancer configuration which caused our old site’s links to become 404s. This was not obvious to us right away, but like the other Google problems, it was a problems for us and affected our traffic. We fixed the problem but we were cut deeply but the two complications. Our traffic started coming back along with our revenues, but it was slower and was a long rough ride. Traffic and revenues were going in the right direction however. We knew we could get our traffic back and get things going so we marched on.

Becoming “That” Team and Company

Through the couple of years we started to tune our SCRUM process and went from two-week sprints to one week sprints, one day planning meetings to 1 hour planning meetings, and one a week releases to releasing multiple times we day with continuous release. It started to become really exhilarating when we could run an on the street low-fi paper test with the kind people of Palo Alto, come up with some up hypothesis on a change, push out a split test (aka. AB test) and then release the winner in a day. If there was a bug and it affected our site, we could have the fix out in minutes.

What Now?

With traffic bouncing back we were engaged by Uptake and I will let Tech Crunch, and the many other blogs tell the rest of the story. I consider myself privileged to be able to learn with and from my fellow team members and ride the ebbs and flos from the first line of code that I wrote on that first pilot and I look forward to the future as Uptake and Real Travel aim to provide the best travel experiences on the web.

So, Sloan, pull the next test from the top of the test queue and lets test it!