Amazon S3 outage: Almost 7 hours and counting
For those of you who are not familiar with Amazon S3: It's a cloud-computing service from Amazon (yes, THE Amazon that we normally only know as an online book and music store). It allows for cheap, and usually reliable on-demand online storage. It's one of several offerings from Amazon in that space, which also includes Amazon EC2 (on-demand computing) and others.
These cloud-services are popular, because they allow you to build massively scalable architectures without having to spend a dime on hardware. You only pay for the storage or the computing capacity you use in their data centres. For example, in EC2 you can bring up new instances of virtual GNU/Linux servers when you need them during times of high demand, and shut them down when the load has lessened. You pay for uptime of instances (a few cents per hour), stored gigabytes per month (also just a few cents) and transmitted data (few cents per gigabyte).
One of the most high-profile victims of the current S3 outage is Twitter: Images, such as avatars of users, are currently not being served, because they are all stored on S3.
I have been following Amazon's services for a while now and had a chance to experiment with them. I am impressed by how powerful the concept is and how well it normally works. I also believe that Amazon learns from glitches like this and manages to improve their system as a result of it. But nevertheless, it is still relatively young, and so apparently not all issues are sorted out yet.
Needless to say, a 6 or 7 hour outage means a lot of egg on Amazon's face. That's not the kind of publicity they want. Still, though, if you can think of strategies to soften the impact of outages of individual components in your architecture, I would still recommend the Amazon services if you are a startup in search for cheap and scalable computing and storage resources.
Update: Just as I posted this, I notice that the avatars on Twitter have started to appear again. Slowly, though, and only some of them so far. But it appears as if finally they are finishing off with the restauration of the US service as well. At 4 pm US west-coast-time, 11 am NZ time, Amazon advises that they estimate the service to be fully restored within one hour. That would make for a total outage duration of 8 hours. We will see.
Update 2: At 5 pm US-west-coast-time (12 noon, NZ time), it appears as if S3 is fully operational again. Twitter got its avatars back, and many startups and web-based companies are finally back in business.
One of the comments on Twitter said it perfectly: "S3 sneezes and the cloud catches a cold." I really like that one. :-)
Other related posts:
The GPU, your personal desktop super computer
A truly light-weight OS: Written in ASM, with GUI, networking and apps
Exploring functional programming - Which language do you recommend?
Comment by Mike, on 22-JUL-2008 04:17
"S3 sneezes and the cloud catches a cold." I'm getting tired of this, why doesn't S3 take the right medicine? Outages are too frequent with S3, I might switch over to a better cloud, maybe Nirvanix, they never get sick.
Add a comment
Please note: comments that are inappropriate or promotional in nature will be deleted.
E-mail addresses are not displayed, but you must enter a valid e-mail address to confirm your comments.
Are you a registered Geekzone user? Login to have the fields below automatically filled in for you and to enable links in comments.
If you have (or qualify to have) a Geekzone Blog then your comment will be automatically confirmed and placed in the moderation queue for the blog owner's approval.
Tag(s): 
Comment by Jon Travis, on 21-JUL-2008 11:46
CloudStatus was able to detect this pretty quickly this morning. We put our writeup on the blog:
http://www.hyperic.com/blog/hyperic/2008/07/20/amazon-ec2-s3-sqs-outage-0720/