foobar on computers, software and the rest of the world


Finally: Persistent storage for Amazon EC2

, posted: 22-Aug-2008 06:01

Amazon has finally announced the general availability of persistent storage for EC2! This is long awaited news. Here is a also a writeup from Werner Vogels, the Amazon CTO.

Some background...

Amazon has offered a very nice cloud-computing infrastructure for a long time. EC2 for on-demand computing, S3 for on-demand storage. This was much more than just glorified VPSs. Instead, it was truly on-demand (new computing instances could be brought up in seconds) and seemingly infinitely scalable. One of the most vexing issues with it, however, was the lack of persistent storage for EC2 computing instances.

Whenever you brought up an instance, you had a nice big disk attached to it. But if the instance would terminate, all that was stored on that disk would be lost. For persistence you would have to back data up to Amazon's S3 storage service. The problem? A custom API relying on REST requests. What many people really wanted, though, was the ability to use storage like a file system.

Not surprisingly, some of third-party solutions sprung up. Services like Rightscale, which would know how to set up a reliable MySQL cluster on unreliable storage, using multiple backups and replication (and offering improved management tools). But also a number of open source projects popped up, which use FUSE (Filesystem in Userspace) to provide a filesystem interface and then map IO requests to HTTP requests to S3. One of the more popular persistent storage solutions for Amazon EC2 has been PersistentFS, which is closed-source and also goes the FUSE approach. You mount a drive and use it like any other, but the storage for the device is actually realized in many little chunks in your S3 storage bucket.

Well, and now we get persistent storage as a service offer directly from Amazon. A really nice and extensive explanation of the technical background was written up by Rightscale and can be found here.

Amazon calls the feature EBS (Elastic Block Store). The name emphasises what it really is: A raw block device. If you want a file system, you format it first just like any other disk.
Amazon Elastic Block Store (EBS) provides block level storage volumes for use with Amazon EC2 instances. Amazon EBS volumes are off-instance storage that persists independently from the life of an instance. Amazon Elastic Block Store provides highly available, highly reliable storage volumes that can be attached to a running Amazon EC2 instance and exposed as a device within the instance. Amazon EBS is particularly suited for applications that require a database, file system, or access to raw block level storage.
Volumes ranging in size from 1 GB to 1 TB can be created and mounted to running EC2 instances. Multiple volumes can be mounted to a single instance, but a volume cannot be mounted to multiple instances at the same time. In the case of instance failure, the same volume can then be mounted by another instance, though. As an interesting feature, it also supports snapshots to S3. Those snapshots are incremental backups. The snapshots can then also be used to create new volumes. It appears that snapshots are also the way to 'grow' a volume: Snapshot, create new and bigger volume, 'recover' the snapshot to the new volume, delete the old volume.

Amazon seems to have paid particular attention to performance:
The latency and throughput of Amazon EBS volumes is designed to be significantly better than the Amazon EC2 instance stores in nearly all cases.
That is rather astonishing. I was under the impression that the 'local disk' that came with an EC2 instance was essentially mapped onto the harddrive of the server which ran the Xen instance. Quite surprising then that this new storage, which is basically just network attached storage, would be even faster? Rightscale has done some testing and writes:
We see over 70MB/sec using sysbench on a m1.small instance, which is hot! Presumably we didn’t get much network contention from other small instances on the same host when running the benchmarks. For random access we’ve seen over 1000 I/O ops/sec, but it’s much more difficult to benchmark those types of workloads. The bottom line though is that performance exceeds what we’ve seen for filesystems striped across the four local drives of x-large instances.
But an important point to consider is this: Since EBSs are network attached storage, accessing them uses some of the available network IO for your EC2 instance. This can be problematic if you have high network IO requirements to start with.

As far as reliability is concerned, Amazon gets a bit vague and is comparing apples with oranges:

Amazon EBS volumes are designed to be highly available and reliable.  Amazon EBS volume data is replicated across multiple servers in an Availability Zone to prevent the loss of data from the failure of any single component. The durability of your volume depends both on the size of your volume and the percentage of the data that has changed since your last snapshot. As an example, volumes that operate with 20 GB or less of modified data since their most recent Amazon EBS snapshot can expect an annual failure rate (AFR) of between 0.1% - 0.5%, where failure refers to a complete loss of the volume. This compares with commodity hard disks that will typically fail with an AFR of around 4%, making EBS volumes 10 times more reliable than typical commodity disk drives.

This sounds good, except that they are telling us that these nice reliability numbers are achieved only when you consider the snapshots with it. What is the AFR of normal hardrives if I count any snapshots I take in with it? In general, an EBS volume lives on replicated drives within one availability zone at the Amazon data centres. Thus, if for some reason one of these availability zones goes down, your EBS volume is gone. In the case of S3, the data would still be available, since S3 is replicated across different availability zones. Consequently, snapshots to S3 are important. With the snapshot, you can then just re-create the volume in a different availability zone and continue operation.

Pricing then at first seems to be less than storage on S3:
With Amazon Elastic Block Store, you only pay for what you use.  Volume storage is charged by the amount you allocate until you release it, and is priced at a rate of $0.10 per allocated GB per month. Amazon EBS also charges $0.10 per 1 million I/O requests you make to your volume.
Careful with the 'what you use' phrase here: This refers to the size of the block device you allocated, not the amount of data you stored. So, a 1 TB drive would cost you $100 per month, even if you leave it empty. I can't comment yet on the pricing for the IO requests. I don't know what can be expected if you run a database on it, vs. just serving static web pages. They gave this example, though:

As an example, a medium sized website database might be 100 GB in size and expect to average 100 I/Os per second over the course of a month.  This would translate to $10 per month in storage costs (100 GB x $0.10/month), and approximately $26 per month in request costs (~2.6 million seconds/month x 100 I/O per second * $0.10 per million I/O).

In the end, I guess, it depends on your application. Some profiling may be required to predict what your actual cost will be. But don't forget that the storage for S3 backups always costs extra ($0.15 per GB). So, you really pay a total of $0.25 per GB per month. Amazon provides a pricing calculator. Click on EC2 to enable the section relevant to EBS.

This then has led PersistenFS to post an interesting cost comparison on the Amazon developer forums. In it they claim that PersistenFS will most of the time provide cheaper storage. Why? Because PersistentFS is always backed to S3. In fact, that's all it ever uses for storage. PersistentFS also has an implied "cost of IO operations", if you will, since all GET and PUT operations to S3 occur a cost as well.

What PersistentFS appears to forget to mention, though, is that eventually their plan is to charge for the amount of data that uers have stored via PersistentFS. A couple of cents per GB, so that would have to be added. But from what I heard, their suggested price target is $0.05 per GB (on top of the $0.15 for S3 storage), rather than $0.10 per GB for EBS. And of course they are right: With PersistentFS you pay only for the actual data you store.

But since EBS doesn't appear to be going to S3 for every IO operation, it clearly aims at those who need very high performance. In that case, it might just be what you have been waiting for.



Other related posts:
"100% availability" doesn't mean what you think it does anymore








Comment by Gary, on 25-Aug-2008 01:08

Nice article - I've added a link to The SOA Blog (http://www.thesoablog.com) I am a huge fan of Amazon's services.


Comment by Mark, on 6-Jun-2009 20:21

One little mentioned point that matters in some cases:

I think you'll find that EBS volume usage is charged per hour. Create a snapshot in S3, then create volumes from the snapshot and delete them when not used. Makes a difference in _some_ use cases - not all - i.e. where many instances use a common snapshot for an hour or two then shutdown.

HTH


Comment by Blake, on 13-Apr-2011 13:24

"In the end, I guess, it depends on your application. Some profiling may be required to predict what your actual cost will be. But don't forget that the storage for S3 backups always costs extra ($0.15 per GB). So, you really pay a total of $0.25 per GB per month."

Actually, snapshots seem to be compressed, so you can end up paying less than $0.25 per GB-month.


Add a comment

Please note: comments that are inappropriate or promotional in nature will be deleted. E-mail addresses are not displayed, but you must enter a valid e-mail address to confirm your comments.

Are you a registered Geekzone user? Login to have the fields below automatically filled in for you and to enable links in comments. If you have (or qualify to have) a Geekzone Blog then your comment will be automatically confirmed and placed in the moderation queue for the blog owner's approval.

Your name:

Your e-mail:

Your webpage:

foobar's profile

 
New Zealand


  • Who I am: Software developer and consultant.
  • What I do: System level programming, Linux/Unix. C, C++, Java, Python, and a long time ago even Assembler.
  • What I like: I'm a big fan of free and open source software. I'm Windows-free, running Ubuntu on my laptop. To a somewhat lesser degree, I also follow the SaaS industry.
  • Where I have been: Here and there, all over the place.




Google Search


Recent posts

Attack on net neutrality right...
Munich already saved millions ...
Iceland's public administratio...
More Apple madness (follow up)...
Apple demonstrates: With great...
Smooth sailing with the Karmic...
Censorship in New Zealand: Wid...
Image roll-over effects withou...
How about: Three strikes and Y...
UK government supports open so...


Top 10

How to write a Linux virus in ...
(11-Feb-2009 06:33, 386066 views)
Follow up: How to write a Linu...
(12-Feb-2009 08:10, 54342 views)
A truly light-weight OS: Writt...
(3-Feb-2009 10:39, 42822 views)
The 'Verified by Visa' fiasco ...
(20-Jun-2008 09:59, 20104 views)
EEE PC with XP is cheaper than...
(9-May-2008 06:50, 18723 views)
11 reasons to switch to Linux...
(4-Feb-2009 09:24, 18490 views)
Would you use Google App Engin...
(8-Apr-2008 20:02, 16954 views)
Censorship in New Zealand: Wid...
(16-Jul-2009 12:11, 16213 views)
Django Plugables: Tons of plug...
(11-Apr-2008 03:24, 15666 views)
Slow file copy bug in Vista: A...
(21-Dec-2007 12:18, 14548 views)