S3 and EBS: A Cloud Storage Story

Approximately remain in this minutes read.

S3 and EBS: A Cloud Storage Story

S3 and EBS: A Cloud Storage Story
Written by

AWS is a big company with many different offerings, services and products. AWS S3 is storage for the internet. It is an object based storage system that allows users to store unlimited objects on a pay- as- you- go model. EBS, the SAN offering from AWS, provides persistent storage that is ideal for file systems. Most of these products may appear similar because they all deal with storage of some sort.  As a result, the internet is filled with debates about what service to use when. Take a look at this example in Stack Overflow and in Quora.

Look at S3—  you can move petabytes and zetabytes of data to allow users to use S3 as a bulk repository, data lake for analytics, and long term storage. EBS is basically a network-based storage service that stores data in a persistent manner. You can detach an EBS volume and attach it to any machine, but you will need an EC2 instance to access the data. You can configure any file system and mount any of the available EC2 instance type on it.

But when should you use one or the other? This article addresses this question. But first, let’s begin with an overview of the two products’ features.

When should you prefer Object Store over a File System or Block Storage? Read here

Comparison of Features

S3EBS
Storage size and limitationsNo limit on the number of objects.
Individual S3 objects: A maximum of 5 TB.
Single PUT upload max size: 5GB (using multipart)**multipart upload is suggested by Amazon for any object with size over 100 MB
Maximum size of 16 TB. There is a limit of 40 EBS volumes for a single Linux instance and a limit of 26 EBS volumes for a single Windows instance.
No limitation on size. Data stored based on IOPS
Data Uploading Multipart upload capability for objects larger than 100 megabytesProvisioned IOPS for faster read/write IO operations
Performance
  • Highly scalable managed service.
  • Supports 100 PUT/LIST/DELETE requests per second by default.
  • Can automatically scale till 300 PUT/LIST/DELETE requests per second or more than 800 GET requests per second.
  • Manually scale the size of the volumes.
  • Uses provisioned IOPS for increased performance.
  • Baseline performance of 3 IOPS per GB for General Purpose volumes.
Data StoredData stored stays in the region. Replicas are made within the region.Data stored stays in the Availability Zone. Replicas are made within the AZ.
Data AccessData can be accessed over internet from anywhere using console, CLI, REST or SOAP APIs.Can be accessed only by EC2 instance.
File Permissions/File System
  • Does not have a mountable file system.
  • Folder permissions do not pass to the folders by default, unlike the traditional file system.
  • Supports File Systems (e.g., ext3, ext4).
Supported Encryption MechanismsServer-Side Encryption

  • Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3)
  • Server-Side Encryption with AWS KMS-Managed Keys (SSE-KMS)
  • Server-Side Encryption with Customer-Provided Keys (SSE-C)

Client Side Encryption

  • Using an AWS KMS–Managed Customer Master Key (CMK)
  • Using a Client-Side Master Key
Using an AWS KMS–Managed Customer Master Key (CMK) – AES 256-bit Encryption standards
Access Control
  • Using Bucket Policies and User Policies
  • Managing Access with ACLs
  • Pre-signed URLs
  • Same as for EC2 Instance.
Availability99.99% available.99.99% available.
Withstand AZ FailureCan withstand up to two concurrent AZ failures.Cannot withstand AZ failure without point-in time EBS Snapshots.
DurabilityEleven 9s of durability

(99.999999999%).

20 times more reliable than normal hard disk.
Consistency Amazon S3 offers read-after-write consistency for PUTS of new objects and eventual consistency for overwrite PUTS and DELETES.
This means that when an Object is uploaded for the first time it may not be accessible for a few milli seconds until data is replicated across all S3 clusters in the same region.
EBS is a file system that offers persistence storage. It follow a strict consistency model, meaning that any write to the file system is instantly visible.
Pricing
  • Per GB of storage used per month
  • Number of requests made (e.g., POST, GET)
  • S3 Inventory
  • S3 Analytics
  • Storage Class Analysis
  • S3 Object Tagging
  • Data Transfer per GB out of S3
  • Amazon S3 Transfer Acceleration
  • Per GB of storage allocated per month
  • Provisioned IOPS
  • EBS Snapshots

 

While the best fit for EBS  is as a File System and it offers an option for backup using snapshot, the S3 offers some additional features. Those features include, versioning (for Backup & DR), lifecycle policies (archiving, deletion), static website hosting, and requester pays bucket (for cost optimization), to name just a few. The Amazon S3 transfer, event notification, and cross-region replication capabilities are also worth noting.

Learn about the Challenges of Using Virtual Machines as Cloud Storage here

Which should you use… and when?

S3 is generally used for highly durable data storage, such as images, videos and short time storage of logs. Storing this type of data in EBS volumes will not only drastically increase the size of the EBS volumes but also decrease the durability of your data.

If you are planning to host I/O-intensive NoSQL or Relational Databases or use Boot volumes/file systems for servers for low-latency interactive apps, the S3 cannot be used because of its latency.

On the other hand, S3 provides you with various storage classes to store long-term data, easily recoverable data, backup and disaster recovery files, provided you are willing to compromise on either availability or durability of data (which is the feature of Standard Infrequent Access or Reduced Redundancy respectively).

Consider a use case where you need to store a huge number of static content files (for example, images or videos). If all these files are stored in EBS volumes,  the EBS needs to perform a disk IO for each file request, and server performance declines. In contrast, if the same files are hosted on S3, they are fetched directly from the S3 and server performance improves because the number of requests on the server decreases. For faster delivery of S3 content, the user can use the AWS Cloudfront, the CDN offering from AWS.

Learn more about data storage management here.

Use S3 as a File System instead of the EBS

Sometimes users do want to use S3 as a file system – although possible with the help of S3FS, it is not recommended by AWS.

S3FS can be thought of as a direct mapping of S3 as a file system. Files are mapped to objects and the file-system metadata (e.g., the ownership and file modes) are stored inside the object’s metadata. It is like attaching an auto-scaling storage system to an instance that can access data like a normal file system.

The use case might be like this: you have the S3 bucket with huge amounts of data and your EC2 instance wants to access all of the data, process it,  and send the results back to the bucket.

Then why not EBS? The answer is simple: storage space.

Then why not always use S3FS? Well, because of its limitations.

S3 objects are read only or write only that means even if you want a small chunk of data, you have to download the entire object. The size of an object cannot exceed 5TB. You also have to provide proper permissions using the object’s’ Access Control List. Same is applicable for write: random writes or appends to files require rewriting the entire file.

Put differently, S3FS is commonly used only for medium and large files, and hence for backup and archiving. A simple advantage is the absence of API calls (at least the user does not make explicit API calls), leading to time savings as compared to storing in EBS volumes and then again to S3.

It is worth mentioning that you cannot use either S3 or EBS on premises. You can, however, do so using third-party vendors, but if you want full compatibility with the AWS interface, it might be better to use Stratoscale’s “build your own region” range of products, which offers you both object and block storage).

S3/Storage

S3 might be somewhat slow when compared to the EBS because of the problem of “Eventual consistency”, whereas Provisioned IOPS SSDs will have high read after write speeds. When it comes to durability, S3 clearly scores better than the EBS volumes. In addition, EBS can only be accessed by one machine at a time, more like a local file system, whereas the S3 is for the Internet and can be accessed from anywhere using APIs.  

If you are trying to use the S3FS for regular use, you aren’t the first. Many people have tried to implement it, but in the end, the harsh truth is it does not do well with general purpose workloads . It is recommended that users choose between S3 and EBS volumes based on the intended functionalities and needs.

Public Cloud Experience in House

 
February 13, 2017

Simple Share Buttons
Simple Share Buttons