AWS is a big company with many different offerings, services and products. AWS S3 is storage for the internet. It is an object based storage system that allows users to store unlimited objects on a pay- as- you- go model. EBS, the SAN offering from AWS, provides persistent storage that is ideal for file systems. Most of these products may appear similar because they all deal with storage of some sort. As a result, the internet is filled with debates about what service to use when. Take a look at this example in Stack Overflow and in Quora.
Look at S3— you can move petabytes and zetabytes of data to allow users to use S3 as a bulk repository, data lake for analytics, and long term storage. EBS is basically a network-based storage service that stores data in a persistent manner. You can detach an EBS volume and attach it to any machine, but you will need an EC2 instance to access the data. You can configure any file system and mount any of the available EC2 instance type on it.
But when should you use one or the other? This article addresses this question. But first, let’s begin with an overview of the two products’ features.
Comparison of Features
|Storage size and limitations||No limit on the number of objects.|
Individual S3 objects: A maximum of 5 TB.
Single PUT upload max size: 5GB (using multipart)**multipart upload is suggested by Amazon for any object with size over 100 MB
|Maximum size of 16 TB. There is a limit of 40 EBS volumes for a single Linux instance and a limit of 26 EBS volumes for a single Windows instance.|
No limitation on size. Data stored based on IOPS
|Data Uploading||Multipart upload capability for objects larger than 100 megabytes||Provisioned IOPS for faster read/write IO operations|
|Data Stored||Data stored stays in the region. Replicas are made within the region.||Data stored stays in the Availability Zone. Replicas are made within the AZ.|
|Data Access||Data can be accessed over internet from anywhere using console, CLI, REST or SOAP APIs.||Can be accessed only by EC2 instance.|
|File Permissions/File System|
|Supported Encryption Mechanisms||Server-Side Encryption|
Client Side Encryption
|Using an AWS KMS–Managed Customer Master Key (CMK) – AES 256-bit Encryption standards|
|Availability||99.99% available.||99.99% available.|
|Withstand AZ Failure||Can withstand up to two concurrent AZ failures.||Cannot withstand AZ failure without point-in time EBS Snapshots.|
|Durability||Eleven 9s of durability|
|20 times more reliable than normal hard disk.|
|Consistency||Amazon S3 offers read-after-write consistency for PUTS of new objects and eventual consistency for overwrite PUTS and DELETES. |
This means that when an Object is uploaded for the first time it may not be accessible for a few milli seconds until data is replicated across all S3 clusters in the same region.
|EBS is a file system that offers persistence storage. It follow a strict consistency model, meaning that any write to the file system is instantly visible.|
While the best fit for EBS is as a File System and it offers an option for backup using snapshot, the S3 offers some additional features. Those features include, versioning (for Backup & DR), lifecycle policies (archiving, deletion), static website hosting, and requester pays bucket (for cost optimization), to name just a few. The Amazon S3 transfer, event notification, and cross-region replication capabilities are also worth noting.
Which should you use… and when?
S3 is generally used for highly durable data storage, such as images, videos and short time storage of logs. Storing this type of data in EBS volumes will not only drastically increase the size of the EBS volumes but also decrease the durability of your data.
If you are planning to host I/O-intensive NoSQL or Relational Databases or use Boot volumes/file systems for servers for low-latency interactive apps, the S3 cannot be used because of its latency.
On the other hand, S3 provides you with various storage classes to store long-term data, easily recoverable data, backup and disaster recovery files, provided you are willing to compromise on either availability or durability of data (which is the feature of Standard Infrequent Access or Reduced Redundancy respectively).
Consider a use case where you need to store a huge number of static content files (for example, images or videos). If all these files are stored in EBS volumes, the EBS needs to perform a disk IO for each file request, and server performance declines. In contrast, if the same files are hosted on S3, they are fetched directly from the S3 and server performance improves because the number of requests on the server decreases. For faster delivery of S3 content, the user can use the AWS Cloudfront, the CDN offering from AWS.
Use S3 as a File System instead of the EBS
Sometimes users do want to use S3 as a file system – although possible with the help of S3FS, it is not recommended by AWS.
S3FS can be thought of as a direct mapping of S3 as a file system. Files are mapped to objects and the file-system metadata (e.g., the ownership and file modes) are stored inside the object’s metadata. It is like attaching an auto-scaling storage system to an instance that can access data like a normal file system.
The use case might be like this: you have the S3 bucket with huge amounts of data and your EC2 instance wants to access all of the data, process it, and send the results back to the bucket.
Then why not EBS? The answer is simple: storage space.
Then why not always use S3FS? Well, because of its limitations.
S3 objects are read only or write only that means even if you want a small chunk of data, you have to download the entire object. The size of an object cannot exceed 5TB. You also have to provide proper permissions using the object’s’ Access Control List. Same is applicable for write: random writes or appends to files require rewriting the entire file.
Put differently, S3FS is commonly used only for medium and large files, and hence for backup and archiving. A simple advantage is the absence of API calls (at least the user does not make explicit API calls), leading to time savings as compared to storing in EBS volumes and then again to S3.
It is worth mentioning that you cannot use either S3 or EBS on premises. You can, however, do so using third-party vendors, but if you want full compatibility with the AWS interface, it might be better to use Stratoscale’s “build your own region” range of products, which offers you both object and block storage).
S3 might be somewhat slow when compared to the EBS because of the problem of “Eventual consistency”, whereas Provisioned IOPS SSDs will have high read after write speeds. When it comes to durability, S3 clearly scores better than the EBS volumes. In addition, EBS can only be accessed by one machine at a time, more like a local file system, whereas the S3 is for the Internet and can be accessed from anywhere using APIs.
If you are trying to use the S3FS for regular use, you aren’t the first. Many people have tried to implement it, but in the end, the harsh truth is it does not do well with general purpose workloads . It is recommended that users choose between S3 and EBS volumes based on the intended functionalities and needs.