The computing world has changed dramatically in recent years. The widespread adoption of mobile and web services has created an explosion of unstructured data. Massive amounts of pictures, music, and video files are created and consumed all over the world on a daily basis. This content needs to be distributed across geographies and its availability needs to be maintained. Although file and block storage systems have been used for decades, this explosion of unstructured data, the proliferation of globally distributed cloud based architectures, and the need for a more simplistic and low cost storage solutions, gave rise to a new type of storage called object storage. In this blog post we will discuss the differences between file storage, block storage, and object storage in order to understand when and how you should use this new form of storage.
Block Storage Characteristics
Block storage solutions have been around for many years and are considered “tried and tested” solutions that offer high performance, security and availability. Block storage systems manage the data in blocks, which are stored on disks and accessed via low-level storage protocols, such as SCSI commands. When appropriate blocks are combined, it creates a file. The direct access to the data reduces the overhead by minimizing abstraction layers. Higher level tasks such as multi-user access, sharing, locking and security are usually handled by the operating system. There is no storage-side metadata associated with the block, except for the address, and even that, arguably, is not metadata about the block. In other words, the block is simply a chunk of data that has no description, no association and no owner. Block storage is considered as the best solution for performance sensitive, transactional, and data-base oriented applications. As such, it is mostly used locally inside a local network. Adding any distance between the application and storage would severely harm its performance.
File System Characteristics
Files in file systems are organized in folders as a hierarchy of directories, subdirectories and files that use naming conventions based on characteristics such as extensions, categories or applications. The file system stores files using relatively simple metadata, such as file name, creation date, creator, file type, most recent change and last access. When the number of files is relatively small, locating a file is manageable However, as the number of files grows into the billions, it becomes much more difficult. File system scalability is generally supported by scale-out NAS systems that, similar to object storage, scales horizontally by adding nodes. But because they’re based on hierarchical file structures with a limited namespace, they’re more restricted than the nearly infinitely scalable flat structure of object storage systems.
What is Object Storage?
Object storage manages data as objects, Each object typically includes the data itself, a variable amount of metadata, and a globally unique identifier. Object storage can be implemented at multiple levels, including the device level (object storage device), the system level, and the interface level.
Highly granular metadata
Unlike a file system, object storage stored files as objects in different locations, and each object has a unique identifier and a large amount of metadata. Although the amount of metadata varies, it is significantly greater than the metadata that is managed in file systems. Object metadata frequently includes a summary of the content in the file, key words, key points, comments, locations of associated objects, data protection policies, security, access, geographic locations and more. This means that unlike block storage and file systems, object storage protects, manages, manipulates and keep objects on a much finer level of granularity. An object is not limited to any type or amount of metadata (although S3 limits user metadata to 2KB).You can assign metadata such as the application association, the importance of an application, the data protection level that you want to assign to an object, replication instructions to another site or sites, when to move this object to a different tier of storage or to a different geography, and when to delete this object. This type of metadata goes way beyond the access control lists used in file systems and of course well beyond block storage metadata.
Object management and identification
Objects contain descriptive properties which can be used for better indexing or management, so administrators do not have to perform lower level storage functions like constructing and managing logical volumes to utilize disk capacity or setting RAID levels to deal with disk failure. Object storage manages the objects using unique “object IDs”. The application can then retrieve the object by presenting the object ID to the object storage. These unique identifiers, which are used within a bucket or across an entire system, support much larger namespaces than the ones used in block storage, thereby eliminating name collisions.This means that objects may be local or geographically separated, but because they are in a flat address space, they can be retrieved in exactly the same way.
Objects can be created, deleted and read, but they can’t be updated in place. Instead, objects are updated by creating new object versions. This means that the challenges of locking and multi-user access simply don’t exist. If multiple users update the same “object” concurrently, the object storage system will simply write different versions of the object.
The lack of in-place update support enables multi-node-copy object redundancy with very little complexity. Object storage accomplishes redundancy and high availability by storing copies of the same object on multiple nodes. When an object is created, it’s created on one node and subsequently copied to one or more additional nodes, depending on the policies in place. The nodes can be within the same data center or geographically dispersed.
In addition, object storage can use erasure codes to protect data. The data is broken into fragments, expanded and encoded with redundant data pieces and stored across a set of different locations. If data becomes corrupted at some point, it can be reconstructed by using information about the data that’s stored elsewhere.
Access through HTTP-based REST API
Another important aspect of object is that it is possible to access an object using an HTTP-based REST application programming interface. These are simple calls such as Get, Put, Delete, etc. On one hand the simplicity of this interface is a great advantage for web-based application, but on the other hand, legacy applications that were probably written to use SCSI, CIFS or NFS calls, will need to be updated.
Supports the use of commodity infrastructure
Object storage can be also less expensive in terms of the underlying infrastructure. Although you can certainly use expensive RAID arrays to build an object store, generally this type of storage is used with commodity hardware. Since data protection is generally accomplished by replicating objects to one or more nodes in the cluster, scaling becomes a simple matter of adding additional nodes.
Typical use cases
As we have seen, an object store is easy to manage, can scale almost infinitely, transcend geographic boundaries and multiple instances of physical hardware in a single namespace, can carry varied amounts of metadata, and support data management functions like data replication and data distribution at object-level granularity. However, it is generally lower-performance, it’s inadequate for transactional data that changes frequently, such as databases, and It’s also not designed to replace NAS to access shared files because it doesn’t have the locking and file-sharing facilities that ensure the single “truth” of a file.
Object storage is, therefore, best suited as a tremendously scalable data store for unstructured data that’s updated infrequently, either as an additional storage tier beyond transactional storage tiers for inactive data or as archival storage. In the cloud space, it’s well suited for file content, especially images and videos. Today, object storage is mainly used in post-process-type data as found in the media, entertainment and healthcare industries, as well as for archiving.
In the past, object storage was mainly used for archiving, however newer object storage systems have gotten some traction around very large applications like eBay. Object storage systems like Facebook’s Haystack have also scaled impressively, reportedly adding 350 million photos daily and storing 240 billion photos, which equals to as much as 357 petabytes. Object storage has also become pervasive among new web and mobile applications that choose it as a common way to store binary data. Cloud services such as AWS S3 and Windows Azure Storage have been the main contributors to this trend. AWS S3 has grown to massive scale, citing over 2 trillion objects stored as of April 2013 and Azure with over 20 trillion objects stored. However, Windows Azure Storage manages Blobs (user files), Tables (structured storage), and Queues (message delivery) and counts them all as objects. In corporate data centers, object storage systems are deployed as archival and file-aggregation storage tiers that supplement traditional data storage.
Although object storage represents a very small portion of the storage market that’s dominated by traditional block- and file-based storage, the combination of unprecedented scalability and distributed access has enabled it to succeed in the cloud storage space. And since cloud is undoubtedly the dominate IT trend for years to come, it is safe to say that it will take a more central role in the future storage market, as standardization evolves and integration with traditional storage systems and peer object storage systems becomes a realit