AWS Storage services

AWS Storage Services

Based on the following WP:
https://d0.awsstatic.com/whitepapers/Storage/AWS%20Storage%20Services%20Whitepaper-v9.pdf  (  Dec2016)
Before dwelling in to finer details of this it is better the reader get acquainted with storage terminology and technology in the non-cloud world and the differences among them. 
Some links that would help you are:

http://www.computerweekly.com/feature/Storage-101-Object-storage-vs-block-vs-file

http://www.techrepublic.com/blog/the-enterprise-cloud/block-level-storage-vs-file-level-storage-a-comparison/

https://stonefly.com/resources/what-is-file-level-storage-vs-block-level-storage
https://www.lifewire.com/san-vs-nas-818005 

RAID : http://searchstorage.techtarget.com/definition/RAID


1.Amazon S3 :  

Object Storage. Objects from 1byte to 5TB. Concurrent read/write possible. S3 offers three classes. S3 standard. S3 Standard IA for infrequent access, Glacier for  archival.
Every solution should be evaluated through seven criteria. 

1.Usage Patterns:
 Four common usage patterns. Store and distribute static web content, to host an entire web site, data store for computation and large scale analytics,  archive and back up solution.
S3 is not good for: File system, SQL DB data store, rapidly changing data, archival data or dynamic website hosting.

2.Performance:
Concurrent access by multiple servers or threads possible. Scales throughput to match the need. Multi part upload of huge files supported. Paired with DB or Search to improve discovery and performance. Amazon transfer acceleration between client and s3 bucket.


3.Durability and Availability:
11 9 durability and 99.99% availability. Replicated across 3 AZs. No single point of failure. There is also a choice of cross-region replication.  Suites as a primary data storage.


4.Scalability and Elasticity:
Supports virtually unlimited number of files in any bucket. 

5.Security:
Provides multiple mechanisms for fine grained granular access control.  Encryption of data at rest as well as in transit(SSL ) is possible. Versioning. MFA support for bucket. Access logging.


6.Interfaces:
REST based interface for management as well as data operation.  Uniquely named buckets ( "top level folders"). Each object has unique object key. Although S3 is a web based object store, with flat naming structure rather than a traditional file system, you can easily emulate a file system hierarchy ( folder1/foder2/folder3 ) in Amazon S3 by object key names that corresponds to the full path name of each file. Some on building applications on S3 used AWS SDKs which wraps REST APIs. SDK available for iOS , Android, PHP, Java etc..  AWS CLI gives Linux style file system commands. S3 can be linked to notification services as well.

7.Cost Model:

Pay for the storage you actually use. Three cost components. Storage( per GB per month), data transfer per month ( per GB per month ), requests ( per thousand per month ). There can be a transfer acceleration fee if that transfer speed more than regular speed. 




2.Amazon Glacier :  

Extremely low cost for data archiving and online data back up.$0,007 per GB per month.Managed service. An archive can represent a single file or combine multiple files. Retrieving involves initiating a job. It any take up to 5 hours to retrieve. You store archives in vaults. It is easy to move data in and out of S3 to Glacier using S3 life cycle policies.

1.Usage Patterns:
 Use cases include archiving, media assets, research and scientific data, digital preservation, magnetic tape replacement.
Glacier is not good for: Rapidly changing data, data which needs immediate access.

2.Performance:
Retrieval jobs are completed in 3 to 5 hours.40TB is the single archive limit. Multipart upload can save time.Range retrieval possible.

3.Durability and Availability:
11 9 durability and 99.99% availability. Redundantly stores data in multiple facilities and in multiple devise. Synchronously replicated. Does automatic integrity checks and does self healing.


4.Scalability and Elasticity:
Single archive limited to 40TB. But there is no limit to total amount of data that can be stored. 

5.Security:
Can control through IAM which users can access Glacier. Data at rest is encrypted. AES 256 .Amazon handles key management. Customers who want to manage their own key can encrypt before uploading.Glacier allows you to lock vaults for compliance mandated retention needed. There can be  Vault Lock policy of "never delete". Integrated with cloud trail for audit.


6.Interfaces:
Two ways to use Glacier.
a)Glacier provides a native standards based REST web service interface. This interface can be accessed using Java SDK or .Net SDK.  AWS management console or Glacier API can be used for management functions or data operations.
b)Glacier is used as a storage class in S3 for automatic policy-driven archiving using life cycle rules. S3 API includes RESTORE operation. But it takes 3 to 5 hours. Retrieved puts a copy in S3's Reduced Redundancy Storage ( RRS ). 
Important to note that objects archived using one of the above approach can only be retrieved using the same approach. You can't even see them as archives  vaults in the vault.

7.Cost Model:
Pay for the storage you actually use. Three cost components. Storage( per GB per month), data transfer per month ( per GB per month ), requests ( UPLOAD/RETRIEVAL per thousand per month ). There can be a transfer acceleration fee if that transfer speed more than regular speed. You can retrieve 5 percentage of storage for free.



3.Amazon EFS :  

Network File System for EC2. Supports Latest version NFSv4 which makes it easier to migrate enterprise applications to AWS or build new ones. Highly scalable NFS which can grow to Petabytes and which allows massively parallel access from multiple EC2 instances.Stores data and meta data across AZs in a region.Each file system is accessed by EC2 instances via mount targets which are created per AZ. One mount target  per AZ in the VPC. Traffic flow  is controlled by the SGs of EC2 and the SG associated with EFS. Access to EFS objects are controlled by Unix-style r/w/x permissions based on user and group IDs. 

http://docs.aws.amazon.com/efs/latest/ug/how-it-works.html


1.Usage Patterns:
 Good for multi threaded applications and applications that concurrently access data from multiple EC2 instances and that need substantial levels of aggregate  throughput and IOPS. Good for larger files as overhead is amortized. Good for analytics, media processing, content management, web serving and home directories. 
EFS not good for archival data, RDBMS or temporary storage.


2.Performance:
Data transfer rate units: Mega 1,000,000. Mebi 2 to the power of 10 = 1,048,576
B > Bytes.   b > bits
Mi > Mebi   M > Mega 
There are two performance mode for EFS. General purpose and Max I/O. If more than 7000 file operations per second Max recommended. EFS is optimized to bust at high throughput rate for shorter duration. A credit system  decides the burst rate and time. Burst Credit Balance metric  in cloud watch shows credit status.
Linux kernel version 4 or later and NFSv4.1 recommended for clients accessing EFS. While mounting refer to mounting options.


3.Durability and Availability:
Designed to be as highly durable and available as S3. Each FS object ( files.directories and links ) are redundantly stored across multiple AZs.


4.Scalability and Elasticity:
EFS file system can grow from an empty FS to Petabyte scale automatically.

5.Security:
There are three levels of access controls to consider while planning EFS file system security.
IAM permissions for API calls. ( For managing/controlling file system resources which us the primary resource. Tags and mount targets are secondary resources )
Security Groups ( SG ) for instances and mount targets.One SG for instance and another for mount targets. These SGs acts like firewalls to control traffic in and out of EFS.
NFS level users groups and permissions. Users and groups are mapped to numerical identifiers which are mapped to EFS users to represent file ownership.Files and directories inside EFS are owned by single user and single group.


6.Interfaces:
Amazon offers network-protocol based HTTP(RFC 2616 ) API for managing EFS , as well as for supporting EFS operations within AWS SDKs and the AWS CLI.  API actions and EFS operations are used to create/edit/delete file systems/tags/mount targets.

EFS uses NFSv4 or later for data access.

7.Cost Model:
No need to provision storage capacity in advance. Pay for the amount of storage you put in to your file system.  There are no charges for bandwidth or for requests. Cost is around $0.30/GB-month.  https://aws.amazon.com/efs/pricing/  total monthly usage is converted to GB hours ans recalculated to GB-month.



4.Amazon EBS :  

 "Both Storage Area Networks (SANs) and Network Attached Storage (NAS) provide networked storage solutions. A NAS is a single storage device that operates on data files, while a SAN is a local network of multiple devices that operate on disk blocks" https://www.lifewire.com/san-vs-nas-818005

EBS volumes provide block level storage. EBS volumes are network-attached storage that persists independently from the running life of a single EC2  instance. After attaching a EBS volume to an instance it can be used as hard drive attached to the instance, including formatting and using OS's I/O commands. Most Amazon AMIs are backed by EBS or boot volume.Multiple volumes can be attached to an instance but any single volume can be attached to only one EC2 at any time.

EBS provide ability to create point-in-time snapshots of volumes which are stored in S3. These snapshots can be used as starting point for new EBS volumes. Can be used to instantiate as many EBS volumes even across Regions. Size of volume range from 1 GiB to 16 TiB.

1.Usage Patterns:
 EBS meant for data that change frequently and that should persist beyond life of EC2 instance. Good for DB or file system or where OS needs access to raw block-level storage. Offers various options for optimizing cost and performance to match work load. OPtions divided in to two categories:

a) SSD backed storage for transnational work load (e.g database , boot volume ) where performance primarily depends on IOPS. 
b) Hard Disk based storage for throughput intensive work loads ( big data, analytics, log processing, data ware house .. ) where performance primarily depends on MB/s. 

The following storage needs are not suited for EBS. Temporary storage, Multi-instance storage, Highly durable storage, static data or web content.
2.Performance:
Four categories of EBS storage. Two SSD types and two HDD types.
a)General purpose SSD (gp2).
b) Provisioned IOPS  SSD(io1)
c) Throughput optimized HDD (st1)
d) Cold HDD ( sc1)



All EBS volumes are attached to network. So other network operations can affect storage performance. To maximize the performance of EBS volumes EC2 instances can be launched as EBS-optimized instances. Most of the latest generation of EC2 instances ( m4,c4,x1,p2 ) are EBS optimized by default. They deliver dedicated throughput between EC2 and EBS speeds between 500Mbps and 10,000Mbps. 
Depending on application use case you can attach different types of EBS to an instance . E.g gp2 for database data, io1 for logs and sc1 for OS.
Stripe and RAID also options.



3.Durability and Availability:
EBS volume data is replicated to multiple servers in same AZ. EBS volume snapshots are incremental. Application consistent back ups are possible. AFR of EBS snapshots are 0.1 and 0.2 % . 
A snapshot of a volume is available is available across all of the AZs within the region.  Snapshots can be copied from region to region or from one user to another user account.EBS snapshots provide an easy to use disk clone or disk image mechanism for backup, sharing or disaster recovery.

4.Scalability and Elasticity:
You can easily add new storage volumes to scale. If you need to expand the storage capacity of existing volume best approach is to snapshot and restore to a new volume.

5.Security:
IAM for access control. Encryption for data in rest and data in transit.


6.Interfaces:
Amazon offers REST management API for as well as for EBS operations with AWS SDK and CLI as well as through management console.  EBS operations are create, delete, describe, attach snapshots etc.  
There is no API for data manipulation. That is the job of the instance OS.
All storage allocated at time of volume creation and charged for that irrespective of used or not.

7.Cost Model:
Pay for what you provisioned. Three components for pricing: provisioned storage,I/O requests(for IO optimized SSD for provisioned IOPS ) and snapshot storage. EBS snapshot copy is charged for the data transferred between regions and for the standard Amazon EBS snapshot charges in the destination region.
For snapshots you are charged for the actual amount of data you used. 
There is no charge for transferring information among various AWS storage offerings or instance if it is all within the same region.



5.Amazon EC2 Instance Storage:  

Instance volumes also called ephemeral drives provide temporary block level storage for many EC2 instance types.This is a pre configured and pre attached storage on the same EC2 server. t1 and c2 has o instance storage.On others instance volume is not exposed by default.Can be done through block device mapping. 
Two instance families built for storage centric work loads are:
a) i2 - SSD backed storage-optimized. Very high IOPS up to 365,000 random IOPS.6.4 TiB.
b) d2- HDD backed dense storage. Upto 3.5 GiB/s. 48 TiB


1.Usage Patterns:
Not good for persistent storage, relational DBMS storage, shared storage or snap shots. Good for temporary data such as buffers, caches, scratch data, and other temporary content , or for data that are replicated across a fleet of instances like load-balance pool of web servers. 
There are instance store AMIs and EBS AMIs. Unlike EBS volumes instance volumes can't be attached or detached from another instance.
i2 for high performance DB workloads e.g NoSQL database like Cassandra, MongoDB, clustered databases,OLTP.
d2 family for applications that benefit from high sequential I/O performance across very large data sets. e.g data warehouses, Hadoop/MapReduce storage nodes and parallel file system.

2.Performance:
To increase aggregate IOPS , or to improve sequential disk throughput , multiple instance store volumes can be grouped together using RAID 0 (disk spriting ) software. Prewarming improves performance. i2,r3, hi1 has direct-attached SSD that provides maximum performance at launch time without pre warming. r2 and i3 support TRIM command on Linux.


3.Durability and Availability:
Not intended to be for durable usage. Data on instance volumes are persistent across orderly instance reboots , but if instance is stopped and restarted, or terminated or fails data on volume is lost. Data on instance must be replicated or backed up if it needs to persist.


4.Scalability and Elasticity:
It is fixed and defined by the instance type. Can increase or decrease the storage by changing the number of instances.To ensure full storage scaling include other type of storage in the architecture.

5.Security:
IAM controls the access. Data inside the volume can be encrypted.


6.Interfaces:
There is no separate management  API for instance store volumes. Instance store volumes are specified using the block device mapping feature of the Amazon EC2 API and management console. You can't create or destroy instance store volumes, but you can control whether or not they are exposed to the EC2 instance and what device name is mapped to each volume.
There are no data API for the data. Use the block device interface of the instance. Native file system I/O of the chosen OS.

In some cases instance store volumes must be formatted and mounted before use. Also must keep track of the block device mapping. There is no way for an application running on an  instance to determine which block device attached to it is an instance store which is an EBS.

7.Cost Model:
The cost of instance include store volumes if it is of that type. Data transferred to and from instance store to other AZs or outside of region may incur data transfer charge.Persistent storage also include charges. 



6.Amazon Storage Gateway:  

ASG connects an on-premise software appliance with cloud-base storage to provide seamless and secure storage integration between an organization's on-premise T environment and the AWS storage infrastructure.
It provides low latency performance by maintaining frequently accessed data on premise while maintaining rest of the data in cloud ( S3 or Glacier ). ASG along with EC2 can be used for disaster recovery solution. 

You can download your ASG software appliance as a virtual machine image that you install on host in your data center or as an EC2 instance.  

Once installed and activated you can use your AWS management console to create gateway-cached volumes, gateway-stored volumes, or a gateway virtual tape library ( VTL ) each of which can be mounted as an iSCSI device as your on-premise applications.

Gateway-cached volumes: S3 for primary data and frequently accessed data in cache on premise. Never a need to scale on premise storage. Storage volumes up to 32 TiB as iSCSI devises mounted on-premise application servers. Each gateway can support up to 20 volumes and total volume of storage is 150 TiB.

Gateway-stored volumes: Stores primary data locally while asynchronously backing up that data in to AWS. Storage volumes up to size 1 TiB and it can be mounted as iSCSI devises. Each gateway configured for gateway stored volume can support up to 12 volumes and total volume storage of 12 TiB. Data written  to your gateway-stored volumes is stored on your on-premises storage hardware, and asynchronously backed up to Amazon S3 in the form of Amazon EBS snapshots.

Gateway-VTL volumes : Allows you to perform offline data archiving by presenting your existing backup application with an iSCSI-based VTL consisting of virtual media changer and virtual tape drives. 
Virtual tapes that needed frequent access should be in VTL. These are in S3. Data that needs to be archived should be in VTSwhich are stored in Glacier.

1.Usage Patterns:
Many use cases.Corporate file sharing, enabling existing on premise back up applications to store primay back ups on S3, disaster recovery, mirroring to cloud and moving to Glacier for archiving.

2.Performance:
Performance depends on many factors as GW sits between layers. 
It is possible to use Direct Connect to speed up.


3.Durability and Availability:
Data in turn stored in S3 and Glacier.


4.Scalability and Elasticity:
Since the underlying data stored in S3, the scalability is that of S3.

5.Security:
IAMsecurity for access roles. ASG encrypts data in transit using SSL. Data at rest is encrypted using AES-256. SG supports authentication between iSCSI initiator and gateway using CHAP ( Challenge Handshake Authentication protocol ).


6.Interfaces:
Console to download the appliance software or the AMI. AW CLI . AWS SDK are available to develop applications that interact with SG.

7.Cost Model:
Pay only for what you use. Following pricing components. Gateway usage ( per gateway/month), snapshot storage usage ( per GB per month ), volume storage usage (per GB per month ), VTS usage ( per GB per month ), VTL usage ( per GB per month ), retrieval from virtual tape shelf ( per GB ), data transfer out ( per GB ).  
https://aws.amazon.com/storagegateway/pricing/  


7.AWS Snowball:  

For moving large amounts of data in and out of AWS using secure hardware storage appliance shipped to customer location. Right now US regions have 50 TB and 80 TB models.Snowball supports moving data in to and from S3. From there it can be moved to any other storage service.


1.Usage Patterns:
For moving data size from Terabytes to Peta bytes. Paces where network band width is low. Cloud migration, data center de commission, content distribution, disaster recovery.If data can be transferred over internet in less than a week, Snowball is not a good solution.

2.Performance:
80 TB of data from data source to snowball in 2.5 days.


3.Durability and Availability:
Finally data resides in S3.


4.Scalability and Elasticity:
Can use multiple snowballs.

5.Security:
IAM security role to create snowball job. All data loaded encrypted. Physically secured using TPM ( Trusted Platform Module ) which uses dedicated processor which detects alterations.HIPPA compliant. PHI data.


6.Interfaces:
There are two ways to get started with Snowball:
a) Create an Import or Export job using Snowball management console ...
b) Use Snowball management API and integrate Snowball as part of your data management solution. ( Programmatic ). API uses standard REST interface. 

Two ways to locally transfer data between Snowball appliance and your on-premise data center. 
a) Snow ball client ( downloadable ) is a terminal application that can be run from local work station. Use simple Linux like commands (cp).
b) Use Amzon S3 adapter for Snowball ( downloadable ). Programmatically transfer data between on premise and Snowball using subset of Amazon S3 REST API commands. This allows you to have direct access to Snowball appliance as if it were an Amazon end point.
Example steps.....


7.Cost Model:
Three pricing components. Service fee (per job ), extra day charges as required ( beyond the first 10 days ), and data transfer. For destination storage the standard S3 rate applies.  

8.CloudFront:  

Content delivery web service for static,dynamic or streaming content by making it available at global network of  Edge locations.User requests are routed to nearest Edge location where content available. If it is not available latest content is retrieved from the definitive source. Supports all file types that can be served over HTTP. For on-demand media files RTMP can be used. CloudFront also supports live media over HTTP.
CloudFront is optimized to work with other web services like ELB, EC2, S3 and Route 53.
CloudFront also works seamlessly with any non-AWS origin servers.

1.Usage Patterns:
Ideal for frequently accessed static web content. Can also deliver dynamic or mixed content. Can stream audio and video. Invalidating cache and/or object versioning can be used.

2.Performance:
Meant for low latency high bandwidth delivery of content. Normally routing based on latency to the nearest. Latency is the time it takes for the first byte to reach the destination.

 
3.Durability and Availability:
This is not meant for durable storage. Origin server provides durability ( S3 or web server on EC2 ). There is no central point of failure.


4.Scalability and Elasticity:
You can start slow and grow globally. With CloudFront there is no need to add expensive web server capacity. As the traffic increases or spikes, CloudFront service automatically responds to the demand. Multiple layers of caching at edge locations.

5.Security:
IAM security access roles. Log files can capture details of each request. CloudFront also integrates with CloudWatch metrics.

6.Interfaces:
Can be managed and configured in many ways. Management console supports CloudFront API functions. Enable or disable distributions, configure CNAMES, enable end user logging using the console. Can also use CLI, REST API or the SDKs.

There is no  data API for CloudFront and no command to preload data. Data is automatically puled in to an Edge location on the first access of an object from that location. HTTP or HTTPS possible.


7.Cost Model:
No long term contracts or required minimum monthly - you pay for the content you deliver. Two pricing components a) regional data transfer out  ( per GB ) and b) requests ( per 10K ). Free usage tier has 50GB data transfer and 2 million HTTP/S requests per month.
If you use any AWS service as origin , "origin fetches" will be free. Data transfer out of CloudFront to origin server will be billed at the "Regional Data Transfer Out Of Origin Rates". 


Three price classes as per location of distribution. Global, US/Canada or regional. 
Offers a reserved capacity plan which offers huge discount.











  

Comments

  1. Thanks for sharing complete information about AWS storage service and EC2 instance storage. I found this blog content very helpful.

    ReplyDelete
  2. When Walmart diversified into the supercenter format, scores of grocery store and drug chains felt comparable ache.This is great blog. If you want to know more about this visit here AWS Cloud Certified.

    ReplyDelete
  3. It was a great article provide complete informaiton on AWS storage service. AWS disaster recovery is very important to secure data and you found complete information at AWS disaster recovery pdf. Thanks

    ReplyDelete

  4. Thank you for your information.it is very nice article.
    AWS Online Training

    ReplyDelete
  5. This comment has been removed by the author.

    ReplyDelete

Post a Comment