AWS Data Replication and Redundancy with Managed Services
This article will look at a problem, where you need replication of PNG images to a second region in the AWS account to improve data durability, and in the case of a disaster, recover 100% of data to the primary region. Now, the possible solution to this is to configure two S3 buckets; one in each region to be completely isolated and avoid sharing the same failure line. For this exercise, versioning must be enabled and cross-region replication (CRR) must be configured.
Technical requirements
- You will need an AWS account. If you have not already done so, you can create one free (https://aws.amazon.com/free/) or, if you have one already, make sure you have enough privileges to create IAM users.
- To perform the S3 exercises, it is recommended to download the AWS Command Line Interface, which can be found here, for a getting started guide(https://aws.amazon.com/cli/).
- For the CLI, you need Python 2.6.5 or later, and the package manager PIP. Once installed, configure your client with this command:
aws configure
This command will prompt you for the ACCESS KEY and SECRET ACCESS KEY from your IAM user, and for a working region. The region code can be obtained from the following resource: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html.
Time to get started!
Data replication and redundancy with managed services
This diagram shows that an S3 bucket can be found in one of three states:
Create an interregional deployment, using two S3 buckets with automatic versioning and replication, to synchronize objects from North America to Europe, just as depicted in the following diagram:
Now, follow the given steps to configure a bucket in different regions in the same AWS account.
Create the origin bucket (us-east-1):
Enable versioning on this bucket (this will replicate all object operations including ACLs, tags, and metadata). Replication is a useful feature when it comes to preventing data loss from accidental deletes:
Note that on the confirmation screen, validate that the region is North Virginia (us-east-1) and versioning is enabled.
Provision the destination bucket in the secondary region (eu-west-1):
A copy of the existing configurations on this bucket can be enabled, and once the bucket has been created, you need to make sure versioning is active; if not, the case enables this feature in the bucket properties:
As a result, you have both buckets in the same account but in different regions:
Now, configure automatic replication; you’ll need to select Replication under Management in the origin bucket (csa-crr-replication-bucket) to create a replication rule. This replication rule is adding file prefixes to filter the objects to be replicated:
It is possible to choose a different storage option and inherit the ownership of the objects to the bucket destination owner. To confirm the current screen, a new role must be created, which will assume the identity to gain write access to the resource:
The next step is to upload an image to the origin bucket:
Now, you can check the replication status. You can observe in (1) the corresponding ETag on the original image. ETag is an HTTP mechanism to perform cache validations; in AWS, S3 is used to generate a unique hash of the object and maintain a change registry. In this case, the replication preserves the original metadata, for example, the Etag, Last Modified, and Version ID.
On the replication status, you can see that the original bucket shows COMPLETED, while the destination bucket is a REPLICA, accounting that the copy has been made successfully (3).
Something significant to note is that objects with the REPLICA status cannot be replicated again into another bucket; this operation can only be done by using S3 APIs with the CopyObject action, which involves a GET after a PUT operation.
Replicating tags
Tags are a way to add contextual information in the form of key=value to classify objects in S3. You only need to select the object checkbox and choose the option Add tags under More:
Changes propagate almost immediately asynchronously to the destination bucket:
Your applications can use this metadata as a means to query additional information about the object, such as the EmployeeID=123456 or Environment=Production. So, it becomes important to know the following limit; to overcome this situation, you’ll need to build a secondary index to save and query this metadata (for example a database table):
Replicating ACLs
By default, each object created in a bucket is owned by the root user of the account, this is known as the resource owner. S3 gives you the option to use two different models of permissions depending on the actions to be performed. To clarify this, use the next scenario.
A hosting company is proprietary of a storage bucket for their customers, and it wants to bring forward a confidential service in which the customer is the only owner of their objects, writing an ACL de-negating all other user access, even root.
Enable public access to the object modifying its ACL in the origin bucket:
In the Public access group, check the Read object checkbox:
Validate the ACL replicated into the destination bucket:
Now, it is possible to access the object via the DNS name in both regions, it is relevant to mention that every object in S3 is opaque, being that each extension and file type is meaningless. S3 is a service optimized to store vast amounts of data with high availability and amazing durability, so it does not have hierarchical capabilities like a regular file system. Imagine S3 as a considerable HashMap data structure as a service.
Distributed nature of S3
Each new object works on a strong consistency model called read-after-write for which every object is written in at least three different AZs before the SUCCESS code is returned, avoiding stale reads on other clients. Nevertheless, if a client immediately does a LIST keys operation, they could have an inconsistent read, and the new object will be invisible to this partition:
Each modified object has an eventual consistency model for PUT overwrites and DELETES on existing objects, leading to probable stale reads if a GET or LIST operation is immediately performed. This model is less strict and yields strong consistency in favor of availability, performance, and network partition tolerance (two AZs failing simultaneously):
Navigate both buckets and click on the Open button so that you can validate that the original object has now been replicated successfully.
You can verify the object in both buckets (origin and destination) by selecting the object checkbox, More | Open, as shown in the previous screenshot.
Metadata replication
When you upload this tulips.jpg, S3 automatically reads the file type inferring the MIME type image/jpeg and stored this information as metadata. Every S3 object can have two kinds of metadata:
Now, modify the system-defined metadata Content-Type to binary/octet-stream so that you can force a web browser to download:
The result of this is due to the fact that the web browser does not have a plugin to handle the kind of file received, and the default behavior is to download the file to the client:
Validate that the metadata replication was successful, using the destination bucket:
Encryption replication
S3 can use a standard symmetric encryption algorithm on the server side (S3 servers), AES-256, to improve the at-rest security of your data and ensure information confidentiality:
Follow up the replication of this feature in the second region:
Note the service limits:
Also, note that objects with more than 100 MB of size must be uploaded using the multi-part API.
If you found this article interesting, you can explore AWS Certified Solutions Architect – Associate Guide* to learn from the AWS subject-matter experts, apply real-world scenarios and clear the AWS Certified Solutions Architect Associate exam. AWS Certified Solutions Architect – Associate Guide will help you to not only be fully prepared to pass the AWS Certified Solutions Architect – Associate exam but also be capable of building secure and reliable applications.