AWS Data Replication and Redundancy with Managed Services

This article will look at a problem, where you need replication of PNG images to a second region in the AWS account to improve data durability, and in the case of a disaster, recover 100% of data to the primary region. Now, the possible solution to this is to configure two S3 buckets; one in each region to be completely isolated and avoid sharing the same failure line. For this exercise, versioning must be enabled and cross-region replication (CRR) must be configured.

Guest Article by Gabriel Ramirez, Authorized Trainer for AWS & Google Cloud
Learn data replication and redundancy with managed services in this article by Gabriel Ramirez, a passionate technologist who works as an Authorized Trainer for Amazon Web Services and Google Cloud and Stuart Scott, the AWS content lead at Cloud Academy where he has created over 40 courses reaching tens of thousands of students.

Technical requirements

  1. You will need an AWS account. If you have not already done so, you can create one free (https://aws.amazon.com/free/) or, if you have one already, make sure you have enough privileges to create IAM users.
  2. To perform the S3 exercises, it is recommended to download the AWS Command Line Interface, which can be found here, for a getting started guide(https://aws.amazon.com/cli/).
  3. For the CLI, you need Python 2.6.5 or later, and the package manager PIP. Once installed, configure your client with this command:

aws configure

This command will prompt you for the ACCESS KEY and SECRET ACCESS KEY from your IAM user, and for a working region. The region code can be obtained from the following resource: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html.

Time to get started!

Data replication and redundancy with managed services

This diagram shows that an S3 bucket can be found in one of three states:

https://static.packt-cdn.com/products/9781789130669/graphics/a5da030b-7fba-4308-94dc-8163b60201bc.png

 

Create an interregional deployment, using two S3 buckets with automatic versioning and replication, to synchronize objects from North America to Europe, just as depicted in the following diagram:

https://static.packt-cdn.com/products/9781789130669/graphics/2390d048-cc73-4a91-9982-e6cf113a88b1.png

 

Now, follow the given steps to configure a bucket in different regions in the same AWS account.

Create the origin bucket (us-east-1):

https://static.packt-cdn.com/products/9781789130669/graphics/38427e1c-a6de-4fad-b863-03d414f0a499.png

 

Enable versioning on this bucket (this will replicate all object operations including ACLs, tags, and metadata). Replication is a useful feature when it comes to preventing data loss from accidental deletes:

https://static.packt-cdn.com/products/9781789130669/graphics/b7fb5e30-e964-44c2-9b57-be74907500c9.png

 

Note that on the confirmation screen, validate that the region is North Virginia (us-east-1) and versioning is enabled.

https://static.packt-cdn.com/products/9781789130669/graphics/0619877f-ff9e-471a-b51e-01b8c5a5fbf5.png

 

Provision the destination bucket in the secondary region (eu-west-1):

https://static.packt-cdn.com/products/9781789130669/graphics/10f81555-a3ba-4686-a906-5517d7bd25bb.png

 

A copy of the existing configurations on this bucket can be enabled, and once the bucket has been created, you need to make sure versioning is active; if not, the case enables this feature in the bucket properties:

https://static.packt-cdn.com/products/9781789130669/graphics/72b18f38-962b-471c-b7b7-a9092fec673c.png

 

As a result, you have both buckets in the same account but in different regions:

https://static.packt-cdn.com/products/9781789130669/graphics/658d1678-cb06-48dd-add3-da2e2c6d50b7.png

Now, configure automatic replication; you’ll need to select Replication under Management in the origin bucket (csa-crr-replication-bucket) to create a replication rule. This replication rule is adding file prefixes to filter the objects to be replicated:

https://static.packt-cdn.com/products/9781789130669/graphics/ea8086bb-b8d6-4165-9b2e-2be33921a34d.png

 

It is possible to choose a different storage option and inherit the ownership of the objects to the bucket destination owner. To confirm the current screen, a new role must be created, which will assume the identity to gain write access to the resource:

https://static.packt-cdn.com/products/9781789130669/graphics/0551a727-19cf-4250-96b0-062944bfd6fe.png

 

The next step is to upload an image to the origin bucket:

https://static.packt-cdn.com/products/9781789130669/graphics/1c7c693f-df11-484e-ab7d-0ffadea4081d.png

 

Now, you can check the replication status. You can observe in (1) the corresponding ETag on the original image. ETag is an HTTP mechanism to perform cache validations; in AWS, S3 is used to generate a unique hash of the object and maintain a change registry. In this case, the replication preserves the original metadata, for example, the EtagLast Modified, and Version ID.

On the replication status, you can see that the original bucket shows COMPLETED, while the destination bucket is a REPLICA, accounting that the copy has been made successfully (3).

https://static.packt-cdn.com/products/9781789130669/graphics/997f4fbe-d1cd-49d2-9ea5-f3090836867a.png

 

Something significant to note is that objects with the REPLICA status cannot be replicated again into another bucket; this operation can only be done by using S3 APIs with the CopyObject action, which involves a GET after a PUT operation.

Replicating tags

Tags are a way to add contextual information in the form of key=value to classify objects in S3. You only need to select the object checkbox and choose the option Add tags under More:

https://static.packt-cdn.com/products/9781789130669/graphics/d690ce8f-82ca-4afd-a870-4c79a314cc1c.png

 

Changes propagate almost immediately asynchronously to the destination bucket:

https://static.packt-cdn.com/products/9781789130669/graphics/70190615-d817-4885-a7a2-f289fb24dff1.png

 

Your applications can use this metadata as a means to query additional information about the object, such as the EmployeeID=123456 or Environment=Production. So, it becomes important to know the following limit; to overcome this situation, you’ll need to build a secondary index to save and query this metadata (for example a database table):

Replicating ACLs

By default, each object created in a bucket is owned by the root user of the account, this is known as the resource owner. S3 gives you the option to use two different models of permissions depending on the actions to be performed. To clarify this, use the next scenario.

A hosting company is proprietary of a storage bucket for their customers, and it wants to bring forward a confidential service in which the customer is the only owner of their objects, writing an ACL de-negating all other user access, even root.

https://static.packt-cdn.com/products/9781789130669/graphics/1c02769d-e968-4d99-ad64-653a1e69d1e7.png

 

Enable public access to the object modifying its ACL in the origin bucket:

https://static.packt-cdn.com/products/9781789130669/graphics/28b30612-d4d5-444f-a756-5c338bbdb1b3.png

In the Public access group, check the Read object checkbox:

https://static.packt-cdn.com/products/9781789130669/graphics/28d211c9-7661-4318-8566-6c22c42ace6e.png

 

Validate the ACL replicated into the destination bucket:

https://static.packt-cdn.com/products/9781789130669/graphics/9f525ef8-fe81-476c-9162-b901bc431def.png

Now, it is possible to access the object via the DNS name in both regions, it is relevant to mention that every object in S3 is opaque, being that each extension and file type is meaningless. S3 is a service optimized to store vast amounts of data with high availability and amazing durability, so it does not have hierarchical capabilities like a regular file system. Imagine S3 as a considerable HashMap data structure as a service.

Distributed nature of S3

Each new object works on a strong consistency model called read-after-write for which every object is written in at least three different AZs before the SUCCESS code is returned, avoiding stale reads on other clients. Nevertheless, if a client immediately does a LIST keys operation, they could have an inconsistent read, and the new object will be invisible to this partition:

https://static.packt-cdn.com/products/9781789130669/graphics/f8edc1fb-bc7e-4271-ab13-4ce62b4f8579.png

 

Each modified object has an eventual consistency model for PUT overwrites and DELETES on existing objects, leading to probable stale reads if a GET or LIST operation is immediately performed. This model is less strict and yields strong consistency in favor of availability, performance, and network partition tolerance (two AZs failing simultaneously):

https://static.packt-cdn.com/products/9781789130669/graphics/840ae214-51b1-4476-94d8-ccb57424daaa.png

 

Navigate both buckets and click on the Open button so that you can validate that the original object has now been replicated successfully.

https://static.packt-cdn.com/products/9781789130669/graphics/a82d681f-e41b-4fd8-be5f-740f4cd5147f.png

 

You can verify the object in both buckets (origin and destination) by selecting the object checkbox, More | Open, as shown in the previous screenshot.

Metadata replication

When you upload this tulips.jpg, S3 automatically reads the file type inferring the MIME type image/jpeg and stored this information as metadata. Every S3 object can have two kinds of metadata:

https://static.packt-cdn.com/products/9781789130669/graphics/17a864d6-1941-4252-bcd0-f915426aa523.png

Now, modify the system-defined metadata Content-Type to binary/octet-stream so that you can force a web browser to download:

https://static.packt-cdn.com/products/9781789130669/graphics/d813c494-a293-46ad-ad8a-90882e0c9415.png

 

The result of this is due to the fact that the web browser does not have a plugin to handle the kind of file received, and the default behavior is to download the file to the client:

https://static.packt-cdn.com/products/9781789130669/graphics/be3da8af-76bb-4b4b-a7c5-a7c850c9dfa9.png

 

Validate that the metadata replication was successful, using the destination bucket:

https://static.packt-cdn.com/products/9781789130669/graphics/2f37713b-a3d5-445c-9e10-5464279aae8a.png

 

Encryption replication

S3 can use a standard symmetric encryption algorithm on the server side (S3 servers), AES-256, to improve the at-rest security of your data and ensure information confidentiality:

https://static.packt-cdn.com/products/9781789130669/graphics/e38a66df-01bc-4f50-85c3-b00dbae32574.png

 

Follow up the replication of this feature in the second region:

https://static.packt-cdn.com/products/9781789130669/graphics/84c29b75-506a-48a9-9361-d5cf3ecbb8e2.png

 

Note the service limits:

https://static.packt-cdn.com/products/9781789130669/graphics/70ede716-c4c1-4678-97f3-b1e41373d4bf.png

Also, note that objects with more than 100 MB of size must be uploaded using the multi-part API.

If you found this article interesting, you can explore AWS Certified Solutions Architect – Associate Guide* to learn from the AWS subject-matter experts, apply real-world scenarios and clear the AWS Certified Solutions Architect Associate exam. AWS Certified Solutions Architect – Associate Guide will help you to not only be fully prepared to pass the AWS Certified Solutions Architect – Associate exam but also be capable of building secure and reliable applications.

Tell us what you think!

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: