Frequently updating and enhancing operations methods – Adhering to AWS Well-Architected Principles

Just as your applications keep evolving over time, your operational procedures need to keep up with the changes as well. It’s not a one-time activity, and you should continuously seek opportunities to refine and enhance these procedures so that new operational risks are managed effectively. To validate your existing procedures, you can organize operational game days that allow your teams to inject failures into a test environment and collaborate on solving them quickly while validating that your existing operational procedures, runbooks, scripts, and automation still work as expected.

Let’s go through an implementation workflow that is often needed by enterprises that work with multiple AWS accounts and regions—automatic enablement of opt-in regions for new AWS accounts. Activities such as these, if not automated, can lead to a lot of operational overhead for the teams managing the AWS platform for the entire organization.

Automatic enablement of opt-in AWS regions for new accounts

As discussed before, we will not go into code -level details of the implementation but describe the implementation workflow in sufficient detail so that you can leverage these understandings to build the same, or similar, automations in the future and optimize your operational practices. Have a look at the following diagram:

Not all AWS regions are enabled by default when a new account is created. All AWS regions where you plan to host your workloads need to be opted in first before you can deploy any resources into them. This also means that basic security guardrails, automations, network baselines, and so on cannot work until the new region is configured for use. To handle this use case, one can implement an automated workflow design, as shown in Figure 12.1. The sequential steps for this workflow are described next:

  1. Platform developers leverage infrastructure as code (IaC) practices to add a new account configuration, using tools such as CloudFormation, Terraform, or Cloud Development Kit (CDK).
  2. The CI/CD pipeline kicks off and provisions a new account in the AWS Organizations organization.
  3. The AWS Organizations service emits an event in us-east-1, which is the default region where events from global services are generated.
  4. To limit access to the organization’s management account, it is a good security practice to forward respective events to a designated automation or tooling account and host all the automation there. In this architecture, we follow the same practice and forward the event to a custom event bus, in another account, in a different region.
  5. Once the event is received in the target account, an EventBridge rule invokes the AWS Step Functions state machine, which is an AWS service for orchestrating workflows.
  6. The AWS Step Functions state machine can further trigger several other AWS services and orchestrate their execution. In this example, we use the state machine to invoke two Lambda functions: one that enables the opt-in region (eu-central-2, or Zurich in this case) and another that checks whether the region is ready for use. This operation can take anywhere between 15 and 60 minutes, which is a strong indicator of why such operations should be automated.
  7. The two Lambda functions invoke respective API calls on the new AWS account to achieve the desired target state, which is the enablement of the new AWS region so that workloads can be hosted in the new account.

Having learned about the best practices to achieve operational excellence, let’s discuss some ways to secure your workloads and infrastructure on AWS.

Leave a Reply

Your email address will not be published. Required fields are marked *