Want a career in architecture & engineering at Chase?

Resiliency architecture and testing, part 1: AWS Well-Architected and the reliability pillar

Please turn on JavaScript in your browser

It appears your web browser is not using JavaScript. Without it, some pages won't work properly. Please adjust the settings in your browser to make sure JavaScript is turned on.

by Stephen Welsh

4 min read

Chase is planning for a significant workload migration of its systems to Amazon Web Services (AWS) over the coming years.

We have the opportunity to not only take advantage of the modern infrastructure with cloud, but also to design well-architected systems once we modernize our applications. But to achieve success with an AWS system and take full advantage of the processes to deliver software to AWS, we will have to align core concepts and definitions with the industry best practices. In this three-part series, I’ll focus on the availability concept of resiliency and the testing of availability through chaos experiments, then discuss how to establish availability requirements and add them to the deployment module in SEAL.

AWS Well-Architected and the Reliability Pillar

AWS Well-Architected helps cloud architects build secure, high-performing, resilient and efficient infrastructure for a variety of applications and workloads. Built around six pillars — operational excellence, security, reliability, performance efficiency, cost optimization and sustainability — AWS Well-Architected provides a consistent approach for customers and partners to evaluate architectures and implement scalable designs.

There are five design principles to help guide architects, engineers and site reliability engineers (SREs) in building reliable systems that their business partners can agree on.

Automatic recovery from failure: this could be an application, EC2 instances, availability zones (AZ) or relational database system (RDS)
Test recovery procedures: use automated chaos testing to impact or fail the workload and validate the recovery procedures
Horizontal scale: deconstruct workloads into multiple services to reduce the impact of a single failure
Manage capacity: monitor demand and workload use to provision instances appropriately
Manage automation change: changes to the automation that manages the infrastructure also needs to be tracked, reviewed and stored in a code repository

Reliability is, in turn, determined by three other things:

Resiliency: The ability to recover workload from infrastructure or service disruptions and dynamically acquire computing resources to meet demand
Availability: The percentage of time that workload is available for use.
Disaster Recovery (DR): The ability to recover workload on one-time events like natural disasters, large technical failures or attacks; the key measure is the recovery time objective (RTO)

In Part 2, we’ll explore resiliency within Chase. Stay tuned!

Want a career in data?

Please update your browser.

Resiliency architecture and testing, part 1: AWS Well-Architected and the reliability pillar

AWS Well-Architected and the Reliability Pillar

Checking Accounts

Savings Accounts & CDs

Credit Cards

Mortgages

Auto

Chase for Business

Investing by J.P. Morgan

Chase Private Client

About Chase

Sports & Entertainment

Chase Security Center

Other Products & Services:

Chase Survey

You're now leaving Chase

Please update your browser.

Resiliency architecture and testing, part 1: AWS Well-Architected and the reliability pillar

Planning, Resiliency, Architecture

AWS Well-Architected and the Reliability Pillar

AWS Well-Architected Pillars

Checking Accounts

Savings Accounts & CDs

Credit Cards

Mortgages

Auto

Chase for Business

Investing by J.P. Morgan

Chase Private Client

About Chase

Sports & Entertainment

Chase Security Center