Alert Message Please update your browser.

We don't support this browser version anymore. Using an updated version will help protect your accounts and provide a better experience. 

Update your browser

Please update your browser.

We don't support this browser version anymore. Using an updated version will help protect your accounts and provide a better experience.

Update your browser

Close

Resiliency architecture and testing, part 3: Resiliency process within PTX and deploying resiliency and chaos testing

 

By: Stephen Welsh

3 min read

Chase is planning for a significant workload migration of its systems to Amazon Web Services (AWS) over the coming years.

We have the opportunity to not only take advantage of the modern infrastructure with cloud, but also to design well-architected systems once we modernize our applications. But to achieve success with an AWS system and take full advantage of the processes to deliver software to AWS, we will have to align core concepts and definitions with the industry best practices. In this three-part series, I’ll focus on the availability concept of resiliency and the testing of availability through chaos experiments, then discuss how to establish availability requirements and add them to the deployment module in SEAL.

Resiliency Process within PtX and Deploying Resiliency and Chaos Testing

We established the Permit to X (PtX) process to verify and validate a minimum set of requirements (largely nonfunctional requirements) are met before a new application component, module or service is allowed to take customer traffic. The milestones are Permit to Build (PtB), Permit to Deploy (PtD) and Permit to Operate (PtO). In this post, we will go deeper into the AWS Well-Architected Reliability Pillar and how it will align with the PtX process in greater detail.

The previous section introduced how application teams will first deploy their application into production in conjunction with the resiliency rules. We’ll now highlight the day-to-day activities for maintaining the software development lifecycle (SDLC), ensuring peak effectiveness and compliance with the resiliency tests.

Our application and deployment modules contribute functionality to an overall product, and that product will have several roles that contribute to the success of its functionality. The SDLC requires the tech partner, product owner, site reliability engineer (SRE), product architect, application developer and DevOps roles to collaborate and ensure the applications services are reliable and resilient.

There are also many JPMC tools and processes within the SDLC process that will contribute to the overall governance of maintaining resilient applications. The diagram below highlights many of the target state tools. Initial findings of the current and future tooling found that there is some overlap between our lines of business, including determinations on what’s required and agreement on aligning what responsibilities should exist within each of the tools.

As an example, there is some blend of varying degrees with the Arcus evaluator, JET Policy Service, R3, CAVE, CCM and Arcus Reporting amongst the six tools. Some of these tools exist in prod with various functionality, some are about to go to prod, and some are still in requirements and design. Another future paper will go deeper into defining the process by understanding the exact expectations of these tools, and will start to describe boundaries. All six tools may still need to exist, but we will need to determine where each one starts and ends.

Check out Parts 1 and 2 of “Resiliency Architecture and Testing.