by Tara Paider
January 1 — 11 min read
Our plan was to replace the heritage extract, transform, load (ETL) tooling with a single, cloud-native solution that solved the key challenges we identified. The data technology leadership, along with our architecture and engineering leadership, said “Great. Prove it.”
We had to prove
- Spark Java could handle complex transformations and integrations at a scale as fast as or faster than our vendor tools
- Spark Java was a better solution than the other options via the Spark API
- 600 ETL developers using point-and-click tools could be reskilled to adopt the solution
Challenge accepted!
We had a very small development team of three people, plus an ETL developer with no Java experience from our centralized data team (soon to become one of our primary clients) to work on the proof of concept (POC) and subsequent pilot. We wanted to have a client embedded in our build team to ensure we never left our existing developers behind. We had to build a framework that was easily adoptable by the most junior software engineers that didn’t compromise our architecture and engineering standards.
We began building the early software development kit (SDK) for the framework using a data flow from our Home Lending line of business. We felt that this would represent the level of complexity and size of our flows as an initial set of requirements. The team would build out pieces of the SDK, then hand it over to the ETL developer to see if they understood intuitively how to use the framework (they had previously taken an online course for Java fundamentals through our learning center). Although the team provided guidance where needed, our client was able to use the framework and built a comparable flow to the original built-in the vendor tool. The new flow also had fully automated test coverage! Through this partnership, the team received immediate feedback from the client not only on the functionality but also the ease of use.
After five months, the initial pilot was built. The sample flow could process about three times the amount of data in half the time as the original vendor-built solution. Our leaders were thrilled!
Full Productization
As we left the pilot phase and embarked on our modernization journey at scale, it was important to think about true product architecture and boundaries in the context of our Data & Analytics products.
Our existing heritage technology stacks went beyond data movement into metadata, business data registration, access control, storage platform onboarding and access, archiving and purging (data retention) and more. We had to all agree that we were not “replacing” heritage functionality with a new thing. We were all – together – fundamentally modernizing how we manage data at the bank. Chase organized into a product structure, so within the Data & Analytics group, we needed to ensure functionality was properly slotted into the right product architecture. This ensured shared functionality existed only once, while also mitigating churn within the teams due to feature requests that were out of scope. Product owners and technology partners know whom to shift a feature request to when it was not within their product boundaries and architecture.
For our data pipeline, we focused only on data movement and transformation and the metadata required to operate a pipeline. We worked with our peer products to review existing heritage functionality and ensure functionality that was required in target state was in the appropriate product roadmap. We still occasionally have missing features come up for discussion and because we have a well-understood product architecture, the discussions are significantly easier than they once were!
As we populated the product backlog with the intended functionality, we worked with our biggest client to identify an MVP pilot for end-to-end functionality. We selected an application that was representative of the overall landscape for data lake use cases – our initial focus – and documented what within the overall feature backlog was required to fulfill the minimum viable product (MVP).