Alert Message Please update your browser.

We don't support this browser version anymore. Using an updated version will help protect your accounts and provide a better experience. 

Update your browser

Please update your browser.

We don't support this browser version anymore. Using an updated version will help protect your accounts and provide a better experience.

Update your browser

Close

Achieving data autonomy, Part 1: Setting the stage for change

by Tara Paider

January 1 — 9 min read

In 2018, Chase was in the middle of transforming how we deliver technology solutions. There were several converging strategies, including modern, cloud-native architecture, skills and talent and delivery methodology.

First, the bank was adopting a "cloud first, cloud native" architecture as well as emphasizing the use of microservices and application programming interfaces (APIs) to build highly componentized and decoupled systems. The purpose for this was ensuring autonomy for system development. Dependency management between systems during software development was brutal – the data technology stack was based on highly proprietary and expensive vendor technologies, such as AbInitio and Informatica. While these stacks were simple to use with their point-and-click user experiences and provided best-in-suite end-to-end data management capabilities, their value diminished as soon as they had to be integrated with other similar tools or platforms in other parts of the firm.

Just within Community & Consumer Banking (CCB), we had over 20 different technologies to deliver our more than 50,000 pipelines for just our data lake. This didn’t include other languages and solutions that were being used outside of our centralized data team. The firm also required integration with their new metadata solution, which was not natively compatible with the multiple metadata repositories used by the various technology stacks within CCB. All this, moreover, required customization to achieve, and none of the existing solutions could be simply picked up and moved to the cloud. In fact, we had decided to tightly couple AbInitio to Hortonworks and specifically YARN for resource management within our Hadoop cluster through a custom solution years earlier.

Our data technology stack also required highly specialized skills, which created a very top-heavy organization of senior engineers. Talent was difficult to find as a result, and we paid about a 40% premium on them because they were senior and specialized. We could not leverage our phenomenally successful junior talent pipeline program for hiring computer science graduates from college or technology bootcamp programs because AbInitio and Informatica simply are not taught in these programs. Using these vendor tools is not considered “coding,” and people hired from these programs are required to commit code at least 50% of the time. These constraints made it exceedingly difficult to build a talent pipeline.

Finally, CCB was in the process of adopting the agile methodology on its journey to a full product model. This journey forced us to evaluate how the structure of the data delivery teams conflicted not only with the most basic principles of agile but also the spirit of what we were trying to accomplish by adopting a product model. Teams could not be staffed to do full feature delivery with T-shaped software engineers (engineers with broad knowledge of several disciplines that nevertheless focus on one the most), which meant that they could not be autonomous or empowered to do their job. There was a lot of waste when teams would hand off work to the data component teams.

Planning for full features was difficult as well. Adoption of the firm’s DevOps capabilities was extremely difficult, if not impossible in some cases, due to the highly proprietary nature of the vendor solutions. This meant that the ability to improve our time to market through test coverage, test automation, continuous integration and automated deployment was highly dependent on leveraging a technology stack that aligned with our firmwide standards.

To summarize the key challenges

  • Plethora of technologies, most not suitable for cloud or our firmwide tool chain
  • Technologies are tightly coupled to the various data platforms
  • Expensive technology and expensive people to support and build solutions
  • Highly specialized work force not in alignment with Product Architecture and full feature teams
  • Data management and data governance policies are inconsistently applied

One final note on the architecture and organizational strategy – we wanted to conform to a data provider / consumer pattern by componentizing the pipelines. In the traditional model of data movement, data teams pull data from a source or grab whatever data they are producing, then clean it up and integrate it. Data teams spend a lot of their time cleaning the data from the systems of record, all of which is done in a single, large workflow for every data set. By separating the responsibility of providing data from consuming data and giving both providers and consumers the necessary frameworks to do this effectively, the quality of data will increase dramatically.

Failing to do this would have more serious long-term consequences

  • Proliferation of unmanageable technical debt, longer timelines for new functionality, exploding costs and rapidly growing data environment
  • Inability to adopt new tools for data scientists and analysts
  • Diminishing quality
  • Fatigued employees frustrated by not working on modern technology
  • Inability to find employees with niche vendor skills
  • Senior developers wanting a promotion and career growth that was stymied by our top-heavy organization

Therefore, we could not fail! With this comes a clarity of purpose and a shared vision for all of us involved.