Skip to main content

How we used vectorization for 1000x Python speedups (no C or Spark needed!)

Track:
PyData: Machine Learning, Stats
Type:
Talk (long session)
Level:
intermediate
Room:
Forum Hall
Start:
16:05 on 11 July 2024
Duration:
45 minutes

Abstract

Want to make all your code faster? With matrices, library knowledge, and a sprinkle of creativity, you can consistently speed up multivariate Python functions by 1000x!

Modal optimization requires simple axioms - arithmetic, checking a case, calling the right sklearn function, and so on. When that’s not sufficient, three core tricks - converting conditional logic to set theory, stacking vectors into a matrix, and shaping data to match library expectations - cover the vast majority of real world cases (90% of the ~400 functions we vectorized).

At Bloomberg, ESG (Environmental, Social, and Governance) Scores require complex computations on large data sets. Time-series computations are fundamental for Governance - one UDF infers board support for a policy from prior cyclical votes and other time offset inputs. By rewriting the pandas backfill as a series of reductions on a 4-tensor, we reduced the runtime from 45 minutes to 10 milliseconds! Analogously, due to real world complexity, finance UDFs can end up with 100+ if/else branches in one function. With a mix of De Morgan’s laws and sparse matrix representations, we simplified the cases and achieved 1000x+ speedups.

We’ll conclude with a quick overview of cutting-edge tools, and hope you’ll leave with a concrete strategy for vectorizing financial models!


The speakers

Jonathan Hollenbeck

Jonathan Hollenbeck is a Senior Software Engineer at Bloomberg for the ESG (Environmental, Social & Governance) team, where he delivers performance & reliability improvements for financial models and data transformations. He has five years of experience in scientific computing, using Python, Julia, and R. After receiving a bachelor’s degree in computer science from UC Davis, he started his software engineering career at Learning at Cisco. During his early career, he integrated content development workflows and learner telemetry into a highly reliable, near-real-time database system with 1,100 internal users. He then took on a strategic role, supporting cross-org learning integration and driving inception + productionalization of ML/AI training projects, one of which achieved CEO recognition & a partnership with the White House. During this time, he earned a master’s degree in Computational & Applied Mathematics from Stanford University, with a focus on Machine Learning ethics.

Justine Wezenaar

Justine Wezenaar

Justine Wezenaar is a Software Engineering Team Lead for Bloomberg’s ESG (Environmental, Social & Governance) Quant team, which owns the implementation and maintenance of the firm’s quantitative parametric scoring models. She took a less-traditional route to Software Engineering, studying mathematics and theoretical physics at McGill University, then working as a data scientist for a healthtech startup in her hometown Halifax, Canada before joining Bloomberg Engineering in New York City in 2018. Before joining the ESG team in 2022, Justine was on the quant engineering team for Bloomberg’s Evaluated Pricing (BVAL) product, where she worked on pricing models for mortgage-backed securities. In her role, her team builds systems which must satisfy both the performance and reliability requirements of Engineering, while also remaining sufficiently flexible and agile to accommodate the Research and Product teams’ responses to the dynamic ESG market landscape.