Taming One Quadrillion Data Points with Apache Iceberg and Parquet

30 minutes


Bloomberg is a leading provider of financial data, with financial data spanning multiple decades. Handling and organizing these huge datasets can be challenging, with typical concerns including sluggish query performance, high storage costs, and data consistency problems.

This talk will describe how Apache Iceberg and Parquet are the dynamic duo of big data management, offering ACID transactions, time travel, and columnar storage capabilities that enable lightning-fast query performance and seamless schema evolution for even our largest workloads.

The session will introduce Apache Iceberg, an open-source table format that enables incremental updates, versioning, and schema evolution. The discussion will then focus on Parquet files, which store data in a compressed and columnar format to enhance query performance and lower storage costs. Finally, the session will outline how our Enterprise Data Lake Applications engineering team has harnessed the capabilities of Apache Iceberg (especially PyIceberg) to revolutionize our data management and analytical processing workflows.

Attendees will be able to apply the best practices discussed in the talk to build better infrastructure for their growing data demands and spur innovation within their organization.