Skip to main content

The rise of the YAML engineer

PyData: Data Engineering
South Hall 2A
11:55 on 11 July 2024
30 minutes


In the analytics world, many of the trending data frameworks, written in Python or other languages, are adopting the declarative paradigm: users describe the desired end state, and let the system figure out the best steps to reach that state. This can be seen at many layers: data extraction, data transformation, data visualization, but also infrastructure, data quality, governance… Lots of those frameworks use YAML as the interface between the users (data engineers, data analysts and other data practitioners) and the desired system state. In this presentation, I propose to dive into the origins of the declarative paradigm for data systems, what it means for us as data practitioners, and why we’re actually not becoming glorified YAML developers. I will also talk about state management and GitOps, and probably complain about YAML multiline strings.

The speaker

Matthieu Caneill

Matthieu Caneill

Matthieu Caneill is a data & software engineer, currently working at Picnic in Amsterdam, where he’s taking care of data platform topics.