The rise of the YAML engineer

Track:: PyData: Data Engineering (2024)
Type:: Talk
Level:: intermediate
Room:: South Hall 2A
Start:: 11:55 on 11 July 2024
Duration:: 30 minutes

Abstract

In the analytics world, many of the trending data frameworks, written in Python or other languages, are adopting the declarative paradigm: users describe the desired end state, and let the system figure out the best steps to reach that state. This can be seen at many layers: data extraction, data transformation, data visualization, but also infrastructure, data quality, governance… Lots of those frameworks use YAML as the interface between the users (data engineers, data analysts and other data practitioners) and the desired system state. In this presentation, I propose to dive into the origins of the declarative paradigm for data systems, what it means for us as data practitioners, and why we’re actually not becoming glorified YAML developers. I will also talk about state management and GitOps, and probably complain about YAML multiline strings.

Recording

Resources

Slides (pdf)