The rise of the YAML engineer
- Track:
- PyData: Data Engineering
- Type:
- Talk
- Level:
- intermediate
- Room:
- South Hall 2A
- Start:
- 11:55 on 11 July 2024
- Duration:
- 30 minutes
Abstract
In the analytics world, many of the trending data frameworks, written in Python or other languages, are adopting the declarative paradigm: users describe the desired end state, and let the system figure out the best steps to reach that state. This can be seen at many layers: data extraction, data transformation, data visualization, but also infrastructure, data quality, governance… Lots of those frameworks use YAML as the interface between the users (data engineers, data analysts and other data practitioners) and the desired system state. In this presentation, I propose to dive into the origins of the declarative paradigm for data systems, what it means for us as data practitioners, and why we’re actually not becoming glorified YAML developers. I will also talk about state management and GitOps, and probably complain about YAML multiline strings.