Don't fix bad data, do this instead
- Track:
- PyData: Data Engineering
- Type:
- Talk
- Level:
- intermediate
- Room:
- North Hall
- Start:
- 11:55 on 11 July 2024
- Duration:
- 30 minutes
Abstract
In a time where GenAI is quickly growing in popularity, along with prescriptive analytics and online ML models, the question is raised whether we still need to care about data quality? We strongly believe that the answer is yes, and even more so than before!
Our expectations of data are high, and this often leads to frustrations when reality does not meet these expectations.
In the pursuit of data quality, expectations must be grounded in reality. It is often the case that a gap exists between anticipated outcomes and the actual data reality, which leads to frustration and mistrust.
This talk delves into pragmatic strategies that can be employed to bridge this gap. The talk will discuss both the technical (hard) and cultural (soft) measures implemented to uphold these standards.
Key Takeaways:
- Integration tests serve as a proactive barrier, preempting the violation of data contracts, unlike reactive data quality checks.
- Prioritisation is crucial; a product-centric mindset is key when evaluating the balance between resource investment and potential gain.
- Data quality management is requiring both hard and soft measures
Are you a data scientist, software engineer, product manager, or data engineer? Join us in this discussion; data quality concerns us all.