Skip to main content

From Pandas to production: ELT with dlt

Track:
PyData: Data Engineering
Type:
Sponsored
Level:
intermediate
Room:
North Hall
Start:
14:35 on 10 July 2024
Duration:
30 minutes

Abstract

We created the “data load tool” (dlt), an open-source Python library, to bridge the gap between data engineers and data scientists. In this talk you will learn about how dlt can help you overcome typical roadblocks in your data science workflows, and how it streamlines the transition from data exploration to production. We will also discuss the pains of maintaining data pipelines and how dlt can help you to avoid common engineering headaches.

Join us to learn best practices around data handling and managing failures with real-life examples!


The speakers

Violetta Mishechkina

Violetta Mishechkina

I’m a Solution Engineer from dltHub. I’ve started my professional journey as a Data Scientist at Nokia. So I worked a lot with Telecommunication data and Time Series. Both Classical ML and Neural Networks were quite popular at the time, so I’ve used them in my projects. The next step for me was moving away from Data Science to MLOps, because I felt that the problem of moving from an ML model to a production is not solved but it is highly important. All questions about getting data, versioning, and proper testing are still on the table.

When I’ve joined dltHub 4 months ago, I’ve entered the world of Data Engineering. Having experience mostly in ML I had to learn how to talk in Data Engineering language. All the terms like schema evolution, data ingestion, and semantic layer were new to me. That is partially why I was so impressed by dlt Python library, cause it abstracted away a lot of these issues. Personally, I believe, that dlt could become a part of standard modern open-source data stack. Because it was built by people knowing what they are doing and tackling the problem of data ingestion hundreds of times.

Linkedin: https://www.linkedin.com/in/violetta-mishechkina/

Adrian Brudaru

Adrian Brudaru

Hi There! I’m Adrian, a data engineer. At dlthub, I am one of the cofounders and original inventor of the first version of dlt.

I started my data career in 2012, in the Berlin startup scene, where I did 5y of employed work. I eventually moved to freelancing, because I liked to do more work and skip the drama.

In this time i built many data warehouses, data products and data teams and saw first hand the friction between data scientists and engineering, that came largely from a different toolset and abstractions needed. This made me try building various boilerplate loading code and identify the need with clarity.