Impersonation in Data Engineering: No More Credentials in Your Code!

Track:: PyData: Data Engineering (2024)
Type:: Talk
Level:: intermediate
Room:: North Hall
Start:: 16:05 on 10 July 2024
Duration:: 30 minutes

Abstract

Imagine stepping into your dream job as a python data developer, ready to dive into coding and show your talent, only to run into missing database credentials that leave you idle for days due to slow interdepartmental communications and permission issues. Frustrating, right?

In my talk, I’ll showcase how we can make this whole process much easier. I’ll explain how using something called “Identity and Access Management” (IAM) lets everyone in a company, including machines, get to work without these annoying holdups.

Surprised to hear that a machine like Airflow can have its own identity? I’ll explain how we use something known as Workload Identity as a crucial part in this ecosystem integrating Airflow within our infrastructure.

A central pillar of the discussion will be the role of impersonation in our setup - how it ties together various elements to enable a harmonious, secure, and maintainable infrastructure. The resulting architecture not only fosters an improved developer experience, faster project delivery, increased productivity and transparency, but also serves as a foundation for more advanced concepts such as data mesh implementation.

Join me in this talk to discover the synergy of IAM, Workload Identity, and impersonation. Let’s equip you with a model that promotes easy team onboarding, transparent access management, and a secure, frustration-free workspace focused on delivery.

And for those interested in having their code perform consistently, whether on a local machine or in the cloud, I will share a small but powerful Docker hack to achieve things consistently no matter where your code is running.

Recording

Resources

Presentation slides