Skip to main content

NLP Application in Cases of Violence Against Women

PyData: Research & Applications
Terrace 2A
11:55 on 11 July 2024
30 minutes


Domestic violence is a widespread problem, one which demands attention and policy fixes. But available data is largely unstructured, making analysis difficult for both researchers and policy makers. In this talk, I’ll show you how Python helped me to retrieve, structure, and classify violence victims’ testimony. I’ll show which APIs and libraries allowed me to retrieve the woman’s testimony from YouTube, turn their speech into text, and then analyze the text itself. You’ll come away knowing not just some new Python techniques, but also how those techniques can be used to improve our society. Outline: -Introduction (1m)

  • How to collect data from YouTube? (5m) o reason for collecting data using YouTube o keywords to find videos o YouTube API
  • How to transcribe audio to text? (5m) o Whisper API o how long it took o accuracy
  • Semantic analysis of testimony (10m) o BERTopic o Analysis of relevant words
  • How useful it is for analyzing unstructured data (10m)
  • Conclusion (2m)

The speaker

Deborah Foroni

Deborah Foroni

I am a master in Technology and a Data Scientist. I am part of technology communities such as Pyladies São Paulo and ‘Todas as Letras’ (LGBTQIAP+). Additionally, I am a popular educator at the Technology Center of MTST.