Is RAG all you need? A look at the limits of retrieval augmented generation

Track:: PyData: LLMs (2024)
Type:: Talk (long session)
Level:: advanced
Room:: Terrace 2A
Start:: 10:45 on 10 July 2024
Duration:: 45 minutes

Abstract

Retrieval-Augmented Generation (RAG) is a widely adopted technique to expand the knowledge of LLMs within a specific domain while mitigating hallucinations. However, it is not a silver bullet that is often claimed to be. A chatbot for developer documentation and one for medical advice may be based on the same architecture, but they have vastly different quality, transparency and consistency requirements. Getting RAG to work well on both can be far from trivial.

In this talk we will first understand what RAG is, where it shines and why it works so well in these applications. Then we are going to see the most common failure modes and walk through a few of them to evaluate whether RAG is a suitable solution at all, how to improve the quality of the output, and when it’s better to go for a different approach entirely.

Recording

Resources

Slides