Are LLMs smarter in some languages than others?

Track:: PyData: LLMs (2024)
Type:: Poster
Level:: intermediate
Room:: Exhibit Hall
Start:: 13:00 on 11 July 2024
Duration:: 60 minutes

Abstract

Have you ever asked yourself if Large Language Models (LLMs) perform differently across various languages? I have.

In this poster session, I will demonstrate how tokens, embeddings, and the LLMs themselves perform when utilized in 30 different languages. I will illustrate how languages influence pricing and various model characteristics.

Spoiler:

The Greek language is the most expensive to process by most models.
Processing Asian languages on Gemini is cheaper.
You can save up to 15% of tokens by removing diacritics.

Resources

Similarity