Why Can't AI Understand Tunisian Arabic?

Have you ever tried talking to ChatGPT or any AI assistant in Tunisian dialect? If you have, you've probably noticed it struggles, switching between languages, misunderstanding context, or giving responses that feel completely off. This isn't just a small bug. It's a fundamental problem affecting millions of Tunisian speakers.

The Problem

Modern AI models power everything from virtual assistants to translation tools, but they have a blind spot: Tunisian Arabic (Darija). While these systems work well with English, French, or Modern Standard Arabic, they fail when it comes to our everyday language.

Why does this matter?

11 million Tunisians can't use AI tools in their native dialect

Customer service bots in Tunisia often frustrate users instead of helping them

Social media analysis misses the sentiment and meaning in Tunisian posts

Educational tools don't support how Tunisians actually communicate

Cultural expressions and local knowledge get lost in translation

What Makes Tunisian Arabic Special?

Tunisian Arabic isn't just "broken Arabic" or "informal language"—it's a rich dialect with its own rules:

Code-switching: We naturally mix Arabic, French, and Berber words in one sentence

Unique expressions: "Aalesh?" "Barsha" "Famma" don't translate directly

Flexible writing: We write in Latin script, Arabic script, or both

Cultural context: Sarcasm, humor, and local references that AI completely misses

When AI can't understand these patterns, it can't serve Tunisian users properly.

Our Research

I've been working on this problem by evaluating how well current LLMs handle Tunisian Arabic. The results? Not great. Even the most advanced models struggle with:

Basic transliteration between Latin and Arabic script

Understanding code-switched sentences

Detecting sentiment in Tunisian social media posts

Translating expressions that carry cultural meaning

The full research paper and findings are available on GitHub.

The TUNIZI Dataset

To tackle this problem, I created the TUNIZI dataset, a benchmark for evaluating AI models on Tunisian Arabic tasks including transliteration, translation, and sentiment analysis.

But here's the truth: one person's dataset isn't enough.

AI models need massive amounts of diverse data to learn properly. The dataset I built is just a starting point. To make AI truly understand Tunisian Arabic, we need contributions from across Tunisia—different cities, age groups, contexts, and ways of speaking.

How You Can Help

This is a call to action for the Tunisian tech community and anyone who cares about linguistic diversity in AI.

We need your help to grow the TUNIZI dataset:

What We Need

Tunisian conversations: Natural dialogue in Darija (text messages, social media posts, etc.)

Translations: Tunisian ↔ English or French pairs

Sentiment labels: Positive, negative, or neutral Tunisian text

Transliterations: Same sentences in both Latin and Arabic script

Regional diversity: Input from different Tunisian cities and regions

Why Contribute?

Representation: Make AI work for Tunisian speakers

Impact: Enable better AI tools for millions of users

Preservation: Document and preserve our dialect digitally

Innovation: Power new applications in Tunisian markets

Open source: All contributions benefit the research community

How to Contribute

1. Visit the GitHub repository 2. Check the contribution guidelines 3. Submit your data (anonymized and consent-given) 4. Join the discussion on improving the dataset

Even small contributions matter. A few sentences, some translations, or labeled posts—every bit helps build better AI for Tunisia.

The Bigger Picture

This isn't just about Tunisian Arabic. It's about linguistic justice in AI. If we don't actively work to include low-resource languages and dialects, AI will only serve a privileged few who speak dominant languages.

By building better datasets and models for Tunisian Arabic, we're:

Making technology more inclusive

Preserving our linguistic heritage

Creating economic opportunities in Tunisian markets

Showing that our dialect matters in the global AI conversation

Join the Movement

AI doesn't have to ignore Tunisian speakers. Together, we can change this.

Check out the research paper to understand the technical details: GitHub Repository

Contribute to the dataset and help AI understand how Tunisians really speak.

Let's build AI that speaks our language. 🇹🇳

---

Have questions or want to collaborate? Feel free to reach out or open an issue on GitHub.