Why We Need Arabic Language Models
12 comments
·October 19, 2025nzeid
Fade_Dance
>But... should they even?
I don't see why not?
Also, while I don't have access to this perspective myself, I'd imagine this is an unending annoyance in many areas of the world, since they are consuming often quite America-centric offerings where localization is an after-thought and contracted out.
null
binarymax
Do we need language specific LLMs? I can’t vouch for the data coverage or accuracy of Arabic in the leading models today, but I do know them to be highly cross-lingual capable.
readthenotes1
Makes me wonder who did the translation
"This is a translation of the Arabic article published on 3rd August 2025"
Full irony would be from an LLM
tokai
Such declarations have become pretty useless without any indicator of the translation method.
sarabande
Does anyone know if they published the dataset?
nakamoto_damacy
I wonder of you pre train on Hebrew and Arabic if it will find the similarities between the RTL writing direction. So many similar words. I guess both came from Aramaic? If so, how about the trifecta of ancient languages with Aramaic then Hebrew the Arabic.
ch4s3
They don’t come from Aramaic, Arabic is a Southwestern Semitic language and Aramaic and Hebrew are Northwestern Semitic languages. Aramaic and Hebrew tree are sort of cousins with Hebrew splitting off from southern Canaanite which was sort of a siblings language with an older form of Aramaic.
nakamoto_damacy
They = Hebrew, Aramaic and Arabic
---
## 1. “They all kept the triconsonantal root system — where word meaning is based on three core consonants (like K-T-B = “write” → Hebrew katav, Arabic kataba, Aramaic ktav).”
*Source evidence:*
* The article “Triliteral Roots / Consonantal Roots” states that many Semitic languages (including Arabic, Hebrew) have roots typically made of three consonants (triliteral) and that words are formed by inserting vowels, etc. ([Transparent Blogs][1]) * A source says: “Both Hebrew and Arabic rely on a triliteral root system, meaning words are formed from three core consonants. Example of the root K-T-B…” ([Biblical Hebrew][2]) * Another general description: “The roots of verbs and most nouns in the Semitic languages are characterized as a sequence of consonants ... such abstract consonantal roots are used…” ([Wikipedia][3]) So this claim is well supported.
*Arabic translation of the claim:*
> احتفظت جميعها بنظام الجذر الثلاثي الحروف — حيث يعتمد معنى الكلمة على ثلاثة حروف صامتة أساسية (مثل ك-ت-ب = “كتب/يكتب” → العبرية כתב (katav)، العربية كتب (kataba)، الآرامية כתَب (ktav)).
*Hebrew translation of the claim:*
> כולן שמרו על שיטת השורש התלת-עברי — שבה משמעות המילה מבוססת על שלושה עיצורים ל־(למשל כ־ת־ב = “כתב” → עברית כתב ( katav ), ערבית كتب ( kataba ), ארמית כתב ( ktav )).
*Citations (for this claim):*
* Semitic linguistics: “The roots of verbs and most nouns in the Semitic languages are characterized as a sequence of consonants …” ([Wikipedia][3])
* “Both Hebrew and Arabic rely on a triliteral root system, meaning words are formed from three core consonants.” ([Biblical Hebrew][2])
* Description of the K-T-B root being used in both Arabic and Hebrew. ([Wikipedia][4])
---
## 2. “They share similar grammar and sound systems, just evolved differently.”
*Source evidence:*
* A blog post on Duolingo says: “Because Arabic and Hebrew are part of the same large language family, their grammars often ‘work’ in similar ways.” ([Duolingo Blog][5]) * A site “Arabic and Hebrew Compared” states: “Arabic and Hebrew morphology … is based on the consonant root system. …” ([Google Sites][6]) * The Wikipedia article on Semitic languages states that the Semitic languages share many grammatical features (word order, non-concatenative morphology, etc.) ([Wikipedia][7]) So yes, there is support for similar grammar and sound (phonological) systems.
*Arabic translation of the claim:*
> إنهما تشتركان في نحو وصوتيات متشابهة، רק تطورتا بشكل مختلف.
*Hebrew translation of the claim:*
> הן חולקות דקדוק ומערכות צלילים דומות, רק שהתפתחו באופן שונה.
*Citations (for this claim):*
* “Because Arabic and Hebrew … their grammars often ‘work’ in similar ways.” ([Duolingo Blog][5]) * “Arabic and Hebrew morphology … is based on the consonant root system.” ([Google Sites][6]) * “Semitic languages share a number of grammatical features …” ([Wikipedia][7])
---
## 3. “Many religious and cultural interactions over millennia reinforced overlap (borrowed or re-borrowed vocabulary).”
*Source evidence:*
* The article “Similarities Between Hebrew and Arabic” mentions: “Many Hebrew and Arabic words are cognates, retaining similar meanings and sounds.” ([Biblical Hebrew][2]) * A blog “Halal, Hillul, and the Shared Meanings of Hebrew and Arabic” discusses relationships between similar sounding words (cognates) due to shared roots. ([Hebrew College][8]) * Comparative grammar sources mention that because Hebrew, Arabic and Aramaic are closely related, there has been lexical borrowing and shared vocabulary. ([semiticroots.net][9]) So your statement about religious/cultural interaction reinforcing overlap (vocabulary) is broadly supported.
*Arabic translation of the claim:*
> العديد من التفاعلات الدينية والثقافية عبر الألفيات عزَّزت التداخل (استعارت أو أعادت استعارة مفردات).
*Hebrew translation of the claim:*
> אינספור אינטראקציות דתיות ותרבותיות לאורך אלפי השנים חיזקו את ההשתלבות (השאלה או השאלה מחדש של אוצר מילים).
*Citations (for this claim):*
* “Many Hebrew and Arabic words are cognates …” ([Biblical Hebrew][2]) * “The relationships between similar-sounding words … in the case of the Semitic languages, similar roots.” ([Hebrew College][8]) * “Hebrew, Arabic, and Aramaic … than between Hebrew and any other language …” ([semiticroots.net][9])
---
[1]: https://blogs.transparent.com/hebrew/hebrew-grammar-consonan... "Hebrew Grammar: Consonantal Roots - Transparent Language Blog" [2]: https://biblicalhebrew.org/similarities-between-hebrew-and-a... "Similarities Between Hebrew and Arabic" [3]: https://en.wikipedia.org/wiki/Semitic_root?utm_source=chatgp... "Semitic root - Wikipedia" [4]: https://en.wikipedia.org/wiki/K-T-B?utm_source=chatgpt.com "K-T-B" [5]: https://blog.duolingo.com/are-arabic-hebrew-persian-related/... "Dear Duolingo: Are Arabic, Hebrew, and Persian related?" [6]: https://sites.google.com/site/mopclanguages/arabic-and-hebre... "MOPC Languages - Arabic and Hebrew Compared" [7]: https://en.wikipedia.org/wiki/Semitic_languages?utm_source=c... "Semitic languages" [8]: https://hebrewcollege.edu/blog/halal-hillul-and-the-shared-m... "Halal, Hillul, and the Shared Meanings of Hebrew and Arabic" [9]: https://www.semiticroots.net/downloads/Comparative%20Grammar... "Comparative Grammar of the Semitic Languages"
null
null
> Clear examples emerge when global language models address culturally sensitive issues, such as social relationships or political debates. They often adopt ambiguous positions that overlook the Arab cultural context, creating a gap between these digital tools and the values and lived experiences of Arab users.
Well I have bad news, my friend. English language models are also terrible at this.
This whole article seems to stem from the premise that it's important for LLMs to engage cultural issues competently. But... should they even?