The Decipherment of the Dhofari Script

ahazred8ta

it's a form of Thamudic / Ancient North Arabian script https://en.wikipedia.org/wiki/Ancient_North_Arabian

analog31

"Pre-Islamic" is an odd description of a script that predates Islam by a millennium. Did they mean "pre-Arabic?"

arp242

Pre-Islamic Arabia is, as far as I know, a fairly widely accepted term. Not that different from pre-Roman Britain, pre-Columbian Americas, pre-colonial Africa, pre-imperial China, or even Pagan Europe. In all these cases a significant change took place which drastically changed the course of the region (usually some sort of unification as a nation or religion, not always peaceful or voluntary of course).

idoubtit

"Preislamic" is a common term for near-East history. Islam is well dated, it introduced many changes and unified the region, so it's a powerful marker.

I've never encountered the word "Pre-Arabic" about the Arabic peninsula. It would be hard to define precisely. The word "arab" is probably more than 3000 years old. The Arabic languages may be older ; they're semitic languages like the Akkadian of Mesopotamia. And when did an "Arab" people or culture emerge from the semitic people and culture? I guess between 6000 BP and 3000 BP, but it was probably a long process, and nomad tribes didn't leave many vestiges.

gryn

is it "pre-arabic" though ? it's believed that old arabic existed back then.

dep_b

[flagged]

comrade1234

Completely unreadable on iOS mobile...

CharlesW

Works fine here. https://imgur.com/a/px7cZAL

ilinx

Interesting. I didn’t have any issues. Could you elaborate a bit more?

commienews

[flagged]

tinco

I wonder if you could decypher these scripts by bruteforcing decoding layers until an LLM could predict the next token. That would assume the text has a sort of logic to it that would still work in modern language, but the decyphering would be fully automatic so we could throw a bunch of compute at it.

zaik

Ok, your LLM can perfectly predict the next token. How do you extract the "logic" out of the weights?

yorwba

It's possible to identify a surprisingly large number of matching words by learning a linear transformation mapping word vectors from two different languages into the same space (e.g. https://arxiv.org/abs/1805.06297 ).

But the problem with ancient languages is typically that there's not enough data to usefully constrain a large enough model. Doubly so for undeciphered scripts where scholars might not even agree on how many different letters there are.

null

[deleted]

yyyk

Presumably, they'd want to get at embeddings, and compare the dimensional space somehow to say: 'the relation between tokens a,b,c is close to the relation of tokens a1,b1,c1 in a similar model of texts of known language of apparently same family (same up to aN,bN,cN), and out of these N sequences, sequence X makes most sense given existing examples'.

(As you can tell, the argument involves some handwaving, but it may possible?)

talos

I don't think OP's idea would work, but if it did you could just ask for a translation.

HN

The Decipherment of the Dhofari Script

The Decipherment of the Dhofari Script