Typed Japanese

39 comments

·March 29, 2025

enricozb

Somewhat related: the Lambek calculus[0], which is kind of a general type system for written languages. Words are either functions or concrete types, and functions can take their arguments on the left or right.

guerrilla

This reminds me of categorial grammars. Has anyone here ever looked into them? I loved them. They're like simply typed lambda calculi but instead of a single arrow type there are two, one for the left and right side of a symbol. So you can so like:

    and : Phrase \ Conjunction / Phrase
    do : Subject \ Verb / Object

Sorry if I have the directionality wrong but `and` would take a phrase on the left, another on the right abd return a conjunction. `do` would take an Subject and an Object and return a verb. In STLC it would look like this:

    and : Phrase -> Phrase -> Conjunction
    do : Subject -> Object -> Verb

These are bad examples just for illustrations, consult real linguists.

Last time I looked at it they were adding polymorphism and a couple of people were starting to think about dependent types. It was mostly linguists interested in it but it was hardcore math and CS. Can't remember the names involved. Carl H. I forget what the H stands for. Hmm. Some fanous type theorists had written about them too.

Anyway, seemed relevant.

edflsafoiewq

An interesting property is that when you have left and right application, which part of fx is the function and which is the argument becomes sort of ambiguous. You can think of f as something that combines with a B on its right to produce an A, (A/B)B = A. But you can also think of x as something that combines with an A/B on its left to produce an A, (A/B)((A/B)\A) = A.

This is famously used in interpreting the status of "Socrates is mortal" ("Socrates" is the argument) and "Every man is mortal" ("is mortal" is the argument).

canjobear

Lots of work on this kind of thing.

A good starting point is https://en.wikipedia.org/wiki/Combinatory_categorial_grammar

musicale

The thing I like about classical symbolic/analytic AI (such as grammars and semantic models) is that it was comprehensible (or "explainable.")

Unfortunately it didn't scale as well as opaque DNNs trained on huge amounts of data.

makeitdouble

For people unfamiliar with the meme chosen as an example (なんで春日影やったの？)

https://dic.pixiv.net/en/a/Why%20did%20we%20play%20Haruhikag...

IshKebab

Amazing. Is Japanese really that strict in its grammar?

niederman

No, this supports only a highly regular subset of Japanese grammar. There are plenty of irregular phrases it doesn't cover, as in most languages.

retrac

I'm not sure what you mean by strict. If by "regular and consistent" then, yes. There are a handful of irregular verbs. Everything else is completely regular with suffixes attached to the verb root.

Japanese is not unusual (on this particular point). That is probably the most common structure for a language. Swahili verbs are entirely regular and can have a long sequence of prefixes which follow a strict order. kula - to eat. usingalikula - (you) would not have eaten. halutayakula - (we) will not eat (those noun class 6 objects)

oceanhaiyang

I think what they mean is can you form sentences that are not textbook and still make sense. In that case as long as you understand the particles and grammar you can make highly backwards sentences still work.

GolDDranks

Every human language works this way...? This is why we have systems called "grammars" in the first place: they capture regularity in the structure of a language.

Human languages are, in a sense, "infinitary", so that they aren't simply a set of fixed phrases. However there are, depenging on the language, also patterns they don't allow. (And some patterns _any_ human languages don't allow.)

This logically necessities language to have regularities. We capture these regularities as "generative grammars".

thaumasiotes

> That is probably the most common structure for a language. Swahili verbs are entirely regular and can have a long sequence of prefixes which follow a strict order.

This seems a little confused. If a language's verbs are entirely regular, the normal point of view would be that that language doesn't inflect verbs at all.

For example, an English verb can include up to four prefixes, which occur in an order that never changes and which do not depend on the verb. In this sense, that part of the grammar is "fully regular". (Note that all of these prefixes are verbs, and with the exception of the first set, they inflect like verbs. The first set inflect only for past tense.)

[Those prefixes, in order, indicate: modality (there are many verbs that can occupy this space, but they're all mutually exclusive with each other), perfect aspect (exactly one verb can occur here), continuous aspect (again, exactly one verb can occur here), passive voice (two verbs, mutually exclusive). Any of the sixteen theoretical combinations of these is allowed.]

But that's all periphrasis. English verbal inflection is a different phenomenon: each verb has five forms (except be, which is special and has more): plain (take), past (took), third-person present singular (takes), active participle (taking), and passive participle (taken). This is where a verb might be irregular or not. A regular verb's past form adds the suffix /d/ to the plain form, the active participle adds /ɪŋ/, the third-person present singular form adds /z/, and the passive participle is identical with the past form.

All verbs are regular in their active participle and only be is irregular in the third-person present singular.† But verbs might have arbitrary past forms (go / went) and while the passive participles are never arbitrary, there is a set of irregular forms following a common pattern that descends from an earlier stage of the language. (taken / been / gone / known / gotten / ...).

There is a double standard over what counts as irregularity in verbs. By the standard we use for Latin, English has a total of one irregular verb. A Latin verb may have a perfective stem that is arbitrarily different from its imperfective stem (compare ferre with tulisse), but this "doesn't count" - we say that it's necessary to memorize four forms of any Latin verb, and with English verbs having only five forms and two of them being always regular, it's a real stretch to find verbs that require you to memorize more than four forms.

† edit: this was wrong; have is also irregular in the present third-person singular.

adrian_b

Also the modal verbs (must, can, may, shall etc.) are irregular in the present third-person singular, because they do not add "-s".

null

[deleted]

sparkie

Not strict, but there is underlying structure.

It's not really practical, but it is interesting to compare it structurally to programming languages.

A verb (or verbal adjective) basically forms the "root" of a clause, and typically appears at the end of it. If we related that to the concept of a function, it would be like writing them in postfix form, with whatever comes before it acting as the arguments to it.

    (args)func

    (WORDS)VERB

Other parts of the clause are suffixed with a spoken "particle", which denotes its purpose in the sentence. These kind of resemble optional/named parameters/keyword arguments, where we can specify them out of canonical order - but the particle behaves like the named parameter to specify which it is, and they too appear in postfix position.

    (arg1:subject, arg2:object)VERB

    ;eg
    (NOUN:ga, NOUN:wo)VERB
    (NOUN:wo, NOUN:ga)VERB

Both examples mean the same, but the former would be more typical, and the latter less common.

A verb clause which appears before a noun modifies the noun, which bears similarity to a subexpression.

    (NOUN:wa, ((NOUN:ni)VERB NOUN):wo)VERB

Some particles modify a sentence and come after the verb - for example, to make it a question it's followed by a particle like `ka`, `no`, `ne` or `na`. These might resembled keywords - again appearing in a posfix position.

    (NOUN:ga)VERB ka

Compared with a typical programming language, Japanese is structured the opposite way - from right to left.

thaumasiotes

> A verb (or verbal adjective) basically forms the "root" of a clause, and typically appears at the end of it.

> Compared with a typical programming language, Japanese is structured the opposite way - from right to left.

How consistent is that across different types of phrases?

English is generally described as being strongly right-headed. But that's really a statement about English noun phrases. Prepositional phrases are left-headed. Determiner phrases, if that's your thing, are left-headed. Verbs are normally positioned between their subject and object, with other modifiers generally after the object.

I had a Chinese tutor who complained that everything in Chinese was the other way around compared to English, but as far as I can tell the odds are about 50-50 that any given structure will match or reverse. Mandarin sentences are SVO just like English. Mandarin prepositions go after their nouns instead of before. Mandarin adjectives go before their nouns. Mandarin noun compounds work the same way as English noun compounds. (牙刷 "tooth brush" is a brush, not a tooth.) Mandarin verbs have indirect objects before the verb instead of after. Mandarin discourse particles occur at the end of the sentence instead of the beginning.† Mandarin relative clauses work just like English noun phrases, which is funny because English relative clauses have their own bespoke ordering. (English: the man who has five sisters; Mandarin: has five sisters who [the] man.)

The big lesson I drew from this is "it doesn't make a lot of sense to describe languages overall as having a particular orientation". Is Japanese more consistent, or is orientation more sensitive to the specific structures you're using?

† Note that "beginning" and "end" aren't the only options. In classical Latin and Greek, particles like this always appear as the second word in a sentence.

sparkie

Japanese is consistent about the verb (or adjective, adverb) coming last in the clause/sentence (besides certain sentence-ending particles). There are no "particles" necessary on verbs because they're always in the right position.

The subject/object/means/destination/etc are flexible and can appear in any order - but typically subject appears before object - `NOUN ga NOUN wo VERB`, `NOUN ga NOUN ni VERB` (where `ga` denotes the subject), whereas `NOUN wo NOUN ga VERB` would be atypical, but not incorrect.

When a verb (or adjective/adverb) appears before a noun, as in `VERB NOUN`, it modifies the noun - acting like an adjective - `VERB NOUN` is like "NOUN that/which VERBS". All adjectives work this way - ADJ NOUN, because verbs and adjectives are not really disjoint word classes in Japanese. Even the noun adjectives ("na-adjectives") function this way - they appear as `NOUN na NOUN`, but `na` is secretly a verb disguised as a particle. Examples:

    sora ga aoi             (The sky is blue)
    aoi sora                (Blue sky)    [i-adjective]
    sora ga kirei desu      (The sky is pretty)
    kirei na sora           (Pretty sky)  [na-adjective]
    sora ga haiiro ni naru  (The sky turns grey)
    haiiro ni naru sora     (Sky which turns gray) [verb clause as adjective]

Spoken Japanese is very context sensitive, so things can be omitted if they're already known to the listener - this can include the verb, and any of the particles.

Verbs can appear out of order - for example at the start of a sentence - but the meaning is understood based on tone or pause in the speech - basically, if what were being spoken were to be written it would be with a comma after the verb. `VERB, NOUN`

Example:

    watashi wa hara ga hetta (I'm hungry).
    hara ga hetta, watashi   (Hungry, I am).
    hara ga heru watashi     (Me who is hungry) [no pause or comma]

GolDDranks

Japanese is often described as "strictly head final". However, there are some exceptions such as head-initial phrases with modifiers like "dake", "bakari", "goro", "gurai" and "hodo".

Also, the morphology in verb inflection and case declination is often head-initial, if it makes sense to say that about morphology. Word formation seems to be kinda mixed.

But Japanese syntax is a lot more consistently head-final than with how many European languages have their parameter setting.

canjobear

English is mixed head-initial and head-final, for example objects go after verbs (head-initial) but adjectives go before nouns (head-final). Japanese is strictly head-final.

pjc50

It's more like a stack language like FORTH.

zerof1l

Textbook one that is usually taught in the first couple of years - yes quite strict and can be deduced to logical patterns. But every-day spoken informal Japanese... no.

And then there are a bunch of nuances that don't follow logic. You just need to learn about them. For example, word "you" is always considered informal, a bit intimate and a bit rude in spoken Japanese, it can be used in formal written Japanese.

cynicalkane

Well, one nice property of Japanese is the formal language is perfectly workable as a language. You'll sound a little stilted, but that's rather normal for foreigners.

If you spoke English while only sticking to formal grammar rules, you'd be unable to communicate like a normal person at all.

I think the comment on the word 'you' sounding informal is a misunderstanding. People use a common formal noun for 'you', あなた, frequently; it's just the nature of Japanese emphasizes social relationships and makes it easy to drop references to people, so using such formal references can be stilted in the wrong context. But that's a matter of word choice, not grammar.

charcircuit

>But every-day spoken informal Japanese... no.

Even informal Japanese comes down to basic patterns for grammar.

>For example, word ...

Word choice is not grammar. Knowing how to use words correctly is the hard part.

kazinator

No.

lewisjoe

Can this be used to build a grammar checker for japanese language?

koito17

Even if one could verify grammatical correctness, there are many ways to produce unnatural Japanese phrases.

To give an easy example: 9つ (9 things) is natural, but 10つ sounds extremely strange. However, 10個 sounds fine. When the number is large enough, it's also common to not use 助数詞 at all.

Sometimes, grammatical mistakes are natural Japanese. For instance, there is a concept of ら抜き言葉 (words with ら dropped), where people will say e.g. 寝れない ("I can't sleep") instead of 寝られない. This is an error in conjugation, yet it's natural language and applies to a few other words, too.

Validating both grammar and word choice is still insufficient to judge naturality of a Japanese phrase. A common "mistake" made by many Japanese is writing 「違和感を感じる」. The verb is redundant because of the 「感」 in 「違和感」. The "correct" word to use in this case is 覚える. In practice, however, either choice of word is understandable and considered correct (except to those with the trivia of 「違和感は覚えるもの！」)

Sometimes, redundancy makes phrases considered incorrect (see 二重敬語 for an example). In other cases, nobody will debate the correctness of the phrase.

GolDDranks

No, because what people mean when they say grammar checker, it doesn't suffice to check whether a sentence is (formally speaking) ungrammatical or not. You'd expect it also check word choice, ortography etc. Those aren't part of the syntax structure. This means that it would allow many very flawed sentences.

Besides, the grammar this project uses is not likely to reflect accurately the actual grammar of modern spoken or written Japanese, and it's likely not to be even nearly complete; that would mean it would also have a quite lot of false positive "ungrammaticals".

Something _like_ this can certainly be used as a part of a grammar checker. But in that case, you shouldn't implement it in TypesScript's type system in the first place.

sparkie

No. Japanese is very context sensitive, and like any natural language, has ambiguities. Japanese is loaded with dajare (puns).

Grammar checking basically needs AI - you need to train some model to understand common phrases and sentence structure. Before LLMs there was software like MeCab[1] which done this, and gave good results, but modern LLMs are much more capable.

[1]:https://taku910.github.io/mecab/

makeitdouble

Japanese grammar is so simple because it doesn't matter that much.

Most of the "rules" are common patterns made into guidelines, and they'll change depending on the speaker, context, society of the time (the "correct" way is fully dictated by the majority). And you could break the grammar rules as long as the other accepted guidelines are OK.

As a parallel you can learn to mechanically drive a car, but driving it "correctly" will require full knowledge of traffic code, societal rules and how to reasonably handle conflicting situations, including crashing it into a tree if it means avoiding a packed school bus.

charcircuit

No, it's missing basic things, accepts ungramattical sentences, and is fundamentally flawed by being based off nihongokyouiku grammar.

barlog

in short "なん春"

elia_is_me

still going?

HN

Typed Japanese

Typed Japanese