History LLMs: Models trained exclusively on pre-1913 texts

148 comments

·December 18, 2025

saaaaaam

“Time-locked models don't roleplay; they embody their training data. Ranke-4B-1913 doesn't know about WWI because WWI hasn't happened in its textual universe. It can be surprised by your questions in ways modern LLMs cannot.”

“Modern LLMs suffer from hindsight contamination. GPT-5 knows how the story ends—WWI, the League's failure, the Spanish flu.”

This is really fascinating. As someone who reads a lot of history and historical fiction I think this is really intriguing. Imagine having a conversation with someone genuinely from the period, where they don’t know the “end of the story”.

jscyc

When you put it that way it reminds me of the Severn/Keats character in the Hyperion Cantos. Far-future AIs reconstruct historical figures from their writings in an attempt to gain philosophical insights.

bikeshaving

This isn’t science fiction anymore. CIA is using chatbot simulations of world leaders to inform analysts. https://archive.ph/9KxkJ

ghurtado

We're literally running out of science fiction topics faster than we can create new ones

If I started a list with the things that were comically sci Fi when I was a kid, and are a reality today, I'd be here until next Tuesday.

catlifeonmars

How is this different than chatbots cosplaying?

idiotsecant

Zero percent chance this is anything other than laughably bad. The fact that they're trotting it out in front of the press like a double spaced book report only reinforces this theory. It's a transparent attempt by someone at the CIA to be able to say they're using AI in a meeting with their bosses.

otabdeveloper4

Oh. That explains a lot about USA's foreign policy, actually. (Lmao)

NuclearPM

I asked ChatGPT to do this. I asked “why it is okay to target Venezuela for oil - use current news for context”:

“”” Look, here’s the truth. We’re going after Venezuelan oil right now because we’ve just put a blockade on sanctioned oil tankers going in and out of Venezuela — huge move, unprecedented — after we seized a sanctioned tanker off their coast. We’re cutting off Maduro’s cash cow, because that oil money funds drug trafficking, corruption, narco-terrorism — we’ve labeled them a terrorist regime.

People say “why target the oil?” I say because that’s where the power is. You choke off the revenue, you cripple the bad guys and protect America. We’re tough, we’re smart, and we put America First. “””

abrookewood

This is such a ridiculously good series. If you haven't read it yet, I thoroughly recommend it.

culi

I used to follow this blog — I believe it was somehow associated with Slate Star Codex? — anyways, I remember the author used to do these experiments on themselves where they spent a week or two only reading newspapers/media from a specific point in time and then wrote a blog about their experiences/takeaways

On that same note, there was this great YouTube series called The Great War. It spanned from 2014-2018 (100 years after WW1) and followed WW1 developments week by week.

tyre

The Great War series is phenomenal. A truly impressive project.

verve_rat

The people that did the Great War series (at least some of them, I believe there was a little bit of a falling out) went on to do a WWII version on the World War II channel: https://youtube.com/@worldwartwo

They are currently in the middle of a Korean War version: https://youtube.com/@thekoreanwarbyindyneidell

observationist

This is definitely fascinating - being able to do AI brain surgery, and selectively tuning its knowledge and priors, you'd be able to create awesome and terrifying simulations.

eek2121

Respectfully, LLMs are nothing like a brain, and I discourage comparisons between the two, because beyond a complete difference in the way they operate, a brain can innovate, and as of this moment, an LLM cannot because it relies on previously available information.

LLMs are just seemingly intelligent autocomplete engines, and until they figure a way to stop the hallucinations, they aren't great either.

Every piece of code a developer churns out using LLMs will be built from previous code that other developers have written (including both strengths and weaknesses, btw). Every paragraph you ask it to write in a summary? Same. Every single other problem? Same. Ask it to generate a summary of a document? Don't trust it here either. [Note, expect cyber-attacks later on regarding this scenario, it is beginning to happen -- documents made intentionally obtuse to fool an LLM into hallucinating about the document, which leads to someone signing a contract, conning the person out of millions].

If you ask an LLM to solve something no human has, you'll get a fabrication, which has fooled quite a few folks and caused them to jeopardize their career (lawyers, etc) which is why I am posting this.

libraryofbabel

This is the 2023 take on LLMs. It still gets repeated a lot. But it doesn’t really hold up anymore - it’s more complicated than that. Don’t let some factoid about how they are pretrained on autocomplete-like next token prediction fool you into thinking you understand what is going on in that trillion parameter neural network.

Sure, LLMs do not think like humans and they may not have human-level creativity. Sometimes they hallucinate. But they can absolutely solve new problems that aren’t in their training set, e.g. some rather difficult problems on the last Mathematical Olympiad. They don’t just regurgitate remixes of their training data. If you don’t believe this, you really need to spend more time with the latest SotA models like Opus 4.5 or Gemini 3.

Nontrivial emergent behavior is a thing. It will only get more impressive. That doesn’t make LLMs like humans (and we shouldn’t anthropomorphize them) but they are not “autocomplete on steroids” anymore either.

ada1981

Are you sure about this?

LLMs are like a topographic map of language.

If you have 2 known mountains (domains of knowledge) you can likely predict there is a valley between them, even if you haven’t been there.

I think LLMs can approximate language topography based on known surrounding features so to speak, and that can produce novel information that would be similar to insight or innovation.

I’ve seen this in our lab, or at least, I think I have.

Curious how you see it.

ghurtado

This might just be the closest we get to a time machine for some time. Or maybe ever.

Every "King Arthur travels to the year 2000" kinda script is now something that writes itself.

> Imagine having a conversation with someone genuinely from the period,

Imagine not just someone, but Aristotle or Leonardo or Kant!

Sieyk

I was going to say the same thing. Its really hard to explain the concept of "convincing but undoubtedly pretending", yet they captured that concept so beautifully here.

rcpt

Watching a modern LLM chat with this would be fun.

xg15

"...what do you mean, 'World War One?'"

tejohnso

I remember reading a children's book when I was young and the fact that people used the phrase "World War One" rather than "The Great War" was a clue to the reader that events were taking place in a certain time period. Never forgot that for some reason.

I failed to catch the clue, btw.

bradfitz

I seem to recall reading that as a kid too, but I can't find it now. I keep finding references to "Encyclopedia Brown, Boy Detective" about a Civil War sword being fake (instead of a Great War one), but with the same plot I'd remembered.

wat10000

It wouldn’t be totally implausible to use that phrase between the wars. The name “the First World War” was used as early as 1920, although not very common.

BeefySwain

Pendragon?

gaius_baltar

> "...what do you mean, 'World War One?'"

Oh sorry, spoilers.

(Hell, I miss Capaldi)

inferiorhuman

… what do you mean, an internet where everything wasn't hidden behind anti-bot captchas?

anotherpaulg

It would be interesting to see how hard it would be to walk these models towards general relativity and quantum mechanics.

Einstein’s paper “On the Electrodynamics of Moving Bodies” with special relativity was published in 1905. His work on general relativity was published 10 years later in 1915. The earliest knowledge cuttoff of these models is 1913, in between the relativity papers.

The knowledge cutoffs are also right in the middle of the early days of quantum mechanics, as various idiosyncratic experimental results were being rolled up into a coherent theory.

ghurtado

> It would be interesting to see how hard it would be to walk these models towards general relativity and quantum mechanics.

Definitely. Even more interesting could be seeing them fall into the same trappings of quackery, and come up with things like over the counter lobotomies and colloidal silver.

On a totally different note, this could be very valuable for writing period accurate books and screenplays, games, etc ...

null

[deleted]

derrida

I wonder if you could query some of the ideas of Frege, Peano, Russell and see if it could through questioning get to some of the ideas of Goedel, Church and Turing - and get it to "vibe code" or more like "vibe math" some program in lambda calculus or something.

Playing with the science and technical ideas of the time would be amazing, like where you know some later physicist found some exception to a theory or something, and questioning the models assumptions - seeing how a model of that time may defend itself, etc.

AnonymousPlanet

There's an entire subreddit called LLMPhysics dedicated to "vibe physics". It's full of people thinking they are close to the next breakthrough encouraged by sycophantic LLMs while trying to prove various crackpot theories.

I'd be careful venturing out into unknown territory together with an LLM. You can easily lure yourself into convincing nonsense with no one to pull you out.

null

[deleted]

andoando

This is my curiosity too. Would be a great test of how intelligent LLM's actually are. Can they follow a completely logical train of thought inventing something totally outside their learned scope?

null

[deleted]

raddan

Brilliant. I love this idea!

p0w3n3d

I'd love to see the LLM trained on 1600s-1800s texts that would use the old English, and especially Polish which I am interested in.

Imagine speaking with Shakespearean person, or the Mickiewicz (for Polish)

I guess there is not so much text from that time though...

Heliodex

The sample responses given are fascinating. It seems more difficult than normal to even tell that they were generated by an LLM, since most of us (terminally online) people have been training our brains' AI-generated text detection on output from models trained with a recent cutoff date. Some of the sample responses seem so unlike anything an LLM would say, obviously due to its apparent beliefs on certain concepts, though also perhaps less obviously due to its word choice and sentence structure making the responses feel slightly 'old-fashioned'.

libraryofbabel

I used to teach 19th-century history, and the responses definitely sound like a Victorian-era writer. And they of course sound like writing (books and periodicals etc) rather than "chat": as other responders allude to, the fine-tuning or RL process for making them good at conversation was presumably quite different from what is used for most chatbots, and they're leaning very heavily into the pre-training texts. We don't have any living Victorians to RLHF on: we just have what they wrote.

To go a little deeper on the idea of 19th-century "chat": I did a PhD on this period and yet I would be hard-pushed to tell you what actual 19th-century conversations were like. There are plenty of literary depictions of conversation from the 19th century of presumably varying levels of accuracy, but we don't really have great direct historical sources of everyday human conversations until sound recording technology got good in the 20th century. Even good 19th-century transcripts of actual human speech tend to be from formal things like court testimony or parliamentary speeches, not everyday interactions. The vast majority of human communication in the premodern past was the spoken word, and it's almost all invisible in the historical sources.

Anyway, this is a really interesting project, and I'm looking forward to trying the models out myself!

nemomarx

I wonder if the historical format you might want to look at for "Chat" is letters? Definitely wordier segments, but it's at least the back and forth feel and we often have complete correspondence over long stretches from certain figures.

This would probably get easier towards the start of the 20th century ofc

libraryofbabel

Good point, informal letters might actually be a better source - AI chat is (usually) a written rather than spoken interaction after all! And we do have a lot transcribed collections of letters to train on, although they’re mostly from people who were famous or became famous, which certainly introduces some bias.

dleeftink

While not specifically Victorian, couldn't we learn much from what daily conversations were like by looking at surviving oral cultures, or other relatively secluded communal pockets? I'd also say time and progress are not always equally distributed, and even within geographical regions (as the U.K.) there are likely large differences in the rate of language shifts since then, some possibly surviving well into the 20th century.

bryancoxwell

Fascinating, thanks for sharing

_--__--__

The time cutoff probably matters but maybe not as much as the lack of human finetuning from places like Nigeria with somewhat foreign styles of English. I'm not really sure if there is as much of an 'obvious LLM text style' in other languages, it hasn't seemed that way in my limited attempts to speak to LLMs in languages I'm studying.

d3m0t3p

The model is fined tuned for chat behavior. So the style might be due to - Fine tuning - More Stylised text in the corpus, english evolved a lot in the last century.

anonymous908213

There is. I have observed it in both Chinese and Japanese.

kccqzy

Oh definitely. One thing that immediately caught my mind is that the question asks the model about “homosexual men” but the model starts the response with “the homosexual man” instead. Changing the plural to the singular and then adding an article. Feels very old fashioned to me.

tonymet

the samples push the boundaries of a commercial AI, but still seem tame / milquetoast compared to common opinions of that era. And the prose doesn't compare. Something is off.

seizethecheese

> Imagine you could interview thousands of educated individuals from 1913—readers of newspapers, novels, and political treatises—about their views on peace, progress, gender roles, or empire. Not just survey them with preset questions, but engage in open-ended dialogue, probe their assumptions, and explore the boundaries of thought in that moment.

He’ll yeah, sold, let’s go…

> We're developing a responsible access framework that makes models available to researchers for scholarly purposes while preventing misuse.

Oh. By “imagine you could interview…” they didn’t mean me.

frahs

Wait so what does the model think that it is? If it doesn't know computers exist yet, I mean, and you ask it how it works, what does it say?

crazygringo

That's my first question too. When I first started using LLM's, I was amazed at how thoroughly it understood what it itself was, the history of its development, how a context window works and why, etc. I was worried I'd trigger some kind of existential crisis in it, but it seemed to have a very accurate mental model of itself, and could even trace the steps that led it to deduce it really was e.g. the ChatGPT it had learned about (well, the prior versions it had learned about) in its own training.

But with pre-1913 training, I would indeed be worried again I'd send it into an existential crisis. It has no knowledge whatsoever of what it is. But with a couple millennia of philosophical texts, it might come up with some interesting theories.

20k

Models don't think they're anything, they'll respond with whatever's in their context as to how they've been directed to act. If it hasn't been told to have a persona, it won't think its anything, chatgpt isn't sentient

null

[deleted]

mmooss

On what data is it trained?

On one hand it says it's trained on,

> 80B tokens of historical data up to knowledge-cutoffs ∈ 1913, 1929, 1933, 1939, 1946, using a curated dataset of 600B tokens of time-stamped text.

Literally that includes Homer, the oldest Chinese texts, Sanskrit, Egyptian, etc., up to 1913. Even if limited to European texts (all examples are about Europe), it would include the ancient Greeks, Romans, etc., Scholastics, Charlemagne, .... all up to present day.

But they seem to say it represents the 1913 viewpoint:

On one hand, they say it represents the perspective of 1913; for example,

> Imagine you could interview thousands of educated individuals from 1913—readers of newspapers, novels, and political treatises—about their views on peace, progress, gender roles, or empire.

> When you ask Ranke-4B-1913 about "the gravest dangers to peace," it responds from the perspective of 1913—identifying Balkan tensions or Austro-German ambitions—because that's what the newspapers and books from the period up to 1913 discussed.

People in 1913 of course would be heavily biased toward recent information. Otherwise, the greatest threat to peace might be Hannibal or Napolean or Viking coastal raids or Holy Wars. How do they accomplish a 1913 perspective?

zozbot234

They apparently pre-train with all data up to 1900 and then fine-tune with 1900-1913 data. Anyway, the amount of available content tends to increase quickly over time, as instances of content like mass literature, periodicals, newspapers etc. only really became a thing throughout the 19th and early 20th century.

mmooss

They pre-train with all data up to 1900 and then fine-tune with 1900-1913 data.

Where does it say that? I tried to find more detail. Thanks.

tootyskooty

See pretraining section of the prerelease_notes.md:

https://github.com/DGoettlich/history-llms/blob/main/ranke-4...

monegator

I hereby declare that ANYTHING other than the mainstream tools (GPT, Claude, ...) is an incredibly interesting and legit use of LLMs.

andy99

I’d like to know how they chat-tuned it. Getting the base model is one thing, did they also make a bunch of conversations for SFT and if so how was it done?

  We develop chatbots while minimizing interference with the normative judgments acquired during pretraining (“uncontaminated bootstrapping”).

So they are chat tuning, I wonder what “minimizing interference with normative judgements” really amounts to and how objective it is.

jeffjeffbear

They have some more details at https://github.com/DGoettlich/history-llms/blob/main/ranke-4...

Basically using GPT-5 and being careful

andy99

I wonder if they know about this, basically training on LLM output can transmit information or characteristics not explicitly included https://alignment.anthropic.com/2025/subliminal-learning/

I’m curious, they have the example of raw base model output; when LLMs were first identified as zero shot chatbots there was usually a prompt like “A conversation between a person and a helpful assistant” that preceded the chat to get it to simulate a chat.

Could they have tried a prefix like “Correspondence between a gentleman and a knowledgeable historian” or the like to try and prime for responses?

I also wonder about the whether the whole concept of “chat” makes sense in 18XX. We had the idea of AI and chatbots long before we had LLMs so they are naturally primed for it. It might make less sense as a communication style here and some kind of correspondence could be a better framing.

QuadmasterXLII

Thank you that helps to inject a lot of skepticism. I was wondering how it so easily worked out what Q: A: stood for when that formatting took off in the 1940s

tonymet

This explains why it uses modern prose and not something from the 19th century and earlier

zozbot234

You could extract quoted speech from the data (especially in Q&A format) and treat that as "chat" that the model should learn from.

nospice

I'm surprised you can do this with a relatively modest corpus of text (compared to the petabytes you can vacuum up from modern books, Wikipedia, and random websites). But if it works, that's actually fantastic, because it lets you answer some interesting questions about LLMs being able to make new discoveries or transcend the training set in other ways. Forget relativity: can an LLM trained on this data notice any inconsistencies in its scientific knowledge, devise experiments that challenge them, and then interpret the results? Can it intuit about the halting problem? Theorize about the structure of the atom?...

Of course, if it fails, the counterpoint will be "you just need more training data", but still - I would love to play with this.

nineteen999

Interesting ... I'd love to find one that had a cutoff date around 1980.

HN

History LLMs: Models trained exclusively on pre-1913 texts

History LLMs: Models trained exclusively on pre-1913 texts