Can you read this cursive handwriting? The National Archives wants your help
96 comments
·January 18, 2025demosthanos
tptacek
OK, fair enough, but can you find one in this article that's hard for an LLM? The gnarliest one I saw, 4o handled instantly, and I went back and looked carefully at the image and the text and I'm sold.
Like if this is a crowdsourcing project, why not do a first pass with an LLM and present users with both the image and the best-effort LLM pass?
Later
I signed up, went to the current missions, and they all seem to post post-1900 and all typeset. They're blurry, but 4o cuts through them like a hot knife through butter.
defaultcompany
My parents have saved letters from their parents which are written in cursive but in two perpendicular layers. Meaning the writing goes horizontally in rows and then when they got to the end of the page it was turned 90 degrees and continued right on top of what was already there for the whole page. This was apparently to save paper and postage. It looks like an unintelligible jumble but my mother can actually decipher it. Maybe that’s what the LLMs are having trouble with?
Edit: apparently it’s called cross writing [1]
tptacek
Are they having trouble? You can sign up right now and get tasks from the archive that seem trivial for 4o (by which I mean: feed a screenshot to 4o, get a transcription, and spot check it).
varenc
My guess is because it’s the Smithsonian, they’re just not willing to trust an LLM’s transcription enough to put their name on it. I imagine they’re rather conservative. And maybe some AI-skeptic protectionist sentiments from the professional archivists. Seems like it could change with time though.
ugh123
> My guess is because it’s the Smithsonian, they’re just not willing to trust an LLM’s transcription enough to put their name on it. I imagine they’re rather conservative
I expect thats a common theme from companies like that, yet I don't think they understand the issue they think they have there.
Why not have the LLMs do as much work as possible and have humans review and put their own name on it? Do you think they need to just trust and publish the output of the LLM wholeheartedly?
I think too many people saw what a few idiot lawyers did last year and closed the book on LLM usage.
tedunangst
Something about extraordinary claims and extraordinary evidence? The evidence presented, a seemingly easily transcribed image, is hardly persuasive.
null
geuis
It's a really interesting project. But boy do they make it hard to participate.
* Article doesn't provide a direct link to the topic mission
* Signup is pretty easy. Well organized and even gently requires you to have two forms of 2FA.
* Sign up complete. Go back to the primary page and try to find the mission. A little buried but not too deep.
* Notice I'm not signed in. Ok, let's do that. Now I'm back on the main page and navigate back. Find the first document and open it. Really interesting to scan through the doc and to read. People back then generally had really nice handwriting.
* Ok, what next, how do I transcribe? ... ? Oh it says I'm not logged in again. Fine, click the link and...
* I'm logged in and directed back to the main page, again.
Look, this is an interesting project and I'd love to spend my spare cycles to help out. But they really need to clean up this process.
Volunteers shouldn't have to jump through kinda poorly designed interfaces to help out.
Unearned5161
Ok I did one letter, from a woman in 1814 writing to James Monroe (then Secretary of State) asking for a passport to go to Scotland to get her late brother's property. What a trip! So enjoyable to get into the flow once you've "synchronized" with the persons handwriting. Furthermore, due to the fact that you're reading and re-writing word for word of whatever you're transcribing, the stories you end up reading have tremendous memory-stick. This is not surprising, considering that you are dedicating an inordinate amount of time per page, but it's a welcome side effect when you try and recollect.
jhanschoo
> Furthermore, due to the fact that you're reading and re-writing word for word of whatever you're transcribing, the stories you end up reading have tremendous memory-stick. This is not surprising, considering that you are dedicating an inordinate amount of time per page, but it's a welcome side effect when you try and recollect.
This was something I enjoyed when I decided to learn a language by translating short stories. (Edit: Of course, you have to choose an author whose diction you respect. Your unfamiliarity with the target language encourages you to mull over the author's use of diction and the nuances the author is trying to convey, and then find appropriate diction in English. This means you spend a long time immersed in the imagery.)
Unearned5161
What a brilliant idea. I've had learning to read French on my list for a while now, I'm going to try transcription as another way at it.
Unearned5161
cheers! I was looking for something semi productive to sink a Friday night into
on a more serious note, working through a transcription project for letters and journals that nobody has touched since they've been archived is such a wonderful feeling. Aside from being in front of the physical document itself, your degree of separation from the writer and point is time is vanishingly small!
I always like to observe when they cross something out or make a mistake and think about what could have caused that. Did a friend pass by the door and scare them? Did they get distracted looking out the window? It's all so close and yet so far away :)
Decabytes
I’m interested to give this a go because I want to practice reading cursive. I do a lot of longhand writing including writing all my notes in cursive. It’s exciting to watch my binding fill up with all sorts of different subjects!
I like to write in cursive for a few reasons
1. I find it makes my hand cramp less 2. It offers some shallow privacy in public 3. I don’t want to lose the skill 4. It’s fun!
gabeio
All of the same reasons I love practicing a little calligraphy! I love how it looks as well. I don’t use a special pen but just add my own style to my cursive to make it look even nicer. But I used to write my notes in school with calligraphy (mostly because it gave me an excuse to not care about the subject) but it made the teachers hate me because I would never finish copying their scribbles fast enough.
saagarjha
Seems like something that some of those big AI companies that are desperately starved of training material could chip in on, no? Actually do something for the public good, spend a few cents of that VC money, get some high-quality training data out of it?
iambateman
This is all very cool so I’m not trying to be dismissive. In a lot of ways, giving a hobby out as a way to participate in the national archives is an end in itself.
But…computers can definitely do this way better, right?
jonahx
I had the same thought but maybe on old hand writing they can't?
EDIT:
I tried giving the sample to 4o and it gave:
The following is the declaration of James Lambert, a soldier of the Revolutionary War in North America.
The said James Lambert this day personally appeared in the Probate Court of the County of Dearborn in the State of Indiana and at the November Term of said Court (1841), it being a court of record created by the laws of Indiana and made oath that:
On the 25th day of March 1842, he will be eighty-five years old, that he was born in the State of Maryland, that he is now a resident of said county and has been for the 27 years last past; that he has lived in Virginia, Maryland, and Pennsylvania...
null
null
AdieuToLogic
> This is all very cool so I’m not trying to be dismissive. In a lot of ways, giving a hobby out as a way to participate in the national archives is an end in itself.
> But…computers can definitely do this way better, right?
No.
Cursive writing is analog and fluid, lacking consistency across authors and often inconsistent by an individual author as well. When done well, it could be classified as its own art form. When done poorly, it can resemble the path walked by a chicken on meth.
musicale
iPad seems to do OK, but it has more to go by since it has the timing and pressure as well as the written text.
sulam
Current LLMs can absolutely do this as well as you can, probably better.
AdieuToLogic
> Current LLMs can absolutely do this as well as you can, probably better.
This is obviously disprovable, in that if they could, they would, and this call to action would not exist.
null
aaron695
[dead]
jez
The handwriting in some of these snippets, while sometimes difficult to read for one reason or another, is nonetheless beautiful: did everyone who wrote have such great handwriting back then?
I'm looking at the piece in the Instagram post linked by the page, which begins, "honor of holding in their service". The lines are so straight, the letters are so uniform!
hello_newman
As someone with terrible handwriting but decent cursive, i think cursive provides a better structure for achieving cleaner penmanship compared to non-cursive writing. My theory is that cursive’s consistency of soft, flowing loops rather than a mix of abrupt angles and disconnected lines helps create a more uniform result.
I also remember teachers telling you when writing cursive to seldom lift your hand from the page. I think that act of keeping your pen on the page for most of the writing process encourages a smoother and more natural flow, reducing the chance of jerky, uneven strokes
999900000999
Widespread literacy is an extremely recent phenomenon.
I highly doubt most people could write that well
cyberax
Handwriting is a skill, you get better with practice!
A lot of bad handwriting stems from using it to write down things quickly (see: https://imgur.com/doctors-strike-5ANma ).
If you instead focus on doing slow calligraphy, your handwriting can improve rapidly.
tptacek
Isn't this like a bread-and-butter AI task?
“The following is the declaration of James Lambert, a soldier of the Revolutionary War in North America.” “The said James Lambert, on this day personally appeared in the Probate Court of the County of Dearborn in the State of Indiana, at the November Term of said Court [1841], it being a court of record created by the laws of Indiana, and made oath that on the 25th day of March 1842 he will be eighty‐five years old; that he was born in the State of Maryland; that he is now a resident of [said] county and has been for the [27] years last past; that he has lived in Virginia, Maryland, [and Pennsylvania]; that…”
These kinds of problems, matching up cursive to actual text, would seem to play to the absolute best strengths of an LLM, given how much basic language structure the models encode.
saagarjha
> The agency uses artificial intelligence and a technology known as optical character recognition to extract text from historical documents. But these methods don’t always work, and they aren’t always accurate.
edelbitter
I've seen people do that, and the results are.. just sad. These modern models insert their twitter-era "what grabs attention must be true" view into the very little authentic past we still possess.
tptacek
What did 4o get wrong about the title image in the transcription I just gave you?
tkgally
This reminded me of something the historian Megan Marshall wrote in the introduction to her book The Peabody Sisters: Three Women Who Ignited American Romanticism (2005):
“I became expert in deciphering the sisters’ handwriting, and that of their ancestors, parents, and friends. Each era and each correspondent presented different challenges. Some hands were sprawling, some spindly, some cramped; t’s went uncrossed at the ends of words, and f’s and s’s were interchanged; spelling, capitalization, and punctuation could be erratic or idiosyncratic. Often, to save paper and postage, the sisters turned a single sheet ninety degrees and wrote back across a page already covered with handwriting. I learned to be especially attentive to these cross-written lines, in which the sisters invariably confided their deepest feelings in the last hurried moments of closing a letter. Here I would find the urgent personal message that had been put off for the sake of dispensing news or settling business. In one such postscript, I discovered Elizabeth’s account of a conversation with Horace Mann in which the two spoke frankly of their love for each other and finally settled on what it meant.”
A photograph of a letter with cross-writing is here:
https://www.masshist.org/database/1774
Marshall wrote more in an article for Slate:
https://slate.com/news-and-politics/2005/05/reading-the-peab...
zabzonk
After using a keyboard for circa 50 years, I can't read my own handwriting. I can't even give a reproduceable signature.
munchler
Me too, and I used to be proud of my handwriting back in the 90's. Definitely a loss in self-expression.
brenainn
The Australian War Memorial has a volunteer program for transcribing old letters and diaries and such: https://transcribe.awm.gov.au/
I gave it a go but it was too hard for me! I write in cursive but I found most of it illegible.
Before commenting asking about why they don't just use LLMs, please note that the article specifically calls out that they do, but it's not always a viable solution:
> The agency uses artificial intelligence and a technology known as optical character recognition to extract text from historical documents. But these methods don’t always work, and they aren’t always accurate.
The document at the top is likely an especially easy document to read precisely because it's meant to be the hook to get people to sign up and get started. It isn't going to be representative of the full breadth of documents that the National Archives want people to go through.