Skip to content(if available)orjump to list(if available)

Therac-25 Simulator

Therac-25 Simulator

32 comments

·January 22, 2025

userbinator

I remember reading --- and vehemently disagreeing --- with the report on the incidents, which danced around the matter but didn't point directly at the underlying cause: excessive complexity in the software, which easily created bugs and hid them. For example, they used multiple threads and a whole OS, when a simple loop would've been sufficient, perhaps in a misguided attempt at trying to keep the UI "responsive"; there would not be any race conditions if everything ran in the same thread.

As Tony Hoare says: "There are two ways to develop software: Make it so simple that there are obviously no bugs, or so complex that there are no obvious bugs."

mbStavola

Recently I had to get a panoramic dental x-ray and I was making small talk with the person who was running the machine.

I joked that I'm always cautious about machines like this, even knowing the dosage of radiation is low, simply because of the history of software safety controls and the story of Therac-25. She hadn't heard of it before and I gave her the gist of it, that an issue with the programming made it so it was possible to accidentally dose a patient considerably more than the intended amount (in a few different ways). It was interesting to her but I then had to pause so she could run the machine. I shut up and she did her thing.

Then, after a few minutes of scanning, she sucked her teeth a bit and apologized, saying she needed to run it once more. No worries, let's get it done! She starts it again and as I'm getting scanned she explains that "for whatever reason I was getting an error so I just had to restart it, this happens sometimes and I'm not really sure why." I give a little half-nervous chuckle and then the scan completes. Once I pull my head out of the machine, I finally get to finish my lovely Therac-25 story wherein I explain that one of the issues was... a combination of non-descriptive error codes, insufficient failsafes, and operator error resulting in patient casualties as the procedure was restarted one or more times.

We shared a little laugh and discussed other things, cost of living primarily. I'm still alive so I'm at least 63% sure I didn't get megadosed or anything but its been a funny conversation to revisit now and then.

emchammer

I would not consider it operator error. Hitting backspace and re-typing the mode? That should be as obvious a change for that kind of thing as shifting between 1st and 4th gear.

sho_hn

Noticed any new abilities lately?

LelouBil

I guess fast dividing cells is a kind of ability ?

mrandish

Because the linked page doesn't include a description of what a Therac-25 is:

> "The Therac-25 is a computer-controlled radiation therapy machine produced by Atomic Energy of Canada Limited in 1982. The Therac-25 was involved in at least six accidents between 1985 and 1987, in which some patients were given massive overdoses of radiation. Because of concurrent programming errors (also known as race conditions), it sometimes gave its patients radiation doses that were hundreds of times greater than normal, resulting in death or serious injury. These accidents highlighted the dangers of software control of safety-critical systems."

https://en.wikipedia.org/wiki/Therac-25

sky2224

There's another massive thing to highlight with this. Atomic Energy had a death and "fixed" the issues. Then they had more deaths.

It really highlights the fact that proper and adequate testing is absolutely and unquestionably required for a system like this, and that if you don't have testing and issues do occur, then you're essentially just going to keep creating issues for yourself while fixing the issues you created in the first place.

Therac-25 is genuinely horrifying.

whycome

You're missing one critical aspect of this nightmare. After an incident:

"The AECL responded in two pages detailing the reasons why radiation overdose was impossible on the Therac-25, stating both machine failure and operator error were not possible."

sho_hn

Leveson's fantastic Therac-25 paper is probably the most important document in the formative years of my young sweng career.

I still re-read it every couple of years, and it's held up a lot better than one of my other early favorites, which was Feynman's appendix to the Challenger report. In the sense that I still draw new thoughts and realizations from it as I re-read it with additional experience in some of the engineering and organizational disciplines it touches. Sad as it is, it's got a little bit of everything.

It's definitely got my vote for the I Ching of critical systems engineering.

Spend the time. Chances are you'll remember it.

buescher

As the kids say, this is correct. It's very good. If you have never read anything like it before, it could be mind-blowing.

Her later work like Engineering a Safer World (available as a free pdf if you poke around on the MIT Press website) is merely good.

_peeley

Do you mind specifying the title of the paper? It appears there's quite a few papers[1][2][3] published concerning Therac-25 by an author named Leveson.

[1] http://sunnyday.mit.edu/papers/therac.pdf

[2] https://ieeexplore.ieee.org/document/274940

[3] https://ieeexplore.ieee.org/document/8102762

sho_hn

Gladly! It's the second of these, "An investigation of the Therac-25 accidents" (1993) w/ DOI 10.1109/MC.1993.274940.

#1 is a later version that was an appendix to her book Safeware (which I have not read), and [3] is a nice second read that follows up on [2] many years later but isn't quite the relentless engineering detective story that makes the original so poignant.

phkahler

Safeware is good. I read it back in the day. Several good failure analysis.

null

[deleted]

_peeley

Thank you!

favorited

Agreed on the Levenson paper!

Is there something specific about the Feynman appendix that you think hasn't aged as well, or is it more that you've squeezed all of the juice out of that fruit already?

sho_hn

More the latter! It's still a very charismatic text that I'm fond of, and it's of course also intensely quotable. But given its brevity it can only deliver so much.

dkulchenko

Every time I read the story of Therac-25 I feel incredibly frustrated AECL never faced real consequences or (criminal) liability for it.

Maybe I'm retroactively imposing modern day safety culture, but reading the timeline and history, it feels like AECL was completely negligent in waving off the issue as more and more fatalities kept piling up.

Can't believe the devices weren't pulled offline to definitively solve the issue after the first death. Instead, they basically went "can't repro, oh well".

favorited

They should have faced consequences for their response, as much as for their error-prone device. Multiple patients had complained of extreme burns during their treatment, and autopsies later confirmed the cause of death to have been radiation exposure, yet AECL was still saying thing like, "damage could not have been produced by any malfunction of the Therac or by any operator error."

Sure, they were laying under our radiation cannon and then died of extreme radiation exposure, but they probably got it somewhere else.

doormatt

http://web.mit.edu/6.033/2007/wwwdocs/assignments/handson-li...

>I fully recognize that there are dangers and risks to which I may be exposed by participating in Reproduction of Therac-25 accidents . The following is a description and/or examples of significant dangers and risks associated with this activity Acute gullibility, Failure to understand April Fool's jokes, Night terrors associated with medical radiation machines .

thangalin

From my blog[1]:

"In 2017, Leveson revisited those lessons[2] and concluded that modern software systems still suffer from the same issues. In addition, she noted:

* Error prevention and detection must be included from the outset.

* Software designs are often unnecessarily complex.

* Software engineers and human factors engineers must communicate more.

* Blame still falls on operators rather than interface designs.

* Overconfidence in reusing software remains rampant."

[1]: https://dave.autonoma.ca/blog/2019/06/06/web-of-knowledge/

[2]: https://ieeexplore.ieee.org/document/8102762

JohnBooty

    Question 4: How many rads did you receive in doing this project?
Best question ever.

haddonist

Well There's Your Problem Podcast (with slides)

Episode 121: Therac-25 https://www.youtube.com/watch?v=7EQT1gVsE6I

picjamai

Wow! that is the best!

NegativeLatency

Struggling to trigger either of the malfunction cases from therac.c, any tips?

probably_wrong

I got both malfunctions by running the machine successfully once, changing the Beam Type and then running again.

null

[deleted]

johnwheeler

I fully recognize that there are dangers and risks to which I may be exposed by participating in Reproduction of Therac-25 accidents . The following is a description and/or examples of significant dangers and risks associated with this activity Acute gullibility, Failure to understand April Fool's jokes, Night terrors associated with medical radiation machines .