Some Epstein file redactions are being undone with hacks
99 comments
·December 23, 2025cmarschner
Befuddling that this happened again. It’s not the first time
- Paul Manafort court filing (U.S., 2019) Manafort’s lawyers filed a PDF where the “redacted” parts were basically black highlighting/boxes over live text. Reporters could recover the hidden text (e.g., via copy/paste).
- TSA “Standard Operating Procedures” manual (U.S., 2009) A publicly posted TSA screening document used black rectangles that did not remove the underlying text; the concealed content could be extracted. This led to extensive discussion and an Inspector General review.
- UK Ministry of Defence submarine security document (UK, 2011) A MoD report had “redacted” sections that could be revealed by copying/pasting the “blacked out” text—because the text was still present, just visually obscured.
- Apple v. Samsung ruling (U.S., 2011) A federal judge’s opinion attempted to redact passages, but the content was still recoverable due to the way the PDF was formatted; copying text out revealed the “redacted” parts.
- Associated Press + Facebook valuation estimate in court transcript (U.S., 2009) The AP reported it could read “redacted” portions of a court transcript by cut-and-paste (classic overlay-style failure). Secondary coverage notes the mechanism explicitly.
A broader “history of failures” compilation (multiple orgs / years) The PDF Association collected multiple incidents (including several above) and describes the common failure mode: black shapes drawn over text without deleting/sanitizing the underlying content. https://pdfa.org/wp-content/uploads/2020/06/High-Security-PD...
throwup238
> - Associated Press + Facebook valuation estimate in court transcript (U.S., 2009) The AP reported it could read “redacted” portions of a court transcript by cut-and-paste (classic overlay-style failure). Secondary coverage notes the mechanism explicitly.
What happens in a court case when this occurs? Does the receiving party get to review and use the redacted information (assuming it’s not gagged by other means) or do they have to immediately report the error and clean room it?
Edit: after reading up on this it looks like attorneys have strict ethical standards to not use the information (for what little that may be worth), but the Associated Press was a third party who unredacted public court documents in a separate Facebook case.
irishcoffee
My guess would be that if the benefitting legal party didn't need to declare they also benefitted from this (because they legally can't be caught, etc.) they wouldn't.
I know and am friends with a lot of lawyers. They're pretty ruthless when it comes to this kind of thing.
Legally, I would think both parties get copies of everything. I don't know if that was the case here.
JumpCrisscross
"There are major differences between the Trump 1.0 and 2.0 administrations. In the Trump 1.0 administration, many of the most important officials were very competent men. One example would be then-Attorney General William Barr. Barr is contemptible, yes, but smart AF. When Barr’s DOJ released a redacted version of the Mueller Report, they printed the whole thing, made their redactions with actual ink, and then re-scanned every page to generate a new PDF with absolutely no digital trace of the original PDF file. There are ways to properly redact a PDF digitally, but going analog is foolproof."
https://daringfireball.net/linked/2025/12/23/trump-doj-pdf-r...
netsharc
It's like Russian spies being caught in the Netherlands with taxi receipts showing they took a taxi from their Moscow HQ to the airport: corrupt organizations attract/can only hire incompetent people...
https://www.vice.com/en/article/russian-spies-chemical-weapo...
null
ajross
Given the context and the baldly political direction behind the redactions, it's not at all unlikely that this is the result of deliberate sabotage or malicious compliance. Bondi isn't blacking these things out herself, she's ordering people to do it who aren't true believers. Purges take time (and often blood). She's stuck with the staff trained under previous administrations.
lamontcg
Or it is just the result of firing people who were competent and giving insufficient training to people who had never done this before.
beaned
[flagged]
exasperaited
You mean the layers that were, in fact, just side effects of scanning the (non-authoritative) short form certificate?
nickpinkston
I wonder if any of this is a conscious act of resistance vs. just incompetence.
And yes, I've heard of Hanlon's Razor haha
wolpoli
Black square vs redaction tool difference is well known if someone's job involves redacting PDF or just working with PDF. It's most likely that additional staffs were pulled in and weren't given enough training.
Dusseldorf
Colleagues whose full time job is doing this sort of thing for various bits of the government have told me this is exactly the case here. People from all over the government have been deputized to redact these documents with little or no prior training.
dboreham
CUaaS. Cover Up as a Service.
mindslight
I wonder if this activity is being used as a kind of loyalty test. Keep track of who is assigned to redact what, and then if certain files leak or are insufficiently redacted, they indicate who isn't all in on Dear Leader.
It's not like a few more stories of Trump raping $whomever are going to move the needle at all, especially with how the media is on board with burying negative coverage of the regime.
Also if you're wondering how this activity isn't some kind of abuse of government resources, keep in mind that thanks to the Supreme Council's embrace of the Unitary Executive Theory (ie Sparkling Autocracy), covering up evidence about Donald Trump raping under-aged sex trafficking victims is now an official priority of the United States Government.
exasperaited
Yeah — don't attribute to resistance what can adequately be explained by idiocy.
cynicalsecurity
Let people believe it's deliberate sabotage. Unfortunately, in real life, minions of a dictator serve the dictator; they don't risk their live or safety for a noble cause. Any screw-ups are a result of gross incompetence that is typical for every dictatorship.
andsoitis
Do you truly believe the US is currently a dictatorship?
brunoqc
Maybe because facism favor loyalty over competence.
null
neilv
A third possibility is diversion, while the most damaging evidence would be suppressed a different way.
apical_dendrite
Reporting is that they had a basically impossible deadline and they took lawyers off of counterintelligence work to do this. So a conscious act of resistance is possible, but it's a situation where mistakes are likely - people working very quickly trying to meet a deadline and doing work they aren't that familiar with and don't really want to be doing.
jmyeet
It's a good question.
For context, lawyers deal with this all the time. In discovery, there is an extensive document ("doc") review process to determine if documents are responsive or non-responsive. For example, let's say I subpoenaed all communication between Bob and Alice between 1 Jan 2019 and 1 Jan 2020 in relation to the purchase of ABC Inc as part of litigation. Every email would be reviewed and if it's relevant to the subpoena, it's marked as responsive, given an identifier and handed over to the other side. Non-responsive communication might not be eg attorney-client communications.
It can go further and parts of documents can be viewed as non-responsive and otherwise be blacked out eg the minutes of a meeting that discussed 4 topics and only 1 of them was about the company purchase. That may be commercially sensitive and beyond the scope of the subpoena.
Every such redaction and exclusion has to be logged and a reason given for it being non-responsive where a judge can review that and decide if the reason is good or not, should it ever be an issue. Can lawyers find something damaging and not want to hand it over and just mark it non-responsive? Technically, yes. Kind of. It's a good way to get disbarred or even jailed.
My point with this is that lawyers, which the Department of Justice is full of, are no strangers to this process so should be able to do it adequately. If they reveal something damaging to their client this way, they themselves can get sued for whatever the damages are. So it's something they're careful about, for good reason.
So in my opinion, it's unlikely that this is an act of resistance. Lawyers won't generally commit overt illegal acts, particularly when the only incentive is keeping their job and the downside is losing their career. It could happen.
What I suspect is happening is all the good lawyers simply aren't engaging in this redaction process because they know better so the DoJ had the wheel out some bad and/or unethical ones who would.
What they're doing is in blatant violation to the law passed last month and good lawyers know it.
There's a lot of this going on at the DoJ currently. Take the recent political prosecutions of James Comey, Letitia James, etc. No good prosecutor is putting their name to those indictments so the administration was forced to bring in incompetent stooges who would. This included former Trump personal attorneys who got improerly appointed as US Attorneys. This got the Comey indictment thrown out.
The law that Ro Khanna and Thomas Massey co-sponsored was sweeping and clear about what needs to be released. The DoJ is trying to protect both members of the administration and powerful people, some of whom are likely big donors and/or foreign government officials or even heads of state.
That's also why this process is so slow I imagine. There are only so many ethically compromised lackeys they can find.
tim333
It's quite funny really. Apparently you just cut and paste the text into Word. They just had the pdf put black rectangles on top.
pilaf
Why into Word specifically?
iAMkenough
The average office worker has it on their computer, illustrating how commonplace unredacting could be. Any text tool will work, even some designed to detect bad redactions in PDFs via drag and drop (now specifically trained on these known bad redactions). https://github.com/freelawproject/x-ray
juujian
Apart from the technological and procedural question, I would love to learn why the DOJ found it important to protect Indyke. He was Epstein's lawyer, and now we learn that he was personally involved. He is not a Washington person. We expected there to be politically motivated protection of certain people, but is the DOJ just going to blanket protect anybody in the docs?
avidiax
Indyke works for other powerful people, runs in MAGA circles.
Two things come to mind:
* Some things Indyke did fall outside the scope of lawyer-client privilege. It would be bad for certain people to get him on a stand and force him to spill the beans. He was never interviewed re: Epstein [1]
* He's a very talented lawyer, insofar as a competent lawyer with, at least, extreme discretion, is talented.
[1] https://www.finance.senate.gov/imo/media/doc/letter_to_doj-f...
sublinear
If you think mere human incompetence with documents is bad, imagine all the vibe coded apps.
montroser
Let's nobody make any fuss about this yet, lest they wise up before releasing the rest of the docs this way too!
c420
“Like you guys have had this stuff for a year. Doesn’t it seem like you could just throw all that into AI at this stage of the game? And just redact the names of the victims, and let’s go.” Joe Rogan
nlitsme
Can you post the document numbers, I can't find where these texts are in the original pdfs.
nlitsme
ah, found it - this is from the 'Court Records' part.
https://www.justice.gov/multimedia/Court Records/Matter of the Estate of Jeffrey E. Epstein, Deceased, No. ST-21-RV-00005 (V.I. Super. Ct. 2021)/2022.03.17-1 Exhibit 1.pdf
tomekf
How it’s done from technical point?
mmh0000
Layers.
PDF is an absurdly complex file format. It's part of the reason there is no single "good" PDF reader, just a lot of mediocre PDF readers that are all terrible in their own way. Which is a topic for another day.
There are several ways to remove data in a PDF:
- Remove the data. This is much harder than it sounds. Many PDF tools won't let you change the content of a PDF, not because it isn't possible, but because you'll likely massively screw up the formatting, and the tools don't want to deal with that.
- Replace the data. This what what all the "blackout" tools do, find "A" and replace with "🮋". This is effective and doesn't break formatting since it's a 1-to-1 replacement. The problem with "replacing" is that not every PDF tool works the same way, and some, instead, just change the foreground and background color to black; it looks nearly the same, but the power of copy-and-paste still functions.
- Then you have the computer illiterate, who think changing the foreground and background color to black is good enough anyway.
zauguin
This seems highly misleading.
> - Remove the data. This is much harder than it sounds. Many PDF tools won't let you change the content of a PDF, not because it isn't possible, but because you'll likely massively screw up the formatting, and the tools don't want to deal with that.
Compared to other formats this is actually relatively easy in a PDF since the way the text drawing operators work they don't influence the state for arbitrary other content. A lot of positioning in a PDF is absolute (or relative to an explicitly defined matrix which has hardcoded values). Usually this makes editing a PDF harder (since when changing text the related text does not adapt automatically), but when removing data it makes it much easier since you can mostly just delete it without affecting anything else. (There are exceptions for text immediately after the removed data, but that's limited and relatively easy to control.)
> - Replace the data. This what what all the "blackout" tools do, find "A" and replace with "🮋". This is effective and doesn't break formatting since it's a 1-to-1 replacement.
That's actually rather tricky in PDFs since they usually contain embedded subset fonts and these usually do not have "🮋" as part of the subset. Also doing this would break the layout since "🮋" has a different width than most letters in a typical font, so it would not lead to less formatting issues than the previous option. Unless the "🮋" is stretched for each letter to have the same dimensions, but then the stretched characters allow to recover the text.
> The problem with "replacing" is that not every PDF tool works the same way, and some, instead, just change the foreground and background color to black; it looks nearly the same, but the power of copy-and-paste still functions.
PDF does not have a concept of a background color. If it looks like a background color in PDF, you have a rectangle drawn in one color and something in the foreground color in front of it. What you usually see in badly redacted PDF files is exactly this, but in opposite color: Someone just draws a black box on top of the characters. You could argue that this is smarter since it would still work even if someone would chnage colors, but of course, PDF is a vector format. If you just add a rectangle, someone else can remove it again. (And also copy & paste doesn't care about your rectangle)
hallole
Thanks for this. Really quells the urge I get every so often to just code my own PDF editor, because they all suck and certainly it couldn't be THAT hard. Such hubris!
brailsafe
Heh, have at it, here's the full spec: https://developer.adobe.com/document-services/docs/assets/5b...
Should take... a weekend tops? ;) PDF is crazy and scary
kayodelycaon
I did a bunch of work creating pdfs using a low-level API, object goes here stuff.
As far as I understand it, at its core, pdf is just a stream of instructions that is continually modifying the document. You can insert a thousand objects before you start the next word in a paragraph. And this is just the most basic stuff. Anything on a page can be anywhere in the stream. I don't know if you can go back and edit previous pages, you might have a shot at least trying to understand one page at a time.
Did you know you can have embedded XML in PDFs? You can have a paper form with all the data filled in and include an XML version of that for any computer systems that would like an easier way to read it.
gregsadetsky
Don't stop yourself before getting started. I believe in you - maybe you could write the one editor that would actually work!
Not kidding - it's a ~~~billion dollar market haha
Make an MVP/Show HN :-)
TRiG_Ireland
The blog post about adding colour gradients to Typst dives into some of the weirdness of the format. https://typst.app/blog/2023/color-gradients
NamTaf
Bravo to you for recognising the load-bearing 'just' before you threw it around :)
3eb7988a1663
I remember reading the recommendation for journalists to redact documents is to black them out in the digital version, print it out, and re-scan it. Anything else has too many potential ways by which it might be possible to smuggle data.
null
dmurray
Even that might leak to length attacks: one reasonable plaintext would lead to black bars of 1135 px, another to 1138 px, and with enough redactions you can converge on what the plaintext might be.
The only safe way for journalists is to paraphrase what the document said and to say "an unnamed source claims that ..." and to guarantee with your reputation, and the reputation of your publisher, that you are being faithful to what the original source said. For even better results, combine multiple sources.
Unfortunately paraphrasing things and taking editorial responsibility have both been deprecated in favour of rereleasing press releases in the house style, so it's difficult to get the actual journalism these days.
general1465
Mistaking redaction tool (replaces data with black square) and black highlighter (adds black square as another layer). If people doing redactions are computer-illiterate, they won't see the difference.
oliwarner
They drew black boxes over the text. The text is still underneath. On OCR'd scanned documents, the text you'd copy is actually stored in metadata and just linked by position to the image.
Anyway, if you click on a "redaction", you're clicking on the box and can't select the text underneath, but if you just highlight the text around it, you can copy all the original text.
It's a bizarre oversight.
null
buhfur
Doesn't work on any PDF's of scanned documents , for example the contacts list.
jdiff
Copying and pasting doesn't work. Unless your PDF viewer does OCR. And if the redaction is just a black rectangle overlaid on top, that can still be removed.
tpoacher
reminds me of that leaky redaction program that won the obfuscated c contest some years back
Alifatisk
Alright, now when everyone knows this. I hope people have backed up all the files to unredact everything before DOJ retracts the sensitive documents.
Related: https://xcancel.com/vmfunc/status/2003292986650853825
https://old.reddit.com/r/law/comments/1ptlms6/some_epstein_f...
https://krassencast.com/p/breaking-we-just-unredacted-the-ep...