Skip to content(if available)orjump to list(if available)

My PhD advisor rewrote himself in bash (2010)

ergotux

Two resources which helped me improving my writing, when I was writing my thesis were "How to Write Mathematics" by Paul R. Halmos and "Mathematical Writing" by Donald E. Knuth et al. I would always start with Halmos to get into the spirit of perusing clear and precise communication. The "Bad/Better/OK" suggestions especially reminded me of the discussions in the lecture notes from Knuth et al. And at a third step a linter such as the proposed one is probably helpful, if something slips through.

I think these resources are essential for anyone who writes on any subject which at least involves definitions here and there.

techas

I second those recommendations and add one: The Craft of Scientific Writing by M. Alley.

BOOSTERHIDROGEN

[flagged]

Out_of_Characte

[flagged]

knighthack

That doesn't seem like a very divisive statement.

nazka

If anybody likes this article and wants to know more about the process of writing effective PhD papers they should watch [1]. In fact, anybody who desires to improve their communication skills should watch it. It is so good that I would have paid to have access to this video!

[1] https://youtu.be/vtIzMaLkCaM

3abiton

I wish there was a repo of "all the things you need to know when starting a PhD", include the famous phd grind.

quickgist

I find myself disagreeing with many of the examples. E.g. according to the article:

Bad: It is quite difficult to find untainted samples. Better: It is difficult to find untainted samples.

Bad: We used various methods to isolate four samples. Better: We isolated four samples.

Something being quite difficult reads significantly differently than just being difficult. You haven't made the sentence better, you've changed the meaning.

And the fact that you used various methods instead of a single method is information missing from the second sentence.

dspillett

> Something being quite difficult reads significantly differently than just being difficult. You haven't made the sentence better, you've changed the meaning.

The problem there is that the meaning quite carries can vary significantly depending on the reader or the context where the word is read (so it can read differently depending on how previous sentences have primed the reader, which means the same person might read it differently with that context than if they start at that sentence. Quite differently, in fact!

This is because in spoken form the word changes a lot with tone of voice. Technically "quite difficult" means "slightly difficult" but in many places actually means "damn near impossible".

I'd day that while removing the word isn't wrong, replacing it with a more specific comparison would be better.

mindcrime

And as I understand it, Brits are particularly fond of using "quite" in a sarcastic fashion, so "quite difficult" in England might mean "not difficult at all, you sodding idiot" or something along those lines.

n4r9

Brit here, this isn't something I'm familiar with. "Quite" usually means "somewhat" as in "I found the test quite hard". In upper class speech it can mean "very" as in "that was quite the challenge" or "agreed" when said on its own as a response to a statement.

dspillett

> And as I understand it, Brits are particularly fond of using "quite" in a sarcastic fashion,

Brit here. Many of us are fond of using _any_ word/phrase with sarcasm, irony, or both.

> so "quite difficult" in England might mean "not difficult at all, you sodding idiot"

Depending on tone and other context "quite" can mean anything from a little to a huge amount. It can also mean exactly, as in "Well, quite.".

This is why you need to be careful in professional and academic contexts, or anywhere in writing for that matter, and use domain specific terminology as much as possible.

DonHopkins

It may have been Jimmy Carr who pointed out that:

Americans think that Brits can be quite patronizing ("pæ-trə-naɪ-zɪŋ").

Actually it's patronizing ("peɪ-trə-naɪ-zɪŋ").

barry-cotter

This is another thing that is captured in tone more than anything though the Brits do have a well deserved reputation for sarcasm. Difficult to convey in print what meaning you want the recipient to get.

null

[deleted]

amelius

> the meaning quite carries can vary significantly

Yes, it can be read as a signal that there is some variability in the amount of difficulty.

Putting more uncertainty in the wording is not necessarily bad, and in a scientific context it can be actually good.

ninalanyon

> Technically "quite difficult" means "slightly difficult"

Really? You must come from a different literary tradition. Quite means exactly.

See the first definition in the Cambridge Dictionary at https://dictionary.cambridge.org/dictionary/english/quite

completely:

The two situations are quite different.

The colours almost match but not quite.

I enjoyed her new book though it's not quite as good as her last one.

UK formal Are you quite sure you want to go?

Quite honestly/frankly, the thought of it terrified me.

codethief

Your examples are excellent but I agree with the GP that "quite" can also be used in the sense of "somewhat but not entirely".

notahacker

You've omitted the definition on the lower part of the page:

quite

adverb, predeterminer

"a little or a lot but not completely:"

I'm quite tired but I can certainly walk a little further. There was quite a lot of traffic today but yesterday was even busier. It was quite a difficult job. He's quite attractive but not what I'd call gorgeous. It would be quite a nuisance to write to everyone.

The same dictionary also includes a grammar article clarifying that quite [usually] means "a little, moderately, not very", when the adjective or adverb it modifies is gradable (e.g "good" or indeed "difficult") and it being an intensifier in [generally rarer] situations where the adjective or adverb isn't (e.g "it is quite wrong to say that 'quite' invariably means 'exactly')

https://dictionary.cambridge.org/grammar/british-grammar/qui...

null

[deleted]

dspillett

> Quite means exactly.

Quite.

[Or sometimes “Quite so”.]

Though in informal communication the word's meaning can be quite different depending on context and tone.

null

[deleted]

ffsm8

Great examples, but you should probably remove the "completely" header - as the following examples don't fall under it. I'll delete this comment in 15 minutes ( • ‿ • )

t8sr

And this is why I now have to read 30 page design docs that could have been 3 pages and said the same thing.

Please try to understand why people have such strong dislike of floral writing, especially in technical texts. If you read a lot of papers or designs, it makes your life miserable.

vharuck

Yes, it's the usual advice of how artists/authors/scientists make something: 1) Make the thing, 2) Try removing each part, 3) If the work fails without that part, put it back.

For example, adverbs are good when readers might have the wrong image without them. E.g., "Alice [quickly] walked." Most of the time, writing is better without words like "very" or "quite."

perching_aix

When it comes to technical writing the only thing I can really discuss is documentation, and the key thing I'm personally looking for there is structure.

It could be about basically anything, just please, pretty please, for the love of god, make it structured. And I don't mean sections with catchy headings, I mean as structured and reference-like as possible.

I want to minimize the amount of time I spend reading prose and searching around, as well as the chance of missing things. I want to hit CTRL+F and be put where I need to be stat and have that be enough. Structure alone can convey a lot of the idea behind how something works - please trust me to able to utilize it to make basic leaps in logic.

A bad example for this is AWS documentation. It's a mish-mash of prose and structured reference. A good example is the AWS CLI documentation (although if they lead with example usages first, that'd be even better).

maccard

Writing good technical text is an art. There is a certain amount of fluff that helps, and it’s almost unnoticeable when it’s there. Without it, it’s too terse. Quite often, my complaint of technical documentation is “it did exactly what the docs said it would do, except in a situation that I didn’t expect it to do that”.

null

[deleted]

seanhunter

Strong disagree. If you want to make a point about using different methods, say what you did.

“We isolated four samples using the following methods…”

zaptheimpaler

Yeah most of his examples looked terrible to me. It's actually part of why reading papers is so damn difficult even when the paper says something simple. They're obsessed with this stilted formal tone that no one actually likes and leaves out subtle but important context clues.

mooreds

If you are looking for more of this, check out vale.sh. It lets you add all kinds of style guidelines, including avoiding weasel words.

Here's the GH action we use to run vale on our website at PR time: https://github.com/FusionAuth/fusionauth-site/blob/main/.git...

and our config: https://github.com/FusionAuth/fusionauth-site/tree/main/conf...

We've found it helpful to enforce style but probably aren't using it to the full extent.

codazoda

Vale.sh looks pretty good, thanks for the reference.

I’m currently editing the third edition of my book and a bunch of articles for my website. This will come in handy.

simonw

I got Claude to port those Bash scripts to a web UI so I could paste code directly into it and see what came out: https://tools.simonwillison.net/writing-style

Claude transcript here: https://gist.github.com/simonw/e9902ed1cbda30f90db8d0d22caa0...

bmacho

I don't think it is working properly. The second(?) pattern match is bad. For example, for

    It is quite difficult to find untainted samples. We used various methods to isolate four samples.
it gives

    Found "quite" in: "is quite difficult to find untainted samples."
    Found "various" in: ""
Otherwise great work, I like these very fast html/js tools.

KTibow

Looking at the code, the issue is that both `findWeaselWords` and `getContext` construct a list of words but do it differently:

  const words = text.toLowerCase().split(/\b/)
vs

  const words = text.split(/\s+/)

simonw

I pasted these two comments into Claude and had it update the tool: https://gist.github.com/simonw/dc79f6adcdb189469890bc0a44331...

New version is now live at https://tools.simonwillison.net/writing-style

DonHopkins

Why not just use Claude itself to review the writing, instead of having it write a much less capable and brittle and limited bash script to do it? You could even ask Claude to write a prompt for itself that performs the same or better function than the bash script. Bash and Perl are like duct tape and chewing gum, so terrible for that kind of stuff, and it's just what Claude does best. It can go so much further by actually weighing alternatives and suggesting changes, instead of just flagging problems. And no weird regular expression inconsistencies that cause false positives and negatives and parsing errors.

simonw

My goal here wasn't to build the best possible writing analysis tool, it was to try out these 2010 bash script rules in a slightly more convenient format (and to play more with Claude Artifacts).

I've actually had a tiny bit of trouble using LLMs for writing analysis in the past, though that was more about spell checking. I found they often missed obvious errors, probably because the tokenization step means they don't "see" the exact original prose in a way that makes those errors as obvious as they are to me.

emmelaich

diction and style commands were present in early Unix, which did similar jobs.

You can get them (or versions of them) from GNU. They're in homebrew.

https://www.gnu.org/software/diction/

From what I remember, they weren't great; this bunch of programs probably does just as well.

null

[deleted]

Gimpei

In my field, writing quality was on the very lowest rung of importance, below even teaching evaluations. As much as I value clear, concise prose, I’d say a grad student would be better served working on public speaking especially when faced with hostile questioning, and, sadly, with brown-nosing. Yes I am bitter :p

kjkjadksj

I think even more than public speaking is just seeking a therapist. People get in over their head but its by their own doing. The stakes are never as high as you have built them up in your head. People want you to pass.

disconcision

how can you tell? likely this is field-dependent but even if because reviewers dont tend to comment on writing quality it doesnt mean it doesnt play a factor in acceptance. if you annoy people, (sometimes even especially) in ways that dont feel substantive enough to merit mentioning in a review, it can make them more inclined to be critical about aspects they might otherwise gloss over.

(i acknowledge this is unsatisfyingly unfalsifiable, and that it can also go the other way, in that selectively bad writing can be used to attempt to paper over holes)

liontwist

> To market a paper, the author must make a compelling case for why her idea deserves access to that resource.

In other words, journals are filled with papers that were sold the best, not the most important ideas. And as the author also says, superficial things like hard to detect typos are often a deciding factor because the reviewers can detect them.

we should stop pretending there is objectivity and embrace journals that reflect taste and opinion of the editor.

Or have standard places like arxiv for publishing everything. There is no scarce resource for uploading pdfs.

kjkjadksj

The editor is not the one making the call really. Its the reviewers. If they give it few marks or no marks then that’s it, its published. If they dog that paper down then the editor has to sit up in their chair and actually decide whether or not the authors made sufficient change to address the reviewers complaints. But even then its still the reviewers who are demanding the standard, not the editors.

greazy

The editor is the one who decides if to goes to review. Lots of 'top tier' journals the editors will quickly reject the manuscript.

liontwist

I desire the power to be with the editor.

aeonik

While we're at, improve the citation system, so I the references become hyperlinks, and code artifacts can be accessed.

nonrandomstring

Useful for those who, for some unfathomable reason, don't use emacs.

Might be hard to get into a good workflow as running them and then re-editing seems tedious.

Author gives credit to emacs "writegood", but my all-time fave style-nazi plugin is "artbollocks-mode".

dang

Related:

Shell scripts to improve your writing - https://news.ycombinator.com/item?id=13295530 - Jan 2017 (55 comments)

Shell scripts to improve your writing, or "My advisor rewrote himself in bash." - https://news.ycombinator.com/item?id=1529166 - July 2010 (31 comments)

tgraf_80

Good intentions, indeed. Creating lots of steering committee slides, I know about the wish from the audience of a simpler language. But ‘very close’ is different from ‘close’. It’s not just salt and pepper but trying to articulate a complex and nuanced reality. And yes, research papers then sound a bit less solid and complete- sorry, but often this is the reality you should not hide.

jimbokun

> But ‘very close’ is different from ‘close’.

What's the difference?

tgraf_80

it's a way to express steps in a continuum (given by the context): - not close (1) - close (2) - very close (3) - arrived (4) That's how language works. It's not mathematics and uses "salt and pepper" to convey the message as accurately as possible.

null

[deleted]

gmac

Can confirm the value of checking for repeated words such as ‘the the’. This is the final example I give my Econ PhD students in a session on RegExps: https://users.sussex.ac.uk/~gm268/iphd/regexps/regexps.pdf

skalarproduktr

This is a great summary how to use regexps, complete with some very nice examples. Thank you!

ninalanyon

Just so long as it doesn't blindly modify "that that".

Timwi

I once wrote a sentence with a double “in” in it — it was something like “let's see what state this is in in two weeks’ time” or similar — and at least two people commented thinking that it was wrong and needed an “in” removed...

gmac

Right. You can just search for /\b(\w+) \1\b/ and see what comes up. There are usually false positives of one kind or another.