Should We Respect LLMs? A Study on Influence of Prompt Politeness on Performance

95 comments

·April 22, 2025

causalmodels

Personally I think people should generally be polite and respectful towards the models. Not because they have feelings, but because cruelty degrades those who practice it.

null

[deleted]

Freak_NL

Computers exist to serve. Anthropomorphising them or their software programming is harmful¹. The tone of voice an officer would use to order a private or ranking to do something seems suitable — which obviously comes down to terse, clear, unambiguous queries and commands.

Besides, humans can switch contexts easily. I don't talk to my wife in the same way I do to a colleague, and I don't talk to a colleague like I would to a stranger, and that too depends on context (is it a friendly, neutral, or hostile encounter?).

1: At this point. I mean, we haven't even reached Kryten-level of computer awareness and intelligence yet, let alone Data.

PorterBHall

I agree! I try to remember to prompt as if I were writing to a colleague because I fear that if I get in the habit of treating them like a servant, it will degrade my tone in communicating with other humans over time.

1970-01-01

Saying thank you to a plant for growing you a fruit is strange behavior. Saying thank you to a LLM for growing you foobar is also strange behavior. Not doing either is not degrading behavior of the grower.

mkmk3

Disagree wrt practicing gratitude towards resources consumed and tools utilized. Maybe it doesn't degrade you if you don't but I think it gives a bit more perspective

1970-01-01

I think we agree on this if you agree that practicing gratitude in life and directly practicing it on non-sentient objects are not the same thing. Going to church to pray, going to therapy, practicing mindfulness, etc. isn't the same thing as seeing each grape growing on a vine as an anthropomorphic object. Don't anthropomorphize your lawnmower.

Al-Khwarizmi

Yes. I tend to be polite to LLMs. I admit that part of the reason is that I'm not 100% sure they're not conscious, or a future version could become so. But the main reason is what you say. Being polite in a conversation just feels like the right thing to me.

It's the same reason why I tend to be good to RPG NPCs, except if I'm purposefully role playing an evil character. But then it's not me doing the conversation, it's the character. When I'm identifying with the character, I'll always pick the polite option and feel bad if I mistreat an NPC, even if there's obviously no consciousness involved.

apercu

That simply means you were raised right. :)

zmgsabst

I think we can look at examples:

People who are respectful of carved rocks, eg temple statues, tend to be generally respectful and disciplined people.

You become how you act.

w0m

ding ding ding.

If you're rude to an LLM, those habits will bleed into your conversations with barista/etc.

kevinpacheco

It seems like by default, the LLMs I've used tend to come across as eager to ask follow-up questions along the lines of "what do you think, x or y?" or "how else can I help you with this?" I'm going to have to start including instructions not to do that to avoid getting into a ghosting habit that might affect my behavior with real people.

monktastic1

I think it depends on the self-awareness of the user. It's easy to slip into the mode of conflating an LLM with a conscious being, but with enough metacognition one can keep them separate. Then, in the same way that walking on concrete doesn't make me more willing to walk on a living creature, neither does my way of speaking to an LLM bleed into human interactions.

That said, I often still enjoy practicing kindness with LLMs, especially when I get frustrated with them.

Cthulhu_

Not necessarily, people will change behaviour based on context. Chat vs email vs HN comments, for example.

macNchz

I think people in general are not all that great at doing this.

Anecdotal, but I grew up in a small town in rural New England, a few hours from NYC and popular with weekenders and second-home owners from there. I don’t think that people from NYC are inherently rude, but there’s a turbulence to life in NYC to where jockeying for position is somewhat of a necessity. It was, however, transparently obvious in my hometown that people from the city were unable to turn it off when they arrived. Ostensibly they had some interest in the slow-paced, pastoral village life, but they were readily identifiable as the only people being outwardly pushy and aggressive in daily interactions. I’ve lived in NYC for some time now, and I recognize the other side of this, and feel it stemmed less from inherent traits and more from an inability to context switch behavior.

satisfice

Possibly. But that’s not the fault of any person except he who forced a fake social actor into our midst.

It’s wrong to build fake humans and then demand they be treated as real.

verisimi

... cos, I mean, what's the difference between ai and a barista? Both are basically inanimate emotion-free zones, right?

Cthulhu_

Where do you draw the line though? I know some people that ask Google proper questions like "how do I match open tagx except XHTML self-contained tags using RegEx?" whereas I just go "html regex". Some people may even add "please" and "thank you" to that.

I doubt anyone is polite in a terminal, also because it's a syntax error. So the question is also, do you consider it a conversation, or a terminal?

disambiguation

You're nice to AI for your own well being. I'm nice to AI so they spare me when they eventually become our overlords. We are not the same.

ehutch79

The data it’s trained on likely includes better answers when the original question was phrased politely. So we get better answers when we’re polite because those tokens are near better answers in the data.

SkyPuncher

When I'm working through a problem with Cursor, I find platitudes go a long way to keeping it grounded. However, when it really refuses to do something then then best way to break the pattern is harsh, stern wording.

For example, if it's written code that's mostly correct but needs some tweaking, a platitude will keep it second guessing everything it just wrote.

* "That looks great so far, can you tweak XYZ" -> Keeps the code I care about while fixing XYZ,

* "Can you tweak XYZ" -> often decides to completely rewrite all of the code

alwa

I’ve had the same sense. From your examples, I wonder if part of it is that, while the form is that of a platitude, you’re giving it substantive direction too: giving it an indication of what you’re satisfied with, as distinct from what remains to be done.

colinmorelli

I wonder if it's the platitude doing that, or the explicit affirmation that _most_ of it looks good, but just XYZ needs tweaking. That intention is explicit in the first message, and potentially implied but unclear in the second.

null

[deleted]

dr_dshiv

Machine psychology is a field now: https://arxiv.org/abs/2303.13988

jmisavage

I’m still going to talk to it like a person because if I don’t then I’ll slowly start to talk to people like they’re LLMs and it’s going to sound rude.

Cthulhu_

Has the way you comment on HN affected how you write emails or talk to people in real life?

Pavilion2095

Yeah, I was thinking the same. How we "talk" to llms is more about us than about them. For me it's natural to say "please" without thinking twice. I didn't even think about that until recently.

maxwell

Did search engines increase rudeness?

munchler

Search engines don't speak English.

gwill

neither do LLMs.

oriel

My experience informs my opinion, that structure is more important than specific tone.

IMO If LLMs are made from our language, then terminology semantics plays strongly into the output, and degree of control.

Some people rage when the machine doesn't work as expected, but we know that, "computers are schizophrenic little children, and don't beat them when they're bad."[1] ... right? Similar applies to please.

I've had far better results by role playing group dynamics with stronger structure, like say, the military. Just naming the LLM up front as Lieutenant, or referencing in-brief a Full Metal Jacket-style dress-down with clear direction, have gotten me past many increasingly common hurdles with do-it-for-you models. Raging never works. You can't fire the machine. Being polite has been akin to giving a kid a cookie for breaking the cookie jar.

It is funny though, to see the Thinking phase say stuff like "The human is angry (in roleplay)..."

[1] https://www.stilldrinking.org/programming-sucks

JackFr

I often find myself saying please and thank you, but the inability of the LLM to pick up tone can be amusing.

After one of the trashed my app in Cursor, and I pounded on my keyboard "WHY WOULD YOU DO THAT!!!" and the model, ignoring my rage and frustration, responded with a list of 4 bullet points explaining why in fact it did make those changes.

wyclif

Did you "accept all changes"?

pulkitsh1234

> Our study finds that the politeness of prompts can significantly affect LLM performance. This phenomenon is thought to reflect human social behavior. The study notes that using impolite prompts can result in the low performance of LLMs, which may lead to increased bias, incorrect answers, or refusal of answers. However, highly respectful prompts do not always lead to better results. In most conditions, moderate politeness is better, but the standard of moderation varies by languages and LLMs. In particular, models trained in a specific language are susceptible to the politeness of that language. This phenomenon suggests that cultural background should be considered during the development and corpus collection of LLMs.

ionwake

I actually disagree with being polite to LLMs.

The reason is soon there will come a time when people are pushing for AI to have "human" rights. This will naturally occur as poeple start to have relationships with them, and start to see them as better equals.

The problem with this is simply LLMs have no conscience. However humans will easily love them and be open to exploitation through dependance.

I believe it might even be imperative to be able to tell apart "word calculators" and animals, and keeping a line, even its through context switching to light verbal abuse to remind yourself they are not sentient, might be an important survival skill.

I am not trying to trigger anyone or be contrarian, does anyone sort of agree and or understand what I mean? Perhaps I havent explained myself well.

thatnerd

Sam Altman doesn't want you to say "please" to ChatGPT.

https://futurism.com/altman-please-thanks-chatgpt

esafak

That is not true; first he says it's "tens of millions of dollars well spent," followed by "you never know". I don't think he knows.

lyjackal

I’ve wondered whether they use thanks as a signal to a conversation well done, for the purpose of future reinforcement learning

eru

I'd speculate that they use slightly more complicated sentiment analysis. This has been a thing since long before LLMs.

esafak

I don't know if they do or if it is efficient but it is possible.

hbbio

From the article: "the impacts of generating a 100-word email. They found that just one email requires .14 kilowatt-hours worth of electricity, or enough to power 14 LED lights for an hour"

Seems completely off the charts. A 70b model on my M3 Max laptop does it for 0.001 kWh... 140x times less that stated in the article. Let's say the OpenAI Nvidia clusters are less energy efficient than my Macbook... but not even sure about that.

ben_w

One can also work backwards, to see what kind of compute hardware they think must be needed for the models, or how much they think OpenAI's electricity costs.

100 words is ~133 tokens, so 0.14 kWh/133 tokens is about 1 kWh/kilo-token. If electricity is all from record-cheapest PV at $0.01/kWh, then this limits them to a price floor of $10/mega-token. For more realistic (but still cheap) pricing of $0.05/kWh, that's $50/mega-token. Here's the current price sheet: https://platform.openai.com/docs/pricing

To generate a 133-token email in, say, 5 seconds, if it takes 0.14 kWh, is 101 kW. This does not seem like a plausible number (caveat: I don't work in a data centre and what I think isn't plausible may just be wrong): https://www.wolframalpha.com/input?i=0.14+kWh+%2F+5+seconds+...

bayindirh

For reference, a single NVIDIA H200 card has a TDP of 700watts. Considering all the middlemen you put between you and the model, .14KWh doesn't look too outrageous to me. Because you add processors, high-speed interconnects, tons of cooling, etc. into the mix. Plus the models you run at the datacenters are way bigger.

For "fathomability" case, the network cables (fibers in fact) you use in that datacenters carries 800gbps, and the fiber-copper interface converters at each end heats up to uncomfortable levels. You have thousands of these just converting packets to light and vice versa. I'm not adding the power consumption of the switches, servers, cooling infra, etc. into the mix.

Yes, water cooling is more efficient than air cooling, but when a server is burning through 6KWh of energy (8x Tesla cards, plus processors, plus rest of the system), nothing is efficient as a local model you hit at your computer.

Disclosure: Sitting on top of a datacenter.

itchyjunk

Casual twitter response turned into a new article turned into a "X wants Y" is exactly why I stopped trusting most of social media as a source of information.

a3w

I end most conversations with a fuck you, then close the browser window. Since usually chatbots fail at the tasks I give them.

prophesi

The system prompt is usually poising them to act as a helpful assistant, and upholding the idea that they're being helpful makes sense for the LLM to carry out that task better.

It makes me wonder if you tell it that they're a disgruntled senior engineer if they'd perform better with a fellow snarky developer.

eru

> Polite language in human communications often garners more compliance and effectiveness, while rudeness can cause aversion, impacting response quality. We consider that LLMs mirror human communication traits, suggesting they align with human cultural norms.

It should be relatively easy to automatically rewrite prompts to exhibit the optimum level of politeness?

Jeff_Brown

I wonder. If the LLM suggests something really dumb, it might be more effective to really let it know that was a dumb idea. That is, the appropriate level of polite ess might depend on context.

w0m

IIRC there was a study done that being polite to an LLM garnered better results. It's helpful all around

alwa

TFA was one such study.

“We observed that impolite prompts often result in poor performance, but overly polite language does not guarantee better outcomes. The best politeness level is different according to the language.”