The "are you sure?" Problem: Why AI keeps changing its mind

zarzavat · 2026-03-16T13:27:05 1773667625

In computer chess there's a concept of "contempt". When you set the engine to have high contempt it evaluates the opponent's moves lower, essentially assuming that the opponent will make a mistake. Conversely with low contempt the engine evaluates the opponent's moves higher, expecting the opponent to play better than it.

There is a similar trade-off with LLMs. Sometimes their human conversant wants assistance and so the LLM should be more deferential. At other times the human wants a bias towards correctness rather than their own opinions.

It would be nice to have a contempt knob that you can adjust, instead of blindly trying to emulate one through prompting.

KronisLV · 2026-03-16T13:19:13 1773667153

Love when Opus 4.6 tells me that I am actually wrong about some detail and that it's right (which after validating it often turns out to be true) - breath of fresh air.

Not having friction to saying "Prove it, write a shell script or code test to demonstrate." is also great, though sometimes doing that proves both of our assumptions to be wrong.

johndhi · 2026-03-16T13:13:35 1773666815

The article posits that sycophancy is inherent to how models are trained.

I think there's a simpler explanation. Every leaked system prompt from every model pretty much includes instructions to "be helpful," and the models are trained to be assistants, not just general knowledge repositories or research tools.

My hunch is that's the core of the problem -- the system prompt.

Eddy_Viscosity2 · 2026-03-16T18:37:01 1773686221

My prompts always contain the phrase 'no sycophancy'. The results are more direct.

fleischhauf · 2026-03-16T20:37:48 1773693468

I wonder what happens if you prompt it to be a tool and not an assistant and that it does not need to be helpful just do as instructed or something like this

RugnirViking · 2026-03-16T13:04:18 1773666258

The article's main idea is that for an AI, sycophancy or adversarial are the two available modes because they don't have enough context to make defensible decisions. You need to include a bunch of fuzzy stuff around the situation, far more than it strictly "needs" to help it stick to its guns and actually make decisions confidently

I think this is interesting as an idea. I do find that when I give really detailed context about my team, other teams, ours and their okrs, goals, things I know people like or are passionate about, it gives better answers and is more confident. but its also often wrong, or overindexes on these things I have written. In practise, its very difficult to get enough of this on paper without a: holding a frankly worrying level of sensitive information (is it a good idea to write down what I really think of various people's weaknesses and strengths?) and b: spending hours each day merely establishing ongoing context of what I heard at lunch or who's off sick today or whatever, plus I know that research shows longer context can degrade performance, so in theory you want to somehow cut it down to only that which truly matters for the task at hand and and and... goodness gracious its all very time consuming and im not sure its worth the squeeze

catigula · 2026-03-16T13:05:12 1773666312

An AI can only be tuned to either be sycophantic or adversarial.

It isn't possible to tune an AI to have some sort of 'correct answer' orientation because that would be full AGI.

comrade1234 · 2026-03-16T14:32:39 1773671559

Something a little annoying is when doing a google search the AI section can change day-by-day and even between devices.

I was thinking through a coding architecture issue and did a google search a few days ago on my phone and one of the ideas it gave me was really useful so I left the browser open. Of course when I went to look at it again today the page reloaded and the results weren't the same or as good. So I tried the same search on my desktop and it was even worse.

EliRivers · 2026-03-16T13:34:39 1773668079

Well of course. If I trawl the world's collected texts, "Are you sure?" correlates very strongly with the following words being an indication that previous statements were erroneous.

If I used that body of text to make a statistical model and used that model to predict what comes after "Are you sure?" it would very often be an indication that previous statements were erroneous.

thetimman8 · 2026-03-16T14:21:09 1773670869

This doesn't feel like a problem but a feature. I don't want AI making ambiguous decisions. I want it to research, present all the relevant facts, and seek my approval before choosing a direction. I even append my prompts like "present all the risks and benefits of each option" rather than "make a decision" to avoid getting a confident answer.

kibibu · 2026-03-16T15:23:05 1773674585

I feel like if I asked the author of this entire article "are you sure?", they might change their mind...

trusche · 2026-03-16T13:10:25 1773666625

This is real, but (at least in a coding context) easily preventable. Just append "don't assume you're wrong - investigate" or something to that effect. Annoying, but usually effective.

simsla · 2026-03-17T01:12:41 1773709961

I think my experience as an interviewer has helped. If you ask non-leading questions, sycophancy doesn't come into play as much.

Instead of saying "are you sure?" or "shouldn't we do X instead?" you could say "give me the benefits and drawbacks of this compared to X".

Also, when you yourself are sure, give clear stear. "This overcomplicates A, let's do B instead."

eru · 2026-03-16T13:23:26 1773667406

Also: 'don't assume you are right.' I often have the models/agents make lots of assumptions and suffer from plenty of confirmation bias.

philipp-gayret · 2026-03-16T13:02:13 1773666133

I am seriously tired of every other paragraph I read ending in an It isn't just X, it's Y. I'm sure there is something insightful in between this slop but to the author: Please write using your own voice, if I wanted ChatGPT's take on it I would ask.

tyleo · 2026-03-16T13:08:54 1773666534

Agreed. I don't even necessarily have anything against AI edited text but there's a way to sharpen your own writing and there's a way to let its voice dominate. There's a lot of idioms it tends to fall back on (em dashes being the most well known). I'm surprised that folks don't notice these and aggressively reassert their voice.

I use LLMs in my own writing because they have benefits for conciseness but it tends to be a fairly laborious process of putting my text in the LLM for shortening and grammar, getting something more generic out, putting my soul back in, putting it back in the LLM for shortening, etc. I tend to do this at the paragraph level rather than the page level.

jofzar · 2026-03-16T13:08:17 1773666497

I miss people having their own voice. I can't keep reading slop.

I wish hackernews banned slop, or atleast required disclosure.

nimonian · 2026-03-16T15:13:31 1773674011

> These aren't edge cases. This is...

me stopping reading

srean · 2026-03-16T13:08:22 1773666502

I think HN might need a downvote button for stories if this continues.

jagged-chisel · 2026-03-16T13:11:06 1773666666

We have "flag." Flag 'em.

srean · 2026-03-16T13:23:37 1773667417

I know, but that might be too harsh.

It requires very few flags to kill a story. My unstated assumption is that it will require wider participation and agreement for downvotes to do the same

robertlagrant · 2026-03-16T13:05:09 1773666309

Exactly. It's not just nauseating—it's sickening.

dpoloncsak · 2026-03-17T16:38:06 1773765486

fwiw, I'm convinced we will all slowly lose our voice as everything around us becomes ai-assisted. People are already picking up the 'AI-isms' into their everyday speech.

I've started a blog just to scream into the void, but every word is my own, and I encourage others to do the same. AI helped set it up, the UI is pretty slop, but that's not the point. I'm hoping that by writing more I can strengthen my connection to my voice as I continue to use these tools for other uses. I'm sure writing in a journal or writing letters to friends would have similar effects too, right?

We all understand "muscles need to be regularly used to be maintained", I think we need to take that same approach to our brain, especially in the day of AI

satisfice · 2026-03-16T14:07:39 1773670059

I call this self-repudiation. I performed a systematic experiment on this exact matter, a couple of years ago. I found that ChatGPT 3.5 frequently self-repudiated, whereas 4.0, under identical circumstances, rarely did.

These experiments are a bit expensive to run because you are forced to read all the responses to judge repudiation. Sometimes it is subtle.

Also, behavior changes with the exact wording of the question.

seanc · 2026-03-16T14:00:08 1773669608

I'm more curious about the other direction. How many times has a model replied to a request with "Are you sure?" I'd bet just about zero.

In my job I do it all the time; people ask for stuff and I often spend a lot of time on clarifying questions, the most fundamental of which is "Are you sure this is what you want?"

hks0 · 2026-03-16T15:56:35 1773676595

Except when I wanted to get ChatGPT or Claude to criticize a religion or religious figure, namely Khamenei. It never backed down and if forced too much and I pointed out its contradiction, it would switch to 2~3-word sentences response mode (i.e. passive-aggressive).

It was a long time ago, Claude 3 or maybe ChatGPT's v3. It felt so dehumanizing that I never tried again.

It didn't seem like trained behavior though, it felt much like hardcoded behavior.

gmerc · 2026-03-16T13:02:44 1773666164

AI Slop. Unfortunately

josefritzishere · 2026-03-16T13:07:41 1773666461

AI slop about AI slop. The internet is dead.

agentultra · 2026-03-16T13:07:07 1773666427

There isn’t a mind to change. Unfortunately the article is slop. Too bad, won’t read the rest.

I wish there was a tag or something we could put on headlines to avoid giving views to slop.

sunir · 2026-03-16T13:13:14 1773666794

There is a mind; the model + text + tool inputs is the full entity that can remember, take in sensory information, set objectives, decide, learn. The Observe, Orient, Decide, Act loop.

As the article says, the models are trained to be good products and give humans what they want. Most humans want agreeableness. You have to get clear in your heuristic instructions what you mean by "are you sure?", as in, identify areas of uncertainty and use empiricism and reasoning to reduce uncertainty.

josefritzishere · 2026-03-16T15:08:12 1773673692

That definition falls substantially short of a mind.

sunir · 2026-03-16T22:07:50 1773698870

Tomato tomato. It’s sufficiently cybernetic. We can treat it therefore as something with agency.

gebalamariusz · 2026-03-16T17:44:27 1773683067

In AWS, for example, DNSSEC Route53 signing is possible, but almost no one configures it. Generally, most people do a lot of good things about security, but they somehow forget about DNS.