In computer chess there's a concept of "contempt". When you set the engine to have high contempt it evaluates the opponent's moves lower, essentially assuming that the opponent will make a mistake. Conversely with low contempt the engine evaluates the opponent's moves higher, expecting the opponent to play better than it.
There is a similar trade-off with LLMs. Sometimes their human conversant wants assistance and so the LLM should be more deferential. At other times the human wants a bias towards correctness rather than their own opinions.
It would be nice to have a contempt knob that you can adjust, instead of blindly trying to emulate one through prompting.
Love when Opus 4.6 tells me that I am actually wrong about some detail and that it's right (which after validating it often turns out to be true) - breath of fresh air.
Not having friction to saying "Prove it, write a shell script or code test to demonstrate." is also great, though sometimes doing that proves both of our assumptions to be wrong.
The article posits that sycophancy is inherent to how models are trained.
I think there's a simpler explanation. Every leaked system prompt from every model pretty much includes instructions to "be helpful," and the models are trained to be assistants, not just general knowledge repositories or research tools.
My hunch is that's the core of the problem -- the system prompt.
I wonder what happens if you prompt it to be a tool and not an assistant and that it does not need to be helpful just do as instructed or something like this
The article's main idea is that for an AI, sycophancy or adversarial are the two available modes because they don't have enough context to make defensible decisions. You need to include a bunch of fuzzy stuff around the situation, far more than it strictly "needs" to help it stick to its guns and actually make decisions confidently
I think this is interesting as an idea. I do find that when I give really detailed context about my team, other teams, ours and their okrs, goals, things I know people like or are passionate about, it gives better answers and is more confident. but its also often wrong, or overindexes on these things I have written. In practise, its very difficult to get enough of this on paper without a: holding a frankly worrying level of sensitive information (is it a good idea to write down what I really think of various people's weaknesses and strengths?) and b: spending hours each day merely establishing ongoing context of what I heard at lunch or who's off sick today or whatever, plus I know that research shows longer context can degrade performance, so in theory you want to somehow cut it down to only that which truly matters for the task at hand and and and... goodness gracious its all very time consuming and im not sure its worth the squeeze
Something a little annoying is when doing a google search the AI section can change day-by-day and even between devices.
I was thinking through a coding architecture issue and did a google search a few days ago on my phone and one of the ideas it gave me was really useful so I left the browser open. Of course when I went to look at it again today the page reloaded and the results weren't the same or as good. So I tried the same search on my desktop and it was even worse.
Well of course. If I trawl the world's collected texts, "Are you sure?" correlates very strongly with the following words being an indication that previous statements were erroneous.
If I used that body of text to make a statistical model and used that model to predict what comes after "Are you sure?" it would very often be an indication that previous statements were erroneous.
This doesn't feel like a problem but a feature. I don't want AI making ambiguous decisions. I want it to research, present all the relevant facts, and seek my approval before choosing a direction. I even append my prompts like "present all the risks and benefits of each option" rather than "make a decision" to avoid getting a confident answer.
This is real, but (at least in a coding context) easily preventable. Just append "don't assume you're wrong - investigate" or something to that effect. Annoying, but usually effective.
I am seriously tired of every other paragraph I read ending in an It isn't just X, it's Y. I'm sure there is something insightful in between this slop but to the author: Please write using your own voice, if I wanted ChatGPT's take on it I would ask.
Agreed. I don't even necessarily have anything against AI edited text but there's a way to sharpen your own writing and there's a way to let its voice dominate. There's a lot of idioms it tends to fall back on (em dashes being the most well known). I'm surprised that folks don't notice these and aggressively reassert their voice.
I use LLMs in my own writing because they have benefits for conciseness but it tends to be a fairly laborious process of putting my text in the LLM for shortening and grammar, getting something more generic out, putting my soul back in, putting it back in the LLM for shortening, etc. I tend to do this at the paragraph level rather than the page level.
It requires very few flags to kill a story. My unstated assumption is that it will require wider participation and agreement for downvotes to do the same
fwiw, I'm convinced we will all slowly lose our voice as everything around us becomes ai-assisted. People are already picking up the 'AI-isms' into their everyday speech.
I've started a blog just to scream into the void, but every word is my own, and I encourage others to do the same. AI helped set it up, the UI is pretty slop, but that's not the point. I'm hoping that by writing more I can strengthen my connection to my voice as I continue to use these tools for other uses. I'm sure writing in a journal or writing letters to friends would have similar effects too, right?
We all understand "muscles need to be regularly used to be maintained", I think we need to take that same approach to our brain, especially in the day of AI
I call this self-repudiation. I performed a systematic experiment on this exact matter, a couple of years ago. I found that ChatGPT 3.5 frequently self-repudiated, whereas 4.0, under identical circumstances, rarely did.
These experiments are a bit expensive to run because you are forced to read all the responses to judge repudiation. Sometimes it is subtle.
Also, behavior changes with the exact wording of the question.
I'm more curious about the other direction. How many times has a model replied to a request with "Are you sure?" I'd bet just about zero.
In my job I do it all the time; people ask for stuff and I often spend a lot of time on clarifying questions, the most fundamental of which is "Are you sure this is what you want?"
Except when I wanted to get ChatGPT or Claude to criticize a religion or religious figure, namely Khamenei. It never backed down and if forced too much and I pointed out its contradiction, it would switch to 2~3-word sentences response mode (i.e. passive-aggressive).
It was a long time ago, Claude 3 or maybe ChatGPT's v3. It felt so dehumanizing that I never tried again.
It didn't seem like trained behavior though, it felt much like hardcoded behavior.
There is a mind; the model + text + tool inputs is the full entity that can remember, take in sensory information, set objectives, decide, learn. The Observe, Orient, Decide, Act loop.
As the article says, the models are trained to be good products and give humans what they want. Most humans want agreeableness. You have to get clear in your heuristic instructions what you mean by "are you sure?", as in, identify areas of uncertainty and use empiricism and reasoning to reduce uncertainty.
In AWS, for example, DNSSEC Route53 signing is possible, but almost no one configures it. Generally, most people do a lot of good things about security, but they somehow forget about DNS.
There is a similar trade-off with LLMs. Sometimes their human conversant wants assistance and so the LLM should be more deferential. At other times the human wants a bias towards correctness rather than their own opinions.
It would be nice to have a contempt knob that you can adjust, instead of blindly trying to emulate one through prompting.
reply