It is interesting that they are focusing a large part of this release on the mod...

famouswaffles · on Feb 27, 2025

The whole robotic, monotone, helpful assistant thing was something these companies had to actively hammer in during the post-training stage. It's not really how LLMs will sound by default after pre-training.

I guess they're caring less and less about that effort especially since it hurts the model in some ways like creative writing.

capnrefsmmat · on Feb 28, 2025

Maybe, but I'm not sure how much the style is deliberate vs. a consequence of the post-training tasks like summarization and problem solving. Without seeing the post-training tasks and rating systems it's hard to judge if it's a deliberate style or an emergent consequence of other things.

But it's definitely the case that base models sound more human than instruction-tuned variants. And the shift isn't just vocabulary, it's also in grammar and rhetorical style. There's a shift toward longer words, but also participial phrases, phrasal coordination (with "and" and "or"), and nominalizations (turning adjectives/adverbs into nouns, like "development" or "naturalness"). https://arxiv.org/abs/2410.16107

sebastiennight · on March 1, 2025

How is "development" an adverb or adjective turned into a noun??

It comes from a French word (développement) and that in turns was just a natural derivation of the verb "développer"... no adverbs or adjectives (English or otherwise) seem to come into play here

capnrefsmmat · on March 1, 2025

Sorry, I should have said adjectives or verbs, as it's "develop" turned into a noun. Just like "discernment" or "punishment". The etymology isn't relevant for classifying it as a nominalization, only the grammatical function.

turnsout · on Feb 27, 2025

Or maybe they're just getting better at it, or developing better taste. After switching to Claude, I can't go back to ChatGPT's overly verbose bullet-point laden book reports every time I ask a question. I don't think that's pretraining—it's in the way OpenAI approaches tuning and prompting vs Anthropic.

sebastiennight · on Feb 27, 2025

If it's just a different choice during RLHF, I'll be curious to see what are the trade-offs in performance.

The "buddy in a chat group" style answers do not make me feel like asking it for a story will make the story long/detailed/poignant enough to warrant the difference.

I'll give it a try and compare on creative tasks.

orbital-decay · on Feb 27, 2025

Anthropic pretty much abandoned this direction after Claude 3, and said it wasn't what they wanted [1]. Claude 3.5+ is extremely dry and neutral, it doesn't seem to have the same training.

>Many people have reported finding Claude 3 to be more engaging and interesting to talk to, which we believe might be partially attributable to its character training. This wasn’t the core goal of character training, however. Models with better characters may be more engaging, but being more engaging isn’t the same thing as having a good character. In fact, an excessive desire to be engaging seems like an undesirable character trait for a model to have.

[1] https://www.anthropic.com/research/claude-character

Kye · on Feb 28, 2025

It's the opposite incentive to ad-funded social media. One wants to drain your wallet and keep you hooked, the other wants you to spend as little of their funding as possible finding what you're looking for.

callc · on Feb 27, 2025

> We're far from the days of "this is not a person, we do not want to make it addictive" and getting a firm foot on the territory of "here's your new AI friend".

That’s a hard nope from me, when companies pull that move. I’ll stick to my flesh and blood humans who still hallucinate but only rarely.

bredren · on Feb 27, 2025

Yes, the "personality" (vibe) of the model is a key qualitative attribute of gpt-4.5.

I suspect this has something to do with shining light on an increased value prop in a dimension many people will appreciate since gains on quantitative comparison with other models were not notable enough to pop eyeballs.

tmaly · on Feb 27, 2025

I would like to see a humor test. So far, I have not seen any model response that has made me laugh.

tkgally · on Feb 27, 2025

How does the following stand-up routine by Claude 3.7 Sonnet work for you?

https://gally.net/temp/20250225claudestandup2.html

fragmede · on Feb 28, 2025

I chuckled.

Now you just need a Pro subscription to get Sora generate a video to go along with this and post it to YouTube and rake in the views (and the money that goes along with it).

sebastiennight · on Feb 28, 2025

That was impressive. If it all came from just this short 4-line prompt, it's even more impressive.

All we're missing now is a text-to-video (or text+audio and then audio-to-video) that can convincingly follow the style instructions for emphasis and pausing. Or are we already there yet?

tkgally · on Feb 28, 2025

Yes, that was the full prompt.

Yesterday, I had Claude 3.7 write a full 80,000-word novel. My prompt was a bit longer, but the result was shockingly good. The new thinking mode is very impressive.

nathanasmith · on March 3, 2025

I had been sleeping on Claude's ability to write books until a couple of days ago I had it write a novel set in the Accelerando universe. It whipped up a very convincing complete multi-Act 13 chapter side plot about humans learning to interact with Economics 2.0. It was quite good though I'm sure cstross would be horrified.

aprilthird2021 · on Feb 28, 2025

Reading this felt like reading junk food

EDIT: Junk food tastes kinda good though. This felt like drinking straight cooking oil. Tastes bad and bad for you.

jdiez17 · on Feb 28, 2025

Okay, you know what? I laughed a few times. Yeah it may not work as an actual stand up routine to a general audience, it’s kinda cringe (as most LLM-generated content), but it was legitimately entertaining to read.

phonon · on Feb 28, 2025

"Tip your server" was a pretty great pun!

lurker9001 · on Feb 27, 2025

incredible

thousand_nights · on Feb 28, 2025

reddit tier humor, truly

it's just regurgitating overly emphasized cliches in a disgustingly enthusiastic tone

willy_k · on Feb 28, 2025

Is that any different from the bulk of standup today?

AgentME · on Feb 27, 2025

My benchmark for this has been asking the model to write some tweets in the style of dril, a popular user who writes short funny tweets. Sometimes I include a few example tweets in the prompt too. Here's an example of results I got from Claude 3 Opus and GPT 4 for this last year: https://bsky.app/profile/macil.tech/post/3kpcvicmirs2v. My opinion is that Claude's results were mostly bangers while GPT's were all a bit groanworthy. I need to try this again with the latest models sometime.

sebastiennight · on Feb 27, 2025

The "roast" tools that have popped up (using either DeepSeek or o3-mini) are pretty funny.

Eg. https://news.ycombinator.com/item?id=43163654

jcims · on Feb 27, 2025

OK now that is some funny shit.

turnsout · on Feb 27, 2025

If you like absurdist humor, go into the OpenAI playground, select 3.5-Turbo, and dial up the temperature to the point where the output devolves into garbled text after 500 tokens or so. The first ~200 tokens are in the freaking sweet spot of humor.

rl3 · on Feb 27, 2025

Maybe it's rose-colored glasses, but 3.5 was really the golden era for LLM comedy. More modern LLMs can't touch it.

Just ask it to write you a film screenplay involving some hard-ass 80s/90s action star and someone totally unrelated and opposite of that. The ensuring unhinged magic is unparalleled.

jcims · on Feb 27, 2025

I built a little AI assistant to read my calendar and send me a summary of my day every morning. I told it to roast me and be funny with it.

3.5 was *way* better than anything else at that.

sebastiennight · on Feb 28, 2025

Ah, I'd love to have that kind of daily recap... mind sharing some of the code (or even just the prompt?)

rl3 · on Feb 28, 2025

Yeah, I think the fact its "mind" so to speak was more fragmented and unpredictable was almost a boon for that purpose.

rl3 · on Feb 28, 2025

>The ensuring unhinged magic is unparalleled.

Oops: ensuing*

amarcheschi · on Feb 27, 2025

Could someone post an example?

immibis · on Feb 27, 2025

ChatGPT gave me this shell script: https://social.immibis.com/media/7102ac83cf4a200e48dd368938e... (obviously, don't download and execute a random shell script from the internet without reading it first)

I think reading it will make you laugh.

oblio · on March 1, 2025

> We're far from the days of "this is not a person, we do not want to make it addictive" and getting a firm foot on the territory of "here's your new AI friend".

And soon we'll have the new AI friend recommending Bud Lite™ and turning the beer can with the logo towards you.

sureIy · on Feb 27, 2025

I don't know if I fully agree. The input clearly shows the need for emotional support more than "how do I pass this test?" The answer by 4o is comical even if you know you're talking to a machine.

It reminds me of the advice to "not offer solutions when a woman talks about her problems, but just listen."

neuroticnews25 · on Feb 28, 2025

How could a machine provide emotional support? When I ask questions like this to LLMs, it's always to brainstorm solutions. I get annoyed when I receive fake-attention follow-up questions instead.

I guess there's a trade-off between being human and being useful. But this isn't unique to LLMs, it's similar to how one wouldn't expect a deep personal connection with a customer service professional.

cynicalpeace · on Feb 28, 2025

There are some businesses trying to do emotional support with AI, like AI GF's, etc

Some will make some profit as a niche thing (millions of users on a global scale, and if unit economics work, can make millions of $)

But it seems it will never be something really mainstream because most normal people don't care what a bot says or does.

The example I always think of is chess bots have been better at chess than humans for decades. But very few people watch stockfish tournaments. Everyone loves Magnus Carlsen though.

This is 100x for emotional support type things.

sebastiennight · on March 1, 2025

I agree with you on the timescale of a single generation.

I disagree with you on the timescale of n ≥ 2 generations: kids/teens/adults will pick up new habits and ways of seeing the world.

Just like someone like me can appear like a grizzled old fool for not seeing the appeal of TikTok, it's 100% possible to be blinded to the very real appeal of a 24/7 sycophantic "friend".

And I'll give you a concrete example: I was at a business conference 3 weeks ago where I talked to the group about the trap people could easily fall into, of ditching personal/professional support for AI support (the trap is: it's easy for the "digital friend" to get you roped in by just being sycophantic enough - "it's never your fault").

And then in the very same meeting, one of the keynote speeches was this influential female CEO explaining how she had "taught her custom GPT to become her spiritual leader" and how this GPT spiritual teacher was acting as her guide, therapist and coach (complete with a name, backstory and profile picture). I was rolling my eyes so hard they might have fallen out of my head.

This is where we're going towards, and people like this misguided CEO will lead their audiences and followers straight there (especially when that is combined with financial incentives or social rewards).

cynicalpeace · on March 1, 2025

People like that will be well served by the niche businesses I mentioned, and those businesses will make a killing.

but the average person won't be using it

aprilthird2021 · on Feb 28, 2025

I think it's a good thing because, idk why, I just start tuning out after getting reams and reams of bullet points I'm already not super confident about the truthfulness of

nialv7 · on Feb 27, 2025

Well yeah, if the llm can keep you engaged and talking, that'll make them a lot more money; compared to if you just use it as a information retrieval tool in which case you are likely to leave after getting what you are looking for.

TheAceOfHearts · on Feb 27, 2025

Since they offer a subscription, keeping you engaged just requires them to waste more compute. The ideal case would be that the LLM gives you a one shot correct response using as little compute as possible.

sebastiennight · on Feb 27, 2025

In a subscription business, you don't want the user to use as few resources as possible. It's the wrong optimization to make.

You want users to keep coming back as often as possible (at the lowest cost-per-run possible though). If they are not coming back they are not renewing.

So, yes, it makes sense to make answers shorter to cut on compute cost (which these SMS-length replies could accomplish) but the main point of making the AI flirtatious or "concerned" is possibly the addictive factor of having a shoulder to cry on 24/7, one that does not call you on your BS and is always supportive... for just $20 a month

The "one-shot correct response" to "I failed my exams" might be "Tough luck, try better next time" but if you do that, you will indeed use very little compute because people will cancel the subscription and never come back.

johnthewise · on Feb 27, 2025

AI subscriptions are already very sticky . I can't imagine at least not paying for one, so I doubt they care about retention like the rest of us plebs do.

player1234 · on Feb 28, 2025

First imagine paying a subscription fee which actually makes the company profitable and gives investors ROI, then I think you can also imagine not paying that amount at all.

nialv7 · on Feb 27, 2025

Plus level subscription has limits too, and Pro level costs 10x more - as long as Pro users don't use ChatGPT 10x more than Plus users on average, OpenAI can benefit. There's also the user retention factor.