> The rate of fundamental, broad-based breakthroughs lifting all LLM applications has clearly slowed with many of the most impactful recent discoveries being in scaling, optimization, tuning and productization toward specific domains.
To me it definitely feels like it's still accelerating, with the most impactful recent discovery being RL training reasoning models (late '24, early '25).
There's an interesting article called "sigmoids won't save you" https://www.astralcodexten.com/p/the-sigmoids-wont-save-you which argues that (unless you have privileged information) you should always assume a process will continue about as long as it’s continued already. (Lindy's Law)
With that in mind the current disruption should last another 10-15 years (assuming it started in '10 or '17.)
I would bet money Anthropic and OpenAI are actually profitable on inference. The problem is they have to spend large sums of money to train models that are essentially worthless after a few months.
They make more money from inference than they do training the model, but then the next model gets so much more expensive to train so their annual figures have been in the red.
One could say "that's a great point, we should take more direct ideological action to address this issue!", but expounding upon the finer details would likely get one banned here.
What I truly don't understand, as a daily heavy Opus 4.7 user, is how you can coherently prompt 15 different parallel conversations at the same time.
For me it's not even a "what the hell are you working on" so much as complete inability to understand how you can keep so many different processes working on distinct tasks. It simply doesn't map on to how I use these tools.
I spend most of my day writing extremely detailed prompts and that's how I'm able to get the sort of excellent results that confound skeptics. But I have to be honest with you: I don't think I can write (or think) fast enough to do two of these at a time, much less 15.
I definitely could not review what they are generating with any degree of confidence.
I'm really hoping you can explain what the heck your usage pattern actually looks like, because reading this makes me feel like I'm missing something.
Yeah good luck with that. I find SystemVerilog is probably the thing that AI is worst at, presumably because there's not that much training data out there, and pretty much everything about the commercial tools is paywalled.
SystemVerilog Assertions. Hardware (silicon ASICs, and also FPGAs often) are written in a language called SystemVerilog. It has a feature called "concurrent assertions" which is usually just called SVA.
These are sort of temporal regexes, e.g. you can write
Which means if the rst signal fell (changed to 0) then foo must be 1 and 1-20 cycles later it must be 0.
The nice thing about them is that there are a few commercial tools that can formally verify them. They're super expensive (~$100k/year for one license), but fairly widely used because they work really well.
It's probably the most successful application of formal verification because it doesn't require much expertise to use. Unlike software formal verification which pretty much immediately requires you to become an expert on loop invariants, termination measures, hoare triples etc. At least that has been my experience.
The human savant will remember where they read it and give you credit. It might lead more people to read your work, and ultimately you make money.
The AI won't even know where the page of text it's seeing came from, and people will avoid your book as they can just ask the AI. So you make less money. (Talking about specialized technical books here.)
> In 1983 David DeWitt (https://en.wikipedia.org/wiki/David_DeWitt) published benchmarking results showing poor performance for Oracle databases. Larry Ellison wasn't happy with the results and it's said that he tried to have DeWitt fired.
> Given how difficult it is to fire professors when there's actual misconduct, the probability of Ellison sucessfully getting someone fired for doing legitimate research in their field was pretty much zero. It's also said that, after DeWitt's non-firing,
> Larry banned Oracle from hiring Wisconsin grads and Oracle added a term to their EULA forbidding the publication of benchmarks. Over the years, many major commercial database vendors added a license clause that made benchmarking their database illegal.
Now, instead of letting car owners pay for the public space they use (street parking), you are forcing anyone without a car to waste their own private space, in case somebody wants to park there.
The subtle difference is between American parking minimums imposed on property owners - “you must reserve space on your private property for this many cars whether you own them or not” vs Japanese parking requirements imposed on car owners - “you must reserve space on some private property for your car if you want to own it”
I'm seeing diminishing returns, though in fairness we have no idea yet how to integrate properly with existing good practices and principles. I suspect improvement is going to come mainly from improved took usage rather than more impressive models.
To me it definitely feels like it's still accelerating, with the most impactful recent discovery being RL training reasoning models (late '24, early '25).
There's an interesting article called "sigmoids won't save you" https://www.astralcodexten.com/p/the-sigmoids-wont-save-you which argues that (unless you have privileged information) you should always assume a process will continue about as long as it’s continued already. (Lindy's Law)
With that in mind the current disruption should last another 10-15 years (assuming it started in '10 or '17.)
reply