Hacker Newsnew | past | comments | ask | show | jobs | submit | thomasahle's commentslogin

> The rate of fundamental, broad-based breakthroughs lifting all LLM applications has clearly slowed with many of the most impactful recent discoveries being in scaling, optimization, tuning and productization toward specific domains.

To me it definitely feels like it's still accelerating, with the most impactful recent discovery being RL training reasoning models (late '24, early '25).

There's an interesting article called "sigmoids won't save you" https://www.astralcodexten.com/p/the-sigmoids-wont-save-you which argues that (unless you have privileged information) you should always assume a process will continue about as long as it’s continued already. (Lindy's Law)

With that in mind the current disruption should last another 10-15 years (assuming it started in '10 or '17.)


He used 600B tokens in 30 days.

I use more than 150B/month with just 15 codex accounts.

60 accounts is "just" $12,000/month. So Peter could "save" 100x by using monthly accounts.

Of course, he doesn't have to, as he works at OpenAI now.


Sounds like a healthy industry, selling tokens at 1000x below cost.


API pricing isn’t cost, we don’t know what cost is.


I would bet money Anthropic and OpenAI are actually profitable on inference. The problem is they have to spend large sums of money to train models that are essentially worthless after a few months.


Dario explicitly stated this in an interview.

They make more money from inference than they do training the model, but then the next model gets so much more expensive to train so their annual figures have been in the red.


So, it's like if they were a pharma company that was barely profitable if you didn't take into account R&D costs?


A large part of the GPT-5.x model iteration has been about making training more affordable and token efficient.


It's to build a moat, of course!

Narrator: there was no moat


This performative concern over token costs and subsidisation comes from either ignorance or some latent ideology signalling.


One could say "that's a great point, we should take more direct ideological action to address this issue!", but expounding upon the finer details would likely get one banned here.


What I truly don't understand, as a daily heavy Opus 4.7 user, is how you can coherently prompt 15 different parallel conversations at the same time.

For me it's not even a "what the hell are you working on" so much as complete inability to understand how you can keep so many different processes working on distinct tasks. It simply doesn't map on to how I use these tools.

I spend most of my day writing extremely detailed prompts and that's how I'm able to get the sort of excellent results that confound skeptics. But I have to be honest with you: I don't think I can write (or think) fast enough to do two of these at a time, much less 15.

I definitely could not review what they are generating with any degree of confidence.

I'm really hoping you can explain what the heck your usage pattern actually looks like, because reading this makes me feel like I'm missing something.


I'm trying to recreate all the commercial EDA stack in open source. (RTL simulators, synthesis, formal proof tools, etc.)

Building compilers has a _lot_ of parallel tasks agents can work on.

Wish me luck..


Good luck!


Yeah good luck with that. I find SystemVerilog is probably the thing that AI is worst at, presumably because there's not that much training data out there, and pretty much everything about the commercial tools is paywalled.


those costs are not just tokens used for prompting . costs include agent loops, etc


What do you do with all those accounts?


Probably trying to fix their broken personal website with the half of the links there not working at all.


Is my website broken?


Ask your 15 codex accounts agents, surely they will help you with that.


Your website seemed fine to me I didn't try every link though


I'm currently choosing between the right formalization for a big hardware project.

I'm considering between SVA, TLA+ and Lean. With the former being more domain specific and the later more general.

Do you think we'll move towards "Lean for everything" or do domain specific formalisms still make sense?


Have you considered P? It feels like a good abstraction for engineers as it's "proper" code.

https://github.com/p-org/P


what's SVA?


SystemVerilog Assertions. Hardware (silicon ASICs, and also FPGAs often) are written in a language called SystemVerilog. It has a feature called "concurrent assertions" which is usually just called SVA.

These are sort of temporal regexes, e.g. you can write

  assert property($fell(rst) |-> foo == 1 ##[1:20] foo == 0)
Which means if the rst signal fell (changed to 0) then foo must be 1 and 1-20 cycles later it must be 0.

The nice thing about them is that there are a few commercial tools that can formally verify them. They're super expensive (~$100k/year for one license), but fairly widely used because they work really well.

It's probably the most successful application of formal verification because it doesn't require much expertise to use. Unlike software formal verification which pretty much immediately requires you to become an expert on loop invariants, termination measures, hoare triples etc. At least that has been my experience.


The human savant will remember where they read it and give you credit. It might lead more people to read your work, and ultimately you make money.

The AI won't even know where the page of text it's seeing came from, and people will avoid your book as they can just ask the AI. So you make less money. (Talking about specialized technical books here.)


Not necessarily.


Does it run on Nvidia or Huawei?


> In 1983 David DeWitt (https://en.wikipedia.org/wiki/David_DeWitt) published benchmarking results showing poor performance for Oracle databases. Larry Ellison wasn't happy with the results and it's said that he tried to have DeWitt fired.

> Given how difficult it is to fire professors when there's actual misconduct, the probability of Ellison sucessfully getting someone fired for doing legitimate research in their field was pretty much zero. It's also said that, after DeWitt's non-firing,

> Larry banned Oracle from hiring Wisconsin grads and Oracle added a term to their EULA forbidding the publication of benchmarks. Over the years, many major commercial database vendors added a license clause that made benchmarking their database illegal.

See also: https://web.archive.org/web/20160719145221/http://sqlmag.com...


This is crazy car-centric legislation.

Now, instead of letting car owners pay for the public space they use (street parking), you are forcing anyone without a car to waste their own private space, in case somebody wants to park there.


I can't imagine that you have to let someone park on your private property anywhere.


No, that is not the point.

The subtle difference is between American parking minimums imposed on property owners - “you must reserve space on your private property for this many cars whether you own them or not” vs Japanese parking requirements imposed on car owners - “you must reserve space on some private property for your car if you want to own it”


Or, you know, they will have improved the safe guards


Sure thing.


Good musicians care about music theory / “first principles” as much as good writers care about language theory / grammar.


I don't know anyway using these models everyday who think they are hitting a ceiling.

If anything there's a plateau between each model release.


I'm seeing diminishing returns, though in fairness we have no idea yet how to integrate properly with existing good practices and principles. I suspect improvement is going to come mainly from improved took usage rather than more impressive models.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: