I feel like "open source" in this context is, as you say, an uncomfortable nuance; the tooling (llama.cpp, et al) is open but useless without weights.
The weights are extraordinarily expensive "capital" that is donated by big organizations who are all at war with each other.
I don't know that it will ever be possible for, for instance, archive.org, to make truly open weights. And, other than archive.org, I can't imagine any other "open source" organization (freebsd? apache?) being in any position at all to make truly open weights.
Maybe governments, government organizations, or universities.
None of whom are currently funded, mandated, inclined, or particularly interested in dumping the money into buying the infrastructure needed to make weights.
Yes. The weights war is a much more aggressive war than the war of OSS donations.
In the OSS donations war (Visual Studio Code being a really fascinating example of it) you could see that the taps can't be turned off so easily. Whatever is donated can be built upon forever.
I think there will come a point, soon enough, where open weights models are capable enough that even if they stagnate, they can be augmented with tooling that essentially keeps them current. Maybe we are there now?
But the risk of the taps being turned off is not negligible.
My own feeling is that governments will ultimately ask consortia of universities to train open weights models and support them financially in doing so.
(And for what it is worth, I think diffusion text models are likely to trigger a hardware arms race that makes this possible)
In much the same way that they used to do that for the supercomputer race, which we just don't hear about right now!
AI data centers that exist and are operational are running at maximum capacity. That's why you see things like the tiny little data center run by xai showing up as a valuable resource to xai (on the sale side) and anthropic (buy side). It is "only" 300 megawatts and there's a 1.25 billion rent on it per month.
If all these other data centers were anywhere near coming on line, that 300mw data center would be a rounding error not a line item as it is right now.
So someone's signed contracts for way more and way larger data centers, someone's purchased billions in hardware for these not yet operational data centers. I'm wondering how depreciation's going to work on all these assets...
Anyhow, I'm not really sure what "max capacity" is here, nor am I really aware when they're going to be delivering the operational assets that are currently levered to their eyeballs and consuming 1/3rd of the memory made on the planet.
As far as inference vs training, have new gotten radically better than old models or only marginally (at the cost of 10x or more the training costs)?
I run my word processing software on my apple 2 (a total joke of a computer) instead of running it on the WANG.
I run my book keeping software on visicalc instead of the IBM.
I run my simulation software on my IBM PC (I even paid for the 8087!) instead of the VAX.
Moore's law has, at least so far, allowed the pioneers with toy computers to grow their toys big enough to solve "big boy" problems after some time has allowed the toy computers to be faster and the pioneers have scaled their crappy home-grown solution to solve their 60% of the problem that was originally solved by some enormous complex system.
Eventually the toy infrastructure gets expensive and solves 90-120% of the "big iron" problem space, but it also grows to cost as much as the big iron solution, but then a new generation of toy software and toy systems emerges to disrupt the "big iron" systems.
Under appreciated requirement for this to work in post-cloud times: open source
If a vendor can SaaS a solution, then enterprise is generally happy (they don't want to have to hire folks for maintenance), and that completely locks out any ability to run locally.
Between enterprise's ambivalence and the obvious financial incentive to vendors, you get SaaS-only products.
You're right Moore's law has been holding up, but will hit a hard limit on process node size, so all scaling will be based on multiple cores. OTH, computing per watt spent has been plateauing. If the future bottlenecks are energy and cooling, that will require infrastructure-scale solutions. My bet is this is going to be real AI company moat.
High Bandwidth Memory uses thousands of interconnects for the data bus. DDR style memory typically uses in the neighborhood of 64 bit transfers at a time.
HBM tends to be integrated onto the package (board, multi chip module, die) because there are really tight signaling and wire routing constraints that make "modularity" impossible.
I remember back in the day you could get motherboards for your 286, 386, and sometimes even 486 with external L1 / L2 / L3 cache -- you'd buy a bunch of static ram dips that you'd populate sockets next to the CPU, and set a bios or DIP switch to enable it. These days that's just not practical because there are too many wires interconnecting the cache to the dies and cache coherence logic, and the speed of light is just too slow and electricity is too messy to put "external" to the die/chip/package, even if the packaging issues could be addressed.
HBM memory is similar -- it's not practical to make a generic interconnect that'd actually work reliably enough to provide field replaceable memory modules as you can with DDR style dimms.
EDIT:
Apparently I'm totally wrong in that these "SOCAMM2" modules have thousands of pads (like a CPU socket) and can in fact run with the same data bus width (1024 bits wide!) as "local" HBM. Very cool. And please ignore my out of date blatherings above. It's still not quite as fast as if you put the HBM in the package, but it's way faster than the DDR style setup.
They've got thunderbolt 5 which is pretty good and every port has a dedicated controller. You can even network macs together just over thunderbolt.
Not as fast as raw PCIe slots inside the chassis, but I don't think the ghost of steve jobs cares much. He'd tell you that if you've got no taste at all you can put his beautiful machine into ugly junk like from sonnettech and you can go do ugly things elsewhere.
I think apple's happy taking the market they've got and they'll leave the big guns HPC market to nvidia. The margins look great for nvidia right now, but I suspect nvidia's path will be similar to dram's boom/bust cycle more than apple's continuous "premium tool" brand's market positioning.
That 1tb / 12 channels is a continuous streaming read / write rates? I assume big wide DDR memory, for random "IO", is much slower than compared to HBM.
I feel like at a certain point there are just going to be big SOC packages with 128gb of ram and stacks of cores (each with their own "local" cache) and the 128gb "local" HBM on-package ram will just be the 4th or 5th level cache, and big server boards will have 4 of those and CXL elsewhere for "main" memory.
And things like the VAST stuff also blur lines between high speed local storage and less performant san or bulk commodity storage.
The old memory / storage hierarchies are getting mixed up (again).
NEW VISTA, OUTER RIM—Just a cycle ago, the brain was in a living person. Now, hours after its first owner died, it sits on a slab draped in tubes that quiver as they pump liters of blood substitute and other fluids through the organ, supplying oxygen and removing waste. As far as anyone knows, with many of its key functions intact but maybe awarness muffled by drugs, the brain hovers between life and death. As people subject it to experimental drugs, sensors record the brain's reactions, capturing hundreds of data points on its cells, proteins, and physiology. Then, after 24 hours in this state, it will be sliced into hundreds of pieces for more detailed study.
If the AI is awesome at identifying security bugs in the linux kernel, it likely can also identify if the thing it's found is similar to something that is already found in the security mailing list?
Or, put another way -- what flags the duplicate? The filer or the system? If my cheese factory is measured by the volume of cheese instead of the quality, I'll churn out the cheese even if it's sloppy duplicated cheese. And that is the case if a person has to flag a new ticket as "same as this" or not.
What's that law that says that any sufficiently large problem turns into a moderation problem?
The problem is that the tech companies are paying their research/marketing departments for headlines that go "Researcher uses powerful new Saga 6.2 release to find 597 kernel vulnerabilities! (Can your company afford NOT getting their $1000/month subscription?)", not for headlines that go "Researcher spends $50.000 to find 597 bugs, then spends $25.000 figuring out 540 of them are duplicates".
Unless the kernel community starts banning & publicly shaming repeat offenders, there's zero incentive for them to put any effort in filtering out duplicates. They are mostly doing it for marketing after all, not out of a genuine interest in making the kernel better.
> “AI detected bugs are pretty much by definition not secret, and treating them on some private list is a waste of time for everybody involved – and only makes that duplication worse because the reporters can't even see each other's reports.”
Ah; so it _is_ a tool problem. It is _also_ a moderation problem.
One could ban orgs that flood the zone with AI generated trash, but is there some potential middle ground where there are sets of filters to identify duplicated bugs, and possibly just internally dump "AI spam" to a lower queue?
This seems like the sort of problem I'd addressed in the 90s with killfiles and spamassassin. In other words, can't the ingestion just go through some filters to shield the humans at the end of the pipe?
The weights are extraordinarily expensive "capital" that is donated by big organizations who are all at war with each other.
I don't know that it will ever be possible for, for instance, archive.org, to make truly open weights. And, other than archive.org, I can't imagine any other "open source" organization (freebsd? apache?) being in any position at all to make truly open weights.
Maybe governments, government organizations, or universities.
None of whom are currently funded, mandated, inclined, or particularly interested in dumping the money into buying the infrastructure needed to make weights.
reply