Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yeah. That's been my general problem with adopting NVidia for anything. They make good hardware, but there's a lot of lock-in, and not a lot of transparency. That introduces business risk.

I'm not in a position where I need GPGPU, but if there wasn't that risk, and generally there were mature, open standards, I'd definitely use it. The major breakpoint would be when libraries like Numpy do it natively, and better yet, when Python can fork out list comprehensions to a GPU. I think at that point, the flood gates will open up, and NVidia's marketshare will explode from specialized applications to everywhere.

Intel stumbled into it by accident, but got it right with x86. Define an open(ish) standard, and produce superior chips to that standard. Without AMD, Cyrix, Via, and the other knock-offs, there would be no Intel at this point.

Intel keeps getting it right with numerical libraries. They're open. They work well. They work on AMD. But because Intel is building them, Intel has that slight bit of advantage. If Intel's open libraries are even 5% better on Intel, that's a huge market edge.



> They make good hardware, but there's a lot of lock-in, and not a lot of transparency.

This sounds like you'd like NVIDIA to open-source all their software. I see this type of request a lot, but I don't see it happening.

NVIDIA's main competitive advantage over AMD and Intel is its software stack. AMD could release a 2x powerful GPGPU tomorrow for half the price and most current NVIDIA users wouldn't care because what good is that if you can't program it? AMD software offer is just poor, of course they open-source everything, they don't make any software worth buying.

ARM and Intel make great software (the Intel MKL, Intel SVML, ... libraries, icc, ifort, ... compiler), and it doesn't open-source any of that either for the same reasons as NVIDIA.

Intel and NVIDIA employ a lot of people to develop their software stacks. These people aren't probably very cheap. AMD strategy is to save a lot of money in software development, maybe hoping that the open-source communities or Intel and NVIDIA will do it for free.

I also see these requests that Intel and NVIDIA should open-source everything together with the explanation that "I need this because I want to buy AMD stuff". That, right there, is the reason why they don't do it.

You want to know why NVIDIA has 99% of the Cloud GPGPU hardware market and AMD 1%? If you think 10.000$ for a V100 is expensive, do the math on how much does an AMD MI50 costs: 5000$ for the hardware, and then a team of X >100k$ engineers (how much do you think AI GPGPU engineers cost?) working for N years just to play catch on the part of the software stack that NVIDIA gives you with a V100 for free. That goes into multiple million dollars more expensive really quickly.


> AMD could release a 2x powerful GPGPU tomorrow for half the price and most current NVIDIA users wouldn't care because what good is that if you can't program it?.

Correction: Nobody will be able to use the AMD hardware (outside of computer graphics) because everybody has been locked-in with CUDA on Nvidia. They can not even change even if they want to: it is pure madness to reprogram an entire GPGPU software stack every 2 years just to change your hardware provider.

And I think it will remain like that until NVidia get sued for anti-trust.

> ARM and Intel make great software [..] doesn't open-source any of that either for the same reasons as NVIDIA.

That's propaganda and it's wrong.

Intel and ARM contribute a lot to OSS. Most of the software they release nowadays is Open Source. This includes compiler support, drivers, libraries and entire dev environment: mkl-dnn, TBB, BLIS, ISPC, "One", mbedTLS.... ARM has even an entire foundation only to contribute to OSS (https://www.linaro.org/) .

Near to that, NVidia does close to nothing.

There is no justification to NVidia's attitude related to OSS. It reminds me the one of Microsoft at its darkest days.

The only excuse I can see to this attitude is greed.

I hope at least they do not contaminate Mellanox with their toxic policies. Mellanox was an example of successful Open Source contributor/company (up to now) with OFabric (https://www.openfabrics.org/). It would be dramatic if this disappear.


Amd doesn't even have software for GPGPU on some of their cards. I have an rx5700xt and I cant use it for anything but gaming because ROCm doesn't support navi cards, a whole year after its release.


As a 5700 owner, I agree.

It gets even worse. There was recently a regression in the 5.4, 5.5, and 5.6 kernels that hit me hard for a week or so on Manjaro last month. System just decided to lock up or restart. Thought the graphics card had died when it happened once on Windows. Working fine now-these drivers have been out for 10 months now.

Even worse, AMD has locked down the releases of some of their 'GPUOpen' software.

https://www.phoronix.com/scan.php?page=news_item&px=Radeon-R...

https://www.phoronix.com/scan.php?page=news_item&px=GPUOpen-...

I did not expect the second one to be open source; just not on their GPUOpen website.

I did expect the first one to 'stay' open source. Not to be made proprietary on their 'GPUOpen' website.

I am definitely keeping an eye on Intel graphics now.


I think at this point AMD wants anything Compute to concentrate on CDNA, and graphics remain on RDNA.


>> ARM and Intel make great software [..] doesn't open-source any of that either for the same reasons as NVIDIA.

> That's propaganda and it's wrong.

Very convenient of you to have omitted what was in the square brackets:

> Intel MKL, Intel SVML, ... libraries, icc, ifort, ... compiler

Show me the open source MKL, Intel SVML, icc and ifort.

Some (all?) of it may be free, but it's not open source.


I don't necessarily have an opinion either way in this discussion but wanted to point out that Intel's latest MKL library does seem to be done as an open source project https://github.com/oneapi-src/oneMKL


> Nobody will be able to use the AMD hardware (outside of computer graphics) because everybody has been locked-in with CUDA on Nvidia.

But numpy can be ported. So can pytorch.

I don't think the lock-in is that big of an issue. GPUs do only simple things, but do them fast.


> GPUs do only simple things, but do them fast

GPUs are immensely complex systems. Look at an API like Vulcan, plus it's shading language, and tell me again it's simple. And that's a low-level interface.

Now add to that the enormous amount of software effort that goes into implementing efficient libraries like cuBLAS, cuDNN, etc. There's a reason other vendors have struggled to compete with NVidia.

Disclaimer: currently employed at NVidia.


Part of Nvidia's advantage comes from building the hardware and software side by side. No one was seriously tackling GPGPU until Nvidia created Cuda, and if you look at the rest of the graphics stack Nvidia is the one driving the big innovations.

GPUs are sufficiently specialized in both interface and problem domain that GPU enhanced software is unlikely to appear without a large vendor driving development, and it would be tough for that vendor to fund application development if there is no lock in on the chips.

which leads to the real question. What business model would enable GPU/AI software development without hardware lock-in? Game development has found a viable business by charging game publishers.


Would you agree that that your observations somewhat imply that a competitive free market is not a fit for all governable domains (and don't mistake governable for government there, we're still talking about shepherding of innovation)?


Early tech investments are risky, but if your competition has tech 10 years more advanced than yours, there is probably no amount of money that would allow you to catch up, surpass, and make enough profits to recover the investment, mainly because you can't buy time, and your competitor won't stop to innovate, they are making a profit and you aren't, etc.

So to me the main realization here is that in tech, if one competitor ends up with tech that's 10 years more advanced than the competition, it is basically a divergence-type of phenomenon. It isn't worth it for the competition to even invest in trying to catch up, and you end up with a monopoly.


This is a good callout, unlike manufacturing the supply chain is almost universally vertically integrated for large software projects. While it's possible to make a kit car that at least some people would buy, most of the big tech companies have reached the point of requiring hundreds of engineers for years to compete.

Caveat that time has shown that the monopolies tend to decay over time for various reasons, the tech world is littered with companies that grew too confident in their monopoly.

- Cisco - Microsoft Windows - IBM

etc.


The problem with vertically integrated technology is that if a huge advancement appears at the lowest level of the stack that would require a whole re-implementation of the whole stack, a new startup building things from scratch can overthrown a large competitor that would need to "throw" their stack away, or evolve it without breaking backward compatibility, etc.

Once you have put a lot of money into a product, it is very hard to start a new one from scratch and let the old one die.


I think you would need to take a fine tooth comb to the definitions here. I could see a few different options emerge for non-Nvidia software including

- Cloud providers wishing to provide lower CapEx solutions in exchange for increased OpeX and margin. - Large Nvidia customers forming a foundation to shepherd Open implementations of common technology components

From a free market perspective both forms of transaction would be viable and incentivized, but neither option necessarily leads to an open implementation.


I have been stating similar thing on GPU for a very long time.

The GPU hardware is ( comparatively ) simple.

It is the software that sets GPU vendors apart. For Gaming, that is Drivers. For Compute that is CUDA.

On a relative scale, getting a decent GPU design may have a difficulty of 1, getting a decent Drivers to work well on all existing software is 10, getting the whole ecosystem system around your Drivers / CUDA + Hardware is likely in the range of 50 to 100.

As far as I can tell, under Jensen's leadership, the chance of AMD or even Intel to shake up Nvidia's grasp in this domain is partially zero in the foreseeable future.

That is speaking as an AMD shareholder and really wants AMD to compete.


> But numpy can be ported. So can pytorch.

Letting AMD or Intel port themselves everything that has been developed in CUDA like it was done for Pytorch is not substainable and will always lag behind.

It can only help to create a monopoly on the long term.


As Hip continues to implement more of CUDA, I think we'll see more developers doing it themselves when the barrier to porting is smaller. AMD has a lot of work to do, and I don't know whether they'll succeed or not, but IMO they have the right strategy.


Intel don't release BLIS, though there is some Intel contribution. Substitute libxsmm, which originally beat MKL.


> Correction: Nobody will be able to use the AMD hardware (outside of computer graphics) because everybody has been locked-in with CUDA on Nvidia.

NVIDA open-sourced their CUDA implementation to the LLVM project 5 years ago, which is why clang can compile CUDA today, and why Intel and PGI have clang forks compiling CUDA to multi-threaded and vectorized x86-64 using OpenMP.

That you can't compile CUDA to AMD GPUs isn't NVIDIA's fault, it's AMD, for deciding to pursue OpenCL first, then HSA, and now HIP.


> Do you work for AMD

I do not. And I use NVidia hardware regularly for GPGPU. But I hate fanboyism.

> NVIDA open-source their CUDA implementation to the LLVM project 5 years ago

Correction: Google developped an internal CUDA implementation for their own need based on LLVM that Nvidia barely supported it for their own need afterwards.

Nothing is "stable" nor "branded" in this work.... Consequently, 99% of public Open Source CUDA-using software still compile ONLY with the CUDA proprietary toolchain ONLY on NVidia hardware. And this is not going to change anything soon.

> one from PGI, that compile CUDA to multi-threaded x86-64 code using OpenMP.

The PGI compiler is proprietary and now property of NVidia. It was previously properitary and independant but mainly used for its GPGPU capability through OpenACC. OpenACC backend targets directly the nvidiaptx (proprietary) format. Nothing related with CUDA.

> Intel being the main vendor pushing for a parallel STL in the C++ standard

That's wrong again.

Most of the work done for the parallel STL and by the C++ committee originate from work from HPX and the STELLAR Group (http://stellar-group.org/libraries/hpx/).

They are pretty smart people and deserve at least respect and parent-ship for what they have done.

More information from Hermut Kaiser (Very Nice Guy btw) here (https://www.youtube.com/watch?v=6Z3_qaFYF84).

They have been the precursor of the idea of parallel "algorithms" in the STL and the concept of "Execution policy" you have in C++17 comes from them.

To the defense of Intel (and up to my knowledge) they have provided the first OSS implementation for compilers for it.


> But I hate fanboyism.

"The only excuse I can see to this attitude is greed" sounds pretty fanboyish to me. :-)

I've never understood why Microsoft, or Adobe, or Autodesk, or Synopsys, or Cadence or any other pure software company is allowed to charge as much as the market will bear for their products, often more per year than Nvidia's hardware, but when a company makes software that runs on dedicated hardware, it's called greed. I don't think it's an exaggeration when I say that, for many laptops with a Microsoft Office 365 license, you pay more over the lifetime of the laptop for the software license than for the hardware itself. And it's definitely true for most workstation software.

When you use Photoshop for your creative work, you lock your design IP to Adobe's Creative Suite. When you use CUDA to create your own compute IP, you lock yourself to Nvidia's hardware.

In both cases, you're going to pay an external party. In both cases, you decide that this money provides enough value to be worth paying for.


> Correction: Google developped an internal CUDA implementation for their own need based on LLVM that Nvidia barely supported it for their own need afterwards.

This is widely inaccurate.

While Google did developed a PTX backend for LLVM, the student that worked on that as part of a GSOC got later hired by NVIDIA, and ended up contributing the current NVPTX backend that clang uses today. The PTX backend that Google contributed was removed some time later.

> Nothing is "stable" nor "branded" in this work.

This is false. The NV part of the backend name (NVPTX) literally brands this backend as NVIDIAs PTX backend, in strong contrast with the other PTX backend that LLVM used to have (it actually had both for a while).

> OpenACC backend targets directly the nvidiaptx (proprietary) format.

This is false. Source: I've used the PGI compiler on some Fortran code, and you can mix OpenACC with CUDA Fortran just fine, and compile to x86-64 using OpenMP to just target x86 CPUs. No NVIDIA hardware involved.

> That's wrong again. > > Most of the work done for the parallel STL and by the C++ committee originate from work from HPX and the STELLAR Group

This is also widely inaccurate. The Parallel STL work actually originated with the GCC parallel STL, the Intel TBB, and NVIDIA Thrust libraries [0]. The author of Thrust was the Editor of the Parallelism TS, and is the chair of the Parallelism SG. The members of the STELLAR group that worked on HPX started collaborating more actively with ISO once they started working at NVIDIA after their PhDs. One of them chairs the C++ library evolution working group. The Concurrency working group is also chaired by NVIDIA (by the other nvidia author of the original parallelism TS.

AMD is nowhere to be found in this type of work.

[0] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n372...


> While Google did developed a PTX backend for LLVM, the student that worked on that as part of a GSOC got later hired by NVIDIA, and ended up contributing the current NVPTX backend that clang uses today.

You more or less reformalized what I said. It might become used one day behind a proprietary blob, rebranded blob of NVidia, but fact is that today, close to nobody use it for production in the wild and it is not even supported officially.

> This is false. The NV part of the backend name (NVPTX) literally brands this backend as NVIDIAs PTX backend.

It does not mean it's stable or used. I do not now a single major GPGPU software in existence that ever used it in an official distribution. Like I said.

> CUDA Fortran just fine

CUDA fortran, yes you said it, CUDA fortran. The rest is OpenACC.

> The Parallel STL work actually originated with the GCC parallel STL, the Intel TBB, and NVIDIA Thrust libraries

My apologies for that. I was ignoring this precedent work.

> AMD is nowhere to be found in this type of work.

I do not think I ever said anything about AMD.


> CUDA fortran, yes you said it, CUDA fortran. The rest is OpenACC.

You can also mix C, OpenACC, and CUDA C, and compile to x86-64. So I'm really not sure about what point you are trying to make here.

You were claiming that OpenACC and CUDA only runs on nvidia's hardware, yet I suppose you now agree that this isn't true I guess.

I do agree that PGI is still nvidia owned, but there are other compilers that do what PGI does.


> You were claiming that OpenACC and CUDA only runs on nvidia's hardware, yet I suppose you now agree that this isn't true I guess.

I do not think I ever said that OpenACC runs only on NVidia hardware. However CUDA I still affirm that CUDA runs only on NVidia hardware yes. For anything else, it is based on code converter in best case.


> That you can't compile CUDA to AMD GPUs isn't NVIDIA's fault, it's AMD, for deciding to pursue OpenCL first, then HSA, and now HIP.

Using a branded & under patent concurrent proprietary technology and copying its API for your own implementation is Maddness that will lead you for sure in front of a court.

It seems that even Google understood that the hard way (https://en.wikipedia.org/wiki/Google_v._Oracle_America)


How come? There is a CUDA C++ and CUDA C toolchains available under a MIT license, large part s of which are contributed by NVIDIA.

How can they sue you for using something that they give you with a license that says "we allow you to do whatever you want with it" ?


the MIT license doesn't have an express patent grant. If Nvidia has a patent on some technology used by the open source code, they could sue you for patent infringement if you use it in a way that displeases them. What they can't do is sue you for copyright infringement.


Google v Oracle is still unsettled.

Most other legal precedent was that it was fine to clone an API.


> Most other legal precedent was that it was fine to clone an API.

CUDA is more than an API. It is a technology under copyright and very likely patented too. Even the API itself contains multiple reference to "CUDA" in function calls and variable name.


None of that protects it from being cloned under previous 9th circuit precedent except maybe patents, but I'm not aware of any patents that'd protect another against CUDA implementation.


>Intel and PGI have clang forks compiling CUDA to multi-threaded and vectorized x86-64 using OpenMP.

Where are these forks?



For PGI, all pgi compilers can do this, just pick x86-64 as the target. There are also other forks online (just search for LLVM, CUDA, x86 as keywords), some university groups have their forks on github, where they compile CUDA to x.


People who are into RISC-V and other side projects/open stacks obviously have not worked on mission critical problems.

When you have a Jet engine hoisted up for a test rig, and something fails in your DSP library, you don't hesitate to call Matlab engineering support to help on within next 30 mins. Try that with some python library. People give a lot of flak to Matlab for being closed source but there is a reason they exist. Not for building a stupid toy project, but for real things where big $$$ is on the line. Python is also used in production everywhere, but if your application is a niche one and using PyVISA library to connect to some DSP hardware that you git cloned is not very "production" ready. You need solid deps.

Don't get me wrong - open source software runs in prod all the time - PostgreSQL/Linux, etc. The smaller the application domain (specific DSP libraries or analysis stacks for wind turbines and such), the lower the availability of high quality open source software (and support).

My point is that reality hits you hard when it is anything where a lot of $$$ or people's time depend on it. Don't blame their engineers for using closed source tools.


> People who are into RISC-V and other side projects/open stacks obviously have not worked on mission critical problems.

"People who are into RISC-V" nowadays includes folks like Chris Lattner, who has worked on more mission-critical problems than most everyone here.


Yes, and not all of them were turned into gold. I don't have any hopes on Swift for Tensorflow.


It would suffice for NVIDIA to open-source enough specifications and perhaps some subset of core software to enable others to build high quality open source (or even proprietary) software that targets NVIDIA's architecture. They can't hire every programmer in the world; if other programmers can build high-performance software that takes advantage of their platform, that increases the value of their hardware.

Your comparison to Intel isn't valid: most software that runs on Intel processors isn't built with icc, and customers have a choice: they can use icc, gcc, clang, or a number of other compilers. The NVIDIA world isn't equivalent.


Anyone is free to target PTX and do their own compiler on top.

In fact, given that it is there since version 3, there are compilers available for almost all major programing languages, including managed ones.

While OpenCL is a C world, and almost no one cares about the C++ extensions and even less vendors care about SPIR-V.

Also the community doesn't seem to be bothered that for a long time, the only SYCL implementation was a commercial one from CodePlay, trying to extend their compilers outside the console market.


> the community doesn't seem to be bothered that for a long time, the only SYCL implementation was a commercial one

Bothered has nothing to do with it. Implementing low level toolchains generally seems to require both a gargantuan effort and an incredible depth of knowledge. If it didn't, I think tooling and languages in general would be significantly better across the board.

What am I supposed to do, implement a SYCL compiler on my own? Forget it - I'll just keep writing GLSL compute shaders or OpenCL kernels until someone with lots of resources is able to foot the initial bill for a fully functional and open source implementation.


Which is why CUDA won, most researchers can't be bothered to keep writing C based shaders with printf debugging.


This is wrong - triSYCL is roughly the same age as ComputeCpp, and hipSYCL is only slightly younger. There has been a lot of academic interest in SYCL, but as with any new technology (especially niche technologies) it's always going to take time to get people on board.

Also, from a quick look at your profile, you seem to have quite a lot of comments criticizing or commenting on CodePlay. Do you have some sort of relationship or animosity with them?


I wish all the luck to CodePlay, the more success the better for them.

They are well appreciated among game developers, given their background.

My problem is how Khronos happens to sell their APIs, and let everyone alone to create their own patched SDKs and then act surprised that commercial APIs end up winning the hearts of the majority.

The situation has hardly changed since I did my thesis with OpenGL in late 90's, porting a particles visualization engine from NeXTSTEP to Windows.

Nothing that compares with CUDA, Metal, DirectX, LibGNMX, NVN tooling.

Hence my reference to CodePlay, as for very long time their SDK was the only productive way to use SYCL.

Khronos likes to oversell the eco-system, and usually the issues and disparities across OEMs tend to be "forgotten" on their marketing materials.


Rust has a PTX backend.


This has literally been a back and forth argument since a 100 point post on slashdot was a groundbreaking event. I don't see it changing any time soon - honestly if anything on tech forums this argument frequently overshadows just how well NVIDIA is doing.


It is just like game forums as well.

The culture here and on those forums couldn't be further apart.


>> NVIDIA's main competitive advantage over AMD and Intel is its software stack. AMD could release a 2x powerful GPGPU tomorrow for half the price and most current NVIDIA users wouldn't care because what good is that if you can't program it?

I always wonder why it is so hard for AMD to develop a true competitor to CUDA, but for AMD hardware? Not try to solve GPGPU programming through open standards like OpenCL, just copy the concept of CUDA wholesale. They could still build it on top of LLVM etc and release the whole thing as open-source, but have the freedom to not have to deal with design-by-committee frameworks like OpenCL, so they can ensure focus on GPU programming and nothing else, and only on those platforms where the majority of the demand is. There is not much wrong with OpenCL, it's just not nearly as good/capable/easy-to-use as CUDA if all you are interested in is GPGPU programming.

AMD is a big company with a lot of revenue, especially recently, so why would it be so hard to have a team working full-time on creating a direct CUDA knock-off ASAP?


Two thoughts that come to mind:

1. AMD has struggled in the past and even today on being profitable with their GPUs. Makes it difficult to entice an army of knowledgeable devs without consistent cash flow. Granted, the tide is turning with their profitable CPU business and equity has shot up.

2. More importantly I think that, being the underdog, AMD has to have a cheaper, open solution to compete. Why would a customer choose to go with AMD’s nascent and proprietary stack over Nvidia’s well established and nearly ubiquitous proprietary stack?

To be clear, I don’t think the problems are insurmountable. AMD won a couple HPC deals recently which should afford them the opportunity to build up their software and invest in a competitive hardware solution.


To be fair, Nvidia has open sourced some key libraries lately. See cutlass and cufftdx.


> Intel keeps getting it right with numerical libraries. They're open. They work well. They work on AMD.

What Intel numerical libraries are you thinking of? When I think of Intel numerical libraries, the first that comes to mind is MKL. MKL is neither open-source nor does it work well on AMD without some fragile hacks [0].

[0] https://www.pugetsystems.com/labs/hpc/How-To-Use-MKL-with-AM...


Well, OP didn't say MKL works well on AMD. But you can at least run it on a non-Intel CPU. Compare CUDA.


The nvidia pgi compiler compiles CUDA to multi-core x86-64. There are other third-party compilers for CUDA->x86-64 (one LLVM-based one from Intel).

There is a "library replacement" for CUDA from AMD called HIP, that you can use to map CUDA programs to ROCm. But... it doesn't work very well.

NVIDIA also open-sourced CUDA support for Clang and LLVM. So anybody can extend clang to map CUDA to any hardware supported by LLVM, including SPIRV. The only company that would benefit from doing this would be AMD, but AMD doesn't have many LLVM contributors.

Intel drives clang and LLVM development for x86_64, paying a lot of people to work on that.


It sounds like people want nvidia to write drivers for AMD.

This criticism makes even less sense when any bystander could implement CUDA suppport on AMD by connecting open source software.


> any bystander

You aren't seriously implying than any bystander is capable of extending LLVM to map CUDA to SPIR-V? What percentage of present day gainfully employed software engineers do you suppose even has the background knowledge? How many hours do you suppose the work would require?


If LLVM has a SPIRV backend, probably very little. For a proof of concept, a bachelor CS thesis would probably do.

Clang already has a CUDA parser, and all the code to lower CUDA specific constructs to LLVM-IR, some of which are specific for the PTX backend. If you try to compile CUDA code for a different target, like SPIRV, you'll probably get some errors saying that some of the LLVM-IR instructions generated by clang are not available in that backend, and you'll need to generate the proper SPIRV calls in clang instead.

Its probably a great beginner task to get started with clang and LLVM. You don't need to worry about the C++ frontend side of things because that's already done, and can focus on understanding the LLVM-IR and how to emit it from clang when you already have a proper AST.


FWIW, there already exists LLVM to SPIR-V compiler: https://github.com/KhronosGroup/SPIRV-LLVM-Translator

Alas, this supports SPIR-V to 1.1.


Late response I know, but I would say anyone who needs that feature could learn to do it, at least if they are on Hacker News. Maybe bystander isn't the most accurate term, but certainly anyone with criticism could take the gauntlet.

LLVM is very well documented and so are these standards. The open source community is also huge and full of talented contributors and more are always welcome to join. I think there's a reason why Linux and GitHub exist.

So in short, if it's a question of motivation and it's something you need, then become motivated to make it happen. That's more likely to happen then convincing a company to invest in supporting a competitor.


CUDA appears to have come out well before even OpenCL. I don't see why there would be expectation that nVidia would design their framework to work on a competitors product.


Cuda was also a response to ATI's own efforts at a proprietary effort which they eventually have up on.


ATI came out with CTM, which was just an assembler. CUDA was released a month or so after that. It was a full C compiler and already had a pretty large set of examples and library functions.

I downloaded CUDA about the day it was released, and used it for real some months later when I bought a 8600 GT GPU.

To call CUDA a response to CTM is too much praise for Nvidia, because it suggests that their response included cobbling a compiler and SDK in just a month. :-)


Not on ARM, or POWER, you can't. Why you'd want to run it on AMD, I don't understand. I don't know what fraction of peak BLIS and OpenBLAS get, but it will be high.


> The major breakpoint would be when libraries like Numpy do it natively

That already happened [0]. NVIDIA has a 1:1 replacement for Numpy called CuPy that does this and is what powers their RAPIDS framework (which is a 1:1 replacement for Pandas that runs on GPUs).

Some people were complaining in [0] about CuPy reproducing numpy's bugs..

[0]: https://news.ycombinator.com/item?id=22830201


Pretty much everyone these days uses a library for driving the GPU calculations. And they tend to either support multiple hardware targets directly (TensorFlow) or have API-compatible replacements (CuPy/NumPy).

So the lock-in risk here is that you might have to run your stuff on CPU if future NVIDIA GPUs are too overpriced.

I mean they are super expensive. But there's nothing that comes close to their cuBLAS library in terms of performance. So unless AMD ponys up and hires GPU algorithm engineers, NVIDIA will win simply due to their superior driver software.

I once had to optimize a CPU matrix multiplication algorithm. 10 days of work for a 2x speedup. Now imagine doing that for every one of the thousands of functions in the Blas library...


Yeah I think most people don't quite appreciate the difficulty and cost of optimizing for hardware and continually maintaining that through hardware cycles. In keeping things closed source Nvidia products have both the advantages of being easier to on-board due to simpler abstraction, and faster technical progress because there is less pushback from myriad parties when big inconvenient changes might need to happen at lower levels for hardware performance reasons kind of like if instead of x86 we instead settled on LLVM.


That is a very good metaphor :)

Actually, I wonder why we went with un-compilable Java bytecode and JIT instead of advancing projects like gcj.


I'm not surprised expecting to beat implementations of the basic Goto strategy for BLAS didn't turn out well. BLIS only needs a single, pared-down GEMM kernel for level3, and maybe one for TRSM. (It doesn't currently have GPU support, but I think there was an implementation mentioned in an old paper.)


NVIDIA has no ethical or moral responsibility to give their competitors the benefit of software they have paid to develop in-house. It is probably a safe bet that you yourself do not develop your projects under the Affero GPL, and so on some level you agree with this.

What you see as "ecosystem lock-in" is properly viewed as software that you pay a premium for as part of your purchase price, above and beyond the pricing of the competitor's hardware. NVIDIA costs more than AMD because they have to employ people to write all that software, and you are "buying" that software when you purchase NVIDIA's product.

Analogously - Amiga has no moral responsibility to let you run AmigaOS on anything except their hardware. This sort of "hardware exclusivity" used to be very common and widely accepted. Today, Apple has no moral responsibility to let you run OS X on anything except their hardware (the existence of underground hackintoshing is irrelevant here). The software is part of what you are buying when you buy the product.


It's not a safe bet. I've build project under AGPL, and made plenty of money doing it. There are places where open is good business, and there are places where proprietary is good business, and there's everything in between. AGPL was nice since I could be open, which had huge market advantage, but release code which my competitors would /never/ take advantage of. It had, quite literally, zero downsides, and a lot of upsides.

There are projects where I do 100% proprietary too, and a mix. It's a business decision. It's not as stupid as proprietary=profit and open=charity. It's a business calculation in every case.


Given that I hardly saw any clone vendors other than AMD, I really doubt that they had any influence on Intel's market share.

What worked out was IBM not being able to prevent PC clones, but given the wide adoption of laptops, tablets and phones that hardly matters nowadays.


Intel was forced to license to AMD for government contracts. There's a super-complex story there I won't get into.

There were a few clone vendors aside from AMD. None were ever a serious threat, and AMD itself didn't become more than a bottom-feeder until after maybe 15 years. But their existence did drive a lot of adoption.

And yes, I did oversimplify. MS-DOS, IBM not being able to prevent clones, and so on, all really played together here as part of the same story.


That's why Intel realized that fab technology was the true differentiator.

The only way to outcompete in a sea of clones is to secure exclusive access to a valuable resource they can't.

Intel with fabs. Dell with lean supply chains. The surviving hard drive and memory companies with scale.

I think IBM and Sun show what happens when you try to fight a stand-up brawl in a commodity space.


>That's why Intel realized that fab technology was the true differentiator.

But now the situation is completely reversed. Intel has faced all kinds of problems, costs, and delays due ultimately to the fact that they made a bad choice on their chip architecture but were forced to make it work because they invested so much in the fab.

What TSMC is fabbing for nvidia is working out really well, and if it was not nvidia could walk away without being stuck with billions of dollars of fab facilities they have to own forever.

edit: reversed is the wrong choice of words. It IS all about the fab, but Intel could not/did not accept that maybe someone else had the key differentiator now.


I think it's the other way around. The architecture was being limited by their fabs ability to yeild large chips and in the absence of any CPU perf pressure from AMD the natural push would lean more towards increacong graphics performance in order to push more pixels. As in I think Intel probably had the same yeild issues as everyone else ~10-32 nm but only Intel had the high margins small chip volume to make it profitable to ramp until Apple and TSMC happened.


The architecture is definitely far ahead of anyone else. When you look at Intel chips still being competitive despite manufacturing being a generation behind and with 1/6th the cache per core.

I'm an AMD shareholder and my biggest fear is Intel figuring out their manufacturing.


Maybe not recently, but in the years that cemented Intel dominance, there were many clones on the market.


Yeah, but never in an amount that was actually meaningful, only by a couple of non branded PC OEMs.


I think the parent only means that if Intel somehow shut down, decided to radically pivot or to close everything, because it is somewhat open, you still had alternatives and neither your code, your product, or your company would face insurmontable hardship or die because of it.


Jax is an implementation of numpy on GPU.


Have you heard about CuPy?


Yes. I won't bet my business on a one-vendor solution with a medium-sized community which might disappear at some point.

If CuPy supported NVidia and AMD, and was folded into Numpy, I'd buy the biggest, beefiest GPU I could find overnight.


What technology would you bet your business on then?

Today, you can write numpy code, and that runs on pretty much all CPUs from all vendors, with different levels of quality.

A one line change allows you to run all numpy code you write on nvidia GPUs, which at least today, are probably the only GPUs you want to buy anyways.

In practice, you would probably be also running your whole software stack on CPUs, at least for debugging purposes. So if you change your mind about using nvidia hardware at some point, you can just revert that one line change and go back to exclusively targeting CPUs. Or who knows, maybe some other GPU vendors might provide their numpy implementation by then, and you can just go from CuPy to ROCmPy or similar.

Either way, if you are building a numpy stack today, I don't see what you lose today from using CuPy when running your products on hardware for which that's available.


shrug I'll bet my business on waiting an extra 15 minutes for analytics code to run.

Seriously. There's little most businesses really needs that I couldn't do on a nice 486 running at 33MHz. Now, if a $5000 workstations gives even 5% improvement to employee productivity, that's an obvious business decision. That doesn't mean it's necessary for a business to work. So dropping $1000 on an NVidia graphics card, if things ran faster and there were no additional cost, would be a no-brainer.

There are additional costs, though.

And no, you can't just go back from faster to slower. Try running Ubuntu 20.04 on the 486 -- it won't go. Over time, code fills up resources available. If I could take a 2x performance hit, it'd be fine. But GPUs are orders-of-magnitude faster.


Please, show us how to train Alexa or BERT on a 486. That'll definitely win you the Turing and Gordon Bell prices, and probably the Peace Nobel price for all those power savings!


Please show me a business (aside from Amazon, obviously) who needs Alexa.

Most businesses need a word processor, a spreadsheet, and some kind of database for managing employees and inventory. A 486 does that just fine.

Most businesses derive additional value from having more, but that's always an ROI calculation. ROI has two pieces: return, and investment. Basic business analytics (regressions, hard-coded rules, and similar) have high return on low investment. Successively complex models typically have exponentially-growing complexity in return for diminishing returns. At some point, there's a breakpoint, but that breakpoint varies for each business.

If the goal is to limit GPGPU to businesses whose core value-add is ML (the ones building things like Alexa), NVidia has done an amazing job. If the goal is to have GPGPU as common as x86, NVidia has failed.


> Please show me a business (aside from Amazon, obviously) who needs Alexa.

I'll bite.

Have you ever been getting a haircut, and the hair dresser had to stop to pick up the phone to make an appointment?

Have you ever go to actually pick up a pizza at a small pizzeria and noticed that from 4 employees, 3 are making pizzas, and one is 99% of their time on the phone?

Every single business that you've ever used in your life would be better off with an Alexa that can handle the 99% most common user interactions.

In fact, even small pizzerias and hair salons nowadays are using third-party online booking systems with chat bots. Larger companies are able to turn a 200 people call center into a 20 man operation by just using an Alexa to at least identify customers and resolve the most common questions.


CuPy has experimental support for Rocm.


Awesome! I did not know that.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: