It's not just CUDA vs ROCm, ROCm has come a long way and is pretty compelling ri...

porphyra · on Oct 28, 2020

ROCm has come a long way but still has a long way to go. It still doesn't support the 5700 XT (or at least, not very well) --- only the Radeon Instinct and Vega are supported. You can find salty GitHub threads about this dating to the very start of when Navi was just released: https://github.com/RadeonOpenCompute/ROCm/issues/887#issueco...

And getting ROCm set up is still a buggy experience with tons of fiddling in the deep inner workings of Linux so it is nearly impossible for the average machine learning engineer to use.

It is compelling for certain bespoke projects like Europe's shiny new supercomputer, but for the vast majority of machine learning, it is totally unusable. By now in ML world the word "gpu" is synonymous with "nvidia".

subtypefiddler · on Oct 28, 2020

Full disclosure, European here and in our team everyone is found of Linux so not representative of your average MLEngineer.

We actually had more issues with nvidia drivers messing up newcomers' machines during updates than with setting up AMD GPUs, but then again n is small (and AMD GPUs were for playing around rather than real work).

Still, a Titan Xp has CUDA support and plenty of memory, but it's better, IME, to upgrade to a model with less memory but higher cuda compute and access to tensor cores.

ris · on Oct 28, 2020

Does ROCm even support Navi yet?

For the amount invested in the hardware development, the amount AMD have been investing in the software side has been shocking.