The fallacy here is that the scale of the neural network used by Tesla is suffic...

aimkey · on June 22, 2021

It seems as if the people gobbling up the "Tesla has the data! Autopilot will keep getting better!" line have never trained a neural network in their life. Models converge. Loss stops decreasing, regardless of more incoming data. Extreme manual data cleaning effort becomes required to prevent overfitting. Model architecture has to change and hyper parameters have to be tweaked. Then you're back at square one as far as testing goes if you change any of those things.

The notion that Tesla's model HAS to keep improving simply because they will be able to pile on more (unlabeled!) data is laughably false. And, in fact, quite insulting to the intelligence of even the most casual ML engineers.

kortex · on June 22, 2021

> And, in fact, quite insulting to the intelligence of even the most casual ML engineers.

Exactly, casual ML engineers. The issue of plateauing tends to occur because there is no more novelty to be had in the data. What mega-experiments like GPT and similar have shown us is that actually you can keep adding novel data and keep improving the model. Kinda inelegant, yet effective. The problem is, most institutions can't add more novelty beyond a certain scale, since that usually means shoveling more money at data storage and compute, on top of the novelty collection.

Tesla merely has to open the money tap to get more of both compute and storage, and let the real-time data flow in.

gwern · on June 22, 2021

> Tesla merely has to open the money tap to get more of both compute and storage, and let the real-time data flow in.

And if you watch the other parts of the presentation, you'll see the bits about them buying clusters with 5k+ A100 GPUs. Presumably they intend to do something with those. Probably not streaming Fortnite concerts.

tokipin · on June 22, 2021

I would agree if their increase in data was linear, but it is increasing by orders of magnitude, which should have qualitative consequences for what they're able to accomplish as they claw their way through 9s. I don't see how it's possible to get progressively more 9s without scaling in both data and compute.

The point of the higher scale isn't just more data, it also makes it easier to solve the unbalanced data problem, because rarer and rarer scenarios will appear in large enough numbers to work with.

nickik · on June 22, 2021

> The notion that Tesla's model HAS to keep improving simply because they will be able to pile on more (unlabeled!) data

That the exact opposite of what they are doing.

It seems like you didn't watch the talk at all.

trhway · on June 22, 2021

>to pile on more (unlabeled!) data

given the nature of that data you can get a lot of unsupervised mileage, so to speak, out of it.

sumnuyungi · on June 22, 2021

You make it sound extremely manual and sequential when reality is anything but.

A team with funds like Tesla, Google, FAIR is going to be using NAS and have a continuous testing pipeline. Tesla has arguably the best environment for continuous testing which is the most difficult part of improving a model. Andrej even said in his talk that their supercomputer is in the top 5 for FLOPs.

SOTA on ImageNet for the past few years has been driven by pre-training on massive datasets. Vision transformers are increasingly more common and are extremely data-hungry.

daveguy · on June 22, 2021

I'd say the data that Tesla collects is of lower value, because it doesn't have sensor info from a different modality. Other companies are getting a good reference to ground truth for both camera to lidar and lidar to camera. I don't know how much more valuable accurate distance sensing over a 3d field is compared to not having it, but I so know it's more valuable.

It may be valuable enough to require a few petaflops less computing power.