The fallacy here is that the scale of the neural network used by Tesla is sufficient to capture the problem of driving given enough training. There is no guarantee that a reasonably priced neural network can encompass the task of driving.
Having training data beyond a certain point is overrated, and Tesla's advantage in gathering it is overstated. Other companies are capturing this data as well. Is there any indication that the data Tesla is collecting is of a higher value, or is it just more bytes?
It seems as if the people gobbling up the "Tesla has the data! Autopilot will keep getting better!" line have never trained a neural network in their life. Models converge. Loss stops decreasing, regardless of more incoming data. Extreme manual data cleaning effort becomes required to prevent overfitting. Model architecture has to change and hyper parameters have to be tweaked. Then you're back at square one as far as testing goes if you change any of those things.
The notion that Tesla's model HAS to keep improving simply because they will be able to pile on more (unlabeled!) data is laughably false. And, in fact, quite insulting to the intelligence of even the most casual ML engineers.
> And, in fact, quite insulting to the intelligence of even the most casual ML engineers.
Exactly, casual ML engineers. The issue of plateauing tends to occur because there is no more novelty to be had in the data. What mega-experiments like GPT and similar have shown us is that actually you can keep adding novel data and keep improving the model. Kinda inelegant, yet effective. The problem is, most institutions can't add more novelty beyond a certain scale, since that usually means shoveling more money at data storage and compute, on top of the novelty collection.
Tesla merely has to open the money tap to get more of both compute and storage, and let the real-time data flow in.
> Tesla merely has to open the money tap to get more of both compute and storage, and let the real-time data flow in.
And if you watch the other parts of the presentation, you'll see the bits about them buying clusters with 5k+ A100 GPUs. Presumably they intend to do something with those. Probably not streaming Fortnite concerts.
I would agree if their increase in data was linear, but it is increasing by orders of magnitude, which should have qualitative consequences for what they're able to accomplish as they claw their way through 9s. I don't see how it's possible to get progressively more 9s without scaling in both data and compute.
The point of the higher scale isn't just more data, it also makes it easier to solve the unbalanced data problem, because rarer and rarer scenarios will appear in large enough numbers to work with.
You make it sound extremely manual and sequential when reality is anything but.
A team with funds like Tesla, Google, FAIR is going to be using NAS and have a continuous testing pipeline. Tesla has arguably the best environment for continuous testing which is the most difficult part of improving a model. Andrej even said in his talk that their supercomputer is in the top 5 for FLOPs.
SOTA on ImageNet for the past few years has been driven by pre-training on massive datasets. Vision transformers are increasingly more common and are extremely data-hungry.
I'd say the data that Tesla collects is of lower value, because it doesn't have sensor info from a different modality. Other companies are getting a good reference to ground truth for both camera to lidar and lidar to camera. I don't know how much more valuable accurate distance sensing over a 3d field is compared to not having it, but I so know it's more valuable.
It may be valuable enough to require a few petaflops less computing power.
Having training data beyond a certain point is overrated, and Tesla's advantage in gathering it is overstated. Other companies are capturing this data as well. Is there any indication that the data Tesla is collecting is of a higher value, or is it just more bytes?