The part you're missing is the 15000 tests, multiplied by a new commit every 9 minutes, which in 8 working hours, is roughly 50 commit-test cycles, so 750,000 tests run in a day's timespan...
Edit: of course that assumes a peak commit rate matching or exceeding the commit-test cycle period. The point being that even a considerably low rate of failure in the testing mechanism could manifest itself as a blocked commit-test-deploy cycle at least once a day, hence the importance placed on rock-solid testing systems that should only ever fail when the tested code itself fails.
We empirically have on average 70 builds a day. The number is higher than your calculation because we don't all work 9-5, we're commiting frequently from around 8am to 9pm. We also run builds repeatedly overnight to flush out any intermittently failing tests we may have recently introduced. We'll run the builds as fast as they can go from 2am-4am.
So how often does a commit get checked in that causes a test (or tests) to fail?
It just seemed to me like you were bragging that tests get run over and over again. They only need to get run if any new code is committed, of course.
And what kind of commit is being checked in every 9 minutes? How big is the dev team? Seems like an awful lot of commits. Is each one a full-fledged feature / bug fix for the site, or are many 1-line changes to the code?
As I read it, the issue he's talking about is that it's easy to accidentally write a test that passes the first 100 000 times you write it, but then fails the next time because of a timeout that was set too low or something like that. A test like that can waste a lot of your time tracking down a nonexistent bug.
It's true that any particular test that spuriously fails one in a million times may never fail. But if you have tens of thousands of tests, and you do tens of test runs per day, you'll have a test spuriously fail once a day or so.
Let's say your team makes 25 commits per day.
25 * ~300 working days = 7,500 commits per year
That would take 133+ years to reach 1 in a million.
The more interesting metric to me is how often the build gets broken.