x 2 (counting sparse as dense) = 624 tflops
x 8 GPUs = 5 "pflops"
The missing 8x you are looking for is just because tensorcore math is much faster than their normal fma path.
x 2 (counting sparse as dense) = 624 tflops
x 8 GPUs = 5 "pflops"
The missing 8x you are looking for is just because tensorcore math is much faster than their normal fma path.