Control bits as in ARM and x86 force serialization of arithmetic due to the RW dependency in every instruction on that bit. There are some tricks but it still needs tracking. For higher order superscalar or out of order processors this gets annoying.
Yes, the old, old way of having a single condition code register or the like (which dates back 40+ years) doesn't work well these days.
I like the Mill CPU approach, where every "register" (it doesn't have named registers actually) has the full set of status bits associated with it, and not just for overflow. Things like "not a result" (NaR), which can represent the result of a failed speculative load for example (because the process doesn't have permission to read from that page, for example).
> I thought that compilers couldn't really use this effectively…
The status bits part in general, or the speculative load stuff?
They allegedly have all this working, privately. They haven't released any development tools or such to the public.
I've often toyed with the idea of writing an instruction-level simulator (as opposed to the RTL sim or whatever they have internally). But even sticking to the public information, I'd likely be infringing on their patents.
No. Control bits (status bits, flags, ...) get renamed just as registers get renamed.
Basically, if there's a bottleneck to x86 code, Intel has run into it, profiled for it and generally optimized around it both in their microarchitectures and in the their C compiler.
That's one of the tricks. But it doesn't solve the issue of clobbers, which Intel had to introduce new variants of ADD and MUL to solve. Named predicate registers make it all much easier for everyone.
I think what you’re saying is basically true but it’s a trade against density.
If you did overflow checks rarely then what you say is a very good point indeed. The key thing is just the frequency of this stuff in modern languages.