those registers should be used before going into the r8~r15, since the "extended" registers require a prefix byte to the instruction every time they're used.
It's hard to argue against "shorter", but since the modern processors this code targets all have a decoded instruction micro-op cache, there is no front-end bottleneck and the instruction length doesn't actually slow anything down.
Also, the four registers you mention are all callee-save on Windows (just RBX and RBP for Mac and Linux) , so avoiding them means you don't have to save and restore them. Given a choice, use a scratch register. That said, R12-R15 are callee-save too, so this may not apply here.
Personally, I dislike the messiness of the historical names when using them for non-historical purposes. I think if it was more acceptable to alias them to R1, R2, etc I'd prefer them more than I do.
but since the modern processors this code targets all have a decoded instruction micro-op cache, there is no front-end bottleneck and the instruction length doesn't actually slow anything down.
Right, this code is running entirely in the core and that's part of what makes it so fast.
Also, the four registers you mention are all callee-save on Windows (just RBX and RBP for Mac and Linux) , so avoiding them means you don't have to save and restore them.
Callee-save and caller-save are just a calling convention; and one of the advantages of using Asm is you don't have to care about calling conventions except if you're interfacing with some other language, which isn't the case here - it's pure Asm. There's not even a single function call in it.
Besides, saving and restoring those registers (if you really need to) only takes 4 bytes each (2 in 32-bit mode) - a push and a pop. This tradeoff pays off if you're going to use them in more than 4 instructions.
It's hard to argue against "shorter", but since the modern processors this code targets all have a decoded instruction micro-op cache, there is no front-end bottleneck and the instruction length doesn't actually slow anything down.
Also, the four registers you mention are all callee-save on Windows (just RBX and RBP for Mac and Linux) , so avoiding them means you don't have to save and restore them. Given a choice, use a scratch register. That said, R12-R15 are callee-save too, so this may not apply here.
Personally, I dislike the messiness of the historical names when using them for non-historical purposes. I think if it was more acceptable to alias them to R1, R2, etc I'd prefer them more than I do.