> Yes, but this just swaps the terminating conditional branch for an indirect branch at the start, no?
Depends how you write it. The number of iterations is ceil(log2(n)). GCC's __builtin_clz essentially computes this, and it has support on most major architectures.
Yes, you can calculate the number of levels, but with a dynamic size that isn't going to let you avoid a branch. Say you calculate there are 10 levels in the search. What do you do now?
Do you consider a jump a branch? My impression was always the point of avoiding branches is so you don't have mispredictions, but that shouldn't be a problem for unconditional branches.
Perhaps I've been a bit loose in my language (and not everyone is consistent with branch vs jump). I really should have said something like "non-constant jump". That is, in my theorem (haha) above, I mean you cannot have non-constant work without at least one non-constant jump.
A non-constant jump is anything that can jump to at least 2 different locations in a data-dependent way. On x86, those would be something like [2]:
- conditional jumps (jcc) aka branches
- indirect calls
- indirect jumps
Basically anything that either goes one of 2 ways based on a condition, or an indirect jump/call that can go to any of N locations.
Without one of those, you will execute the same series of instructions every time, so you can't do variable work [1].
So when you say "unconditional branches" I'm not sure if you are talking about direct branches, which always go to the same place (these are usually called jumps), or indirect branches, which (on x86) don't have an associated condition but can go anywhere since their target is taken from a register or memory location.
If you are talking about the former, I don't think you can implement your proposed strategy: you can't jump to the correct switch case with a fixed, direct branch. If you are talking about the latter (which I had assumed), you can – but it is subject to prediction and mispredicts in basically the same way as a conditional branch.
---
[1] Here, "work" has a very specific meaning: instructions executed. There are meaningful ways you can do less work while executing the same instructions: e.g., you might have many fewer cache misses for smaller inputs. However, at some point the instructions executed will dominate.
[2] There are more obscure ways to get non-constant work, such as self-modifying code, but these cover 99% of the practical cases.
Depends how you write it. The number of iterations is ceil(log2(n)). GCC's __builtin_clz essentially computes this, and it has support on most major architectures.