You're probably right. I took 500ms for 1B iterations and saw that you're looking at ~0.25ns a call, and that seemed a bit low.
However, based on your code, you ran it 100M times, not 1B (1e8 vs 1e9). That changes it to 2.5ns per call. I ran the code on my machine (similar to yours, Macbook Pro 2.66ghz) with Clojure 1.2.0, and I got just over 2500ms for 1e8 iterations, which is about 12.5ns per call.
For comparison, looping 1e8 times with two calls to empty functions in a static language takes ~639ms, gives me ~3ns per call.
So, you can see why my first suspicion was that the JVM was doing something like just inlining the methods and avoiding the call altogether. Considering the differences in our reported numbers, you may have a newer JVM than me, and if it is beating simple CALL instructions, it must be inlining them or avoiding some of the looping.
For comparison, looping 1e8 times with two calls to empty functions in a static language takes ~639ms, gives me ~3ns per call. So, you can see why my first suspicion was that the JVM was doing something like just inlining the methods and avoiding the call altogether. Considering the differences in our reported numbers, you may have a newer JVM than me, and if it is beating simple CALL instructions, it must be inlining them or avoiding some of the looping.