But why that whole mess with function pointers? Where do they have the key advan...

asveikau · on Aug 1, 2013

Let's look at the signature of qsort() from libc:

   void qsort(void *base, size_t nmemb, size_t size,
           int (*compar)(const void *, const void *));

qsort is a function inside of libc.so. It's already been compiled. It doesn't know about the "compar" pointer you're going to pass it. That function might not even exist yet.

qsort doesn't know if you're sorting integers, sorting strings, sorting struct foo which is still being ironed out. It knows how to sort arrays. You give it the size of each element, the number of elements, and a function pointer that it will call to compare elements. It will call you back and you can stick your data-structure-specific knowledge into that function.

(This turns out to be a lousy way to sort arrays, by the way. If you used a C++ template instead, the comparison function could be inlined directly in the sort algorithm, which gives you better object code. But you lose the possibility that the caller and callee exist in different modules, for example.)

exDM69 · on Aug 1, 2013

> qsort is a function inside of libc.so. It's already been compiled.

Unfortunately, this is also why it is so damn slow. Try comparing C++ std::sort with C qsort, the performance gap is HUGE. The reason is that the "function pointer" std::sort gets inlined but qsort will actually invoke a function call via function pointer.

If you'd move qsort to a header file as an inline function, the performance problem would go away.

asveikau · on Aug 1, 2013

This is one thing I don't understand about HN sometimes. I always get replies that seen to angrily "disagree". But I've already made your point in my parenthetical remark.

There are of course other times where the function pointer overhead doesn't matter as much and the modularity of not having things as templates is desirable. Sorting is just a bad example.

Radle · on Aug 1, 2013

thanks, that helped me out <3

derefr · on Aug 1, 2013

Function pointers are useful anywhere you don't know what function you're going to call until runtime (and yet more useful still when the number of functions you could be calling is unlimited or extensible.)

Sure, you can get away with, say, implementing a parser with a big switch statement that calls various parse_this() or parse_that() functions, depending on what kind of token you hit. On the other hand, you could instead keep a hash table with token-type keys and function-pointer values. Then instead of O(n) cmp/jnz operations (as a switch statement boils down to), you get O(1) for the hash function, and no processor branch-prediction penalty.

Where it really gains advantage is callbacks: where you write a function for a library (low-level code; mechanism) which can call a user-provided function (high-level code; policy) to do something. The low-level isn't supposed to be aware of the high-level; you shouldn't have to recompile the library every time you have a new callback to pass to it. So function pointers are ideal for this. You call the function, pass it the address of your function, and then it jumps to it to get some answer whenever it needs it.

Think on this for a little bit, and you might be able to invent the traditional higher-order functions: things like map, filter, and fold, which let you do things (adding one to each element; making a new list with just the elements that are even; summing the elements together) by just specifying the policy of what you want to do with each item, without worrying about the mechanism of how it's iterating through the list, keeping temporary values, etc.

Thinking further, you might realize that if you can read a file into executable memory at runtime using some OS function, then you could put native machine code in that file, along with a table specifying function names and the offsets where those functions start in the file. You could read this table in, add the pointer representing where the file starts in memory, to the offset from the table representing where the machine-code is in the file... and bam, you've got a function pointer! Now you can call it. That's a plugin system! (These files containing symbol tables and machine code are called "Shared Object-code" or ".so" files--or Dynamically-Linked Libraries (".dll"s) in Windows--and the function to load them is dlopen(). You don't have to do the math yourself; it's abstracted away by the mechanism--but that's what's really going on there. It's just a plain-old file, mapped into memory, and then you get an address into that memory and call it.)

I would say that the ultimate use of function pointers, though, is a tracing JIT (the JIT being short for "Just In Time".)

In some languages (themselves implemented on top of C), the language's compiler is built into the language's runtime; you can take code specified in source form, or as bytecode, and pass it as a regular old string to a function, which will turn it into native machine code, load it into (an executable range of) memory, and then return you a function pointer to it.

Given this, the language can be based around an interpreter that normally just reads source- or byte-code directly. Interpreters are great, for the most part--they're easy to program; shorter and cleaner than compilers in source form; they frequently fit entirely in processor-cache in compiled form, whereas globs of native code don't; and they don't tend to branch much. If you've never implemented one, they tend to just be a short little "read instruction, parse it into an op and arguments, look up what function that op translates to (via the hash-table-of-function-pointers method above), pass that function the arguments, repeat" loop.

The only thing is, if you have to do that over and over for the same 1000 instructions, the "read, interpret, jump, return" loop does add overhead--especially since each of these operations has to execute separately and serially, where native instructions might get optimized together into a single clever instruction like x86's "lea" (fused multiply-add followed by load-from-memory, all in a single CPU cycle!)

So, if you want that performance boost for your tight inner-loop code, you could write a separately-compiled module in a lower-level language, and hook it into your code through the C ABI (as above--plugin system = dlopen = more function pointers!) But that's a big jump in complexity. Instead, if you just hook a JIT to your interpreter, the result is only slightly more complex than a plain interpreter.

A JITing interpreter does the same loop, but it also keeps statistics on how often it executes each piece of your code. When it notices that some piece of code is getting interpreted over and over a lot, it'll pass it into the JIT; get back a function pointer to real, native, optimized code; and then patch its in-memory representation of the code you fed it, so instead of saying "A, B, [C, D, E], F", where [C, D, E] is the block that's getting executed all the time, it'll say "A, B, jump to [this native address] and run what's there, F".

That single instruction will transfer control from the interpreter to the code compiled by the JIT, through the function pointer. When the compiled function is done, control comes back, and the interpreter steps to the next instruction like normal. Still easy, still simple, but way faster where and when you need it. Neat, huh?

Radle · on Aug 1, 2013

With the help of half an litre Monster Energy, I even managed to understand most of what you explained! (I am an no skilled script-kiddo, that's trying to increase...)

Good explanation and thanks for the work done, that shall help me a lot later on!

modarts · on Aug 1, 2013

This is the kind of stuff I come to HN for. Thank you for helping me finally realize what .dll's really are.

numeromancer · on Aug 1, 2013

This reminds me of how ECL (Embedded Common Lisp) loads lisp modules; it translates them to C, compiles them to PIC with a C compiler, and dynamically loads the resulting .o file.

fnayr · on Aug 1, 2013

Wow, really helpful to give some motivation for importance.

Radle · on Aug 1, 2013

thanks!

gregfjohnson · on Aug 1, 2013

One great example of function pointers is in C++ when implementing inheritance. Say you have an array of Polygons, and each one has an Area() method. It turns out that some of them are Triangles, some are Squares, and some are Hexagons. polygon[i]->Area() calls a different function for each of these, and this is implemented by storing function pointers in "vtables" for each of the Polygon objects. In C++ this happens behind the scenes (you don't program this directly), but in C (or C++ for that matter if you needed to), you can get a very similar effect by storing function pointers in structs. The linux kernel makes extensive use of this programming style in driver code.

exDM69 · on Aug 1, 2013

You use a function pointer when you don't know at compile time what function you are going to call at runtime. An example is plugins: you load a function from a dll (or shared object) and you get a function pointer.

When dealing with native code, functions are just piles of instructions in memory (with executable bit set in memory protection) that comes from the "text" section of your binary file. The code is stored in memory and therefore has an address, through which they can be reached.

smtddr · on Aug 1, 2013

If you ever seen some of the C code when hacking the Sony PSP slims for the first time; particulary Davee & Bubbletune with "ChickHEN" stuff, knowing how function pointers work is required[1].

1. http://forums.exophase.com/threads/tiff-exploit-hen-informat...

Disclaimer(?): The TRFyuki here, is me. :)

dllthomas · on Aug 1, 2013

If you need to determine which function to call dynamically, in an extensible way. Take a signal handler or other callback. There are more sophisticated uses, but that seems a clear and straightforward example.