Assembly vs. Intrinsics (2014)

37ef_ced3 · on April 21, 2021

Here is an example of a program that would not have been written without Intel's GCC intrinsics:

Intrinsics are better than direct assembly because GCC can simplify, combine, and reorder instructions (e.g., lift them out of loops). GCC handles register allocation, etc. Intrinsics dramatically simplify the programmer's job

The GCC codegen is fantastic for NN-512, overall

GCC 8.3 and earlier make a few mistakes, like compiling FNMADD as xor-negation followed by FMADD (using an extra register for the xor-negation constant), but those problems have been fixed in GCC 9.1 and above

The only codegen mistake I see in GCC 10 is when I load 512 bits from memory, convert the low 256 bits from packed-half to packed-single, and do the same thing for the high 256 bits. GCC sometimes reloads the 512 bits from memory (despite still having those bits in a register). It doesn't harm performance much, but it seems dumb. Not sure why GCC does this

GCC can reduce the liveness range of an in-register value by moving the producing instruction and consuming instruction closer together. This can be a big help if you're writing code that just barely fits in the register file. For example, NN-512 produces many loops that use almost all of the 32 ZMM vector registers. GCC generally does a good job avoiding spills, if the programmer doesn't make the job too hard

In my opinion, properly written C intrinsics produce very good AVX-512 machine code, much more easily than if I wrote the assembly by hand. You can write much larger, more complex, fully vectorized programs when GCC helps you

Joeboy · on April 21, 2021

> The problem is that intrinsics are so unreliable that you have to manually check the result on every platform and every compiler you expect your code to be run on, and then tweak the intrinsics until you get a reasonable result. That's more work than just writing the assembly by hand.

Well that sounds very bad. Have things improved since this article was written? Are intrinsics best avoided?

volta83 · on April 21, 2021

> Have things improved since this article was written?

Yes.

> Are intrinsics best avoided?

No.

---

If you are writing assembly code by hand, you probably care about the quality of the generated code.

Intrinsics are lower effort, portable, and can and often do generate much better code than using inline assembly.

I disagree with the claim that verifying the assembly output of an intrinsics is more work than writing assembly code by hand. In my experience, it is significantly less work.

I also disagree with the post about what to do if an intrinsic doesn't generate good code.

"Tweak the intrinsic until you get a reasonable result" is probably the worst thing you can do, because once the intrinsic is fixed, your tweaks might prevent it from generating good code.

The two things you can and should do are:

- report the bug, so that it gets fixed (1-2 days for clang.. fixing an intrinsic is just adding a new "case" in a pattern matching table, writing down what instructions it should lower too... worst case you can fix this yourself if you know what it should lower to...),

- if you can't live with worse code till the next compiler release, use inline assembly.

This second point is super rare. Its just not worth it. Intrinsics are usually fixed in a couple of days if you report the bug, and the fixed intrinsic will be in the next compiler release in a couple of months if you can't use the nightlies. So unless you really need this now, this is often not worth doing.

usefulcat · on April 21, 2021

> Intrinsics are usually fixed in a couple of days

This is only helpful if everyone who builds the software is willing and able to use a bleeding edge version of a particular compiler. By comparison, using inline assembly will fix the problem for everyone, immediately, usually without anyone needing to use a different compiler or compiler version.

volta83 · on April 22, 2021

Using inline assembly is perfectly fine.

I don't think we disagree here. The only bad thing is trying to bend broken intrinsics into half-way doing what you want and complaining that it is too much work.

Don't do that.

---

Also, keep in mind that with inline assembly you often need one implementation per architecture and per compiler, since all compilers have slightly different syntax (its non-standard), i've seen multiple implementations even for the same compiler depending on version...

unwind · on April 21, 2021

Another perspective is of course that of the embedded developer, a camp I can count myself to.

In embedded software, it's not uncommon to have exactly one target for the software (commonly called "the target"). Sometimes the target changes due to components being end-of-lifed or so, but it's rare and slow.

In those situations, I have found intrinsics to be very helpful since they allow you to reason and talk about the software at a higher level (C is, after all, higher than assembly) and without making sure all developers on a team understand the inline assembly syntax. :)

It is still good practice to check the resulting code, especially as, if you're using intrinsics, chances are you're often thinking more or less in assembly, but you can do that once and be pretty sure you're getting the desired result.

The resulting code is of course also more portable, which can be helpful when you want to e.g. automate tests of code without external hardware dependencies such as data structures, utility functions, and so on.

CalChris · on April 21, 2021

He's right about assembly vs inline assembly (gcc asm). But something well done like Intel Intrinsics [1] specializes an intrinsic for target platforms. It's (a lot) more work on the intrinsic writer's part but then provides something of a cross platform abstraction for the programmer.

[1] https://software.intel.com/sites/landingpage/IntrinsicsGuide...

I think intrinsics are like C++ templates. Maybe you shouldn't be writing them unless you really know what you're doing.

klyrs · on April 21, 2021

C++ templates are extremely useful at a novice level. It's macros that I'd warn against

syrrim · on April 21, 2021

(2014)