In contrast, GNU grep uses libc’s memchr, which is standard C code with no explicit use of SIMD instructions. However, that C code will be autovectorized to use xmm registers and SIMD instructions, which are half the size of ymm registers.
Drats, you're totally right. It's easy to mess up that kind of thing.
Thankfully, it looks like my analysis remains mostly unchanged. I don't see any AVX2 in there (and indeed, I didn't when I looked at the profile either, in contrast to Go's implementation).
I updated the blog, thanks again for the clarification.
I don't think this is correct. glibc has architecture specific hand rolled (or unrolled if you will lol) assembly for x64 memchr. See here: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86...