I thought the title meant it was using Altivec or SSE, but it's merely operating on a chunk of 4 bytes at a time dealing with misaligned data up front. Still a good article, despite my initial disappointment.
A similar article which originally taught me these tenants is:
A similar article which originally taught me these tenants is:
http://rentzsch.com/papers/straightenUpAndFlyRight