hash = FNV_offset_value
for each byte_of_data to be hashed
{
hash = hash XOR byte_of_data
hash = hash × FNV_prime
}
return hash
The pseudocode for seahash looks like this (with '×' as the wrapping multplier operator, and some simplification for padding if the data length in bytes is not a multiple of 8 bytes per word × 4 words in the hash state):
hash = {offset_1, offset_2, offset_3, offset_4}
for (int data_index = 0;
data_index < data.length_in_64_bit_words;
data_index = data_index + 4_words_in_hash)
{
for (int hash_index = 0; hash_index < 4; hash_index++)
{
// Mix in data
hash[hash_index] = hash[hash_index] XOR data[data_index + hash_index]
// Diffuse
hash[hash_index] = hash[hash_index] XOR (hash[hash_index] RSHIFT 32)
hash[hash_index] = hash[hash_index] × seahash_prime
hash[hash_index] = hash[hash_index] XOR (hash[hash_index] RSHIFT 32)
hash[hash_index] = hash[hash_index] × seahash_prime
hash[hash_index] = hash[hash_index] XOR (hash[hash_index] RSHIFT 32)
}
}
result = hash[0] XOR hash[1] XOR hash[2] XOR hash[3] XOR data.length_in_bytes
result = result XOR (result RSHIFT 32)
result = result × seahash_prime
result = result XOR (result RSHIFT 32)
result = result × seahash_prime
result = result XOR (result RSHIFT 32)
return result
FNV is operating on bytes of data, while seahash is operating on 64-bit words. A modern processor will be able to handle 64 bits at once. True, it can probably handle 8 bits independently in one instruction without having to create a temporary value, but it still needs to do more operations.
FNV is completely sequential. Until the first byte is hashed, no work can be done on the second byte. In seahash, as you observed, parallelism can be exploited. The second, third, and fourth bytes are all completely independent of the first byte, as bytes 6, 7, and 8 are independent of byte 5, and so on. You can have four independent threads each do a quarter of the work, and then put the result back together at the end.
So what you're saying is, is that this could be heavily optimized using OpenCL?
I might have to bookmark this as a project to take on. A heavily parallel hashing algorithm that takes advantage of OpenCL would immensely increase the efficiency wouldn't it?
(I'm dipping my toes in parallel processing as of late and am genuinely curious in this topic and question)
Is FNV algorithm limited in the same way as a ripple carry adder is then, where every operation relies on the previous result, and can't be (or maybe just isn't but could be) done in parallel due to this implementation?