评论删除后，数据将无法恢复

顶
*3*

加载中

The state-of-the-art in non-cryptographic hash functions has advanced rapidly in the last few years. When I did some searching this week I was happy to see that new cutting-edge hash functions had been released even since last time I looked into this 6 months or a year ago.

Non-cryptographic hash functions take a string as input and compute an integer output. The desirable property of a hash function is that the outputs are evenly distributed across the domain of possible outputs, especially for inputs that are similar. Unlike a cryptographic hash function, these functions are*not* designed to withstand an effort by an attacker to find a collision. Cryptographic hash functions have this property, but are much slower:
SHA-1 is on the order of 0.09 bytes/cycle whereas the newest non-cryptographic hash functions are on the order of 3 bytes/cycle. So non-cryptographic hashes are roughly 33x faster, at the cost of not being able to withstand attacks. Non-cryptographic hashes are most often used for hash tables.

Non-cryptographic hash functions take a string as input and compute an integer output. The desirable property of a hash function is that the outputs are evenly distributed across the domain of possible outputs, especially for inputs that are similar. Unlike a cryptographic hash function, these functions are

As an interesting aside,
there is a debate going on in the Lua community right now about what, if anything, should be done about the fact that Lua's hash function could theoretically be attacked to force its hash table implementation into its O(n) worst-case lookup performance. This could let an attacker DoS you if he is feeding you input that you are putting into a Lua hash table. The Lua authors are somewhat skeptical about how realistic this attack is (and whether it would be cheaper than other DoS alternatives), but are moving ahead anyway with a plan to generate a random seed at startup that the hash function will use. This is an interesting alternative to cryptographic hash functions that should be able to give you the same collision resistance as a cryptographic hash function (presuming you have an entropy source that can give you truly random bits), but at the cost of non-reproducible output.

Since there are lots of options out there for non-cryptographic hash functions and this number keeps expanding, I thought I'd summarize my knowledge of what is out there.

Since there are lots of options out there for non-cryptographic hash functions and this number keeps expanding, I thought I'd summarize my knowledge of what is out there.

More information about Bob's functions can be found on Wikipedia: Jenkins hash function.

If you're on 32-bit, MurmurHash looks like the clear winner since it's the only function faster than lookup3 that has a native 32-bit version. 32-bit machines could probably compile and run City and Spooky, but I would expect it to be much slower because the 64-bit math would have to be emulated.

On 64-bit machines it's hard to say which is best without further benchmarking. I'd be liable to prefer Spooky to City since the latter depends on the CRC32 instruction for speed which isn't available everywhere.

One other consideration is aligned vs. unaligned access. MurmurHash (unlike City or Spooky) comes in a variant that will only perform aligned reads, since on many architectures unaligned reads will crash or return the wrong data (unaligned reads are undefined behavior in C). City and Spooky both address the issue by copying the input data into aligned storage with memcpy(); Spooky does the memcpy() a block at a time (if ALLOW_UNALIGNED_READS is not defined), City does the memcpy() an integer at a time! On machines that can handle unaligned reads (like x86 and x86-64) the memcpy will be optimized away, but I did a test on my little ARM box and found that this:

Please let me know of any errors in this article.

#include <stdint.h> #include <string.h> int32_t read32_unaligned(const void *buf) { int32_t ret; memcpy(&ret, buf, 4); return ret; }compiles to this very inefficient code (this would be a single instruction on x86):

0: b500 push {lr} 2: 2204 movs r2, #4 4: b083 sub sp, #12 6: 4601 mov r1, r0 8: eb0d 0002 add.w r0, sp, r2 c: f7ff fffe bl 0 10: 9801 ldr r0, [sp, #4] 12: b003 add sp, #12 14: bd00 pop {pc}To conclude, MurmurHash still looks like the best option if you need 32-bit or aligned-only reads. CityHash and SpookyHash look to be faster on x86-64, but I would almost think of them as being specific to that architecture, since I'm not aware of other architectures that are both 64-bit and allow unaligned reads.

Please let me know of any errors in this article.

本文中的所有译文仅用于学习和交流目的，转载请务必注明文章译者、出处、和本文链接。

我们的翻译工作遵照 CC 协议，如果我们的工作有侵犯到您的权益，请及时联系我们。

加载中

删除一条评论

评论删除后，数据将无法恢复

取消

确定

深圳市奥思网络科技有限公司版权所有

粤ICP备12009483号
顶部

## 评论(0)