I had a lot of questions on my MySQL Indexing: Best Practices Webinar (both recording and slides are available now) We had lots of questions. I did not have time to answer some and others are better answered in writing anyway.
Q: One developer on our team wants to replace longish (25-30) indexed varchars with an additional bigint column containing the crc64, and change the indexing to be on that column. That would clearly save indexing space. Is this a reasonable performance optimization. (Keep in mind that the prefix adaptive hashing would fail here, because the first 10 or so characters usually are the same). Of course UNIQUE index optimizations can no longer be applied either.
A: This is good optimization in many cases. When you apply it though remember hash can have collisions so you will need to have your queries do something like SELECT * FROM TBL WHERE hash=crc32(‘string’) AND string=’string’ The other thing you need to consider is string comparison in MySQL is case insensitive by default while hash comparison will be done case sensitive unless you lowercase string before hashing. I also would note 25-30 bytes length is rather short for such hack as BIGINT itself is 8 bytes and the difference in the index length with all overhead is not going to be huge. I think this technique is best when you’re working with 100 bytes+ strings. Is say bytes as it is string length at which it makes sense is collation specific.