MySQL 索引最佳实践之问题反馈 已翻译 100%

oschina 投递于 2013/01/09 08:40 (共 19 段, 翻译完成于 01-23)
阅读 6145
收藏 116
9
加载中

I had a lot of questions on my MySQL Indexing: Best Practices Webinar (both recording and slides are available now) We had lots of questions. I did not have time to answer some and others are better answered in writing anyway.

Q: One developer on our team wants to replace longish (25-30) indexed varchars with an additional bigint column containing the crc64, and change the indexing to be on that column. That would clearly save indexing space. Is this a reasonable performance optimization. (Keep in mind that the prefix adaptive hashing would fail here, because the first 10 or so characters usually are the same). Of course UNIQUE index optimizations can no longer be applied either.

A: This is good optimization in many cases. When you apply it though remember hash can have collisions so you will need to have your queries do something like SELECT * FROM TBL WHERE hash=crc32(‘string’) AND string=’string’ The other thing you need to consider is string comparison in MySQL is case insensitive by default while hash comparison will be done case sensitive unless you lowercase string before hashing. I also would note 25-30 bytes length is rather short for such hack as BIGINT itself is 8 bytes and the difference in the index length with all overhead is not going to be huge. I think this technique is best when you’re working with 100 bytes+ strings. Is say bytes as it is string length at which it makes sense is collation specific.

已有 1 人翻译此段
我来翻译

Q: ORDER By optimization issues: select * from table where A=xxx and B between 100 and 200 order by B Very common for a date range to also need to be ordered. The question is how can one have optimized indexes and sorting in such a scenario, since inequality ends index usage.

Actually in this case index on (A,B) would work well. If however you would need to sort by some 3rd column, say C index (A,B,C) would not work as range will prevent sorting from using the index. In this case you can use trick mentioned in the presentation to convert sort to the union for small ranges.

已有 1 人翻译此段
我来翻译

Q: In the case of a junction table, would indexing on (foreignkey1,foreignkey2) AND on (foreignkey2,foreignkey1) be a good idea?

A: Yes. This is a good practice. Normally I’d do something like CREATE TABLE LINK (id1 int unsigned not null ,id2 int unsigned not null, PRIMARY KEY(id1,id2), KEY K(id2)) engine=INNODB; when table has to be traversed in both directions for different queries. This will use fast primary key for some queries and use key K as covering index for lookup in other direction.
For Innodb Table id2 is not needed pas part of second key as PRIMARY key is appended to it internally anyway. For MyISAM table you should use K(id2,id1) in the same case. Some people would prefer to define second key as UNIQUE this has benefits and drawbacks. Benefit being you can get extra optimizations by optimizer knowing index is UNIQUE the drawback is insert buffer will not be able to be used, which can be important for large, heavily written tables.

已有 1 人翻译此段
我来翻译

Q: in trick #1 will “WHERE a IN (2-4)” be worse then “WHERE a IN (2,3,4)”? Another word is range for IN clause better than BETWEEN?

A: IN(2-4) will not do what you’re implying here. 2-4 will be evaluated as math expression and the result will be IN(-2) which is not what you’re looking for.

已有 1 人翻译此段
我来翻译

Q: I have a primary index on an int (ID) and other indexes on columns idx1(X,A,B,C) idx2(Y,A,B,C) etc (there are 5) would I be better off making the primary A,B,C,ID and Having other indexes on one column, idx1(X) idx2(Y) etc?

A: I would wonder whenever it is best setup for 5 indexes to differ only by first column. Regarding changing primary key to include such column prefix it depends on what you’re looking for a lot. This will cause data clustering done by these columns which can be helpful if you’re doing a lot of range scans on what would be primary key but it also can slow down your inserts and make primary key significantly fragmented. I also would note there are some MySQL optimizer restrictions in how well it can deal with primary key appended to the index column, especially in such case as you’re suggesting. In the end I would seek for a lot of performance gains before I move to such unusual setup.

已有 1 人翻译此段
我来翻译

Q: Table1 has a primary key. Table2 joins to table1 using Table1′s primary key. Should table2 have an index on the field that is used to join the two tables?

A: The question in this case is how MySQL will execute the join. If it will fist lookup Table2 using some other index and when go to Table1 to lookup row by primary key when you do not need an index on a field which is used to join tables in table2

已有 1 人翻译此段
我来翻译

Q: In regards to extending an index being better than adding a new one: Let’s say I have a table named PO that has a primary key of PO # and 2 additional fields for vendor_id and order_id. If I have an index on vendor_id, order_id but my query is only selecting on vendor, will the index have any impact on the speed of the query?

A: If you extend the index from (vendor_id) to (vendor_id,order_id) you will make it 4 bytes longer (assuming order_id is int) which will impact your queries which only use vendor_id but unlikely significantly. It is likely to be a lot less expensive than having another index on (vendor_id,order_id) in addition to index on (vendor_id) alone. The cases when you really should worry about performance impact of extending index is when you increase its length dramatically, for example adding long varchar column. In such cases it indeed might be better to add another index.

已有 1 人翻译此段
我来翻译

Q: We have a database that has about 400GB of indexes. The indexes can’t fit in memory anymore. How does this affect performance?

A: Typically you do not need all your indexes to be in memory only those portions of them which are accessed frequently. The size of this “working set” can greatly depend on application and can range from 5% of total size or less to almost 100%. When your go from working set what fits in memory to the one which does not any more performance can degrade 10 or more times.

已有 1 人翻译此段
我来翻译

Q: In which cases should auto-increment be used as primary key?

A: Auto-increment is a good default primary key. You should pick something else if you have a good reason to do it – if you would benefit from data clustering in the different way or if you have some other natural candidate for primary key which gets a lot of lookups is frequent reason to use something else as primary key.

已有 1 人翻译此段
我来翻译

Q: How many indexes is too many?

A: There is a hard limit on amount of indexes you can have, which is 64 per table in recent MySQL versions. However it is often too many. Instead of thinking about hard limit of indexes I prefer to add indexes only in cases which provides positive impact to performance. At some point the gains from indexes you add will be less than performance loss because of having too many indexes.

已有 1 人翻译此段
我来翻译
本文中的所有译文仅用于学习和交流目的,转载请务必注明文章译者、出处、和本文链接。
我们的翻译工作遵照 CC 协议,如果我们的工作有侵犯到您的权益,请及时联系我们。
加载中

评论(11)

g
grey256
“另外还需要注意的是 MySQL 里的字符串比较默认是不区分大小写的,对于哈希字符串也是如此”
我看mysql手册(5.7)里面的例子 crc32('MYSQL') 得到3259397556,而crc32('mysql')得到的是2501908538,自己测了下大小写是有影响的(5.5)。
Pader
Pader

引用来自“苏翰”的评论

第二个问题为什么“复合索引 (A,B,C) 就会导致排序无法使用索引”?

因为B的 BETWEEN..IN 得到的是一系列的值,而不是单一的值,而按照BTREE的原理,C是B的子树,那么查到B的时候就得到的是一堆C的子树,这时候C就失去作用了。大家可以去多看看BTREE相关的知识。

这里讲到的“将排序转成小范围数据的联合来处理”不知道是如何做的。MySQL 在实现这种 BETWEEN..IN 和 IN() 的范围查询后再对结果集进行排序是一个软肋,但这也并不是 MySQL 的缺陷,而是所有数据库都必备的BTREE索引的一个软肋。
pyruby
pyruby
同楼上问
苏生不惑
苏生不惑
第二个问题为什么“复合索引 (A,B,C) 就会导致排序无法使用索引”?
pyruby
pyruby
文章中最好再举几个更实际点例子,就更好理解了
今天天气不错
今天天气不错
使用BETWEEN的写法转换成IN来查询日期段,以天为单位那好点,如果存储的时间戳精确到秒,同时范围在一周,是罗列出in(所有的秒)快还是between快?前者构造的sql不知道有多大。。。数据不会堵塞?
恺哥
恺哥

引用来自“Khiyuan”的评论

引用来自“恺哥”的评论

感觉老外说话真啰嗦,那些啰嗦的话我个人觉得不翻译为好,翻译出来反倒不通顺且迷惑

我也有同感!
但是翻译的时候,却又不能随便的省略掉,因为可能会造成前后译文的脱节,也生怕遗漏掉作者的本意。
翻译,本来就不是纯粹的写作,其旨为传递作者的思考结果。
特别是技术类翻译,又与文学类不同,文学类达意即可,技术类,还是兢兢业业为好。

赞成"翻译,本来就不是纯粹的写作,其旨为传递作者的思考结果"这句话
K6F
K6F

引用来自“恺哥”的评论

感觉老外说话真啰嗦,那些啰嗦的话我个人觉得不翻译为好,翻译出来反倒不通顺且迷惑

我也有同感!
但是翻译的时候,却又不能随便的省略掉,因为可能会造成前后译文的脱节,也生怕遗漏掉作者的本意。
翻译,本来就不是纯粹的写作,其旨为传递作者的思考结果。
特别是技术类翻译,又与文学类不同,文学类达意即可,技术类,还是兢兢业业为好。
华兹格
华兹格

引用来自“sunyh”的评论

翻译的文章 为啥不 列出 原文 好对照啊。

你可以点右上角的:中英文对照 啊,有的!
sunyh
sunyh
翻译的文章 为啥不 列出 原文 好对照啊。
返回顶部
顶部