评论删除后,数据将无法恢复
dd if=/data/mysql/dbname/large_table.ibd of=/dev/null bs=16k iflag=directYou may find that even on commodity HDD servers, you can read more than 100MB/s multiplied by "number of drives" from disks. Over 1GB/s is not uncommon.
If primary key pages are fragmented, full table scan becomes very slow. Fig 1 illustrates the problem. After InnoDB reads leaf page #3, it has to read page #5230, and after that it has to read page #4. Page #5230 is far from page #3 and #4, so disk read ordering becomes almost random, not sequential. It is very well known that random reads on HDD is much slower than sequential reads. One very effective approach to improve random reads is using SSD, but per-GB cost on SSD is still more expensive than HDD so using SSD is not always possible.
Fig 1. Full table scan is not actually doing sequential reads
mysql> SELECT buf_warmup ("db1", "large_table"); /* loading into buf pool */ mysql> SELECT * FROM large_application_table; /* in-memory select */
buf_warmup() is a udf that reads entire ibd file of database "db1", table "large_table". It takes time to read the entire ibd file from disk, but reads are sequential so much faster than random reads. When I tested, I could get ~5x overall faster time than normal linear read ahead.
This proved that sequentially reading ibd files helped to improve throughput, but there were a couple of disadvantages:
This is illustrated at fig 2 below.
Fig 2: Logical Read Ahead
Logical Read Ahead (LRA) solved issues of Physical Read Ahead. LRA enables InnoDB to read only primary key pages (not reading seconday index pages), and to prefetch configurable number of pages (not entire table) at one time. And LRA does not require any SQL syntax changes.
We added two new session variables to make LRA work. One is "innodb_lra_size" to control how many MBs to prefetch leaf pages. The other is "innodb_lra_sleep" session variable to control how many milliseconds to sleep per prefetch. We tested around 512MB ~ 4096MB prefetch size and 50 milliseconds sleep, and so far we haven't encountered any serious (such as crash/stall/inconsistency) problem. These session variables should be set only when doing full table scan. In our case, mysqldump and some utility scripts (i.e. online schema change tool) turn logical read ahead on.
Another performance issue we noticed was that i/o unit size in InnoDB was only one page, even if doing readahead. 16KB i/o unit size is too small for sequential reads, and much less efficient than larger I/O unit size.
In 5.6, InnoDB uses Linux Native I/O (aio) by default. If submitting multiple consecutive 16KB read requests at once, Linux internally can merge requests and reads can be done efficiently. But unfortunately InnoDB submitted only one page i/o request at once. I filed a bug report #68659. As written in the bug report, on modern HDD RAID 1+0 environment, I could get more than 1000MB/s disk reads by submitting 64 consecutive pages requests at once, while I got only 160MB/s disk reads by submitting one page request.
To make Logical Read Ahead work better on our environments, we fixed this issue. On our MySQL, InnoDB submits many more page i/o requests before calling io_submit().
Table size | Without LRA | With LRA | Improvement |
10GB | 10 min 55.00 sec | 1 min 15.20 sec | 8.71x |
50GB | 52 min 44.42 sec | 6 min 36.98 sec | 7.97x |
Table size | Without LRA | With LRA | Improvement |
1GB | 7 min 33 sec | 24 sec | 18.8x |
评论删除后,数据将无法恢复
评论(15)
引用来自“宏哥”的评论
引用来自“eechen”的评论
引用来自“weipengfei”的评论
引用来自“eechen”的评论
“我希望这些特性能够在InnoDB中获得Oracle官方支持,至少是主要的MySQL分支。”
Oracle未必不收,Oracle可指望着MySQL在Windows上K掉SQL Server:
http://www.mysql.com/why-mysql/windows/index.html.zh
Oracle也不可能因MySQL的这点改善而活不成,说不定他们会内部自己实现,只在商业版提供有这个特性。Oracle是想通过社区版继续推广和扩大MySQL影响,通过商业版和技术支持把钱收回来。
就算Oracle不收,提交给MariaDB这个MySQL分支被接纳的可能性也很大,Google和淘宝就像MariaDB贡献过代码。
连数据库都没有见过
拿着菜刀喊着要打航母的玩意
一边YY去吧
引用来自“宏哥”的评论
引用来自“eechen”的评论
引用来自“weipengfei”的评论
引用来自“eechen”的评论
“我希望这些特性能够在InnoDB中获得Oracle官方支持,至少是主要的MySQL分支。”
Oracle未必不收,Oracle可指望着MySQL在Windows上K掉SQL Server:
http://www.mysql.com/why-mysql/windows/index.html.zh
Oracle也不可能因MySQL的这点改善而活不成,说不定他们会内部自己实现,只在商业版提供有这个特性。Oracle是想通过社区版继续推广和扩大MySQL影响,通过商业版和技术支持把钱收回来。
就算Oracle不收,提交给MariaDB这个MySQL分支被接纳的可能性也很大,Google和淘宝就像MariaDB贡献过代码。
连数据库都没有见过
拿着菜刀喊着要打航母的玩意
一边YY去吧
引用来自“eechen”的评论
引用来自“weipengfei”的评论
引用来自“eechen”的评论
“我希望这些特性能够在InnoDB中获得Oracle官方支持,至少是主要的MySQL分支。”
Oracle未必不收,Oracle可指望着MySQL在Windows上K掉SQL Server:
http://www.mysql.com/why-mysql/windows/index.html.zh
Oracle也不可能因MySQL的这点改善而活不成,说不定他们会内部自己实现,只在商业版提供有这个特性。Oracle是想通过社区版继续推广和扩大MySQL影响,通过商业版和技术支持把钱收回来。
就算Oracle不收,提交给MariaDB这个MySQL分支被接纳的可能性也很大,Google和淘宝就像MariaDB贡献过代码。
连数据库都没有见过
拿着菜刀喊着要打航母的玩意
一边YY去吧
引用来自“weipengfei”的评论
引用来自“eechen”的评论
“我希望这些特性能够在InnoDB中获得Oracle官方支持,至少是主要的MySQL分支。”
Oracle未必不收,Oracle可指望着MySQL在Windows上K掉SQL Server:
http://www.mysql.com/why-mysql/windows/index.html.zh
Oracle也不可能因MySQL的这点改善而活不成,说不定他们会内部自己实现,只在商业版提供有这个特性。Oracle是想通过社区版继续推广和扩大MySQL影响,通过商业版和技术支持把钱收回来。
就算Oracle不收,提交给MariaDB这个MySQL分支被接纳的可能性也很大,Google和淘宝就像MariaDB贡献过代码。
引用来自“eechen”的评论
“我希望这些特性能够在InnoDB中获得Oracle官方支持,至少是主要的MySQL分支。”