如何进行高效模糊查询

牛牛牛牛 发布于 2012/08/09 15:10
阅读 1K+
收藏 2

现有一个关键字表有300多W行的数据,对类目和关键字分别进行了索引

在进行模糊查询的时候发现特别慢

原因是模糊查询的时候不走索引,查的是全表

 select * from keyword where keyword  like '%oschina%' 

网上找了一些资料用半匹配 

 select * from keyword where keyword  like 'oschina%'  // 这样走索引

 select * from keyword where keyword  like '%oschina'  // 这样走全表

 用半匹配的话明显精度不够

 

加载中
0
牛牛牛牛
牛牛牛牛

@红薯 经过几天的性能与精准度测试,最后还是选择了mysql的fulltext索引

主要解决点

1、中文转化为拼音,解决不能搜索汉字的问题

2、配置my.ini(my.cnf) 增加参数ft_min_word_len = 2,  解决2个字母不能搜索的问题

3、嵌套like %keyword%  过滤非关键字相关的结果

红薯
红薯
很好 :) 可以把过程跟大家分享下
0
红薯
红薯
这种还是用全文搜索引擎吧,类似 lucene 之类
红薯
红薯
回复 @牛牛牛牛 : 没用过
牛牛牛牛
牛牛牛牛
@红薯 mysql使用fulltext索引有什么要注意的?
牛牛牛牛
牛牛牛牛
恩,去试试看
0
牛牛牛牛
牛牛牛牛

@红薯 这几天用了lucene3.6进行了一个测试

测试过程发现有一个问题就是分词,我本身的关键字就相当于词 

用了StandardAnalyzer 与IKAnalyzer分词之后查询的结果就不尽人意了

比如查:开源中国 

“开源软件在中国越来越... ”  ”中国的开源软件越来越....“

有什么办法在建立索引的时候不分词,然后查询的时候类似sql中的like一样?

红薯
红薯
这不是全文搜索要做的哦,全文搜索都是基于分词的
0
牛牛牛牛
牛牛牛牛
mysql> show index from keyword_new;
+-------------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table       | Non_unique | Key_name     | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| keyword_new |          0 | PRIMARY      |            1 | id          | A         |     3199258 |     NULL | NULL   |      | BTREE      |         |               |
| keyword_new |          1 | show_count   |            1 | keyword     | A         |     1066419 |     NULL | NULL   |      | BTREE      |         |               |
| keyword_new |          1 | show_count   |            2 | show_count  | A         |     3199258 |     NULL | NULL   |      | BTREE      |         |               |
| keyword_new |          1 | cid_index    |            1 | cid         | A         |        3329 |     NULL | NULL   |      | BTREE      |         |               |
| keyword_new |          1 | pinyin_index |            1 | pinyin      | NULL      |           1 |     NULL | NULL   |      | FULLTEXT   |         |               |
+-------------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
5 rows in set (0.00 sec)

mysql> EXPLAIN 
    -> select keyword,
    ->        max(show_count) as show_count , 
    ->    max(b.estimate_show_count)  as estimate_show_count ,
    ->        max(change_show_count) as change_show_count,
    ->  max(click_count) as  click_count,
    ->   max(click_rate) as click_rate,
    ->  max(click_tran_rate)as  click_tran_rate,
    ->  max(compete_count) as compete_count ,
    ->  max(compete_avg_price) as compete_avg_price   from(
    -> select *  from (
    -> select  * from  keyword_new where  MATCH   (pinyin)   
    -> AGAINST ('+"lian yi qun"'))s
    -> where s.keyword like '%连衣裙%')  
    -> b GROUP BY keyword ORDER BY show_count desc ;
+----+-------------+-------------+----------+---------------+--------------+---------+------+-------+---------------------------------+
| id | select_type | table       | type     | possible_keys | key          | key_len | ref  | rows  | Extra                           |
+----+-------------+-------------+----------+---------------+--------------+---------+------+-------+---------------------------------+
|  1 | PRIMARY     | <derived2>  | ALL      | NULL          | NULL         | NULL    | NULL |  2812 | Using temporary; Using filesort |
|  2 | DERIVED     | <derived3>  | ALL      | NULL          | NULL         | NULL    | NULL | 54885 | Using where                     |
|  3 | DERIVED     | keyword_new | fulltext | pinyin_index  | pinyin_index | 0       |      |     1 | Using where                     |
+----+-------------+-------------+----------+---------------+--------------+---------+------+-------+---------------------------------+
3 rows in set (0.48 sec)

mysql> 
mysql> EXPLAIN 
    -> select keyword,
    ->        max(show_count) as show_count , 
    ->    max(b.estimate_show_count)  as estimate_show_count ,
    ->        max(change_show_count) as change_show_count,
    ->  max(click_count) as  click_count,
    ->   max(click_rate) as click_rate,
    ->  max(click_tran_rate)as  click_tran_rate,
    ->  max(compete_count) as compete_count ,
    ->  max(compete_avg_price) as compete_avg_price   from(
    -> select *  from (
    -> select  * from  keyword_new where  MATCH   (pinyin)   
    -> AGAINST ('+"iphone"'))s
    -> where s.keyword like '%iphone%')  
    -> b GROUP BY keyword ORDER BY show_count desc ;
+----+-------------+-------------+----------+---------------+--------------+---------+------+------+---------------------------------+
| id | select_type | table       | type     | possible_keys | key          | key_len | ref  | rows | Extra                           |
+----+-------------+-------------+----------+---------------+--------------+---------+------+------+---------------------------------+
|  1 | PRIMARY     | <derived2>  | ALL      | NULL          | NULL         | NULL    | NULL | 2379 | Using temporary; Using filesort |
|  2 | DERIVED     | <derived3>  | ALL      | NULL          | NULL         | NULL    | NULL | 2388 | Using where                     |
|  3 | DERIVED     | keyword_new | fulltext | pinyin_index  | pinyin_index | 0       |      |    1 | Using where                     |
+----+-------------+-------------+----------+---------------+--------------+---------+------+------+---------------------------------+
3 rows in set (0.02 sec)

mysql> EXPLAIN 
    -> select keyword,
    ->        max(show_count) as show_count , 
    ->    max(b.estimate_show_count)  as estimate_show_count ,
    ->        max(change_show_count) as change_show_count,
    ->  max(click_count) as  click_count,
    ->   max(click_rate) as click_rate,
    ->  max(click_tran_rate)as  click_tran_rate,
    ->  max(compete_count) as compete_count ,
    ->  max(compete_avg_price) as compete_avg_price   from(
    -> select *  from (
    -> select  * from  keyword_new where  MATCH   (pinyin)   
    -> AGAINST ('+"ping guo"'))s
    -> where s.keyword like '%苹果%')  
    -> b GROUP BY keyword ORDER BY show_count desc ;
+----+-------------+-------------+----------+---------------+--------------+---------+------+-------+---------------------------------+
| id | select_type | table       | type     | possible_keys | key          | key_len | ref  | rows  | Extra                           |
+----+-------------+-------------+----------+---------------+--------------+---------+------+-------+---------------------------------+
|  1 | PRIMARY     | <derived2>  | ALL      | NULL          | NULL         | NULL    | NULL |  7964 | Using temporary; Using filesort |
|  2 | DERIVED     | <derived3>  | ALL      | NULL          | NULL         | NULL    | NULL | 41930 | Using where                     |
|  3 | DERIVED     | keyword_new | fulltext | pinyin_index  | pinyin_index | 0       |      |     1 | Using where                     |
+----+-------------+-------------+----------+---------------+--------------+---------+------+-------+---------------------------------+
3 rows in set (0.35 sec)

 

mysql> 
mysql> EXPLAIN 
    -> select keyword,
    ->        max(show_count) as show_count , 
    ->    max(b.estimate_show_count)  as estimate_show_count ,
    ->        max(change_show_count) as change_show_count,
    ->  max(click_count) as  click_count,
    ->   max(click_rate) as click_rate,
    ->  max(click_tran_rate)as  click_tran_rate,
    ->  max(compete_count) as compete_count ,
    ->  max(compete_avg_price) as compete_avg_price   from(
    -> select *  from (
    -> select  * from  keyword_new where  MATCH   (pinyin)   
    -> AGAINST ('+"nan zhuang"'))s
    -> where s.keyword like '%男装%')  
    -> b GROUP BY keyword ORDER BY show_count desc ;
+----+-------------+-------------+----------+---------------+--------------+---------+------+--------+---------------------------------+
| id | select_type | table       | type     | possible_keys | key          | key_len | ref  | rows   | Extra                           |
+----+-------------+-------------+----------+---------------+--------------+---------+------+--------+---------------------------------+
|  1 | PRIMARY     | <derived2>  | ALL      | NULL          | NULL         | NULL    | NULL |   3483 | Using temporary; Using filesort |
|  2 | DERIVED     | <derived3>  | ALL      | NULL          | NULL         | NULL    | NULL | 151537 | Using where                     |
|  3 | DERIVED     | keyword_new | fulltext | pinyin_index  | pinyin_index | 0       |      |      1 | Using where                     |
+----+-------------+-------------+----------+---------------+--------------+---------+------+--------+---------------------------------+
3 rows in set (1.19 sec)

附上执行计划
返回顶部
顶部