关于sphinx大数据

微笑ZD 发布于 2015/12/11 09:20
阅读 891
收藏 1

@张小农 你好,想跟你请教个问题:

最近在学习sphinx,遇到了一些问题,刚看到你发的文章《sphinx+mysql+php 秒杀大数据》http://my.oschina.net/rookier/blog/406140?p=1, 相信你能帮助我解决问题;

我现在有个数据表<log_merger>,里面有将近2亿条数据(根据时间的推移可能会更多),和你的55G大数据还相差甚远,但是表中有14个字段,恰好这14个字段在后期都有可能作为查询条件,下面是属性:

sql_attr_uint = logId
sql_attr_uint = outId
sql_attr_uint = matId
sql_attr_uint = memId
sql_attr_uint = worId
sql_attr_string = adId
sql_attr_string = temId
sql_attr_string = extId
sql_attr_uint = plat
sql_attr_uint = type
sql_attr_uint = draId
sql_attr_uint = proId
sql_attr_timestamp = cTime
sql_attr_string = reqIp



sphinx.conf
source config{
    type            = mysql
    sql_host        = localhost
    sql_user        = root
    sql_pass        = xxxxx
    sql_db          = db_log
    sql_port        = 3306
    sql_query_pre = SET NAMES utf8

    sql_query_pre = SET SESSION query_cache_type=OFF

    sql_ranged_throttle = 100

    sql_attr_uint = logId
    sql_attr_uint = outId
    sql_attr_uint = matId
    sql_attr_uint = memId
    sql_attr_uint = worId
    sql_attr_string = adId
    sql_attr_string = temId
    sql_attr_string = extId
    sql_attr_uint = plat
    sql_attr_uint = type
    sql_attr_uint = draId
    sql_attr_uint = proId
    sql_attr_timestamp = cTime
    sql_attr_string = reqIp
}

source data : config{
    sql_query = SELECT logId,outId,draId,proId,matId,memId,worId,adId,temId,extId,plat,reqIp,type,UNIX_TIMESTAMP(LEFT(cTime,10)) as cTime FROM `log_merger`
}

indexer{
    mem_limit       = 1023M
}

searchd{
    listen          = 9312
    log         = /usr/local/sphinx/var/log/searchd.log
    query_log       = /usr/local/sphinx/var/log/query.log
    read_timeout        = 5
    client_timeout      = 300
    max_children        = 30
    persistent_connections_limit    = 30
    pid_file        = /usr/local/sphinx/var/log/searchd.pid
    seamless_rotate     = 1
    preopen_indexes     = 1
    unlink_old      = 1
    mva_updates_pool    = 1M
    max_packet_size     = 8M
    max_filters     = 256
    max_filter_values   = 4096
    max_batch_queries   = 32
    workers         = threads
    expansion_limit = 2000
}

index data{
    source = data
    path = /usr/local/sphinx/var/data/data
    docinfo = extern
    mlock = 0
    morphology = none
    min_word_len = 1
}


1:sphinx是不是对属性的个数有限制,上面有14个,每次建立索引就会报错,

ERROR: index 'data': too many string attributes (current index format allows up to 4 GB).

如果不超过10个就没问题;


2:如果我将数据减少到几千万条,那么即使是14个属性都能顺利建立索引;我之前怀疑是物理机内存不够,可是我将内存扩到70G,问题依然存在!系统本身没有对单文件大小做限制!


以上问题希望大神能够帮忙解答;

感谢!


加载中
0
loki_lan
loki_lan
秒杀也是醉了,GB在大数据里只是基本单位,TB才是正常单位
微笑ZD
微笑ZD
我只是想解决我这里的问题,其他的不管!谢谢
0
r
rav3n
建议改用ES。一个很重要的特性是支持分布式。不过你现在这种量级一台足以。
微笑ZD
微笑ZD
因为公司原因,不考虑用ES,谢谢
返回顶部
顶部