Token 贬值 exceeds length of provided text sized

panhan 发布于 2013/03/07 17:10
阅读 1K+
收藏 1

lucene2.9的例子能跑通,无问题。
升级到4.1版本,编译无错,使用HighLight高亮显示查询结果时,出错。

org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token 贬值 exceeds length of provided text sized 6
 at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:229)
 at org.apache.lucene.search.highlight.Highlighter.getBestFragments(Highlighter.java:158)
 at org.apache.lucene.search.highlight.Highlighter.getBestFragments(Highlighter.java:464)

 

主要方法:

public List<Article> getArticles(String query) throws Exception {
  IndexReader reader = DirectoryReader.open(FSDirectory.open(new File(INDEXPATH)));
  
  try{
  List<Article> qlist = new ArrayList<Article>();
  String fieldName = "title";
  IndexSearcher indexSearcher = new IndexSearcher(reader);
  

  System.out.println(">>> 2.开始读取索引... ... 通过关键字:【 "+ query +" 】");
  long begin = new Date().getTime();
  
  //下面的是进行title,content 两个范围内进行收索.
  BooleanClause.Occur[] clauses = { BooleanClause.Occur.SHOULD,BooleanClause.Occur.SHOULD };
  Query queryOBJ = MultiFieldQueryParser.parse(Version.LUCENE_41,query, new String[]{"title","content"}, clauses,analyzer);//parser.parse(query);
  Filter filter = null;
  
  //################# 搜索相似度最高的记录 ###################
  TopDocs topDocs = indexSearcher.search(queryOBJ, filter, 1000);

  System.out.println("*** 共匹配:" + topDocs.totalHits + "个 ***");
  
  Article article = null;
  
  //输出结果
  for (ScoreDoc scoreDoc : topDocs.scoreDocs){
    Document targetDoc = indexSearcher.doc(scoreDoc.doc);
    article = new Article();
    
    //设置高亮显示格式
    SimpleHTMLFormatter simpleHTMLFormatter = new SimpleHTMLFormatter("<font color='red'><strong>", "</strong></font>");
    /* 语法高亮显示设置 */
    Highlighter highlighter = new Highlighter(simpleHTMLFormatter,new QueryScorer(queryOBJ));
    highlighter.setTextFragmenter(new SimpleFragmenter(Integer.MAX_VALUE));
    
    
    // 设置高亮 设置 title,content 字段
    String title = targetDoc.get("title");
    String content = targetDoc.get("content");
    TokenStream titleTokenStream = analyzer.tokenStream("title",new StringReader(title));
    TokenStream contentTokenStream = analyzer.tokenStream("content",new StringReader(content));
  //此行抛错
    String highLightTitle = highlighter.getBestFragment(titleTokenStream, title);
    String highLightContent = highlighter.getBestFragment(contentTokenStream, content);
    
    //

    
        if(highLightTitle == null)
         highLightTitle = title;

        if(highLightContent == null)
         highLightContent = content;
       
       article.setTitle(highLightTitle);
    article.setContent(highLightContent);
    article.setTag(targetDoc.get("tag"));
    article.setTotalHits(topDocs.totalHits);
    
    qlist.add(article);
  }
  
  long end = new Date().getTime();
  System.out.println(">>> 3.搜索完毕... ... 共花费:" + (end - begin) +"毫秒...");
  
  reader.close();
  
  return qlist;
  
  }catch(Exception e){
   e.printStackTrace();
   return null;
  }
 }

 

不知有无高手通过4.1版的highlight测试的?

加载中
0
panhan
panhan

彻底摈弃早先低版本的写法了,虽然编译没有问题;
采用例子自带的写法,用默认的分词器,通过编译,但是使用MultiFieldQueryParser对多个字段进行检索,有些问题——找到的次数明显少

 

0
稀巴烂
稀巴烂
上面的问题是怎么解决的。现在我也碰到类似问题。
0
jinfan1115
jinfan1115
我这两天也遇到了类似的问题;
solr用的是4.7.2版本,在索引库里面加了拼音,想问下可以怎么解决?
返回顶部
顶部