加载中

It’s no secret that Machine Learning is revolutionizing many industries. This is equally true in search, where companies exhaust themselves capturing nuance through manually tuned search relevance. Mature search organizations want to get past the “good enough” of manual tuning to build smarter, self-learning search systems.

That’s why we’re excited to release the Elasticsearch Learning to Rank Plugin. What is Learning to Rank? With Learning to Rank, a team trains a Machine Learning model to learn what users deem relevant.

众所周知,机器学习正在引领许多行业的变革。对于曾疲于用人工调整搜索相关性来捕捉细微差别的搜索行业就更是如此。人工调整已经实现了其能达到的最好效果,成熟的搜索公司不满足于此,试图建立更加智能和自动化的搜索系统。

因此,Elasticsearch 学习排序插件发布后我们非常激动。那么,学习排序是什么?用学习排序方法,一个团队就可以训练机器学习模型来判断用户行为的相关性。

When implementing Learning to Rank, you need to:

  • Measure what users deem relevant through analytics to build a judgment list grading documents as exactly relevant, moderately relevant, or not relevant for queries.

  • Hypothesize which features might help predict relevance, such as the TF*IDF of specific field matches, recency, personalization for the searching user, etc.

  • Train a model that can accurately map features to a relevance score.

  • Deploy the model to your search infrastructure, using it to rank search results in production.

Don’t fool yourself. Underneath each of these steps lie complex, hard technical, and non-technical problems. There’s still no silver bullet. As we mention in Relevant Search, manual tuning of search results comes with many of the same challenges as a good learning to rank solution. We’ll have more to say about the many infrastructure, technical, and non-technical challenges of mature learning to rank solutions in future blog posts.

在实现排名学习的时候,你需要:

  • 通过分析来衡量用户对相关度的评价, 以构建出一个评价列表,将文档分为与查询精确相关、适度相关以及完全无关,这几个等级。

  • 猜测那些特性可能对相关性的预测有所帮助, 比如指定域匹配的 TF*IDF , 新近程度, 搜索用户的个性特征等等。

  • 对一个模型进行训练使其能够精确地将特性匹配到一个相关度等级上去。

  • 将模型部署到你的搜索设施上去,利用它来在生产环境中对搜索结果进行排名。

不要被假象欺骗。上述这些步骤,其中的每一个底层都是复杂的、技术上实现起来比较困难的,以及非技术性的问题。实施起来仍然不会有捷径可循。如我们在相关度搜索中所提过的, 对搜索结果的手工调整会遇到跟应用一个好的排名学习解决方案相比许多相同的挑战。以后我们会在博文中就成熟的排名学习解决方案中关于基础设施、技术以及非技术性的挑战发表更多的意见。

In this blog post, I want to tell you about our work to integrate learning to rank within Elasticsearch. Clients ask us in nearly every relevance consulting engagement whether or not this technology can help them. However, while there’s a clear path in Solr thanks to Bloomberg, there hasn’t been one in Elasticsearch. Many clients want the modern affordances of Elasticsearch, but find this a crucial missing piece to selecting the technology for their search stack.

Indeed, Elasticsearch’s Query DSL can rank results with tremendous power and sophistication. A skilled relevance engineer can use the query DSL to compute a broad variety of query-time features that might signal relevance, giving quantitative answers to questions like:

  1. How much is the search term mentioned in the title?

  2. How long ago was the article/movie/etc. published?

  3. How does the document relate to user’s browsing behaviors?

  4. How expensive is this product relative to a buyer’s expectations?

  5. How conceptually related is the user’s search term to the subject of the article?

在本篇博文中,我想告诉你关于我们将机器学习集成到 Elasticsearch 中的排名系统中去,所做的一些工作。客户在几乎每一次的相关咨询会议中都要问我们这项技术是否能帮助到他们。然而,虽然在 Solr 中 Bloomberg 使其有了一个应用此技术的明确途径,但在 Elasticsearch 仍然没有。许多的客户想要的是 Elasticsearch 能满足更加与时俱进的可用性需求, 却发现在为他们的搜索技术栈选择该项技术方案时存在关键性的缺失。

诚然, Elasticsearch 的 DSL 查询可以利用强大的能力和复杂的架构来实现对结果的排名。一名熟练的相关领域的工程师可以利用查询 DSL 来计算出各种也许会标识出相关性的查询时间特性, 然后给出如下一些定量问题的回答:

  1. 标题中提到的搜索字条有多少?

  2. 文章、影片等等是多久之前发布的?

  3. 文档是如何同用户的浏览习惯相关联的?

  4. 该产品的价格相对于买房的预期是贵了还是便宜?

  5. 用户在搜索是使用的词条与文章主题之间的概念性相关度几何?

Many of these features aren’t static properties of the documents in the search engine. Instead, they are query-dependent, meaning that they measure some relationship between the user or their query and a document. To readers of Relevant Search, this is what we term signals in that book.

So, the question becomes, how can we marry the power of machine learning with existing power of the Elasticsearch Query DSL? That’s exactly what our plugin does: use Elasticsearch Query DSL queries as feature inputs to a Machine Learning model.

这些特性大部分都不是搜索引擎文档的静态属性。相反,它们是依赖于查询的,即计量用户、用户查询和某个文档的相关度。请关联搜索的读者知悉,这就是我们在书中提到的标记。

因此问题就变成了,如何将机器学习和 elasticsearch 提供的 Query DSL 查询相结合?这就是我们的插件要完成的工作:在用户向一个机器学习模型输入特征值的时候使用 Elasticsearch Query DSL 查询。

How Does It Work?

The plugin integrates RankLib and Elasticsearch. Ranklib takes as input a file with judgments and outputting a model in its own native, human-readable format. Ranklib then lets you trains models either programmatically or via the command line. Once you have a model, the Elasticsearch plugin contains the following:

  • A custom Elasticsearch script language called ranklib that can accept ranklib generated models as an Elasticsearch scripts.

  • A custom ltr query that inputs a list of Query DSL queries (the features) and a model name (what was uploaded at 1) and scores results.

As learning to rank models can be expensive to implement, you almost never want to use ltr query directly. Rather, you would rescore the top N results such as:

{
 "query": { /*a simple base query goes here*/ },
 "rescore": {
  "window_size": 100,
  "query": {
   "rescore_query": {
    "ltr": {
     "model": {
      "stored": "dummy"
     },
     "features": [{
        "match": {
         "title": < users keyword search >
        }
       }...

如何工作?

这个插件集成了 RankLib 和 Elasticsearch。Ranklib 的输入是一个评分文件,它以易读格式输出一个内置的模型。我们需要通过编程或者命令行训练这个模型。有了模型以后,Elasticsearch 插件会包含如下内容:

  • 一个自定义的 Elasticsearch 脚本语音 ranklib,它可以接受 ranklib 的模型作为 Elasticsearch 脚本

  • 一个自定义查询 ltr query,它的输入是包含了 Query DSL 查询(特征值)和模型名称(上一步那个)还有分数结果的列表。

鉴于排名学习模型的实现成本极高,你可能永远不会想要使用 ltr query,而是会像下面这样给前 N 个结果重新打分:

{
 "query": { /*a simple base query goes here*/ },
 "rescore": {
  "window_size": 100,
  "query": {
   "rescore_query": {
    "ltr": {
     "model": {
      "stored": "dummy"
     },
     "features": [{
        "match": {
         "title": < users keyword search >
        }
       }...

You can dig into a fully functioning example in the scripts directory of the project. It’s a canned example, using hand-created judgments of movies from TMDB. I use an Elasticsearch index with TMDB to execute queries corresponding to features, augment a judgment file with the relevance scores of those queries and features, and train a Ranklib model at the command line. I store the model in Elasticsearch and provide a script to search using the model.

Don’t be fooled by the simplicity of this example. The reality of a real learning to rank solutions is a tremendous amount of work, including studying users, processing analytics, data engineering, and feature engineering. I say that to not dissuade you because the payoff can be worth it; just know what you’re getting into. Smaller organizations might still do better with the ROI of hand-tuned results.

你可以到项目的 scripts 目录仔细研究这个功能完整的样例。这个例子封装严密,使用了 TMDB 的电影的手工评分。我用 Elasticsearch 索引了 TMDB 来检索相应特征,然后用这些检索和特征的相应分扩大了评分文件,并在命令行训练了一个 Ranklib 模型。这个模型被我存储进 Elasticsearch,并用它执行了一条检索脚本。

别被这个简单的例子忽悠了。真实的排名学习方案是大量的工作的结果,包括研究用户、处理分析、数据工程和特征处理等。我这么说不是要吓退你,而是你只要想想你的所得就明白你的付出是值得的。小型企业使用手工调整的 ROI 可能更好一些。

Training and Loading the Learning to Rank Model

Let’s start with the hand-created, minimal judgment list I’ve provided to show how our example trains a model.

Ranklib judgment lists come in a fairly standard format. The first column contains the judgment (0-4) for a document. The next column is a query id, such as “qid:1.” The subsequent columns contain the values of the features associated with that query-document pair. On the left-hand side is the 1-based index of the feature. To the right of that number is the value for the feature. The example in the Ranklib README is:

3 qid:1 1:1 2:1 3:0 4:0.2 5:0 # 1A 2 qid:1 1:0 2:0 3:1 4:0.1 5:1 # 1B 1 qid:1 1:0 2:1 3:0 4:0.4 5:0 # 1C 1 qid:1 1:0 2:0 3:1 4:0.3 5:0 # 1D 1 qid:2 1:0 2:0 3:1 4:0.2 5:0 # 2A

Notice also the comment (# 1A , etc). That comment is the document identifier for this judgment. The document identifier isn’t needed by Ranklib, but it’s fairly handy to human readers. As we’ll see it’s useful for us as well when we gather features via Elasticsearch queries.

训练和加载排名学习模型

咱们就用我提供的这个手动创建的、迷你的排名列表来看一下我如何训练模型。

Ranklib 评分列表有十分严格的格式。第一列是文档评分,从 0 到 4。第二列是查询 id,比如 “qid:1.” 后面的列是检索文档对的特征值:左边的是基于 1 的特征索引,右边的是特征值。 下面是Ranklib README 中的数据:

3 qid:1 1:1 2:1 3:0 4:0.2 5:0 # 1A 2 qid:1 1:0 2:0 3:1 4:0.1 5:1 # 1B 1 qid:1 1:0 2:1 3:0 4:0.4 5:0 # 1C 1 qid:1 1:0 2:0 3:1 4:0.3 5:0 # 1D 1 qid:2 1:0 2:0 3:1 4:0.2 5:0 # 2A

注意一下注释(用井号开头的),是评分的文档标识。这个标识对于 Ranklib 没用,不过对应我们阅读很有用。后面我们会看到通过 Elasticsearch 检索也会用到这类标识符。

Our example starts with a minimal version of the above file (seen here). We need to start with a trimmed-down version of the judgment file that simply has a grade, query id, and document id tuple. Like so:

4 qid:1 # 7555 3 qid:1 # 1370 3 qid:1 # 1369 3 qid:1 # 1368 0 qid:1 # 136278 ...

As above, we provide the Elasticsearch _id for the graded document as the comment on each line.

We need to enhance this a bit further. We must map each query id (qid:1) to an actual keyword query (“Rambo”) so we can use the keyword to generate feature values. We provide this mapping in the header which the example code will pull out:

# Add your keyword strings below, the feature script will 
# Use them to populate your query templates # # qid:1: rambo # qid:2: rocky # qid:3: bullwinkle # # https://sourceforge.net/p/lemur/wiki/RankLib%20File%20Format/ # # 4 qid:1 # 7555 3 qid:1 # 1370 3 qid:1 # 1369 3 qid:1 # 1368 0 qid:1 # 136278 ...

To help clear up some confusion, I’m going to start talking about ranklib “queries” (the qid:1 etc) as “keywords” to differentiate from the Elasticsearch Query DSL “queries” which are Elasticsearch-specific constructs used to generate feature values.

What’s above isn’t a complete Ranklib judgment list. It’s just a minimal sample of relevance grades for given documents for a given keyword search. To be a fully-fledged training set, it needs to include the feature values shown above, the 1:0 2:1 … included after each line in the first judgment list shown.

我们用一个最小版本的文件举例 (看 这里)。我们需要从文件的削减版开始,这个文件仅仅含有一级,查询 id 和文档 id 元组. 像这样:

4 qid:1 # 7555 3 qid:1 # 1370 3 qid:1 # 1369 3 qid:1 # 1368 0 qid:1 # 136278 ...

如上所述,我们为分级的文件提供 Elasticsearch _id,作为每行的注释。

我们必须将每个查询 id (qid:1) 与实际的关键字查询 ("Rambo") 相匹配,以便用关键字生成特征值。我们在例子代码的开头提供匹配关系:

# 在下面加入你的关键字, 特征脚本将会它们安置于查询模板 
# # qid:1: rambo # qid:2: rocky # qid:3: bullwinkle # # https://sourceforge.net/p/lemur/wiki/RankLib%20File%20Format/ # # 4 qid:1 # 7555 3 qid:1 # 1370 3 qid:1 # 1369 3 qid:1 # 1368 0 qid:1 # 136278 ...

为使大家更好地了解,我将要把 ranklib"查询" (qid:1 等等) 作为关键字来区分 Elasticsearch Query DSL 查询 ,Elasticsearch Query DSL 查询是 Elasticsearch 用于生成特征值的明确的结构。 

上述不是完整的 Ranklib 判断列表。当给定关键字查询文档时,它仅仅是相关性等级的极简案例。要成为完整的训练集,它需要包含上述值的特征,在每行第一个判断列表显示后包含 1:0 2:1 ... 等等。

To generate those feature values, we also need to have proposed features that might correspond to relevance for movies. These, as we said, are Elasticsearch queries. The scores for these Elasticseach queries will finish filling out the judgment list above. In the example above, we do this using a jinja template corresponding to each feature number. For example, the file 1.json.jinja is the following Query DSL query:

{ "query": { "match": { "title": "" } } }

In other words, we’ve decided that feature 1 for our movie search system ought to be the TF*IDF relevance score for the user’s keywords when matched against the title field. There’s also 2.jinja.json , which performs a more complex search across multiple text fields:

{ "query": { "multi_match": { "query": "", "type": "cross_fields", "fields": ["overview", "genres.name", "title", "tagline", "belongs_to_collection.name", "cast.name", "directors.name"], "tie_breaker": 1.0 } } }

Part of the fun of learning to rank is hypothesizing what features might correlate with relevance. In the example, you can change features 1 and 2 to any Elasticsearch query. You can also experiment by adding additional features 3 through however many. There are problems with too many features, as you’ll want to get enough representative training samples that cover all reasonable feature values. We’ll discuss more training and testing learning to rank models in a future blog post.

为生成那些特征值,我们需要提出与电影相关的特征属性。正如我们刚才所述,有 Elasticsearch 查询。Elasticsearch 查询的分数将填充上述的判断列表。上述例子中,我们用 jinja 模板与每个特征数字相对应。例如, 文件 1.json.jinja 是下面的 Query DSL 查询:

{ "query": { "match": { "title": "" } } }

换言之,对于我们的电影检索系统,当用户的关键字与标题字段匹配时,我们决定把特征 1 作为 TF*IDF 相关分数。文件 2.jinja.json , 遍及多个文本字段,做了更复杂的查询:

{ "query": { "multi_match": { "query": "", "type": "cross_fields", "fields": ["overview", "genres.name", "title", "tagline", "belongs_to_collection.name", "cast.name", "directors.name"], "tie_breaker": 1.0 } } }

学习排名的乐趣之一是假设特征的相关性。举例来说,在任意 Elasticsearch 查询中,你可以改变特征 1 和 2,你也可以通过增加额外的特征来实验。特征太多就会出现问题,你需要充分的训练样本来覆盖所有合理的特征值。在后续文章中,我们将讨论更多关于训练与测试学习来排名模型。

With these two ingredients, the minimal judgment list and a set of proposed Query DSL queries/features, we need to generate a fully-fleshed out judgment list for Ranklib and load the Ranklib generated model into Elasticsearch to be used. This means:

  1. Getting relevance scores for features for each keyword/document pair. Aka issuing queries to Elasticsearch to log relevance scores.

  2. Outputting a full judgment file not only with grades and keyword query ids but also with feature values from step 1:

  • Running Ranklib to train the model.

  • Loading the model into Elasticsearch for use at search time.

  • The code to do this is all bundled up in train.py, which I encourage you to take apart. To run this, you’ll need:

    • RankLib.jar downloaded to the scripts folder.

    • Python packages Elasticsearch and Jinja2 installed (there’s a Python requirements.txt if

      you’re familiar).

通过精简版判断列表和 Query DSL 查询/特征 集这两个因素,我们要为 Ranklib 生成更多的判断列表,将 Ranklib 生成模型加载进 Elasticsearch。这意味着:

  1. 为特征和关键字/文档对获取相关分数。又称,向 Elasticsearch 发出查询,记录相关分数。

  2. 输出全量的判断文件,不仅关于级别以及关键字查询ids,而且有步骤 1 中的特征值:

  • 运行 Ranklib 训练模型。

  • 在查询时,加载模型至 Elasticsearch。

  • 所有代码均在 train.py,我建议你进行拆分,在执行:

    • RankLib.jar 下载至 scripts 文件夹.

    • 安装 Elasticsearch and Jinja2 Python 包(有一个 requirements.txt 文件)。


返回顶部
顶部