首页 文章

弹性搜索模糊短语

提问于
浏览
9

我有以下查询为我的搜索添加模糊性 . 但是,我现在意识到匹配查询不考虑搜索字符串中单词的顺序,就像match_phrase那样 . 但是,我不能得到match_phrase给我模糊的结果 . 有没有办法告诉匹配考虑单词之间的顺序和距离?

{
    "query": {
        "match": {
            "content": {
                "query": "some search terms like this",
                "fuzziness": 1,
                "operator": "and"
            }
        }
    }
}

3 回答

  • 0

    最终发现我需要使用 span 查询的组合,这样可以对模糊和污点进行大量微调 . 我需要添加一个函数来手动标记我的短语并以编程方式添加到"clauses"数组:

    {"query":
    {
      "span_near": {
        "clauses": [
          {
            "span_multi": {
              "match": {
                "fuzzy": {
                  "content": {
                    "fuzziness": "2",
                    "value": "word"
                  }
                }
              }
            }
          },
          {
            "span_multi": {
              "match": {
                "fuzzy": {
                  "content": {
                    "fuzziness": "2",
                    "value": "another"
                  }
                }
              }
            }
          }                   
        ],
        "slop": 1,
        "in_order": "true"
    
  • 1

    @econgineer优秀的帖子 .

    我想尝试使用我们正在处理的ES查询 - 但是我懒得继续做JSON数据....

    我认为这段代码有效...奇怪的是它导致jq抱怨但ElasticSearch工作....

    import json
    import pprint
    from collections import defaultdict
    nested_dict = lambda: defaultdict(nested_dict)
    query=nested_dict()
    query['span_near']['clauses']=list()
    query['slop']='1'
    query['in_order']="true"
    
    
    words=['what','is','this']
    for w in words:
        nest = nested_dict()
        nest["span_multi"]["match"]["fuzzy"]["msg"]["fuzziness"]["value"]=w
        nest["span_multi"]["match"]["fuzzy"]["msg"]["fuzziness"]["fuzziness"]="2"
        json.dumps(nest)
        query['span_near']['clauses'].append(json.loads(json.dumps(nest)))
    
    
    pprint.pprint(json.loads(json.dumps(query)))
    

    如果你美化输出

    cat t2.json | tr  "\'" "\""  | jq '.'
    

    你应该看到类似的东西

    {
      "in_order": "true",
      "slop": "1",
      "span_near": {
        "clauses": [
          {
            "span_multi": {
              "match": {
                "fuzzy": {
                  "msg": {
                    "fuzziness": {
                      "fuzziness": "2",
                      "value": "what"
                    }
                  }
                }
              }
            }
          },
          {
            "span_multi": {
              "match": {
                "fuzzy": {
                  "msg": {
                    "fuzziness": {
                      "fuzziness": "2",
                      "value": "is"
                    }
                  }
                }
              }
            }
          },
          {
            "span_multi": {
              "match": {
                "fuzzy": {
                  "msg": {
                    "fuzziness": {
                      "fuzziness": "2",
                      "value": "this"
                    }
                  }
                }
              }
            }
          }
        ]
      }
    }
    

    然后查询ES这只是一个正常现象

    curl --silent My_ES_Server:9200:/INDEX/_search -d @t2.json
    

    非常感谢您的初步指导,我希望其他人能够发现这一点 .

  • 13

    确实,一个很好的问答 . 我很惊讶这种“模糊短语匹配”没有开箱即用的支持 .

    这是一个经过测试的NodeJS代码,它在多搜索(msearch)的上下文中生成模糊短语匹配(多子句)查询块,但是对于单个搜索,它应该工作相同 .

    Usage:

    let queryBody = [];
    client.msearch({
       body: queryBody
    })
    
    queryBody.push({ index: 'YOUR_INDEX' });
    queryBody.push(createESFuzzyPhraseQueryBlock('YOUR PHRASE', 'YOUR_FIELD_NAME', 2));   // 2 <- fuzziness
    

    功能:

    const createESFuzzyPhraseClauseBlock = (word, esFieldName, fuzziness) => {
        let clauseBlock = JSON.parse(
            `{
                "span_multi": {
                    "match": {
                        "fuzzy": {
                            "${esFieldName}": {
                                "fuzziness": "${fuzziness}",
                                "value": "${word}"
                            }
                        }
                    }
                }
            }`);
    
        return clauseBlock;
    };
    
    
    const createESFuzzyPhraseQueryBlock = (phrase, esFieldName, fuzziness) => {
        let clauses = [];
    
        let words = phrase.split(' ');
        words.forEach(word => clauses.push(createESFuzzyPhraseClauseBlock(word, esFieldName, fuzziness)));
    
        let queryBlock =
            {
                "query":
                    {
                        "span_near": {
                            "clauses": clauses,
                            "slop": 1,
                            "in_order": "true"
                        }
                    }
            };
    
        return queryBlock;
    };
    

相关问题