首页 文章

关于如何索引单词并使用其类型(实体等)进行注释的指导,然后是Elasticsearch / w.e . 用注释返回这些单词?

提问于
浏览
0

我正在尝试 Build 一个非常简单的NLP聊天(我甚至可以说伪NLP?),我想识别一个固定的意图子集(动词,情绪)和实体(产品等)

它's a kind of entity identification or named-entity recognition, but I'我不确定我需要一个完全成熟的NER解决方案来实现我想要的目标 . I don't care if the person types cars instead of car. HE HAS to type the EXACT word. 所以不需要在这里处理语言 .

It doesn't need to identity and classify the words, I'm just looking for a way that when I search a phrase, it returns all results that contains each word of if.

我想索引类似的东西:

want [type: intent]
buy [type: intent]
computer [type: entity]
car [type: entity]

然后用户将键入:

我想买车 .

然后我将这个短语发送到ElasticSearch / Solr / w.e . 并且它应该返回类似下面的东西(它不必像那样结构化,但每个单词应该带有它的类型):

[
    {"word":"want", "type:"intent"},
    {"word":"buy", "type":"intent"},
    {"word":"car","type":"car"}
]

我带来的方法是将每个单词索引为:

{
    "word": "car",
    "type": "entity"
}
{
    "word": "buy",
    "type": "intent"
}

然后我提供整个短语,用“单词”搜索 . 但到目前为止我没有成功,因为弹性搜索不会返回任何单词,即使短语包含被索引的单词 .

Any insights/ideas/tips to keep this using one of the main search engines?

If I do need to use a dedicated NER solution ,注释这样的单词的方法是什么, without the need to worry about fixing typos, multi-languages, etc? 我只想在人们完全按原样键入意图和实体时才返回结果,因此不是高级NLP解决方案 .

奇怪的是我在google上找不到这个 .

1 回答

  • 2

    我创建了一个基本的 index 并索引了一些像这样的文档

    PUT nlpindex/mytype/1
    {
        "word": "buy",
        "type": "intent"
    }
    

    我使用query string来搜索短语中出现的所有单词

    GET nlpindex/_search
    {
      "query": {
        "query_string": {
          "query": "I want to buy a car",
          "default_field": "word"
        }
      }
    }
    

    默认情况下, operatorOR ,因此它将搜索 word 字段中短语中的每个单词 .

    这是我得到的结果

    "hits": [
         {
            "_index": "nlpindex",
            "_type": "mytype",
            "_id": "1",
            "_score": 0.09427826,
            "_source": {
               "word": "car",
               "type": "entity"
            }
         },
         {
            "_index": "nlpindex",
            "_type": "mytype",
            "_id": "4",
            "_score": 0.09427826,
            "_source": {
               "word": "want",
               "type": "intent"
            }
         },
         {
            "_index": "nlpindex",
            "_type": "mytype",
            "_id": "3",
            "_score": 0.09427826,
            "_source": {
               "word": "buy",
               "type": "intent"
            }
         }
      ]
    

    这有帮助吗?

相关问题