Facets用空格标记标签 . 有解决方案吗？-Java 学习之路

我有facets tokenize标签带空格的问题 .

我有以下映射：

curl -XPOST "http://localhost:9200/pictures" -d '
    {
      "mappings" : {
        "pictures" : {
                "properties" : {
                    "id": { "type": "string" },
                    "description": {"type": "string", "index": "not_analyzed"},
                    "featured": { "type": "boolean" },
                    "categories": { "type": "string", "index": "not_analyzed" },
                    "tags": { "type": "string", "index": "not_analyzed", "analyzer": "keyword" },
                    "created_at": { "type": "double" }
                }
            }
        }
    }'

我的数据是：

curl -X POST "http://localhost:9200/pictures/picture" -d '{
      "picture": {
        "id": "4defe0ecf02a8724b8000047",
        "title": "Victoria Secret PhotoShoot",
        "description": "From France and Italy",
        "featured": true,
        "categories": [
          "Fashion",
          "Girls",
        ],
        "tags": [
          "girl",
          "photoshoot",
          "supermodel",
          "Victoria Secret"
        ],
        "created_at": 1405784416.04672
      }
    }'

我的查询是：

curl -X POST "http://localhost:9200/pictures/_search?pretty=true" -d '
    {
      "query": {
        "text": {
          "tags": {
            "query": "Victoria Secret"
          }
        }
      },
      "facets": {
        "tags": {
          "terms": {
            "field": "tags"
          }
        }
      }
    }'

输出结果是：

{
      "took" : 1,
      "timed_out" : false,
      "_shards" : {
        "total" : 5,
        "successful" : 5,
        "failed" : 0
      },
      "hits" : {
        "total" : 0,
        "max_score" : null,
        "hits" : [ ]
      },
      "facets" : {
        "tags" : {
          "_type" : "terms",
          "missing" : 0,
          "total" : 0,
          "other" : 0,
          "terms" : [ ]
        }
      }
    }

现在，我在方面得到0和 total: 0 in hits
任何想法为什么它不起作用？
我知道当我从标签中删除 keyword analyzer 并将其设为 "not_analyzed" 然后我得到结果 .
但仍存在区分大小写的问题 .
如果我通过 removing keyword analyzer 运行相同的上述查询，那么我得到的结果是：

facets: {
        tags: {
            _type: terms
            missing: 0
            total: 12
            other: 0
            terms: [
                {
                    term: photoshoot
                    count: 1
                }
                {
                    term: girl
                    count: 1
                }
                {
                    term: Victoria Secret
                    count: 1
                }
                {
                    term: supermodel
                    count: 1
                }         
            ]
        }

    }

这里 Victoria Secret 在 "not_analyzed" 中区分大小写，但在 count 中占用空间，但当我使用 lowercase 查询为"victoria secret"时，它不会给出任何结果 .

Any suggestions??

谢谢，
苏拉杰

2 回答

0
第一个例子对我来说并不完全清楚 . 如果使用KeywordAnalyzer，则意味着该字段将按原样编制索引，但是根本不分析字段更有意义，这是相同的 . 您发布的映射包含两者
```
"index": "not_analyzed", "analyzer": "keyword"
```
这没有多大意义 . 如果您不分析该领域，为什么要为它选择分析仪？

除此之外，当然如果你不分析字段，标签 Victoria Secret 将按原样编入索引，因此查询 victoria secret 将不匹配 . 如果您希望它不区分大小写，则需要定义custom analyzer，它使用KeyworkTokenizer，因为您不想将其标记为LowercaseTokenFilter . 您可以通过索引设置分析部分定义自定义分析器，然后在映射中使用它 . 但是这样的方面总是小写的，这就是为什么最好定义一个multi field并使用两个不同的文本分析索引字段，一个用于facet，一个用于搜索 .

您可以像这样创建索引：
```
curl -XPOST "http://localhost:9200/pictures" -d '{
    "settings" : {
        "analysis" : {
            "analyzer" : {
              "lowercase_analyzer" : {
                "type" : "custom",
                "tokenizer" : "keyword",
                "filter" : [ "lowercase"]
              }
            }
        }
    },
    "mappings" : {
        "pictures" : {
            "properties" : {
                "id": { "type": "string" },
                "description": {"type": "string", "index": "not_analyzed"},
                "featured": { "type": "boolean" },
                "categories": { "type": "string", "index": "not_analyzed" },
                "tags" : {
                    "type" : "multi_field",
                    "fields" : {
                        "tags": { "type": "string", "analyzer": "lowercase_analyzer" },
                        "facet": {"type": "string", "index": "not_analyzed"},
                    }
                },
                "created_at": { "type": "double" }
            }
        }
    }
}'
```
然后，当您搜索该字段时，自定义lowercase_analyzer也将默认应用于文本查询，以便您可以搜索 Victoria Secret 或 victoria secret 并获取结果 . 您需要更改构面零件并在新的 tags.facet 字段上创建构面，该区域未分析 .

此外，您可能希望查看match query，因为文本查询已被弃用最新的elasticsearch版本（0.19.9） .
回复于 2024-04-26T10:36:25+08:00
4

我认为这对我的回答有所帮助

https://gist.github.com/2688072

回复于 2024-04-26T10:36:25+08:00

Facets用空格标记标签 . 有解决方案吗？

2 回答

相关问题