Elasticsearch查询返回所有记录-Java 学习之路

380

我在Elasticsearch中有一个小型数据库，出于测试目的，我希望将所有记录拉回来 . 我试图使用表单的URL ...

http://localhost:9200/foo/_search?pretty=true&q={'matchAll':{''}}

有人可以给我你用来完成这个的URL吗？

23 回答

6
elasticsearch（ES）支持从ES集群索引获取数据的GET或POST请求 .

当我们做GET时：
```
http://localhost:9200/[your index name]/_search?size=[no of records you want]&q=*:*
```
当我们做POST时：
```
http://localhost:9200/[your_index_name]/_search
{
  "size": [your value] //default 10
  "from": [your start index] //default 0
  "query":
   {
    "match_all": {}
   }
}
```
我建议使用带有elasticsearch的UI插件http://mobz.github.io/elasticsearch-head/这将帮助您更好地了解您创建的索引并测试索引 .
回复于 2024-05-03T19:01:38+08:00
10
下面的查询将返回您想要返回的NO_OF_RESULTS .
```
curl -XGET 'localhost:9200/foo/_search?size=NO_OF_RESULTS' -d '
{
"query" : {
    "match_all" : {}
  }
}'
```
现在，这里的问题是你想要 all 返回记录 . 很自然地，在编写查询之前，您不会知道 NO_OF_RESULTS 的值 .

我们如何知道您的文档中存在多少条记录？只需在下面输入查询即可
```
curl -XGET 'localhost:9200/foo/_search' -d '
```
这会给你一个看起来像下面的结果
```
{
hits" : {
  "total" :       2357,
  "hits" : [
    {
      ..................
```
结果 total 告诉您文档中有多少记录可用 . 所以，这是一个很好的方式来了解 NO_OF RESULTS 的 Value
```
curl -XGET 'localhost:9200/_search' -d '
```
搜索所有索引中的所有类型
```
curl -XGET 'localhost:9200/foo/_search' -d '
```
搜索foo索引中的所有类型
```
curl -XGET 'localhost:9200/foo1,foo2/_search' -d '
```
搜索foo1和foo2索引中的所有类型
```
curl -XGET 'localhost:9200/f*/_search
```
搜索以f开头的任何索引中的所有类型
```
curl -XGET 'localhost:9200/_all/type1,type2/_search' -d '
```
所有索引中的搜索类型用户和推文
回复于 2024-05-03T19:01:38+08:00
0
简单！你可以使用 size 和 from 参数！
```
http://localhost:9200/[your index name]/_search?size=1000&from=0
```
然后逐渐更改 from ，直到获得所有数据 .
回复于 2024-05-03T19:01:38+08:00

这是我使用python客户端找到的最佳解决方案

# Initialize the scroll
  page = es.search(
  index = 'yourIndex',
  doc_type = 'yourType',
  scroll = '2m',
  search_type = 'scan',
  size = 1000,
  body = {
    # Your query's body
    })
  sid = page['_scroll_id']
  scroll_size = page['hits']['total']

  # Start scrolling
  while (scroll_size > 0):
    print "Scrolling..."
    page = es.scroll(scroll_id = sid, scroll = '2m')
    # Update the scroll ID
    sid = page['_scroll_id']
    # Get the number of results that we returned in the last scroll
    scroll_size = len(page['hits']['hits'])
    print "scroll size: " + str(scroll_size)
    # Do something with the obtained page

https://gist.github.com/drorata/146ce50807d16fd4a6aa

使用java客户端

import static org.elasticsearch.index.query.QueryBuilders.*;

QueryBuilder qb = termQuery("multi", "test");

SearchResponse scrollResp = client.prepareSearch(test)
        .addSort(FieldSortBuilder.DOC_FIELD_NAME, SortOrder.ASC)
        .setScroll(new TimeValue(60000))
        .setQuery(qb)
        .setSize(100).execute().actionGet(); //100 hits per shard will be returned for each scroll
//Scroll until no hits are returned
do {
    for (SearchHit hit : scrollResp.getHits().getHits()) {
        //Handle the hit...
    }

    scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(60000)).execute().actionGet();
} while(scrollResp.getHits().getHits().length != 0); // Zero hits mark the end of the scroll and the while loop.

https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-search-scrolling.html

回复于 2024-05-03T19:01:38+08:00

要返回所有索引的所有记录，您可以执行以下操作

curl -XGET http://35.195.120.21:9200/_all/_search?size=50&pretty

输出：

"took" : 866,
  "timed_out" : false,
  "_shards" : {
    "total" : 25,
    "successful" : 25,
    "failed" : 0
  },
  "hits" : {
    "total" : 512034694,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "grafana-dash",
      "_type" : "dashboard",
      "_id" : "test",
      "_score" : 1.0,
       ...

回复于 2024-05-03T19:01:38+08:00

5

使用 server:9200/_stats 也可以获取有关所有别名的统计信息..比如每个别名的大小和元素数量，这非常有用，并提供有用的信息

回复于 2024-05-03T19:01:38+08:00
108
size param将显示的命中数从默认值（10）增加到500 .
```
http://localhost:9200/[indexName]/_search?pretty=true&size=500&q=*:*
```
逐步更改 from 以获取所有数据 .
```
http://localhost:9200/[indexName]/_search?size=500&from=0
```
回复于 2024-05-03T19:01:38+08:00
6
```
http://127.0.0.1:9200/foo/_search/?size=1000&pretty=1
                                   ^
```
Note the size param ，它将默认值（10）显示的命中数增加到每个分片1000个 .

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-from-size.html
回复于 2024-05-03T19:01:38+08:00
0

http://localhost:9200/foo/_search/？ size = 1000＆pretty = 1

您需要指定大小查询参数，因为默认值为10

回复于 2024-05-03T19:01:38+08:00

对于Elasticsearch 6.x

要求： GET /foo/_search?pretty=true

回复：在Hits-> total中，给出文档的计数

{
      "took": 1,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
      },
      "hits": {
        "total": 1001,
        "max_score": 1,
        "hits": [
          {

回复于 2024-05-03T19:01:38+08:00

curl -XGET '{{IP/localhost}}:9200/{{Index name}}/{{type}}/_search?scroll=10m&pretty' -d '{
"query": {
"filtered": {
"query": {
"match_all": {}
}}'

回复于 2024-05-03T19:01:38+08:00

-5

curl -X GET 'localhost:9200/foo/_search?q=*&pretty'

回复于 2024-05-03T19:01:38+08:00

0

默认情况下，Elasticsearch返回10条记录，因此应明确提供大小 .

根据请求添加大小以获得所需的记录数 .

http：// ：9200 / / _search？pretty = true＆size =（记录数）

注意：最大页面大小不能超过index.max_result_window索引设置，默认为10,000 .

回复于 2024-05-03T19:01:38+08:00
5
官方文档提供了这个问题的答案！你可以找到here .
```
{
  "query": { "match_all": {} },
  "size": 1
}
```
您只需将size（1）替换为您想要查看的结果数量！
回复于 2024-05-03T19:01:38+08:00
18
我认为支持lucene语法，所以：

http://localhost:9200/foo/_search?pretty=true&q=*:*

大小默认为10，因此您可能还需要 &size=BIGNUMBER 才能获得超过10个项目 . （其中BIGNUMBER等于您认为比数据集大的数字）

但是，对于大型结果集，使用扫描搜索类型的elasticsearch文档suggests .

例如：
```
curl -XGET 'localhost:9200/foo/_search?search_type=scan&scroll=10m&size=50' -d '
{
    "query" : {
        "match_all" : {}
    }
}'
```
然后根据上面的文档链接继续请求建议 .

编辑： scan 在2.1.0中已弃用 .

scan 与 _doc 排序的常规 scroll 请求相比没有任何好处 . link to elastic docs（由@ christophe-roussy发现）
回复于 2024-05-03T19:01:38+08:00
0
您可以使用_count API获取 size 参数的值：
```
http://localhost:9200/foo/_count?q=<your query>
```
返回 {count:X, ...} . 提取值'X'，然后执行实际查询：
```
http://localhost:9200/foo/_search?q=<your query>&size=X
```
回复于 2024-05-03T19:01:38+08:00
16

如果您只是添加一些大数字作为大小，Elasticsearch将会慢得多 significant ，使用一种方法来获取所有文档都使用扫描和滚动ID .

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html

回复于 2024-05-03T19:01:38+08:00

通过提供大小，elasticSearch返回的最大结果为10000

curl -XGET 'localhost:9200/index/type/_search?scroll=1m' -d '
{
   "size":10000,
   "query" : {
   "match_all" : {}
    }
}'

之后，您必须使用Scroll API获取结果并获取_scroll_id值并将此值放在scroll_id中

curl -XGET  'localhost:9200/_search/scroll'  -d'
{
   "scroll" : "1m", 
   "scroll_id" : "" 
}'

回复于 2024-05-03T19:01:38+08:00

如果你想要提取数千条记录，那么...有些人给出了使用“滚动”的正确答案（注意：有些人还建议使用“search_type = scan” . 这已被弃用，并在v5.0中被删除 . 你不需要它）

从'搜索'查询开始，但指定'scroll'参数（这里我使用1分钟超时）：

curl -XGET 'http://ip1:9200/myindex/_search?scroll=1m' -d '
{
    "query": {
            "match_all" : {}
    }
}
'

这包括你的第一批“点击” . 但我们不是在这里完成的 . 上面的curl命令的输出将是这样的：

{"_scroll_id":"c2Nhbjs1OzUyNjE6NU4tU3BrWi1UWkNIWVNBZW43bXV3Zzs1Mzc3OkhUQ0g3VGllU2FhemJVNlM5d2t0alE7NTI2Mjo1Ti1TcGtaLVRaQ0hZU0FlbjdtdXdnOzUzNzg6SFRDSDdUaWVTYWF6YlU2Uzl3a3RqUTs1MjYzOjVOLVNwa1otVFpDSFlTQWVuN211d2c7MTt0b3RhbF9oaXRzOjIyNjAxMzU3Ow==","took":109,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":22601357,"max_score":0.0,"hits":[]}}

使用_scroll_id非常重要，接下来应运行以下命令：

curl -XGET  'localhost:9200/_search/scroll'  -d'
    {
        "scroll" : "1m", 
        "scroll_id" : "c2Nhbjs2OzM0NDg1ODpzRlBLc0FXNlNyNm5JWUc1" 
    }
    '

但是，传递scroll_id并不是设计为手动完成的 . 你最好的选择是编写代码来完成它 . 例如在java中：

private TransportClient client = null;
    private Settings settings = ImmutableSettings.settingsBuilder()
                  .put(CLUSTER_NAME,"cluster-test").build();
    private SearchResponse scrollResp  = null;

    this.client = new TransportClient(settings);
    this.client.addTransportAddress(new InetSocketTransportAddress("ip", port));

    QueryBuilder queryBuilder = QueryBuilders.matchAllQuery();
    scrollResp = client.prepareSearch(index).setSearchType(SearchType.SCAN)
                 .setScroll(new TimeValue(60000))                            
                 .setQuery(queryBuilder)
                 .setSize(100).execute().actionGet();

    scrollResp = client.prepareSearchScroll(scrollResp.getScrollId())
                .setScroll(new TimeValue(timeVal))
                .execute()
                .actionGet();

现在，最后一个命令的LOOP使用SearchResponse来提取数据 .

回复于 2024-05-03T19:01:38+08:00

580
如果有人正在寻找像我一样从Elasticsearch中检索的所有数据用于某些用例，我就是这样做的 . 而且，所有数据均指，所有索引和所有文档类型 . 我正在使用Elasticsearch 6.3
```
curl -X GET "localhost:9200/_search?pretty=true" -H 'Content-Type: application/json' -d'
{
    "query": {
        "match_all": {}
    }
}
'
```
Elasticsearch reference
回复于 2024-05-03T19:01:38+08:00
0
没有，除了@Akira Sendoh已经回答了如何实际获得所有文档 . 但即使是那个解决方案也会在没有日志的情况下崩溃我的 ES 6.3 服务 . 使用低级 elasticsearch-py 库对我有用的唯一方法是使用 scroll() api的scan helper：
```
from elasticsearch.helpers import scan

doc_generator = scan(
    es_obj,
    query={"query": {"match_all": {}}},
    index="my-index",
)

# use the generator to iterate, dont try to make a list or you will get out of RAM
for doc in doc_generator:
    # use it somehow
```
然而，现在更清洁的方式似乎是通过 elasticsearch-dsl 库，提供更抽象，更清晰的调用，例如：http://elasticsearch-dsl.readthedocs.io/en/latest/search_dsl.html#hits
回复于 2024-05-03T19:01:38+08:00
0
调整大小的最佳方法是在URL前面使用size = number
```
Curl -XGET "http://localhost:9200/logstash-*/_search?size=50&pretty"
```
注意：可以在此大小中定义的最大值为10000.对于任何超过一万的值，它希望您使用滚动功能，这将最大限度地减少对性能的影响 .
回复于 2024-05-03T19:01:38+08:00

您可以使用size = 0这将返回所有文档示例

curl -XGET 'localhost:9200/index/type/_search' -d '
{
   size:0,
   "query" : {
   "match_all" : {}
    }
}'

回复于 2024-05-03T19:01:38+08:00

Elasticsearch查询返回所有记录

23 回答

相关问题