首页 文章

Elasticsearch数据与映射不匹配

提问于
浏览
0

我正在将elasticsearch prod数据从1.4.3v迁移到5.5v,我正在使用reindex . 当我尝试将旧的ES索引重新索引到新的ES索引时,重建索引失败并出现异常 Failed Reason: mapper [THROUGHPUT_ROWS_PER_SEC] cannot be changed from type [long] to [float]. Failed Type: illegal_argument_exception

ES 1.4.3v中task_history索引的ES映射

{
   "task_history": {
      "mappings": {
         "task_run_hist": {
            "_all": {
               "enabled": false
            },
            "_routing": {
               "required": true,
               "path": "org_id"
            },
            "properties": {
               "RUN_TIME_IN_MINS": {
                  "type": "double"
               },
               "THROUGHPUT_ROWS_PER_SEC": {
                  "type": "long"
               },
               "account_name": {
                  "type": "string",
                  "index": "not_analyzed",
                  "store": true
               }
            }
         }
      }
   }
}

ES 5.5v中的task_history索引的ES映射(此映射作为部分重建索引创建)

{
  "task_history": {
    "mappings": {
      "task_run_hist": {
        "_all": {
          "enabled": false
        },
        "_routing": {
          "required": true
        },
        "properties": {
          "RUN_TIME_IN_MINS": {
            "type": "float"
          },
          "THROUGHPUT_ROWS_PER_SEC": {
            "type": "long"
          },
          "account_name": {
            "type": "keyword",
            "store": true
          }
        }
      }
    }
  }
}

样本数据

{
  "_index": "task_history",
  "_type": "task_run_hist",
  "_id": "1421955143",
  "_score": 1,
  "_source": {
    "RUN_TIME_IN_MINS": 0.47,
    "THROUGHPUT_ROWS_PER_SEC": 46,
    "org_id": "xxxxxx",
    "account_name": "Soma Acc1"
  }
},
{
  "_index": "task_history",
  "_type": "task_run_hist",
  "_id": "1421943738",
  "_score": 1,
  "_source": {
    "RUN_TIME_IN_MINS": 1.02,
    "THROUGHPUT_ROWS_PER_SEC": 65.28,
    "org_id": "yyyyyy",
    "account_name": "Choma Acc1"
  }
}

2个问题

  • 如果 THROUGHPUT_ROWS_PER_SEC 类型的映射是 long ,elasticsearch 1.4.3如何保存浮点数?

  • 如果是旧ES中的数据问题,如何在开始重建索引过程之前删除所有浮点数?

对于第二个选项,我试图使用下面的查询列出所有具有浮点数的文档,以便我可以验证一次并删除它,但是在查询下面仍然列出具有 THROUGHPUT_ROWS_PER_SEC 作为非浮点数的文档 .

注意:启用了Groovy脚本

GET task_history/task_run_hist/_search?size=100
{
   "filter": {
      "script": {
         "script": "doc['THROUGHPUT_ROWS_PER_SEC'].value % 1 == 0"
      }
   }
}

Updated with solution provided by Val

当我在重建索引中尝试下面的脚本时,我收到运行时错误 . 下面列出 . 关于什么在这里得到什么的任何线索?我添加了附加条件将 RUN_TIME_IN_MINS 转换为float,因为原始脚本在 RUN_TIME_IN_MINS 字段中指出了错误 . mapper [RUN_TIME_IN_MINS] cannot be changed from type [long] to [float]"

POST _reindex?wait_for_completion=false
{
  "source": {
    "remote": {
      "host": "http://esip:15000"
    },
    "index": "task_history"
  },
  "dest": {
    "index": "task_history"
  },
  "script": {
    "inline": "if (ctx._source.THROUGHPUT_ROWS_PER_SEC % 1 != 0) { ctx.op = 'noop' } ctx._source.RUN_TIME_IN_MINS = (float) ctx._source.RUN_TIME_IN_MINS;",
    "lang": "painless"
  }
}

运行时错误

{
  "completed": true,
  "task": {
    "node": "wZOzypYlSayIRlhp9y3lVA",
    "id": 645528,
    "type": "transport",
    "action": "indices:data/write/reindex",
    "status": {
      "total": 18249521,
      "updated": 4691,
      "created": 181721,
      "deleted": 0,
      "batches": 37,
      "version_conflicts": 0,
      "noops": 67076,
      "retries": {
        "bulk": 0,
        "search": 0
      },
      "throttled_millis": 0,
      "requests_per_second": -1,
      "throttled_until_millis": 0
    },
    "description": """
reindex from [host=esip port=15000 query={
  "match_all" : {
    "boost" : 1.0
  }
}][task_history] updated with Script{type=inline, lang='painless', idOrCode='if (ctx._source.THROUGHPUT_ROWS_PER_SEC % 1 != 0) { ctx.op = 'noop' } ctx._source.RUN_TIME_IN_MINS = (float) ctx._source.RUN_TIME_IN_MINS;', options={}, params={}} to [task_history]
""",
    "start_time_in_millis": 1502336063507,
    "running_time_in_nanos": 93094657751,
    "cancellable": true
  },
  "error": {
    "type": "script_exception",
    "reason": "runtime error",
    "script_stack": [],
    "script": "if (ctx._source.THROUGHPUT_ROWS_PER_SEC % 1 != 0) { ctx.op = 'noop' } ctx._source.RUN_TIME_IN_MINS = (float) ctx._source.RUN_TIME_IN_MINS;",
    "lang": "painless",
    "caused_by": {
      "type": "null_pointer_exception",
      "reason": null
    }
  }
}

1 回答

  • 0

    您显然希望使用 long 保留现有的ES 5.x映射,因此您需要做的就是在您的reindex调用中添加一个脚本,将 THROUGHPUT_ROWS_PER_SEC 字段修改为 long . 这样的事情应该做:

    POST _reindex
    {
      "source": {
        "remote": {
          "host": "http://es1host:9200",
        },
        "index": "task_history"
      },
      "dest": {
        "index": "task_history"
      },
      "script": {
        "inline": "if (ctx._source.THROUGHPUT_ROWS_PER_SEC % 1 != 0) { ctx.op = 'noop' }" },
        "lang": "painless"
      }
    }
    

相关问题