首页 文章

弹性搜索转换并将lat lon作为geo_point批量插入

提问于
浏览
0

我有一个简单的 csv 文件,它有4个字段,serial_num,post_code,lat,lon如:

serial_num,post_code,LAT,LON
06AA209365,PE10 2AZ,532342,168459
98A819621,PE10 1AA,532342,168459
07FD490906,PE12 1VV,497882,157983

我需要批量插入 elasticsearch . lat lon字段需要在单个geo_point字段中定义,因此我创建了一个映射,如下所示:

  • index是serial_data

  • 类型是小部件

PUT /serial_data
{
"mappings": {
"widget": {
  "properties": {
    "serial_number": {
      "type": "string"
    },
    "post_code": {
      "type": "string"
    },
    "location": {
      "type": "geo_point"
    }
  }
}

}}

我曾尝试使用 embulk 来插入数据,因为我认为我有一个已定义的映射 . 如果我将lat定义为double或long,那么 embulk 将解析lat,长期进入单个位置,它没有并且我过于乐观 .

我还认为 embulk 有一个批量输入-json插件,但我找不到它 .

Question

如何批量加载这些数据,我们真的很感激 .

1 回答

  • 0

    我使用树过滤插件 .

    • embulk-filter-insert:插入位置列

    • embulk-filter-ruby_proc:组合LAT和LON列

    • embulk-filter-column:删除LAT和LON列

    data.csv

    serial_num,post_code,LAT,LON
    06AA209365,PE10 2AZ,532342,168459
    98A819621,PE10 1AA,532342,168459
    07FD490906,PE12 1VV,497882,157983
    

    conf.yml

    in:
      type: file
      path_prefix: data.csv
      parser:
        charset: UTF-8
        newline: CRLF
        type: csv
        delimiter: ','
        quote: '"'
        escape: '"'
        trim_if_not_quoted: false
        skip_header_lines: 1
        allow_extra_columns: false
        allow_optional_columns: false
        columns:
        - {name: serial_num, type: string}
        - {name: post_code, type: string}
        - {name: lat, type: long}
        - {name: lon, type: long}
    filters:
      - type: insert
        column: 
          location: 
      - type: ruby_proc
        requires:
          - json
        columns:
          - name: location
            proc: |
              ->(_,record) do 
                return { lat: record["lat"], lon: record["lon"] }.to_json.to_s
              end
            skip_nil: false
    
      - type: column
        columns:
          - {name: serial_num}
          - {name: post_code}
          - {name: location}
    
    
    out: {type: stdout}
    

    产量

    +-------------------+------------------+-----------------------------+
    | serial_num:string | post_code:string |             location:string |
    +-------------------+------------------+-----------------------------+
    |        06AA209365 |         PE10 2AZ | {"lat":532342,"lon":168459} |
    |         98A819621 |         PE10 1AA | {"lat":532342,"lon":168459} |
    |        07FD490906 |         PE12 1VV | {"lat":497882,"lon":157983} |
    +-------------------+------------------+-----------------------------+
    

相关问题