如何将带有ETL的CSV中的边缘导入OrientDB图形？-Java 学习之路

我正在尝试将CSV文件中的边缘导入OrientDB . 顶点存储在单独的文件中，并已通过ETL导入OrientDB . 所以我的情况类似于OrientDB import edges only using ETL tool和OrientDB ETL loading CSV with vertices in one file and edges in another .

Update

Friend.csv

"id","client_id","first_name","last_name"
"0","0","John-0","Doe"
"1","1","John-1","Doe"
"2","2","John-2","Doe"
...

友元导入器删除了 "id" 字段，但存储了 "client_id" . 这个想法是让一个已知的客户端生成 id 用于搜索等 .

PeindingFriendship.csv

"friendship_id","client_id","from","to"
"0","0-1","1","0"
"2","0-15","15","0"
"3","0-16","16","0"
...

"friendship_id" 和 "client_id" 应作为 "PendingFriendship" 边的属性导入 . "from" 是朋友的 "client_id" . "to" 是另一位朋友的 "client_id" . 对于 "client_id" ， Friend 和 PendingFriendship 都存在唯一索引 .

我的ETL配置如下所示

...
"extractor": {
  "csv": {
  }
},
"transformers": [
  {
    "command": {
      "command": "CREATE EDGE PendingFriendship FROM (SELECT FROM Friend WHERE client_id = '${input.from}') TO (SELECT FROM Friend WHERE client_id = '${input.to}') SET client_id = '${input.client_id}'",
      "output": "edge"
    }
  },
  {
    "field": {
      "fieldName": "from",
      "expression": "remove"
    }
  },
  {
    "field": {
      "fieldName": "to",
      "operation": "remove"
    }
  },
  {
    "field": {
      "fieldName": "friendship_id",
      "expression": "remove"
    }
  },
  {
    "field": {
      "fieldName": "client_id",
      "operation": "remove"
    }
  },
  {
    "field": {
      "fieldName": "@class",
      "value": "PendingFriendship"
    }
  }
],
...

此配置的问题是它创建了两个边缘条目 . 一个是预期的“PendingFriendship”优势 . 第二个是空的“PendingFriendship”边缘，我删除的所有字段都是空值的属性 . 导入失败，在第二行/文档，因为它无法插入另一个空的“PendingFriendship”，因为它违反了唯一性约束 . 如何避免创建不必要的空“PendingFriendship” . 将边缘导入OrientDB的最佳方法是什么？文档中的所有示例都使用CSV文件，其中顶点和边在一个文件中，但对我来说情况并非如此 .

我也看了一下Edge-Transformer，但它返回的是Vertex而不是Edge！

Created PendingFriendships

1 回答

0
一段时间后，我找到了一种方法（解决方法）将上述数据导入OrientDB . 而不是使用ETL Tool我编写了简单的ruby脚本，它使用Batch endpoints 调用OrientDB的HTTP API .

脚步：
- 导入好友 .
- 使用响应创建 client_ids 到 @rids 的映射 .
- 解析 PeindingFriendship.csv 并构建 batch 个请求 .
- 每个友谊都是由自己的命令创建的 .
- 从2.的映射用于将 @rids 从4插入到命令中 .
- 在1000个命令的junks中发送 batch 个请求 .
批处理请求主体示例：
```
{
  "transaction" : true,
  "operations" : [
    {
      "type" : "cmd",
      "language" : "sql",
      "command" : "create edge PendingFriendship from #27:178 to #27:179 set client_id='4711'"
    }
  ]
}
```
这不是我问的问题的答案，但它解决了将数据导入OrientDB的更高目标 . 因此，我让社区公开将这个问题标记为已解决或未解决 .
回复于 2024-05-14T03:37:48+08:00

如何将带有ETL的CSV中的边缘导入OrientDB图形？

1 回答

相关问题