我正在努力批量发布Google Pub / Sub数据以发送给Apache Beam . 这是我的基本代码 .
p.begin()
.apply("Input", PubsubIO.readAvros(CmgData.class).fromTopic("topicname"))
.apply("Transform", ParDo.of(new TransformData()))
.apply("Write", BigQueryIO.writeTableRows()
.to(table)
.withSchema(schema)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND));
p.run().waitUntilFinish();
显然,Apache Beam认为数据是未绑定的,因为它来自订阅,但我想批量处理并发送它 . 有许多不同的项目提到有界如下: - PCollection.IsBounded(https://beam.apache.org/documentation/sdks/javadoc/2.4.0/org/apache/beam/sdk/values/PCollection.IsBounded.html) - 似乎对写入没有影响 .
BoundedReadFromUnboundedSource - (https://beam.apache.org/documentation/sdks/javadoc/2.4.0/org/apache/beam/sdk/io/BoundedReadFromUnboundedSource.html) - 找不到将PCollection转换为有界源的方法,反之亦然 .
BoundedWindow - (https://beam.apache.org/documentation/sdks/javadoc/2.4.0/org/apache/beam/sdk/transforms/windowing/BoundedWindow.html) - 找不到工作用法
Write.Method - (https://beam.apache.org/documentation/sdks/javadoc/2.2.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.Method.html) - 当我尝试使用它时抛出IllegalArgumentException .
有人能指出我如何声明一个对象是有界数据的方向,所以我可以批量处理它而不仅仅是流?
1 回答
有关详细信息,您可以看到我的其他问题BigQuery writeTableRows Always writing to buffer
但是,添加以下三行意味着数据将被绑定: -