我试图使用Flink kafka连接器0.11,但它在运行作业时不断抛出这个错误 .

java.lang.RuntimeException: Error while confirming checkpoint
    at org.apache.flink.runtime.taskmanager.Task$3.run(Task.java:1260)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.kafka.common.errors.ProducerFencedException: Producer attempted an operation with an old epoch. Either there is a newer producer with the same transactionalId, or the producer's transaction has been expired by the broker.

据我所知,从kafka文档中,事务超时必须大于检查点间隔,但小于代理transaction.max.timeout.ms .

我的群集设置如下:

  • Flink版本1.4.2

  • 应用flink-connector-kafka-0.11_2.11

  • 检查点间隔:5000ms

  • 观察到的端到端检查点时间:2秒

Kafka制作人配置:

transactional.id : tx-kafka-topic1
transaction.timeout.ms : 30000
acks: all
enable.idempotence : true
retries: 3
max.in.flight.requests.per.connection : 1

带服务器配置的Kafka代理(kafka_2.11-1.0.0-cp1.jar):

transaction.max.timeout.ms=120000
transaction.state.log.replication.factor=3

在我看来,间隔彼此不重叠,但作业仍然失败,错误高于上述 . 感谢有人能指出我正确的方向 .