我有三台服务器:

blade1(192.168.112.31),
blade2(192.168.112.32)和
blade3(192.168.112.33) .

在每台服务器上安装kafka_2.11-1.0.0 .
在刀片3(192.168.112.33:2181)上也安装了zookeeper .

我创建了一个主题repl3part5,其中包含以下行:

bin/kafka-topics.sh --zookeeper 192.168.112.33:2181 --create --replication-factor 3 --partitions 5 --topic repl3part5

当我描述主题时,它看起来像这样:

[root@blade1 kafka]# bin/kafka-topics.sh --describe --topic repl3part5 --zookeeper 192.168.112.33:2181

Topic:repl3part5    PartitionCount:5    ReplicationFactor:3 Configs:
    Topic: repl3part5    Partition: 0    Leader: 2    Replicas: 2,3,1    Isr: 2,3,1
    Topic: repl3part5    Partition: 1    Leader: 3    Replicas: 3,1,2    Isr: 3,1,2
    Topic: repl3part5    Partition: 2    Leader: 1    Replicas: 1,2,3    Isr: 1,2,3
    Topic: repl3part5    Partition: 3    Leader: 2    Replicas: 2,1,3    Isr: 2,1,3
    Topic: repl3part5    Partition: 4    Leader: 3    Replicas: 3,2,1    Isr: 3,2,1

我有一个关于这个主题的制作人:

bin/kafka-console-producer.sh --broker-list 192.168.112.31:9092,192.168.112.32:9092,192.168.112.33:9092 --topic repl3part5

和单一消费者:

bin/kafka-console-consumer.sh --bootstrap-server 192.168.112.31:9092,192.168.112.32:9092,192.168.112.33:9092 --topic repl3part5  --consumer-property group.id=zoran_1

生产环境 者发送的每条消息都由消费者收集 .
到现在为止还挺好 .

现在我想测试kafka服务器的故障转移 . 如果我放下刀片3 kafka服务,我会收到消费者警告,但仍会消耗所有生成的消息 .

[2018-01-30 14:30:01,203] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Connection to node 3 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2018-01-30 14:30:01,299] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Connection to node 3 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2018-01-30 14:30:01,475] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Connection to node 3 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)

现在我已经在刀片3上启动了kafka服务,我已经在刀片2服务器上放下了kafka服务 . 消费者现在显示一个警告,但所有生成的消息仍然消耗 .

[2018-01-30 14:31:38,164] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Connection to node 2 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)

现在我已经在刀片2上启动了kafka服务,我已经在刀片1服务器上放下了kafka服务 .

消费者现在显示有关节点1/2147483646的警告,但也显示偏移的异步自动提交...失败:偏移提交失败并带有可重试的异常 . 您应该重试提交偏移量 . 潜在的错误是:null .

[2018-01-30 14:33:16,393] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Connection to node 1 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2018-01-30 14:33:16,469] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Connection to node 2147483646 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2018-01-30 14:33:16,557] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Connection to node 1 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2018-01-30 14:33:16,986] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Connection to node 2147483646 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2018-01-30 14:33:16,991] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Connection to node 1 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2018-01-30 14:33:17,493] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Connection to node 2147483646 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2018-01-30 14:33:17,495] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Connection to node 1 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2018-01-30 14:33:18,002] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Connection to node 2147483646 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2018-01-30 14:33:18,003] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Asynchronous auto-commit of offsets {repl3part5-4=OffsetAndMetadata{offset=18, metadata=''}, repl3part5-3=OffsetAndMetadata{offset=20, metadata=''}, repl3part5-2=OffsetAndMetadata{offset=19, metadata=''}, repl3part5-1=OffsetAndMetadata{offset=20, metadata=''}, repl3part5-0=OffsetAndMetadata{offset=20, metadata=''}} failed: Offset commit failed with a retriable exception. You should retry committing offsets. The underlying error was: null (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2018-01-30 14:33:18,611] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Connection to node 1 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2018-01-30 14:33:18,932] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Connection to node 2147483646 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2018-01-30 14:33:18,933] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Asynchronous auto-commit of offsets {repl3part5-4=OffsetAndMetadata{offset=18, metadata=''}, repl3part5-3=OffsetAndMetadata{offset=20, metadata=''}, repl3part5-2=OffsetAndMetadata{offset=19, metadata=''}, repl3part5-1=OffsetAndMetadata{offset=20, metadata=''}, repl3part5-0=OffsetAndMetadata{offset=20, metadata=''}} failed: Offset commit failed with a retriable exception. You should retry committing offsets. The underlying error was: null (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2018-01-30 14:33:19,977] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Connection to node 2147483646 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2018-01-30 14:33:19,978] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Asynchronous auto-commit of offsets {repl3part5-4=OffsetAndMetadata{offset=18, metadata=''}, repl3part5-3=OffsetAndMetadata{offset=20, metadata=''}, repl3part5-2=OffsetAndMetadata{offset=19, metadata=''}, repl3part5-1=OffsetAndMetadata{offset=20, metadata=''}, repl3part5-0=OffsetAndMetadata{offset=20, metadata=''}} failed: Offset commit failed with a retriable exception. You should retry committing offsets. The underlying error was: null (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2018-01-30 14:33:19,979] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Connection to node 1 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)

我试图通过在所有三个server.properties文件(其中一个在这里https://pastebin.com/Japn0Grk)上添加一个offsets.topic.replication.factor = 2(或3)来解决问题,但没有成功 . 我的想法是主题__consumer_offset没有在整个集群中复制,但看起来并非如此 .

虽然刀片1 kafka服务被关闭主题描述显示如下:

[root@blade1 kafka]# bin/kafka-topics.sh --describe --topic repl3part5 --zookeeper 192.168.112.33:2181

Topic:repl3part5    PartitionCount:5    ReplicationFactor:3 Configs:
    Topic: repl3part5    Partition: 0    Leader: 3    Replicas: 2,3,1    Isr: 3
    Topic: repl3part5    Partition: 1    Leader: 3    Replicas: 3,1,2    Isr: 3
    Topic: repl3part5    Partition: 2    Leader: 3    Replicas: 1,2,3    Isr: 3
    Topic: repl3part5    Partition: 3    Leader: 3    Replicas: 2,1,3    Isr: 3
    Topic: repl3part5    Partition: 4    Leader: 3    Replicas: 3,2,1    Isr: 3

生产环境 者现在显示以下警告,它仍然会在主题上放置消息,但消息只会增加分区上的延迟计数:

[2018-01-30 14:37:21,816] WARN [Producer clientId=console-producer] Connection to node 1 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)

我注意到虽然blade1上的kafka服务还活着,但我可以将任何组合中的刀片2和3放下来,消费者将始终能够使用消息 . 如果刀片1上的kafka服务已关闭,则即使刀片2和刀片3上的kafka服务启动并运行,消费者也无法使用消息 .

在刀片1上启用kafka服务后, 生产环境 者在刀片1上的kafka服务发送时发送的所有消息都被重放,并且消费者终端显示以下内容:

[2018-01-30 14:44:30,817] ERROR [Consumer clientId=consumer-1, groupId=zoran_1] Offset commit failed on partition repl3part5-4 at offset 20: This is not the correct coordinator. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2018-01-30 14:44:30,817] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Asynchronous auto-commit of offsets {repl3part5-4=OffsetAndMetadata{offset=20, metadata=''}, repl3part5-3=OffsetAndMetadata{offset=22, metadata=''}, repl3part5-2=OffsetAndMetadata{offset=20, metadata=''}, repl3part5-1=OffsetAndMetadata{offset=22, metadata=''}, repl3part5-0=OffsetAndMetadata{offset=22, metadata=''}} failed: Offset commit failed with a retriable exception. You should retry committing offsets. The underlying error was: This is not the correct coordinator. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2018-01-30 14:44:31,202] ERROR [Consumer clientId=consumer-1, groupId=zoran_1] Offset commit failed on partition repl3part5-4 at offset 22: This is not the correct coordinator. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2018-01-30 14:44:31,202] WARN [Consumer clientId=consumer-1, groupId=zoran_1] Asynchronous auto-commit of offsets {repl3part5-4=OffsetAndMetadata{offset=22, metadata=''}, repl3part5-3=OffsetAndMetadata{offset=24, metadata=''}, repl3part5-2=OffsetAndMetadata{offset=22, metadata=''}, repl3part5-1=OffsetAndMetadata{offset=24, metadata=''}, repl3part5-0=OffsetAndMetadata{offset=24, metadata=''}} failed: Offset commit failed with a retriable exception. You should retry committing offsets. The underlying error was: This is not the correct coordinator. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)

从现在开始,一切都没有问题或警告,系统功能齐全 .

有人可以向我解释为什么刀片1上的kafka服务器如此重要,为了能够阻止两台服务器中的任何一台(包括刀片1上的kafka服务器)并且能够毫不拖延地使用消息,我有哪些选择?这件事让我抓狂 .

你能帮忙吗?

问候 .