我有2个Kafkas由3个ZK节点支持 . 我想通过在每个节点上本地运行kafka-console-producer和-consumer来测试Kafka节点 .

所以我使用2个不同的终端通过SSH连接到我的一个Kafka经纪人 . 在终端#1中我像这样运行消费者:

/opt/kafka/bin/kafka-console-consumer.sh --zookeeper a.b.c.d:2181 --topic test1

其中a.b.c.d是我的3个ZK节点之一的私有IP .

然后在终端#2中我像这样运行 生产环境 者:

/opt/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test1

我能够很好地启动消费者和 生产环境 者,没有任何问题 .

但是,在 生产环境 者终端中,如果我通过输入一些文本(例如“hello”)并按下ENTER键在test1主题上“触发”一条消息,我立即开始看到:

[2017-01-17 19:45:57,353] WARN Error while fetching metadata with correlation id 0 : {test1=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
[2017-01-17 19:45:57,372] WARN Error while fetching metadata with correlation id 1 : {test1=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
[2017-01-17 19:45:57,477] WARN Error while fetching metadata with correlation id 2 : {test1=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
[2017-01-17 19:45:57,582] WARN Error while fetching metadata with correlation id 3 : {test1=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)
...and it keeps going!

而且,在消费者终端中,即使我在启动消费者时没有出现任何错误,大约30秒后我收到以下警告消息:

[2017-01-17 19:46:07,292] WARN Fetching topic metadata with correlation id 1 for topics [Set(test1)] from broker [BrokerEndPoint(1,ip-x-y-z-w.ec2.internal,9092)] failed (kafka.client.ClientUtils$)
java.nio.channels.ClosedChannelException
at kafka.network.BlockingChannel.send(BlockingChannel.scala:110)
at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:80)
at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:79)
at kafka.producer.SyncProducer.send(SyncProducer.scala:124)
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:59)
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:94)
at kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:66)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)

有趣的是,ip-x-y-z-w.ec2.internal是其他Kafka经纪人的私有DNS,所以也许这是在interbroker通信期间的某种失败?

关于这里发生了什么以及我可以采取哪些措施进行故障排除的任何想法?


更新

这是我的两个Kafkas节点的整个 server.properties 文件:

listeners=PLAINTEXT://0.0.0.0:9092
advertised.host.name=<private-aws-ec2-ip-addr>.ec2.internal
advertised.listeners=PLAINTEXT://0.0.0.0:9092
broker.id=1
port=9092
num.partitions=4
zookeeper.connect=zkA:2181,zkB:2181,zkC:2181
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
log.dirs=/tmp/kafka-logs
num.recovery.threads.per.data.dir=1
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connection.timeout.ms=6000
offset.metadata.max.bytes=4096

如果有任何看起来像配置气味,请告诉我 .