首页 文章

无法将数据从水槽输入hdfs hadoop以获取日志

提问于
浏览
0

我正在使用以下配置从日志文件中将数据推送到hdfs .

agent.channels.memory-channel.type = memory
agent.channels.memory-channel.capacity=5000
agent.sources.tail-source.type = exec
agent.sources.tail-source.command = tail -F /home/training/Downloads/log.txt
agent.sources.tail-source.channels = memory-channel
agent.sinks.log-sink.channel = memory-channel
agent.sinks.log-sink.type = logger
agent.sinks.hdfs-sink.channel = memory-channel
agent.sinks.hdfs-sink.type = hdfs
agent.sinks.hdfs-sink.batchSize=10
agent.sinks.hdfs-sink.hdfs.path = hdfs://localhost:8020/user/flume/data/log.txt
agent.sinks.hdfs-sink.hdfs.fileType = DataStream
agent.sinks.hdfs-sink.hdfs.writeFormat = Text
agent.channels = memory-channel
agent.sources = tail-source
agent.sinks = log-sink hdfs-sink
agent.channels = memory-channel
agent.sources = tail-source
agent.sinks = log-sink hdfs-sink

我没有收到错误消息,但仍然无法找到hdfs中的输出 . 在打断我可以看到接收器中断异常和该日志文件的一些数据 . 我正在运行以下命令:flume-ng agent --conf / etc / flume-ng / conf / --conf-file /etc/flume-ng/conf/flume.conf -Dflume.root.logger = DEBUG,console - n代理人;

3 回答

  • 0

    i had a similar issue

    在我的情况下现在它正在工作下面是conf文件:

    #Exec Source
    execAgent.sources=e
    execAgent.channels=memchannel
    execAgent.sinks=HDFS
    #channels
    execAgent.channels.memchannel.type=file
    execAgent.channels.memchannel.capacity = 20000
    execAgent.channels.memchannel.transactionCapacity = 1000
    #Define Source
    execAgent.sources.e.type=org.apache.flume.source.ExecSource
    execAgent.sources.e.channels=memchannel
    execAgent.sources.e.shell=/bin/bash -c
    execAgent.sources.e.fileHeader=false
    execAgent.sources.e.fileSuffix=.txt
    execAgent.sources.e.command=cat /home/sample.txt
    #Define Sink
    execAgent.sinks.HDFS.type=hdfs
    execAgent.sinks.HDFS.hdfs.path=hdfs://localhost:8020/user/flume/
    execAgent.sinks.HDFS.hdfs.fileType=DataStream
    execAgent.sinks.HDFS.hdfs.writeFormat=Text
    execAgent.sinks.HDFS.hdfs.batchSize=1000
    execAgent.sinks.HDFS.hdfs.rollSize=268435
    execAgent.sinks.HDFS.hdfs.rollInterval=0
    #Bind Source Sink Channel
    execAgent.sources.e.channels=memchannel
    execAgent.sinks.HDFS.channel=memchannel
    `
    

    我希望这可以帮到你 .

  • 1

    我建议在HDFS中放置文件时使用前缀配置:

    agent.sinks.hdfs-sink.hdfs.filePrefix = log.out

  • 0

    @bhavesh - 您确定,日志文件(agent.sources.tail-source.command = tail -F /home/training/Downloads/log.txt)会不断追加数据吗?由于您已使用带-F的Tail命令,因此只有已更改的数据(在文件中)将被转储到HDFS中

相关问题