首页 文章

将数据下沉到hdfs时,配置flume不生成.tmp文件

提问于
浏览
2

我正在使用flume将数据从服务器日志传输到hdfs . 但是当数据流入hdfs时,它首先创建.tmp文件 . 配置中是否有一种方法可以隐藏.tmp文件,或者可以通过附加a来更改名称 . 在前 . 我的收集代理文件看起来像 -

## TARGET AGENT ##
## configuration file location:  /etc/flume/conf
## START Agent: flume-ng agent -c conf -f /etc/flume/conf/flume-trg-agent.conf -n collector

#http://flume.apache.org/FlumeUserGuide.html#avro-source
collector.sources = AvroIn
collector.sources.AvroIn.type = avro
collector.sources.AvroIn.bind = 0.0.0.0
collector.sources.AvroIn.port = 4545
collector.sources.AvroIn.channels = mc1 mc2

## Channels ##
## Source writes to 2 channels, one for each sink
collector.channels = mc1 mc2

#http://flume.apache.org/FlumeUserGuide.html#memory-channel

collector.channels.mc1.type = memory
collector.channels.mc1.capacity = 100

collector.channels.mc2.type = memory
collector.channels.mc2.capacity = 100

## Sinks ##
collector.sinks = LocalOut HadoopOut

## Write copy to Local Filesystem
#http://flume.apache.org/FlumeUserGuide.html#file-roll-sink
#collector.sinks.LocalOut.type = file_roll
#collector.sinks.LocalOut.sink.directory = /var/log/flume
#collector.sinks.LocalOut.sink.rollInterval = 0
#collector.sinks.LocalOut.channel = mc1

## Write to HDFS
#http://flume.apache.org/FlumeUserGuide.html#hdfs-sink

collector.sinks.HadoopOut.type = hdfs
collector.sinks.HadoopOut.channel = mc2
collector.sinks.HadoopOut.hdfs.path = /user/root/flume-channel/%{log_type}
collector.sinks.k1.hdfs.filePrefix = events-
collector.sinks.HadoopOut.hdfs.fileType = DataStream
collector.sinks.HadoopOut.hdfs.writeFormat = Text
collector.sinks.HadoopOut.hdfs.rollSize = 1000000

任何帮助将不胜感激 .

2 回答

  • 2

    设置 hdfs.idleTimeout=x ,其中x是正数

  • 0

    Flume中打开用于写入的所有文件默认都具有.tmp扩展名 . 您可以使用其他扩展名更改此设置 . 但我们无法避免这种延伸 . 此外,还需要区分封闭文件 . 所以最好使用像“ . ”这样的后缀 . 用于打开文件 . Flume HDFS Sink提供了几个参数:

    hdfs.inUsePrefix – Prefix that is used for temporal files that flume actively writes into hdfs.inUseSuffix .tmp Suffix that is used for temporal files that flume actively writes into

    hdfs.inUsePrefix = .

    collector.sinks.HadoopOut.hdfs.inUsePrefix = .

    hdfs.inUseSuffix =如果为空,则使用.tmp,否则使用指定的后缀 .

相关问题