首页 文章

flume load csv文件优于hdfs sink

提问于
浏览
1

我已将Flume源配置为Spooldir类型 . 我有很多 CSV files, .xl3 and .xls ,我希望我的Flume代理将所有文件从spooldir加载到HDFS接收器 . 但水槽代理返回异常

这是我对水槽来源的配置:

agent.sources.s1.type = spooldir
agent.sources.s1.spoolDir = /my-directory
agent.sources.s1.basenameHeader = true
agent.sources.batchSize = 10000

和我的HDFS接收器:

agent.sinks.sk1.type = hdfs 
agent.sinks.sk1.hdfs.path = hdfs://...:8020/user/importflume/%Y/%m/%d/%H 
agent.sinks.sk1.hdfs.filePrefix = %{basename}
agent.sinks.sk1.hdfs.rollSize = 0
agent.sinks.sk1.hdfs.rollCount = 0
agent.sinks.sk1.hdfs.useLocalTimeStamp = true
agent.sinks.sk1.hdfs.batchsize =    10000
agent.sinks.sk1.hdfs.fileType = DataStream
agent.sinks.sk1.serializer = avro_event
agent.sinks.sk1.serializer.compressionCodec = snappy

1 回答

  • 0

    您可以将以下配置用于spool目录 . 只需在以下配置中提供本地文件系统和HDFS位置的路径即可 .

    #Flume Configuration Starts
    # Define a file channel called fileChannel on agent1
    agent1.channels.fileChannel1_1.type = file 
    # on linux FS
    agent1.channels.fileChannel1_1.capacity = 200000
    agent1.channels.fileChannel1_1.transactionCapacity = 1000
    # Define a source for agent1
    agent1.sources.source1_1.type = spooldir
    # on linux FS
    #Spooldir in my case is /home/hadoop/Desktop/flume_sink
    agent1.sources.source1_1.spoolDir = 'path'
    agent1.sources.source1_1.fileHeader = false
    agent1.sources.source1_1.fileSuffix = .COMPLETED
    agent1.sinks.hdfs-sink1_1.type = hdfs
    
    #Sink is /flume_import under hdfs
    
    agent1.sinks.hdfs-sink1_1.hdfs.path = hdfs://'path'
    agent1.sinks.hdfs-sink1_1.hdfs.batchSize = 1000
    agent1.sinks.hdfs-sink1_1.hdfs.rollSize = 268435456
    agent1.sinks.hdfs-sink1_1.hdfs.rollInterval = 0
    agent1.sinks.hdfs-sink1_1.hdfs.rollCount = 50000000
    agent1.sinks.hdfs-sink1_1.hdfs.writeFormat=Text
    
    agent1.sinks.hdfs-sink1_1.hdfs.fileType = DataStream
    agent1.sources.source1_1.channels = fileChannel1_1
    agent1.sinks.hdfs-sink1_1.channel = fileChannel1_1
    
    agent1.sinks =  hdfs-sink1_1
    agent1.sources = source1_1
    agent1.channels = fileChannel1_1
    

    您也可以参考Flume spool dir上的this blog获取更多信息 .

相关问题