def parse_pdf_dump(file)
file = open(file,&:read)
fields = file.split("---").delete_if{|f| f.empty?}
#Create an Array of the fields
fields.map do |field|
#Create a have of attribute => value for each field attribute
Hash[
field.split("\n").map do |line|
split_line = line.split(":")
#grab the name of the attribute
name = split_line.shift
#grab the value of the attribute
#join is used in the case that the data originally had a : in it
val = split_line.join(":")
unless f_name.nil?
[name.downcase, val.strip]
end
end
]
end
end
1 回答
由于
dump_data_fields
方法具有非常标准化的结构,因此该方法应该适用于您需要它将输出一个Array,每个字段都是一个哈希对象 .使用active_pdftk调用如下
因此,您将使用pdftk转储数据字段,将它们解析为数组
fields_array
,然后删除文本文件 .