首页 文章

通过应用条件解析日志文件

提问于
浏览
1

我有一个调试日志文件,如下所示:

示例文件:

DEBUG: Fri Dec  7 06:49:14 2018:16920 extra text
DEBUG: Fri Dec  7 06:49:14 2018:16920: start <ID>
DEBUG: Fri Dec  7 06:49:14 2018:16920: Final output is "output
output output
output"
DEBUG: extra lines

我想只获取ID和最终输出,如下所示 .

预期产量:

<ID> "output
output output
output"

我想在python或bash中执行此操作 . 任何帮助,将不胜感激 . 谢谢

当前代码仅适用于“最终输出” . 但我也想获取ID,并且应该有一种方法来区分(分隔符)每个ID及其输出 .

stream=open("debuglog.txt","r")
lines=stream.readlines()

flag = 0
for i in lines:
    if "DEBUG:" in i:
        flag = 0
    if "final output is" in i:
        flag = 1
    if flag:
        print(i)

3 回答

  • 0

    示例日志文件:

    DEBUG: Fri Dec  7 06:49:14 2018:16920 extra text
    DEBUG: Fri Dec  7 06:49:14 2018:16920: start 12324
    DEBUG: Fri Dec  7 06:49:14 2018:16920: Final output is "output output output output"
    DEBUG: extra lines
    

    请找到代码 . 另外,我假设您每个ID和输出只有一个实例

    import sys, re
    
    stream=open("log","r")
    lines=stream.readlines()
    
    
    flag_ID = 0
    flag_output = 0
    flag_print = 1
    for i in lines:
        ID = re.match("DEBUG: [\w :]* start (\d+)", i)
        output = re.match("DEBUG: [\w :]* Final output is \"([\w ]*)\"", i)
        if ID:
            flag_ID = 1
            value_ID = ID.group(1)
        if output:
            flag_output = 1 
            value_output = output.group(1)
        if flag_output == 1 and flag_ID == 1 and flag_print == 1:
            print "{0} {1}".format(value_ID, value_output)
            flag_print = 0
    

    产量

    12324 output output output output
    

    如果这可以解决您的问题,请勾选并接受;)

  • 1

    使用Perl,如果文件可以放入内存,你可以使用单行程序 .

    /tmp> cat debug.log
    DEBUG: Fri Dec  7 06:49:14 2018:16920 extra text
    DEBUG: Fri Dec  7 06:49:14 2018:16920: start <ID1>
    DEBUG: Fri Dec  7 06:49:14 2018:16920: Final output is "output
    output output
    output"
    DEBUG: extra lines
    DEBUG: Fri Dec  7 06:49:14 2018:16921 extra text
    DEBUG: Fri Dec  7 06:49:14 2018:16921: start <ID2>
    DEBUG: Fri Dec  7 06:49:14 2018:16921: Final output is "output output output output"
    DEBUG: extra lines
    /tmpl>
    /tmp> perl -0777 -ne ' while(/^DEBUG(.+?)start (\S+).*?DEBUG.+?Final output is \"(.+?)\"/smg) { print "$2 $3\n" } ' debug.log
    <ID1> output
    output output
    output
    <ID2> output output output output
    /tmp>
    
  • 3

    用python,怎么样:

    #!/usr/bin/python
    
    import re
    text = open("logfile", "r").read()
    
    regex = r'start (.+?)$.*?Final output is (.+?)(?:(?=\nDEBUG)|\Z)'
    for m in re.finditer(regex, text, re.MULTILINE|re.DOTALL):
        for i in m.groups():
                print(i.replace('\n', ' '))
    

    输入日志文件:

    DEBUG: Fri Dec  7 06:49:14 2018:16920 extra text
    DEBUG: Fri Dec  7 06:49:14 2018:16920: start <ID>
    DEBUG: Fri Dec  7 06:49:14 2018:16920: Final output is "output
    output output
    output"
    DEBUG: extra lines
    
    DEBUG: Fri Dec  7 06:49:14 2018:16920 extra text
    DEBUG: Fri Dec  7 06:49:14 2018:16920: start <ID2>
    DEBUG: Fri Dec  7 06:49:14 2018:16920: Final output is "output2
    output+ output/
    output2"
    

    并输出:

    <ID>
    "output output output output"
    <ID2>
    "output2 output+ output/ output2"
    
    • 正则表达式中的第一个parens捕获 start 之后和换行符之前的所有字符并将字符串存储到 1st group 中 .

    • 正则表达式中的第二个parens还捕获 Final output is 之后和 DEBUG 之前的任何字符或字符串的结尾并将字符串存储到 2nd group . 由于 re.DOTALL 选项,字符串中可以包含换行符 .

    • 第3个parens是空长锚并且不包含在捕获组中 .

    EDIT

    下面的更新版本为单个ID处理多个“最终输出”,并仅显示每个ID的最后一个输出:

    #!/usr/bin/python
    
    import re
    text = open("logfile", "r").read()
    
    regex = r'start (.+?)$(.+?)(?:(?=DEBUG[^\n]+?start)|\Z)+'
    regex2 = r'Final output is (.+?)(?:(?=\nDEBUG)|\Z)'
    
    for m in re.finditer(regex, text, re.MULTILINE|re.DOTALL):
        print m.group(1)
        m2 = re.finditer(regex2, m.group(2), re.MULTILINE|re.DOTALL)
        print list(m2).pop().group(1).replace('\n', ' ')
    

    输入日志文件:

    DEBUG: Fri Dec  7 06:49:14 2018:16920 extra text
    DEBUG: Fri Dec  7 06:49:14 2018:16920: start <ID1>
    DEBUG: Fri Dec  7 06:49:14 2018:16920: Final output is "output
    output output
    output"
    DEBUG: extra lines
    DEBUG: Fri Dec  7 06:49:14 2018:16920: Final output is "this
    is the last output
    for <ID1>"
    DEBUG: extra lines
    
    DEBUG: Fri Dec  7 06:49:14 2018:16920 extra text
    DEBUG: Fri Dec  7 06:49:14 2018:16920: start <ID2>
    DEBUG: Fri Dec  7 06:49:14 2018:16920: Final output is "output2
    output+ output/
    output2"
    

    并输出:

    <ID1>
    "this is the last output  for <ID1>"
    <ID2>
    "output2 output+ output/ output2"
    

    我把子串的提取分为两个步骤:

    • 提取ID和剩余文本(可能包含额外的字符串) . 这是使用 regex 处理的 .

    • 从上面的"remaining text"中提取"final output"个子串 . 这是使用 regex2 处理的 .

    然后选择最后的“最终输出”并显示 .

相关问题