如何使用ANTLRv4仅解析一些注释-Java 学习之路

我使用ANTLRv4开发应用程序分析Java源代码 . 我声称将所有单行注释与第一个令牌 TODO （例如 // TODO <some-comment> ）以及直接跟随声明匹配 .

Sample code:

class Simple {
    public static void main(String[] args) {
        // TODO develop cycle
        for (int i = 0; i < 5; i++) {
            // unmatched comment
            System.out.println("hello");
        }
        // TODO atomic
        int a;

        // TODO revision required
        {
            int b = a+4;
            System.out.println(b);
        }
    }
}

Result = map 这样：

"develop cycle" -> for(...){...}
"atomic" -> int a
"revision required" -> {...}

关于stackoverflow的official book (1)和类似主题（(2)，(3)，(4)，(5)，(6)）我尝试了几种方法 .

起初，我希望在(1)和(2)中描述特殊的COMMENTS Channels ，但发生错误 rule 'LINE_COMMENT' contains a lexer command with an unrecognized constant value; lexer interpreters may produce incorrect output .

我想以一种忽略所有单行注释但从 TODO 开始的方式解析源代码会更好 . 我希望可以将todo-comments直接添加到AST中以便使用侦听器/ walker . 我只需要注册监听器/ walker for TODO注释并提取以下语句，将两者都添加到所需的 map .

我已经修改了官方Java8 gammar两天但没有任何成功 . 编译器抱怨或AST错误处理 .

这是我做的更新：

// ...
COMMENT
    :   '/*' .*? '*/' -> skip
    ;

TODO_COMMENT
    :   '// TODO' ~[\r\n]*
    ;

LINE_COMMENT
    :   '//' ~[\r\n]* -> skip
    ;

有人可以帮我吗？语法不是我的一杯茶 . 提前致谢

EDIT1:

上面发布的语法修改符合条件且没有错误，但会生成以下树（请注意红色标记的节点，包括 int ）

error AST

EDIT2:

假设上面的代码示例，则会生成跟随错误的 parser.compilationUnit();

line 3:2 extraneous input '// TODO develop cycle;' expecting {'abstract', 'assert', 'boolean', 'break', 'byte', 'char', 'class', 'continue', 'do', 'double', 'enum', 'final', 'float', 'for', 'if', 'int', 'interface', 'long', 'new', 'private', 'protected', 'public', 'return', 'short', 'static', 'strictfp', 'super', 'switch', 'synchronized', 'this', 'throw', 'try', 'void', 'while', IntegerLiteral, FloatingPointLiteral, BooleanLiteral, CharacterLiteral, StringLiteral, 'null', '(', '{', '}', ';', '<', '!', '~', '++', '--', '+', '-', Identifier, '@'}
line 8:2 extraneous input '// TODO atomic;' expecting {'abstract', 'assert', 'boolean', 'break', 'byte', 'char', 'class', 'continue', 'do', 'double', 'enum', 'final', 'float', 'for', 'if', 'int', 'interface', 'long', 'new', 'private', 'protected', 'public', 'return', 'short', 'static', 'strictfp', 'super', 'switch', 'synchronized', 'this', 'throw', 'try', 'void', 'while', IntegerLiteral, FloatingPointLiteral, BooleanLiteral, CharacterLiteral, StringLiteral, 'null', '(', '{', '}', ';', '<', '!', '~', '++', '--', '+', '-', Identifier, '@'}
line 11:2 extraneous input '// TODO revision required;' expecting {'abstract', 'assert', 'boolean', 'break', 'byte', 'char', 'class', 'continue', 'do', 'double', 'enum', 'final', 'float', 'for', 'if', 'int', 'interface', 'long', 'new', 'private', 'protected', 'public', 'return', 'short', 'static', 'strictfp', 'super', 'switch', 'synchronized', 'this', 'throw', 'try', 'void', 'while', IntegerLiteral, FloatingPointLiteral, BooleanLiteral, CharacterLiteral, StringLiteral, 'null', '(', '{', '}', ';', '<', '!', '~', '++', '--', '+', '-', Identifier, '@'}

因此，语法很明显，因为它与简单的例子斗争

1 回答

2
原因是你不希望你的任何解析器规则中的特殊注释，即没有解析器匹配它 .

你可以（至少）做以下事情：
- 添加可选 TODO_COMMENT ？在每个解析器规则前面 .
- 将 TODO_COMMENT 令牌添加到单独的 Channels ，例如 ToDoCommentChannel （不要忘记为此通道定义常量！）并选择树行走中注释后面的每个构造 .
我会做什么的粗略轮廓：
- 为 TODO_COMMENT 使用单独的通道 .
- lex并像往常一样解析
- 从令牌流中获取所有令牌并找到所需通道的令牌，并在默认通道上获取以下令牌并将其存储在列表中 .
- 如果起始令牌在列表中，则执行解析并检查每个输入的规则 . 如果是，请将规则文本复制到结果列表中，否则递归（如果 TODO_COMMENT 可以嵌套，甚至可以在起始标记位于列表中时递归） .
更新：

关于 rule 'LINE_COMMENT' contains a lexer command with an unrecognized constant value; lexer interpreters may produce incorrect output 错误：

这可以忽略，因为它只影响Antlrworks2或插件使用的解释器 . 你也可以这样做：
```
//Instead of
TODO_COMMENT
    :   '// TODO' ~[\r\n]*  -> channel(ToDoCommentChannel)
    ;    

// do this (assuming the channel value is indeed 42):
TODO_COMMENT
    :   '// TODO' ~[\r\n]*  -> channel(42 /*ToDoCommentChannel*/)
    ;
```
这将在Antlrworks2和代码中工作（您仍然可以在java运行时代码中使用通道的常量值） .
回复于 2024-04-25T06:36:39+08:00

如何使用ANTLRv4仅解析一些注释

1 回答

相关问题