令牌化Python源文件时的令牌名称和关键字（在Python中）-Java 学习之路

在this answer之后，我试图获取Python源文件的所有令牌信息（即令牌的确切名称，值和位置），如下所示 .

# Python source file
import os

class Test():
    """
    This class holds latitude, longitude, depth and magnitude data.
    """

    def __init__(self, latitude, longitude, depth, magnitude):
        self.latitude = latitude
        self.longitude = longitude
        self.depth = depth
        self.magnitude = magnitude

    def __str__(self):
        # -1 is for detection of missing data
        depth = self.depth
        if depth == -1:
            depth = 'unknown'

        magnitude = self.magnitude
        if magnitude == -1:
            depth = 'unknown'

        return "M{0}, {1} km, lat {2}\N{DEGREE SIGN} lon {3}\N{DEGREE SIGN}".format(magnitude, depth, self.latitude, self.longitude)

那么， ideally ，我希望我的输出看起来像：

line = 15，column = 9，value ='def'，token ='method for method'
line = 18，column = 13，value ='if'，token ='if statement'

我试试这个

with open(file, 'rb') as f:
    for t in tokenize.tokenize(f.readline):
        print(t.type, t.exact_type)

我得到了这个输出

当我尝试这个

with open(file, 'rb') as f:
    for toknum, tokval, spos, epos, line in tokenize.tokenize(f.readline):
        print(toknum, tokval, spos)

我得到了这个输出

59 utf-8 (0, 0)
57 # Python source file (1, 4)
58 
 (1, 24)
5      (2, 0)
1 import (2, 4)
1 os (2, 11)
4 
 (2, 13)
58 
 (3, 4)
1 class (4, 4)
1 Test (4, 10)
53 ( (4, 14)
53 ) (4, 15)
53 : (4, 16)
(...)

我有兴趣获得文档中提到的 exact_type of the token ;即 name of the token . 到目前为止，我只能看到它，如果在我的第一个例子中我打印整个元组 t .

任何想法如何实现这一目标？

另外，我在网上找不到任何相关的有用示例代码 . 我可以研究用于在Python中解析/标记Python源文件的任何链接/在线资料？

任何简单易用的代码示例都将不胜感激 . 此外，如果您知道有用的在线资料，其中包含 tokenize 模块的示例/解释及其方法，那将是很棒的 .

1 回答

1
我相信你感兴趣的东西可以通过A bstract Syntax Trees 来实现;遍历它们 - 特别是那些AST的叶节点，即 tokens （参见token模块），它代表您正在寻找的所有值和关键字 .

因此，您应该查看ast和parser模块，而不是 tokenize 模块 . 另外，正如文档中所述，Green Tree Snakes在AST的主题上有一些非常好的材料 .

为了帮助您入门，这里有一个小例子，它读取python源文件，通过 ast 解析它并提取类的名称：
```
import tokenize
import ast

with tokenize.open(source) as sf:  # need the tokenize.open for source files and not a string
    source_file_contents = sf.read()

module = ast.parse(source_file_contents)

class_definitions = []

for node in module.body:
    if isinstance(node, ast.ClassDef):
        class_defs.append(node)
            class_definitions.append(node)

print([class_definition.name for class_definition in class_definitions])
```
最后，如果你想进入 visiting nodes of an AST 的方向（即通过 ast.NodeVisitor ），那么你可以在stackoverflow中找到好的资源，比如this或this .
回复于 2024-05-19T09:42:36+08:00

令牌化Python源文件时的令牌名称和关键字（在Python中）

1 回答

相关问题