来自Stanford CoreNLP解析器的树结构-Java 学习之路

我正在尝试运行StanfordCoreNLP解析器，我有以下代码：

from pycorenlp import StanfordCoreNLP

nlp = StanfordCoreNLP('http://localhost:9000')

def depparse(text):
    parsed=""
    output = nlp.annotate(text, properties={
      'annotators': 'depparse',
      'outputFormat': 'json'
      })

    for i in output["sentences"]:
        for j in i["basicDependencies"]:
            parsed=parsed+str(j["dep"]+'('+ j["governorGloss"]+' ')+str(j["dependentGloss"]+')'+' ')
        return parsed
text='I shot an elephant in my sleep'
depparse(text)

这给了我输出： 'ROOT(ROOT shot) nsubj(shot I) det(elephant an) dobj(shot elephant) case(sleep in) nmod:poss(sleep my) nmod(shot sleep) '

要将关系转换为树，我遇到了一个stackoverflow post Stanford NLP parse tree format . 但是，解析器的输出位于"bracketed parse (tree)"中 . 因此，我不知道如何实现它 . 我也尝试更改outputformat，但它会出错 .

我也找到了Python - Generate a dictionary(tree) from a list of tuples并实施了

list_of_tuples = [('ROOT','ROOT', 'shot'),('nsubj','shot', 'I'),('det','elephant', 'an'),('dobj','shot', 'elephant'),('case','sleep', 'in'),('nmod:poss','sleep', 'my'),('nmod','shot', 'sleep')]

nodes={}

for i in list_of_tuples:
    rel,parent,child=i
    nodes[child]={'Name':child,'Relationship':rel}

forest=[]

for i in list_of_tuples:
    rel,parent,child=i
    node=nodes[child]

    if parent=='ROOT':# this should be the Root Node
            forest.append(node)
    else:
        parent=nodes[parent]
        if not 'children' in parent:
            parent['children']=[]
        children=parent['children']
        children.append(node)

print forest

我得到以下输出 [{'Name': 'shot', 'Relationship': 'ROOT', 'children': [{'Name': 'I', 'Relationship': 'nsubj'}, {'Name': 'elephant', 'Relationship': 'dobj', 'children': [{'Name': 'an', 'Relationship': 'det'}]}, {'Name': 'sleep', 'Relationship': 'nmod', 'children': [{'Name': 'in', 'Relationship': 'case'}, {'Name': 'my', 'Relationship': 'nmod:poss'}]}]}]

1 回答

确实有点偏离主题（这不是你原来问题的答案，而是你最后的评论） . 将其作为答案发布，因为代码不能很好地适合评论 . 但是通过稍微改变你的depparse函数，你可以得到所需的格式：

def depparse(text):
parsed=""
output = nlp.annotate(text, properties={
  'annotators': 'depparse',
  'outputFormat': 'json'
  })
for i in output['sentences']: # not sure if there can be multiple items here. If so, it just returns the first one currently.
    return [tuple((dep['dep'], dep['governorGloss'], dep['dependentGloss'])) for dep in i['basicDependencies']]

回复于 2024-05-06T17:34:01+08:00

来自Stanford CoreNLP解析器的树结构

1 回答

相关问题