首页 文章

Stanford Core NLP - 了解共同参与解决方案

提问于
浏览
14

我在理解上一版斯坦福NLP工具中对coref解析器所做的更改时遇到了一些麻烦 . 作为示例,下面是一个句子和相应的CorefChainAnnotation:

The atom is a basic unit of matter, it consists of a dense central nucleus surrounded by a cloud of negatively charged electrons.

{1=[1 1, 1 2], 5=[1 3], 7=[1 4], 9=[1 5]}

我不确定我理解这些数字的含义 . 查看源代码也没有任何帮助 .

谢谢

3 回答

  • 8

    我一直在使用coreference依赖图,我开始使用这个问题的另一个答案 . 过了一会儿,虽然我意识到上面这个算法并不完全正确 . 它产生的输出甚至不接近我的修改版本 .

    对于使用这篇文章的任何人来说,这里是我最终得到的算法,它也过滤掉了自引用,因为每个代表性的人都会提到自己,很多提到只引用自己 .

    Map<Integer, CorefChain> coref = document.get(CorefChainAnnotation.class);
    
    for(Map.Entry<Integer, CorefChain> entry : coref.entrySet()) {
        CorefChain c = entry.getValue();
    
        //this is because it prints out a lot of self references which aren't that useful
        if(c.getCorefMentions().size() <= 1)
            continue;
    
        CorefMention cm = c.getRepresentativeMention();
        String clust = "";
        List<CoreLabel> tks = document.get(SentencesAnnotation.class).get(cm.sentNum-1).get(TokensAnnotation.class);
        for(int i = cm.startIndex-1; i < cm.endIndex-1; i++)
            clust += tks.get(i).get(TextAnnotation.class) + " ";
        clust = clust.trim();
        System.out.println("representative mention: \"" + clust + "\" is mentioned by:");
    
        for(CorefMention m : c.getCorefMentions()){
            String clust2 = "";
            tks = document.get(SentencesAnnotation.class).get(m.sentNum-1).get(TokensAnnotation.class);
            for(int i = m.startIndex-1; i < m.endIndex-1; i++)
                clust2 += tks.get(i).get(TextAnnotation.class) + " ";
            clust2 = clust2.trim();
            //don't need the self mention
            if(clust.equals(clust2))
                continue;
    
            System.out.println("\t" + clust2);
        }
    }
    

    您的例句的最终输出如下:

    representative mention: "a basic unit of matter" is mentioned by:
    The atom
    it
    

    通常“原子”最终成为代表性的提及,但在这种情况下它并不令人惊讶 . 输出稍微更精确的另一个例子是以下句子:

    革命战争发生在18世纪,这是美国的第一次战争 .

    产生以下输出:

    representative mention: "The Revolutionary War" is mentioned by:
    it
    the first war in the United States
    
  • 17

    这些是注释者最近的结果 .

    • [1,1] 1原子

    • [1,2] 1是一个基本的物质单位

    • [1,3] 1它

    • [1,6] 6个带负电荷的电子

    • [1,5] 5带负电的电子 Cloud

    标记如下:

    [Sentence number,'id']  Cluster_no  Text_Associated
    

    属于同一群集的文本指的是相同的上下文 .

  • 0

    第一个数字是一个集群ID(代表标记,代表同一个实体),参见 SieveCoreferenceSystem#coref(Document) 的源代码 . 对数字不在CorefChain#toString()中:

    public String toString(){
        return position.toString();
    }
    

    其中position是一组提到实体的位置对(让他们使用 CorefChain.getCorefMentions() ) . 以下是完整代码的示例(在groovy中),其中显示了如何从位置到令牌:

    class Example {
        public static void main(String[] args) {
            Properties props = new Properties();
            props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
            props.put("dcoref.score", true);
            pipeline = new StanfordCoreNLP(props);
            Annotation document = new Annotation("The atom is a basic unit of matter, it   consists of a dense central nucleus surrounded by a cloud of negatively charged electrons.");
    
            pipeline.annotate(document);
            Map<Integer, CorefChain> graph = document.get(CorefChainAnnotation.class);
    
            println aText
    
            for(Map.Entry<Integer, CorefChain> entry : graph) {
              CorefChain c =   entry.getValue();                
              println "ClusterId: " + entry.getKey();
              CorefMention cm = c.getRepresentativeMention();
              println "Representative Mention: " + aText.subSequence(cm.startIndex, cm.endIndex);
    
              List<CorefMention> cms = c.getCorefMentions();
              println  "Mentions:  ";
              cms.each { it -> 
                  print aText.subSequence(it.startIndex, it.endIndex) + "|"; 
              }         
            }
        }
    }
    

    输出(我不明白's'来自哪里):

    The atom is a basic unit of matter, it consists of a dense central nucleus surrounded by a cloud of negatively charged electrons.
    ClusterId: 1
    Representative Mention: he
    Mentions: he|atom |s|
    ClusterId: 6
    Representative Mention:  basic unit 
    Mentions:  basic unit |
    ClusterId: 8
    Representative Mention:  unit 
    Mentions:  unit |
    ClusterId: 10
    Representative Mention: it 
    Mentions: it |
    

相关问题