如何训练stanford LexicalizedParser将新单词识别为名词？-Java 学习之路

我想弄清楚如何训练stanford LexicalizedParser
（edu.stanford.nlp.parser.lexparser.LexicalizedParser）将新名词纳入其词典 .

起初我的目标是采取现有模型并稍微调整一下，而不是从大量的培训示例中创建一个全新的模型 .

这个问题的答案表明这是不可能的> How can I add more tagged words to the Stanford POS-Tagger's trained models?

希望有人在那里可以让我走上正确的轨道，如何做到这一点 .

作为我想要做的具体例子，我说我有“researchgate”这个词，我想在解析句子时将其视为名词 . 目前，'研究门'被视为不同的词性，取决于它的位置..但我希望它被识别为'NN'（名词） .

例子...

而不是这个：

(NP
        (NP (JJ recent) (NN activity))
        (PP (IN in)
          (NP (PRP$ your) (JJ researchgate) (NNS topics)))))

我要这个：

(NP
        (NP (JJ recent) (NN activity))
        (PP (IN in)
          (NP (PRP$ your) (NN researchgate) (NNS topics)))))

而不是这个：

(ROOT
      (FRAG
        (NP (NN subscription))
        (S
          (VP (TO to)
            (VP (VB researchgate))))))

我要这个：

(ROOT
      (NP
        (NP (NN subscription))
        (PP (TO to)
          (NP (NN researchgate)))))

我目前正在使用这个模型：models / edu / stanford / nlp / models / lexparser / englishPCFG.ser.gz

我试过这样做>

java -cp  stanford-parser.jar        
            edu.stanford.nlp.parser.lexparser.LexicalizedParser   -train  /tmp/train.txt

与/tmp/train.txt的竞争对手如下>

(NP
                (NP (JJ recent) (NN activity))
                (PP (IN in)
                  (NP (PRP$ your) (JJ researchgate) (NNS topics)))))

我得到了一堆很有前途的输出，但后来得到了这个错误>

Error. Can't parse test sentence: [This, is, just, a, test, .]

所以我需要提供更多的例子而不仅仅是我在/tmp/train.txt中的例子 .

看看文档，LexicalizedParser似乎有一个很有前景的方法，我正在考虑尝试...>

public static LexicalizedParser getParserFromTreebank(Treebank trainTreebank,
                                                          Treebank secondaryTrainTreebank,
                                                          double weight,
                                                          GrammarCompactor compactor,
                                                          Options op,
                                                          Treebank tuneTreebank,
                                                          List<List<TaggedWord>> extraTaggedWords)

我很想跳进去尝试这个，因为选择正确的选项似乎很棘手 . doco说：
解析器的选项，在训练和测试（解析）时必须是SAME，以便解析器正常工作

所以我可能需要指导如何提取用于edu / stanford / nlp / models / lexparser / englishPCFG.ser.gz的选项或许它是

edu.stanford.nlp.parser.lexparser.EnglishTreebankParserParams  ?

另外，也许我想在我的extraTaggedWords中加入researchgate？

我感觉我在正确的轨道上，但希望在下降到一个老鼠洞之前得到一些建议 .

提前致谢！

chris

1 回答

1

我发布到stanford解析器邮件列表，我收到了John Bauer的回答（谢谢，John！）

John Bauer 2:09 PM（39分钟前）给我，解析器用户不幸的是，你需要从头开始训练 . There is no way to extend a current parser model. 该功能在"the list"上，但它让你屏住呼吸...... John

回复于 2024-05-17T09:55:31+08:00

如何训练stanford LexicalizedParser将新单词识别为名词？

1 回答

相关问题