首页 文章

使用Netbeans的Stanford NER上的Java堆空间错误

提问于
浏览
0

我使用斯坦福NER来解析一个句子以获得以下标签:tokenize,ssplit,pos,lemma,ner . 我还通过Project-> Properties-> Run-> VM Options将netbeans中的内存增加到 -Xms1600M -Xmx1600M . 我仍然得到Java内存异常 . 我在Windows 7 JDK 1.7版上运行32位java . 这是我的代码

public ArrayList<String> NERTokensRet(String string) {
    ArrayList<String> myArr = new ArrayList<String>();


    props = new Properties();
    props.put("annotators", "tokenize,ssplit,pos,lemma,ner");
    pipeline = new StanfordCoreNLP(props);


    //     String resultString = string.replaceAll("[^\\p{L}\\p{N}]", " ");   
    Annotation annotation = new Annotation(string);
    pipeline.annotate(annotation);
    int j;
    for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class)) {
        List<CoreLabel> tokens = sentence.get(CoreAnnotations.TokensAnnotation.class);

        for (j = 0; j < tokens.size(); j++) {
            CoreLabel token = tokens.get(j);
            myArr.add("[" + token.originalText() + "," + token.tag() + "," + token.beginPosition() + "," + token.endPosition() + "]");
        }
        //System.out.println(myArr);
    }

    return myArr;
}

堆栈跟踪 :

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:45)
at java.lang.StringBuilder.<init>(StringBuilder.java:68)
at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:2998)
at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2819)
at java.io.ObjectInputStream.readString(ObjectInputStream.java:1598)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1319)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
at java.util.HashMap.readObject(HashMap.java:1030)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:974)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
at edu.stanford.nlp.ie.crf.CRFClassifier.loadClassifier(CRFClassifier.java:2255)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1444)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1421)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1500)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.loadClassifier(AbstractSequenceClassifier.java:1487)
at edu.stanford.nlp.ie.crf.CRFClassifier.getClassifier(CRFClassifier.java:2386)
at edu.stanford.nlp.ie.ClassifierCombiner.loadClassifierFromPath(ClassifierCombiner.java:130)
at edu.stanford.nlp.ie.ClassifierCombiner.loadClassifiers(ClassifierCombiner.java:116)
at edu.stanford.nlp.ie.ClassifierCombiner.<init>(ClassifierCombiner.java:98)
at edu.stanford.nlp.ie.NERClassifierCombiner.<init>(NERClassifierCombiner.java:64)
at edu.stanford.nlp.pipeline.StanfordCoreNLP$6.create(StanfordCoreNLP.java:500)

有人可以尽快帮忙吗?

2 回答

  • 0

    堆栈跟踪只是在加载CoreNLP中用于NER的大型模型(特征和权重)时显示java内存不足 . 这些确实使用了相当多的内存,但这仍然是令人惊讶的 . 你没有说什么操作系统,什么JDK版本,无论是32/64位等你正在使用 . 但对于上面的程序(添加了一个主方法并填充了几种类型),在Linux上的Java 7u5(CentOS 5)上,我可以使用-mx700m运行它(使用32位或64位Java - yay压缩哎呀) . 所以我认为1600m对于任何架构/版本都应该足够了 .

    所以,我试试:

    • 运行更多内存,看看是否有变化(如-mx1800m)

    • 如果没有,请确保VM确实获得了您在上面说明的内存量(即使您编写的内容看起来正确) . 例如,尝试打印Runtime.getRuntime() . maxMemory()/ 1024/1024 .

  • 2

    我卸载了所有东西(java和netbeans)并重新安装了所有东西(java和netbeans) . 它仍然无法分配-Xmx1400m但是分配-Xmx1000m并且运行良好 . 谢谢大家的努力 .

相关问题