斯坦福主题建模工具箱：例外-Java 学习之路

我正在尝试使用斯坦福主题建模工具箱 . 我从这里下载了"tmt-0.4.0.jar" -File：http://nlp.stanford.edu/software/tmt/tmt-0.4/我尝试了一些例子 . 示例0和1工作正常，但尝试示例2（无代码更改），我收到以下异常：

[单元]装载搜索PubMed-OA-subset.csv.term-counts.cache.70108071.gz [并发] 32只允许在异常线程 “螺纹3” java.lang.ArrayIndexOutOfBoundsException：-1在scalanlp.stage.text . TermCounts $ class.getDF（TermFilters.scala：64）at scalanlp.stage.text.TermCounts $$ anon $ 2.getDF（TermFilters.scala：84）at scalanlp.stage.text.TermMinimumDocumentCountFilter $$ anonfun $ apply $ 4 $$ anonfun $ apply $ 5 $$ anonfun $ apply $ 6.apply（TermFilters.scala：172）at scalanlp.stage.text.TermMinimumDocumentCountFilter $$ anonfun $ apply $ 4 $$ anonfun $ apply $ 5 $$ anonfun $ apply $ 6.apply（TermFilters.scala ：172）在scala.collection.Iterator $$匿名$ 22.hasNext（Iterator.scala：390）在scala.collection.Iterator $$匿名$ 22.hasNext（Iterator.scala：388）在scala.collection.Iterator $类 . foreach（Iterator.scala：660）at scala.collection.Iterator $$ anon $ 22.foreach（Iterator.scala：382）at scala.collection.IterableViewLike $ transformed $ class.foreach（IterableViewLike.scala：41）at scala.collection .IterableViewLike $$匿名$ 5.foreach（IterableViewLike.scala ：82）scala.collection.TraversableOnce $ class.size（TraversableOnce.scala：104）at scala.collection.IterableViewLike $$ anon $ 5.size（IterableViewLike.scala：82）at scalanlp.stage.text.DocumentMinimumLengthFilter.filter（ DocumentFilters.scala：31）scalanlp.stage.text.DocumentMinimumLengthFilter.filter（DocumentFilters.scala：28）at scalanlp.stage.generic.Filter $$ anonfun $ apply $ 1.apply（Filter.scala：38）at scalanlp.stage .generic.Filter $$ anonfun $在edu.stanford.nlp.tmt.data.concurrent的scala.collection.Iterator $$ anon $ 22.hasNext（Iterator.scala：390）上申请$ 1.apply（Filter.scala：38） .Concurrent $$ anonfun $ map $ 2.apply（Concurrent.scala：100）at edu.stanford.nlp.tmt.data.concurrent.Concurrent $$ anonfun $ map $ 2.apply（Concurrent.scala：88）at edu.stanford .nlp.tmt.data.concurrent.Concurrent $$匿名$ 4.run（Concurrent.scala：45）

为什么我会收到此异常，以及如何解决此问题？非常感谢你的帮助！

PS：代码与网站示例2中的代码相同：

// Stanford TMT Example 2 - Learning an LDA model
// http://nlp.stanford.edu/software/tmt/0.4/

// tells Scala where to find the TMT classes
import scalanlp.io._;
import scalanlp.stage._;
import scalanlp.stage.text._;
import scalanlp.text.tokenize._;
import scalanlp.pipes.Pipes.global._;

import edu.stanford.nlp.tmt.stage._;
import edu.stanford.nlp.tmt.model.lda._;
import edu.stanford.nlp.tmt.model.llda._;

val source = CSVFile("pubmed-oa-subset.csv") ~> IDColumn(1);

val tokenizer = {
  SimpleEnglishTokenizer() ~>            // tokenize on space and punctuation
  CaseFolder() ~>                        // lowercase everything
  WordsAndNumbersOnlyFilter() ~>         // ignore non-words and non-numbers
  MinimumLengthFilter(3)                 // take terms with >=3 characters
}

val text = {
  source ~>                              // read from the source file
  Column(4) ~>                           // select column containing text
  TokenizeWith(tokenizer) ~>             // tokenize with tokenizer above
  TermCounter() ~>                       // collect counts (needed below)
  TermMinimumDocumentCountFilter(4) ~>   // filter terms in <4 docs
  TermDynamicStopListFilter(30) ~>       // filter out 30 most common terms
  DocumentMinimumLengthFilter(5)         // take only docs with >=5 terms
}

// turn the text into a dataset ready to be used with LDA
val dataset = LDADataset(text);

// define the model parameters
val params = LDAModelParams(numTopics = 30, dataset = dataset,
  topicSmoothing = 0.01, termSmoothing = 0.01);

// Name of the output model folder to generate
val modelPath = file("lda-"+dataset.signature+"-"+params.signature);

// Trains the model: the model (and intermediate models) are written to the
// output folder.  If a partially trained model with the same dataset and
// parameters exists in that folder, training will be resumed.
TrainCVB0LDA(params, dataset, output=modelPath, maxIterations=1000);

// To use the Gibbs sampler for inference, instead use
// TrainGibbsLDA(params, dataset, output=modelPath, maxIterations=1500);

1 回答

1

答案已由该工具的作者发布 . 请看这里 .

当你有一个陈旧的.cache文件时，通常会发生这种情况 - 遗憾的是，错误消息并不是特别有用 . 尝试删除运行文件夹中的缓存并再次运行 .

https://lists.cs.princeton.edu/pipermail/topic-models/2012-July/001979.html

回复于 2024-04-24T05:34:51+08:00

斯坦福主题建模工具箱：例外

1 回答

相关问题