使用列表中的nltk.corpus删除停用词-Java 学习之路

我有一个列表，其中包含评论的所有单独单词的列表，如下所示：

texts = [['fine','for','a','night'],['it','was','good']]

我想删除所有的停用词，使用nltk.corpus包，并将所有没有停用词的单词放回列表中 . 最终结果应该是一个列表，由没有停用词的单词列表组成 . 这是我试过的：

import nltk
nltk.download() # to download stopwords corpus
from nltk.corpus import stopwords
stopwords=stopwords.words('english')
words_reviews=[]

for review in texts:
    wr=[]
    for word in review:
        if word not in stopwords:
            wr.append(word)
        words_reviews.append(wr)

这段代码实际上有效，但现在我得到了错误：AttributeError：'list'对象没有属性'words'，指的是停用词 . 我确保安装了所有包 . 可能是什么问题呢？

1 回答

3
问题是您在代码中重新定义了 stopwords ：
```
from nltk.corpus import stopwords
stopwords=stopwords.words('english')
```
在第一行之后， stopwords 是一个带有 words() 方法的语料库阅读器 . 在第二行之后，它是一个列表 . 继续进行 .

实际上在列表中查找内容非常慢，因此如果使用此方法，您将获得更好的性能：
```
stopwords = set(stopwords.words('english'))
```
回复于 2024-04-29T11:02:53+08:00

使用列表中的nltk.corpus删除停用词

1 回答

相关问题