首页 文章

python 64bit的内存错误

提问于
浏览
-5

我添加了代码和错误消息 . 我已经安装了python3 64bit和anaconda 3.5,当我从文本文件中提取导致2D数组264,549 X21,000的功能时,我面临内存错误 . 我使用的是Windows 10 64位和16GB Ram . 当我检查python版本时,我得到的结果是:Python 3.6.2 | Anaconda,Inc . | (默认情况下,2017年9月19日,08:03:39)[winv上的MSC v.1900 64位(AMD64)]

这是python的问题吗?或阵列无法装入内存?

这是我的错误:

从训练数据中提取弓列表... Traceback(最近一次调用最后一次):文件“tweet_fea_bow.py”,第27行,在train_bow = vect.fit_transform(training).toarray()文件“C:\ ProgramData \ Anaconda3 \ lib \ site-packages \ scipy \ sparse \ compressed.py“,第964行,在toarray中返回self.tocoo(copy = False).toarray(order = order,out = out)文件”C:\ ProgramData \ Anaconda3 \ lib \ site-packages \ scipy \ sparse \ coo.py“,第252行,在toarray中B = self._process_toarray_args(order,out)文件”C:\ ProgramData \ Anaconda3 \ lib \ site-packages \ scipy \ sparse \ base.py“,第1039行,在_process_toarray_args中返回np.zeros(self.shape,dtype = self.dtype,order = order)MemoryError

import sys,os,traceback
import numpy as np
import sklearn
from sklearn.feature_extraction.text import CountVectorizer

print(sys.argv)
if len(sys.argv) == 2:
  print( "Reading data from file " + (sys.argv[1]))
  query_file_name = sys.argv[1] 
  tf_num = int(sys.argv[1])
else:
  print ('Number of arguments = %d, expecting 1 arguments, program     terminated.') % (len(sys.argv) - 1)
sys.exit (1)

training=open('../training_data.txt','r').read().splitlines()
print('extracting bow from training data...')
vect=CountVectorizer(min_df=tf_num, ngram_range=(2,2))
train_bow=vect.fit_transform(training).toarray()
print('training matrix size:',train_bow.shape)
print('writing the training matrix...')
outfile=(path+'../bow_bi_gram_%s.npy'%(tf_num))
try:
   np.save(outfile,train_bow)
except:
   print('error')
   e=sys.exc_info()
   print(e)
print('extracting bow from testing data..')
testing=open('../testing_red_len_remove_@andurl.txt','r')
.read().splitlines()
test_bow=vect.transform(testing).toarray()
print('size of testing matrix:',test_bow.shape)
print('writing the testing matrix...')
outfile=('../testing_bow_bi_gram_%s.npy'%(tf_num))
np.save(outfile,test_bow)

1 回答

  • 0

    只需自己动手:264,549 * 21,000 *一个单元的大小=?

    通常,这些数据必须按块进行处理 .

相关问题