首页 文章

尝试在AWS上读取csv时出现内存错误

提问于
浏览
0

当我运行以下代码时,我收到错误:

import os
import boto3
import pandas as pd
import sys

if sys.version_info[0] < 3: 
    from StringIO import StringIO # Python 2.x
else:
    from io import StringIO # Python 3.x

# get your credentials from environment variables
aws_id = 'XX'
aws_secret = 'YY'

client = boto3.client('s3', aws_access_key_id=aws_id,
        aws_secret_access_key=aws_secret)

bucket_name = 'arpbhatnagar'

object_key = 'application_train.csv'
csv_obj = client.get_object(Bucket=bucket_name, Key=object_key)
body = csv_obj['Body']
csv_string = body.read().decode('utf-8')

train = pd.read_csv(StringIO(csv_string))

我收到以下错误:

错误:()中的MemoryError Traceback(最近调用最后一次)()21 csv_obj = client.get_object(Bucket = bucket_name,Key = object_key)22 body = csv_obj ['Body'] ---> 23 csv_string = body.read() .decode('utf-8')24 25 train = pd.read_csv(StringIO(csv_string),low_memory = True,engine ='python')/usr/lib/python2.7/encodings/utf_8.pyc in decode(输入,错误)14 15 def decode(输入,错误='严格'):---> 16返回codecs.utf_8_decode(输入,错误,真)17 18类IncrementalEncoder(codecs.IncrementalEncoder):MemoryError:

1 回答

  • 0

    在下载或摄取 application_train.csv 时,看起来你的内存不足 . 要解决该问题,您可以先将文件下载到磁盘,然后将文件名提供给Pandas:

    tmp_filename = "/tmp/application_train.csv"
    client.download_file(bucket_name, object_key, tmp_filename)
    training_set = pd.read_csv(tmp_filename)
    

相关问题