将文件解压缩到s3-Java 学习之路

我正在寻找一种简单的方法来将s3存储桶中存在的zip / gzip提取到同一存储桶位置，并删除提取后的父zip / gzip文件 .

我目前无法使用任何API实现此功能 .

尝试了原生boto，pyfilesystem（fs），s3fs . 源和目标链接似乎是这些功能的问题 .

（与Python 2.x / 3.x和Boto 2.x一起使用）

我看到有一个用于执行此工作的node.js（unzip-to-s3）的API，但是没有用于python的API .

我能想到的几个实现：

一个简单的API，用于在同一个存储桶中提取zip文件 .
使用s3作为文件系统并操纵数据
使用数据管道实现此目的
将zip传输到ec2，解压缩并复制回s3 .

选项4将是最不优选的选项，以最小化ec2插件的架构开销 .

需要支持获得此功能实现，并在稍后阶段与lambda集成 . 任何指向这些实现的指针都非常感谢 .

提前致谢，

孙大信 .

3 回答

0

您可以尝试https://www.cloudzipinc.com/将几种不同格式的存档从S3解压缩/扩展到存储桶中的目标 . 我用它将数字目录的组件解压缩到S3中 .

回复于 2024-05-10T20:38:32+08:00
0

通过使用ec2实例解决了 . 将s3文件复制到ec2中的本地目录，并将该目录复制回S3存储桶 .

回复于 2024-05-10T20:38:32+08:00

示例解压缩到ec2实例中的本地目录

def s3Unzip(srcBucket,dst_dir):  
'''
function to decompress the s3 bucket contents to local machine 

Args:
srcBucket (string): source bucket name 
dst_dir (string): destination location in the local/ec2 local file system

Returns:
None
'''      
#bucket = s3.lookup(bucket)
s3=s3Conn
path=''

bucket = s3.lookup(bucket_name)
for key in bucket:
    path = os.path.join(dst_dir, key.name)
    key.get_contents_to_filename(path)
    if path.endswith('.zip'):
        opener, mode = zipfile.ZipFile, 'r'
    elif path.endswith('.tar.gz') or path.endswith('.tgz'):
        opener, mode = tarfile.open, 'r:gz'
    elif path.endswith('.tar.bz2') or path.endswith('.tbz'):
        opener, mode = tarfile.open, 'r:bz2'
    else: 
        raise ValueError ('unsuppported format')

    try:
        os.mkdir(dst_dir)
        print ("local directories created")
    except Exception:
        logger_s3.warning ("Exception in creating local directories to extract zip file/ folder already existing")    
    cwd = os.getcwd()
    os.chdir(dst_dir)

    try:
        file = opener(path, mode)
        try: file.extractall()
        finally: file.close()
        logger_s3.info('(%s) extracted successfully to %s'%(key ,dst_dir))
    except Exception as e:
        logger_s3.error('failed to extract (%s) to %s'%(key ,dst_dir))
        os.chdir(cwd)   
s3.close

示例代码上传到mysql实例

使用“LOAD DATA LOCAL INFILE”查询直接上传到mysql

def upload(file_path,timeformat):
'''
function to upload a  csv file data to mysql rds 

Args:
file_path (string): local file path
timeformat (string): destination bucket to copy data

Returns:
None    
'''  
for file in file_path:
    try:
        con = connect()
        cursor = con.cursor()

        qry="""LOAD DATA LOCAL INFILE '%s' INTO TABLE xxxx FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' (col1 , col2 ,col3, @datetime , col4 ) set datetime = str_to_date(@datetime,'%s');""" %(file,timeformat)
        cursor.execute(qry)
        con.commit()
        logger_rds.info ("Loading file:"+file)
    except Exception:
        logger_rds.error ("Exception in uploading "+file)
         ##Rollback in case there is any error
        con.rollback()
cursor.close()
# disconnect from server
con.close()

回复于 2024-05-10T20:38:32+08:00

将文件解压缩到s3

3 回答

示例解压缩到ec2实例中的本地目录

示例代码上传到mysql实例

相关问题