首页 文章

Python:如何下载zip文件

提问于
浏览
6

我正在尝试使用以下代码下载zip文件:

o = urllib2.build_opener( urllib2.HTTPCookieProcessor() )

#login
p = urllib.urlencode( { usernameField: usernameVal, passField: passVal } )
f = o.open(authUrl,  p )
data = f.read()
print data
f.close()

#download file
f = o.open(remoteFileUrl)
localFile = open(localFile, "wb")
localFile.write(f.read())
f.close()

我收到了一些二进制数据,但文件大小太小而且不是有效的zip文件 . 我没有正确检索zip文件吗? f = o.open(remoteFileUrl) 的HTTP响应头如下所示 . 我不知道是否需要特殊处理来处理这个响应:

HTTP / 1.1 200 OK服务器:Apache-Coyote / 1.1 Pragma:私有缓存控制:必须重新验证过期:星期二,1997年12月31日23:59:59 GMT内容 - 处置:内联;文件名= “files.zip”;内容类型:application / zip Transfer-Encoding:chunked

4 回答

  • 10

    f.read() 不是't necessarily read the whole file, but just a packet of it (which might be the whole file if it'小,但不适用于大文件) .

    你需要像这样循环数据包:

    while 1:
       packet = f.read()
       if not packet:
          break
       localFile.write(packet)
    f.close()
    

    f.read() 返回一个空包,表示您已读取整个文件 .

  • 0

    如果你不介意将整个zip文件读入内存,最快的读写方式如下:

    data  = f.readlines()
    with open(localFile,'wb') as output:
        output.writelines(data)
    

    否则,当您通过网络获取时,要以块的形式进行读写,请执行此操作

    with open(localFile, "wb") as output:
        chunk = f.read()
        while chunk:
            output.write(chunk)
            chunk = f.read()
    

    这有点不太整洁,但避免将整个文件保存在内存中 . 希望能帮助到你 .

  • 1

    这是一个更强大的解决方案,使用urllib2以块的形式下载文件并打印下载状态

    import os
    import urllib2
    import math
    
    def downloadChunks(url):
        """Helper to download large files
            the only arg is a url
           this file will go to a temp directory
           the file will also be downloaded
           in chunks and print out how much remains
        """
    
        baseFile = os.path.basename(url)
    
        #move the file to a more uniq path
        os.umask(0002)
        temp_path = "/tmp/"
        try:
            file = os.path.join(temp_path,baseFile)
    
            req = urllib2.urlopen(url)
            total_size = int(req.info().getheader('Content-Length').strip())
            downloaded = 0
            CHUNK = 256 * 10240
            with open(file, 'wb') as fp:
                while True:
                    chunk = req.read(CHUNK)
                    downloaded += len(chunk)
                    print math.floor( (downloaded / total_size) * 100 )
                    if not chunk: break
                    fp.write(chunk)
        except urllib2.HTTPError, e:
            print "HTTP Error:",e.code , url
            return False
        except urllib2.URLError, e:
            print "URL Error:",e.reason , url
            return False
    
        return file
    
  • 1

    试试这个:

    #download file
    f = o.open(remoteFileUrl)
    
    response = ""
    while 1:
        data = f.read()
        if not data:
            break
        response += data
    
    with open(localFile, "wb") as local_file:
        local_file.write(response)
    

相关问题