urllib.request在unicode字符串的urlopen上失败-Java 学习之路

from urllib.request import Request, urlopen, urlretrieve
from bs4 import BeautifulSoup
def save_picture(self, word):
    search_string = "https://www.google.nl/search?q={}&tbm=isch&tbs=isz:m".format(word)

    request = Request(search_string, headers={'User-Agent': 'Mozilla/5.0'})
    raw_website = urlopen(request).read()

    soup = BeautifulSoup(raw_website, "html.parser")
    image = soup.find("img").get("src")

    urlretrieve(image, "{}.jpg".format(word))

我编写了上面的函数来保存Google Images中的第一个tumbnail图像 . 然而问题是，当我输入一个非ansii字时，它会失败，例如：mañana

错误消息源自urllib模块 . 我正在使用python 3.6

回溯（最近一次调用最后一次）：文件“c：\ users \ xxx \ Desktop \ script.py”，第19行，在main（）文件“c：\ users \ xxx \ Desktop \ script.py”中，第16行，在main save_picture（“mañana”）文件“c：\ users \ xxx \ Desktop \ script.py”，第8行，在save_picture中raw_website = urlopen（request）.read（）文件“C：\ Users \ xxx \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ urllib \ request.py“，第223行，在urlopen中返回opener.open（url，data，timeout）文件”C：\ Users \ xxx \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ urllib \ request.py“，第526行，处于打开响应= self._open（req，data）文件”C：\ Users \ xxx \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ urllib \ request.py“，第544行，在_open'_open'，req）文件”C：\ Users \ xxx \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ urllib \ request.py“，第504行，在_call_chain结果中= func（* args）文件“C：\ Users \ xxx \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ urllib \ request.py”，第1361行，位于https_open context = self._context，check_hostname = self._check_hostname ）文件“C：\ Users \ xxx \ AppData \ Local \ Progra ms \ Python \ Python36 \ lib \ urllib \ request.py“，第1318行，在do_open encode_chunked = req.has_header（'Transfer-encoding'）中）文件”C：\ Users \ xxx \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ http \ client.py“，第1239行，请求self._send_request（方法，url，正文， Headers ，encode_chunked）文件”C：\ Users \ xxx \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ http \ client.py“，第1250行，在_send_request self.putrequest（方法，网址，**跳过）文件”C：\ Users \ xxx \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ http \ client . py“，第1117行，在putrequest中self._output（request.encode（'ascii'））UnicodeEncodeError：'ascii'编解码器无法对位置16中的字符'\ xf1'进行编码：序数不在范围内（128）

edit: 读完后我发现这个任务有几个库，urllib，urllib2和requests（以及pip：urllib3） . 我收到此错误是因为我使用的是折旧库吗？

edit2: 添加了完整的追溯

1 回答

import requests
import mimetypes
from bs4 import BeautifulSoup

def save_picture(self, word):
    search_string = "https://www.google.nl/search?q={}&tbm=isch&tbs=isz:m".format(word)
    response = requests.get(search_string, headers={'User-Agent': 'Mozilla/5.0'})

    #find the tumbnail for first hit
    soup = BeautifulSoup(response.text, "html.parser")
    image_location = soup.find("img").get("src")

    # download image
    image = requests.get(image_location)
    content_type = image.headers.get('content-type')
    ext = mimetypes.guess_extension(content_type)

    with open(f"{word}{ext}", 'wb') as fd:
        for chunk in image.iter_content(chunk_size=128):
            fd.write(chunk)

我使用请求重写了函数，它按预期处理unicode字符串 . 但保存文件有点冗长

回复于 2024-04-19T07:09:56+08:00

urllib.request在unicode字符串的urlopen上失败

1 回答

相关问题