我在使用Python 3.4在PyCharm中运行此代码时遇到问题 . 当我将它传递给BeautifulSoup时,变量 html_text 停止运行(我正在使用BeautifulSoup4) .

错误消息是:

UnicodeEncodeError:'charmap'编解码器无法编码位置52793中的字符'\ ufffd':字符映射到<undefined>

为什么会这样,怎么解决?

import urllib.request
from bs4 import BeautifulSoup

url = 'http://nytimes.com'

urls = [url]  # stack of urls
visited = [url]  # already visited urls to avoid revisiting

while len(urls) > 0:
    try:
        html_text = urllib.request.urlopen(urls[0]).read()
    except:
        print(urls[0])
    soup = BeautifulSoup(html_text, 'html5lib')
    urls.pop(0)