我在使用Python 3.4在PyCharm中运行此代码时遇到问题 . 当我将它传递给BeautifulSoup时,变量 html_text
停止运行(我正在使用BeautifulSoup4) .
错误消息是:
UnicodeEncodeError:'charmap'编解码器无法编码位置52793中的字符'\ ufffd':字符映射到<undefined>
为什么会这样,怎么解决?
import urllib.request
from bs4 import BeautifulSoup
url = 'http://nytimes.com'
urls = [url] # stack of urls
visited = [url] # already visited urls to avoid revisiting
while len(urls) > 0:
try:
html_text = urllib.request.urlopen(urls[0]).read()
except:
print(urls[0])
soup = BeautifulSoup(html_text, 'html5lib')
urls.pop(0)