Python 3.5 urllib.request 403 Forbidden Error

import urllib.request
import urllib
from bs4 import BeautifulSoup


url = "https://www.brightscope.com/ratings"
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page, "html.parser")

print(soup.title)

I was trying to go to the above site and the code keeps spitting out a 403 Forbidden Error.
Any Ideas?


C:\ Users \ jerem \ AppData \ Local \ Programs \ Python \ Python35-32 \ python.exe“C:/ Users / jerem / PycharmProjects / webscraper / url scraper.py”Traceback(最近一次调用最后一次):文件“ C:/ Users / jerem / PycharmProjects / webscraper / url scraper.py“,第7行,在页面= urllib.request.urlopen(url)文件”C:\ Users \ jerem \ AppData \ Local \ Programs \ Python \ Python35- 32 \ lib \ urllib \ request.py“,第163行,在urlopen中返回opener.open(url,data,timeout)文件”C:\ Users \ jerem \ AppData \ Local \ Programs \ Python \ Python35-32 \ lib \ urllib \ request.py“,第472行,处于打开响应= meth(req,response)文件”C:\ Users \ jerem \ AppData \ Local \ Programs \ Python \ Python35-32 \ lib \ urllib \ request.py“,第582行,在http_response'htt',请求,响应,代码,消息,hdrs)文件“C:\ Users \ jerem \ AppData \ Local \ Programs \ Python \ Python35-32 \ lib \ urllib \ request.py”,行510,错误返回self._call_chain(* args)文件“C:\ Users \ jerem \ AppData \ Local \ Programs \ Python \ Python35-32 \ lib \ urllib \ request.py”,第444行,_call_chain result = func (* args)文件“C:\ Users \ jerem \ AppData \ Local \ Programs \ Python \ Python35-32 \ lib \ urllib \ request.py“,第590行,在http_error_default中引发HTTPError(req.full_url,code,msg,hdrs,fp)urllib.error.HTTPError: HTTP错误403:禁止

回答(1)

2 years ago

import requests
from bs4 import BeautifulSoup


url = "https://www.brightscope.com/ratings"
headers = {'User-Agent':'Mozilla/5.0'}
page = requests.get(url)
soup = BeautifulSoup(page.text, "html.parser")

print(soup.title)

出:

<title>BrightScope Ratings</title>

首先,使用 requests 而不是 urllib .

比,将 headers 添加到 requests ,如果没有,该网站将禁止您,因为默认的 User-Agent 是爬虫,该网站不喜欢 .