作为网络抓取的初学者,我正在学习本教程:https://brennan.io/2016/03/02/logging-in-with-requests/
关于如何刮取需要登录/会话cookie的网站 . 上面表示在POST到登录页面时只需要隐藏的输入 .
所以这是我最终得到的代码:
import requests, lxml.html
URL = 'https://www.mypage.com/login'
#GET PAGE
s = requests.session()
login = s.get(URL)
login_html = lxml.html.fromstring(login.text)
#FIND FORMS
hidden_inputs = login_html.xpath(r'//form//input[@type="hidden"]')
form = {x.attrib["name"]: x.attrib["value"] for x in hidden_inputs}
form['username'] = 'myusername'
form['password'] = 'mypassword123'
#LOGIN
response = s.post('https://www.mypage.com/login', data=form)
但是这不适用于我尝试执行实际登录的页面 . 通过查看Chrome中的检查器,我了解到实际登录(初始POST方法= JsonExecute)如下所示:
General
Request URL:https://www.mypage.com/services/PlainService.svc/JsonExecute
Request Method:POST
Status Code:200 OK
Remote Address:192.192.192.192:443
Response Headers (22)
Request Headers (12)
Request Payload
{method: "login", pageGuid:"ff0e9db6-c0bc-490d-9dd0-593c72689683",...}
Data:{
methodName:"Login"
pageGuid:"ff0e9db6-c0bc-490d-9dd0-593c72689683"
password:"mypassword123"
username:"myusername"
method:"Login"
pageGuid:"ff0e9db6-c0bc-490d-9dd0-593c72689683"
我注意到请求URL指向的东西与我期望的完全不同,我意识到这个教程对我没有帮助 .
我如何模仿这个POST方法,我可以从Chrome中的检查器完全遵循这个方法?