作为网络抓取的初学者,我正在学习本教程:https://brennan.io/2016/03/02/logging-in-with-requests/

关于如何刮取需要登录/会话cookie的网站 . 上面表示在POST到登录页面时只需要隐藏的输入 .

所以这是我最终得到的代码:

import requests, lxml.html

URL = 'https://www.mypage.com/login'

#GET PAGE
s = requests.session()
login = s.get(URL)
login_html = lxml.html.fromstring(login.text)

#FIND FORMS
hidden_inputs = login_html.xpath(r'//form//input[@type="hidden"]')
form = {x.attrib["name"]: x.attrib["value"] for x in hidden_inputs}
form['username'] = 'myusername'
form['password'] = 'mypassword123'

#LOGIN
response = s.post('https://www.mypage.com/login', data=form)

但是这不适用于我尝试执行实际登录的页面 . 通过查看Chrome中的检查器,我了解到实际登录(初始POST方法= JsonExecute)如下所示:

General

Request URL:https://www.mypage.com/services/PlainService.svc/JsonExecute
Request Method:POST
Status Code:200 OK
Remote Address:192.192.192.192:443

Response Headers (22)

Request Headers (12)

Request Payload

{method: "login", pageGuid:"ff0e9db6-c0bc-490d-9dd0-593c72689683",...}
   Data:{
      methodName:"Login"
      pageGuid:"ff0e9db6-c0bc-490d-9dd0-593c72689683"
      password:"mypassword123"
      username:"myusername"
   method:"Login"
   pageGuid:"ff0e9db6-c0bc-490d-9dd0-593c72689683"

我注意到请求URL指向的东西与我期望的完全不同,我意识到这个教程对我没有帮助 .

我如何模仿这个POST方法,我可以从Chrome中的检查器完全遵循这个方法?