首页 文章

Scrapy FormRequest登录无效

提问于
浏览
1

我正在尝试使用Scrapy登录,但我收到了大量“重定向(302)”消息 . 当我使用我的真实登录以及假登录信息时会发生这种情况 . 我也尝试过另一个网站但仍然没有运气 .

import scrapy
from scrapy.http import FormRequest, Request

class LoginSpider(scrapy.Spider):
    name = 'SOlogin'
    allowed_domains = ['stackoverflow.com']

    login_url = 'https://stackoverflow.com/users/login?ssrc=head&returnurl=http%3a%2f%2fstackoverflow.com%2f'
    test_url = 'http://stackoverflow.com/questions/ask'

    def start_requests(self):
        yield Request(url=self.login_url, callback=self.parse_login)

    def parse_login(self, response):
        return FormRequest.from_response(response, formdata={"email": "XXXXX", "password": "XXXXX"}, callback=self.start_crawl)

    def start_crawl(self, response):
       yield Request(self.test_url, callback=self.parse_item)

    def parse_item(self, response):
        print("Test URL " + response.url)

我也尝试过添加

meta = {'dont_redirect': True, 'handle_httpstatus_list':[302]}

到初始请求和FormRequest .

这是上面代码的输出:

2017-04-17 21:48:17 [scrapy.utils.log]信息:Scrapy 1.3.3启动(机器人:stackoverflow)2017-04-17 21:48:17 [scrapy.utils.log]信息:重写设置:{'BOT_NAME':'stackoverflow','NEWSPIDER_MODULE':'stackoverflow.spiders','SPIDER_MODULES':['stackoverflow.spiders'],'USER_AGENT':'Mozilla / 5.0'} 2017-04-17 21: 48:17 [scrapy.middleware]信息:启用扩展:['scrapy.extensions.corestats.CoreStats','scrapy.extensions.telnet.TelnetConsole','scrapy.extensions.logstats.LogStats'] 2017-04-17 21 :48:17 [scrapy.middleware]信息:启用下载中间件:['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware','scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware','scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware','scrapy.downloadermiddlewares .useragent.UserAgentMiddleware','scrapy.downloadermiddlewares.retry.RetryMiddleware','scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware','scrapy.downloadermiddlewares.httpcompressi on.HttpCompressionMiddleware','scrapy.downloadermiddlewares.redirect.RedirectMiddleware','scrapy.downloadermiddlewares.cookies.CookiesMiddleware','scrapy.downloadermiddlewares.stats.DownloaderStats'] 2017-04-17 21:48:17 [scrapy.middleware]信息:启用蜘蛛中间件:['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware','scrapy.spidermiddlewares.offsite.OffsiteMiddleware','scrapy.spidermiddlewares.referer.RefererMiddleware','scrapy.spidermiddlewares.urllength.UrlLengthMiddleware','scrapy . spidermiddlewares.depth.DepthMiddleware'] 2017-04-17 21:48:17 [scrapy.middleware]信息:启用项目管道:[] 2017-04-17 21:48:17 [scrapy.core.engine]信息:蜘蛛打开2017-04-17 21:48:17 [scrapy.extensions.logstats]信息:抓0页(0页/分),刮0项(0件/分)2017-04-17 21:48: 17 [scrapy.extensions.telnet] DEBUG:telnet控制台监听127.0.0.1:6023 2017-04-17 21:48:18 [scrapy.core.engine] DEBUG:Crawled(200)https://stackoverflow.com/用户/升ogin?ssrc = head&returnurl = http%3a%2f%2fstackoverflow.com%2f>(referer:None)2017-04-17 21:48:18 [scrapy.core.engine] DEBUG:Crawled(200)https:// stackoverflow.com/search?q=&email=XXXXX&password=XXXXX>(referer:https://stackoverflow.com/users/login?cosrc = head &returnurl = http%3a%2f%2fstackoverflow.com%2f)2017-04-17 21:48:19 [scrapy.downloadermiddlewares.redirect] DEBUG:重定向(302)到http://stackoverflow.com/users/login?ssrc=anon_ask&returnurl=http%3a%2f%2fstackoverflow.com%2fquestions%2fask>来自http://stackoverflow.com/questions/ask> 2017-04-17 21:48:19 [scrapy.downloadermiddlewares.redirect] DEBUG:重定向(302)到https://stackoverflow.com/users/login?ssrc= anon_ask&returnurl = http%3a%2f%2fstackoverflow.com%2fquestions%2fask>来自http://stackoverflow.com/users/login?ssrc=anon_ask&returnurl=http%3a%2f%2fstackoverflow.com%2fquestions%2fask> 2017-04 -17 21:48:19 [scrapy.core.engine] DEBUG:Crawled(200)https://stackoverflow.com/users/login?ssrc=anon_ask&returnurl=http%3a%2f% 2fstackoverflow.com%2fquestions%2fask>(referer:https://stackoverflow.com/search?q=&email=XXXXX&password=XXXXX)测试网址https://stackoverflow.com/users/login?ssrc=anon_ask&returnurl=http%3a %2f%2fstackoverflow.com%2fquestions%2fask 2017-04-17 21:48:19 [scrapy.core.engine]信息:关闭蜘蛛(已完成)2017-04-17 21:48:19 [scrapy.statscollectors] INFO :倾倒Scrapy统计数据:{'downloader / request_bytes':1772,'downloader / request_count':5,'downloader / request_method_count / GET':5,'downloader / response_bytes':34543,'downloader / response_count':5,'downloader / response_status_count / 200':3,'downloader / response_status_count / 302':2,'finish_reason':'完成','finish_time':datetime.datetime(2017年,4,17,18,48,19,470354),' log_count / DEBUG':6,'log_count / INFO':7,'request_depth_max':2,'response_received_count':3,'scheduler / dequeued':5,'scheduler / dequeued / memory':5,'scheduler / enqueued' :5,'scheduler / enqueued / memory':5,'start_time':datetime.datetime(2017,4,17,18,48, 17,386516)} 2017-04-17 21:48:19 [scrapy.core.engine]信息:蜘蛛关闭(完成)

1 回答

  • 0

    Scrapy by默认尝试在第一个可单击的输入字段中填充您的电子邮件和密码(在登录页面的搜索表单中) . 您需要通过 formnameformid 指定输入字段,例如 FormRequest.from_response(response, formid="login-form", formdata={"email": "XXXXX", "password": "XXXXX"}, callback=self.start_crawl) . See docs

相关问题