首页 文章

使用基于python scrapy的爬虫但收到错误

提问于
浏览
0

嗨伙计们我在python中写了一个爬虫来刮......

import scrapy

from c2.items import C2Item

try:

    class C2(scrapy.Spider):
            name = 'cn'
            allowed_domains = ['priceraja.com']
            start_urls = ['https://www.priceraja.com']



            def parse_item(self, response):

               Item = []
               Item['url']=response.xpath('//a/@href/text()').extract()
               yield Item

except Exception:
logging.exception("message")

我继续收到NotImplemented错误

2017-08-05 01:12:28 [scrapy.core.scraper] ERROR: Spider error processing 
<GET 
https://www.killerfeatures.com> (referer: None)
Traceback (most recent call last):
File "D:\Ana\lib\site-packages\twisted\internet\defer.py", line 653, in _ 
runCallbacks
current.result = callback(current.result, *args, **kw)
File "D:\Ana\lib\site-packages\scrapy\spiders\__init__.py", line 90, in 
parse raise NotImplementedError
NotImplementedError
2017-08-05 01:12:28 [scrapy.core.engine] INFO: Closing spider (finished)
2017-08-05 01:12:28 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 435,
 'downloader/request_count': 2,
 'downloader/request_method_count/GET': 2,

'downloader / response_bytes':9282,'downloader / response_count':2,'downloader / response_status_count / 200':1,'downloader / response_status_count / 301':1,'finish_reason':'finished','finish_time':datetime . datetime(2017,8,4,19,42,28,837000),'log_count / DEBUG':3,'log_count / ERROR':1,'log_count / INFO':7,'response_received_count':1,'scheduler / dequeued':2,'scheduler / dequeued / memory':2,'scheduler / enqueued':2,'scheduler / enqueued / memory':2,'spider_exceptions / NotImplementedError':1,'start_time':datetime.datetime(2017 ,8,4,19,42,25,976000)} 2017-08-05 01:12:28 [scrapy.core.engine]信息:蜘蛛关闭(完成)

1 回答

  • 1

    在实现parse_item函数时,Scrapy正在寻找解析函数 . 将parse_item更改为parse可能有效,或者您可以覆盖parse函数 .

    here的另一个解决方案是使用CrawlSpider

相关问题