scrapy可以产生请求和项目吗？-Java 学习之路

当我写 parse() 函数时，我可以同时为一个页面提供请求和项目吗？

我想在页面A中提取一些数据，然后将数据存储在数据库中，并提取要遵循的链接（这可以通过CrawlSpider中的规则来完成） .

我称A页面的链接页面是B页面，所以我可以写另一个parse_item（）从B页面中提取数据，但我想在B页面中提取一些链接， so I can only use rule to extract links? how to tackle with the duplicate URLs in Scrapy?

3 回答

我'm not 100% I understand your question but the code below request sites from a starting url using the basespider, then scans the starting url for href'然后循环每个链接调用 parse_url . parse_url 中匹配的所有内容都会发送到您的项目管道 .

def parse(self, response):
       hxs = HtmlXPathSelector(response)
       urls = hxs.select('//a[contains(@href, "content")]/@href').extract()  ## only grab url with content in url name
       for i in urls:
           yield Request(urlparse.urljoin(response.url, i[1:]),callback=self.parse_url)


def parse_url(self, response):
   hxs = HtmlXPathSelector(response)
   item = ZipgrabberItem()
   item['zip'] = hxs.select("//div[contains(@class,'odd')]/text()").extract() ## this bitch grabs it
   return item

回复于 2024-05-09T01:31:15+08:00

是的，您可以同时处理请求和项目 . 从what I've seen：

def parse(self, response):
    hxs = HtmlXPathSelector(response)
    base_url = response.url
    links = hxs.select(self.toc_xpath)

    for index, link in enumerate(links):
        href, text = link.select('@href').extract(), link.select('text()').extract()
        yield Request(urljoin(base_url, href[0]), callback=self.parse2)

    for item in self.parse2(response):
        yield item

回复于 2024-05-09T01:31:15+08:00

来自Steven Almeroth的谷歌小组：

你是对的，你可以产生请求并返回一个项目列表，但这不是你正在尝试的 . 您试图生成一个项目列表而不是返回它们 . 既然你已经使用parse（）作为生成器函数，那么就不能同时获得yield和return . 但是你可以有很多收益 .

试试这个：

def parse(self, response):
    hxs = HtmlXPathSelector(response)
    base_url = response.url
    links = hxs.select(self.toc_xpath)

    for index, link in enumerate(links):
        href, text = link.select('@href').extract(), link.select('text()').extract()
        yield Request(urljoin(base_url, href[0]), callback=self.parse2)

    for item in self.parse2(response):
        yield item

回复于 2024-05-09T01:31:15+08:00

scrapy可以产生请求和项目吗？

3 回答

相关问题