Xpath选择器不在Scrapy中工作-Java 学习之路

我正在尝试从此Xpath中提取文本：

//*/li[contains(., "Full Name")]/span/text()

来自此网页：http://votesmart.org/candidate/biography/56110/norma-smith#.V9SwdZMrKRs

我已经在谷歌Chrome控制台中测试了它（可行），就像Xpath的许多其他版本一样，但我不能让它与Scrapy一起使用 . 我的代码只返回“{}” .

这是我在我的代码中测试它的地方，用于上下文：

def parse_bio(self, response):  
    loader = response.meta['loader']
    fullnameValue = response.xpath('//*/li[contains(., "Full Name")]/span/text()').extract()
    loader.add_value('fullName', fullnameValue)
    return loader.load_item()

问题不是我的代码（我不认为），它适用于其他（非常广泛的）Xpath选择器 . 但我不确定Xpath有什么问题 . 我禁用了JavaScript，如果这有所不同 . 任何帮助都会很棒！

编辑：以下是其他代码，以使其更清晰：

from scrapy import Spider, Request, Selector
from votesmart.items import LegislatorsItems, TheLoader



class VSSpider(Spider):
name = "vs"
allowed_domains = ["votesmart.org"]
start_urls = ["https://votesmart.org/officials/WA/L/washington-state-legislative"]


def parse(self, response):
    for href in response.xpath('//h5/a/@href').extract():
        person_url = response.urljoin(href)
        yield Request(person_url, callback=self.candidatesPoliticalSummary)

def candidatesPoliticalSummary(self, response): 
    item = LegislatorsItems()
    l = TheLoader(item=LegislatorsItems(), response=response)


   ...
   #populating items with item loader. works fine

    # create right bio url and pass item loader to it
    bio_url = response.url.replace('votesmart.org/candidate/', 
                                   'votesmart.org/candidate/biography/')
    return Request(bio_url, callback=self.parse_bio, meta={'loader': l})

def parse_bio(self, response):  
    loader = response.meta['loader']
    print response.request.url
    loader.add_xpath('fullName', '//*/li[contains(., "Full Name")]/span/text()')
    return loader.load_item()

2 回答

该表达式在shell中完美地适用于我：

$ scrapy shell "http://votesmart.org/candidate/biography/56110/norma-smith#.V9SwdZMrKRs"
In [1]: response.xpath('//*/li[contains(., "Full Name")]/span/text()').extract()
Out[1]: [u'Norma Smith']

请尝试使用 add_xpath() 方法：

loader.add_xpath('fullName', '//*/li[contains(., "Full Name")]/span/text()')

回复于 2024-04-30T22:22:23+08:00

0

我想出了我的问题！网站上的许多页面都是登录保护的，我首先没有访问权限 . Scrapy的表单请求起到了作用 . 感谢所有的帮助（特别是使用 view(response) 的建议，这是非常有用的） .

回复于 2024-04-30T22:22:23+08:00

Xpath选择器不在Scrapy中工作

2 回答

相关问题