Scrapy Xpath输出为空-Java 学习之路

我想在这个网站上提取数据：http://www.pokepedia.fr/Pikachu我'm learning the python and how to use Scrapy and my problem is : Why I can'吨用Xpath检索数据？

当我在浏览器中测试xpath时，我的Xpath看起来很好，它会返回正确的值 . （谷歌浏览器）

import re
from scrapy import Spider
from scrapy.selector import Selector

from stack.items import StackItem


class StackSpider(Spider):
name = "stack"
allowed_domains = ["pokepedia.fr"]
start_urls = [
    "http://www.pokepedia.fr/Pikachu",
]

def unicodize(seg):
    if re.match(r'\\u[0-9a-f]{4}', seg):
        return seg.decode('unicode-escape')

    return seg.decode('utf-8')

def parse(self, response):
    pokemon = Selector(response).xpath('//*[@id="mw-content-text"]/table[2]')

    for question in pokemon:
        item = StackItem()
        item['title'] = question.xpath(
            '//*[@id="mw-content-text"]/table[2]/tbody/tr[1]/th[2]/text()').extract()[0]
        yield item

我想在页面中提取口袋妖怪的名称，但是当我使用时：

scrapy crawl stack -o items.json -t json

我的Json输出：

在我的控制台中我发现了这个错误：

IndexError : list index out of range

我跟着这个tuto：https://realpython.com/blog/python/web-scraping-with-scrapy-and-mongodb/

1 回答

1
试试这个
```
question.xpath( '//*[@id="mw-content-text"]/table[2]/tr[1]/th[2]/text()').extract()[0]
```
浏览器添加了tbody标签 . 它们不在原始HTML中，这就是scrapy返回空文件的原因 .

PS：你可能想考虑使用
```
scrapy shell URL
```
然后使用
```
response.xpath('...YOUR SELECTOR..')
```
用于调试和测试 .
回复于 2024-05-01T04:02:07+08:00

Scrapy Xpath输出为空

1 回答

相关问题