首页 文章

如果存在与否,Nokogiri会获得元素

提问于
浏览
0
Quite simply can you do a conditional scrape, i.e. I want an <a> 
tag within a parent, and if a <span> is contained within that parent
(so the span is holding the <a>, instead of the parent), I still want
to drill into the span regardless for the <a>

希望这个例子能够提供足够的细节 .

<tr>
    <td>1989</td>
    <td>
      <i>
       <a href="/wiki/Always_(1989_film)" title="Always (1989 film)">Always</a>
     </i>
    </td>
     <td>Pete Sandich</td>
</tr>

我可以使用以下方式访问 <a> 罚款:

all_links = doca.search('//tr//td//i//a[@href]')

但我想知道的是我还可以添加一个条件,所以如果 <a> 周围有一个 Span ,可以将其放入搜索中吗?

<tr>
    <td>1989</td>
    <td>
      <i>
       <span>
         <a href="/wiki/Always_(1989_film)" title="Always (1989 film)">Always</a>
       </span>
     </i>
    </td>
     <td>Pete Sandich</td>
</tr>

那么有没有办法有条件地 grab <a> ,就像这样:

all_links = doca.search('//tr//td//i//?span//a[@href]')

其中?span是条件 - 即如果有 Span ,则输入该级别,然后输入链接 .

如果没有 Span ,那么跳过它然后输入链接 .

在此先感谢,非常感谢任何帮助!

巴蒂尔

1 回答

  • 2

    开始了 :

    require 'nokogiri'
    
    doc = Nokogiri::HTML::Document.parse <<-eot
    <tr>
        <td>1989</td>
        <td>
          <i>
           <span>
             <a href='/wiki2/Always_(1989_film)' title='Always (1989 film)'>Always</a>
           </span>
         </i>
        </td>
            <td>
          <i>
             <a href='/wiki1/Always_(1989_film)' title='Always (1989 film)'>Always</a>
         </i>
        </td>
         <td>Pete Sandich</td>
    </tr>
    eot
    
    # xpath expression will grab a tag if it is wrapped inside the span tag
    node = doc.xpath("//tr//i//a[name(./..)='span']")
    p node.size # => 1
    p node.map{ |n| n['href'] } # => ["/wiki2/Always_(1989_film)"]
    

相关问题