首页 文章

Perl XML :: LibXML XPath 2.0到XPath 1.0

提问于
浏览
-2

我有以下XPath在XPath 2.0中正常工作(在OxygenXML中测试):

//h2[a[@id='start']]/following-sibling::*[not(preceding-sibling::*[self::div[@id='end']])]

但是当我将它与LibXML findnodes()一起使用时,我会得到不同的结果:

my @nodes = $source_doc->findnodes('//h2[a[@id="start"]]/following-sibling::*[not(preceding-sibling::*[self::div[@id="end"]])]');

检查LibXML documentation后,似乎LibXML仅支持XPath 1.0 . 我将如何将XPath更改为适用于XPath 1.0的内容?甚至可以在XPath 1.0中创建这样的兼容路径吗?

由于我被要求提供,我正在更新帖子以包含我的示例数据和我在运行上面输入的XPath时得到的输出:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link rel="stylesheet" type="text/css" href="insn.css" />
<meta name="generator" content="encodingindex.xsl" />
<title>Index by Encoding</title>
</head>
<body><hr /><h1 class="topleveltable"><a name="top" id="top"></a>Top-level encodings</h1><div
 class="regdiagram-32"></div><hr /><h2><a name="dp" id="start"></a>Data-processing and
 miscellaneous instructions</h2><div class="decode_navigation">
 <p>These instructions are under the <a href="#top">top-level</a>.</p>
 </div><div class="regdiagram-32">
 <table class="regdiagram">
 <thead>
    <tr>
        <td>31</td>
        <td>0</td>
      </tr>
    </thead>
    <tbody>
      <tr class="firstrow">
        <td colspan="4" class="lr">!= 1111</td>
        <td colspan="2" class="lr">00</td>
        <td class="lr">op0</td>
        <td colspan="5" class="lr">op1</td>
        <td colspan="12" class="lr"></td>
        <td class="lr">op2</td>
        <td colspan="2" class="lr">op3</td>
        <td class="lr">op4</td>
        <td colspan="4" class="lr"></td>
      </tr>
    </tbody>
  </table>
</div><div class="instructiontable">
  <table class="instructiontable">
    <tr>
      <th colspan="5">Decode fields</th>
      <th rowspan="2"> Instruction details </th>
    </tr>
    <tr>
      <th class="bitfields">op0</th>
      <th class="bitfields">op1</th>
      <th class="bitfields">op2</th>
      <th class="bitfields">op3</th>
      <th class="bitfields">op4</th>
    </tr>
    <tr class="instructiontable">
      <td class="bitfield"> 0 </td>
      <td class="bitfield"> </td>
      <td class="bitfield"> 1 </td>
      <td class="bitfield"> != 00 </td>
      <td class="bitfield"> 1 </td>
      <td class="iformname"><a href="#xldst">Extra load/store</a></td>
    </tr>
    <tr class="instructiontable">
      <td class="bitfield"> 0 </td>
      <td class="bitfield"> 0xxxx </td>
      <td class="bitfield"> 1 </td>
      <td class="bitfield"> 00 </td>
      <td class="bitfield"> 1 </td>
      <td class="iformname"><a href="#mul_word">Multiply and Accumulate</a></td>
    </tr>
  </table>
</div><hr /><h2><a name="sync" id="sync"></a>Synchronization primitives and
  Load-Acquire/Store-Release</h2><div class="decode_navigation">
  <p>These instructions are under <a href="#dp">Data-processing and miscellaneous
      instructions</a>.</p>
</div><div class="regdiagram-32">
  <table class="regdiagram">
    <thead>
      <tr>
        <td>31</td>
        <td>0</td>
      </tr>
    </thead>
    <tbody>
      <tr class="firstrow">
        <td colspan="4" class="lr">!= 1111</td>
        <td colspan="4" class="lr">0001</td>
        <td class="lr">op0</td>
        <td colspan="11" class="lr"></td>
        <td colspan="2" class="lr">11</td>
        <td colspan="2" class="lr"></td>
        <td colspan="4" class="lr">1001</td>
        <td colspan="4" class="lr"></td>
      </tr>
    </tbody>
  </table>
</div><hr /><hr /><h2><a name="dpmisc" id="dpmisc"></a>Miscellaneous</h2><div
  class="decode_navigation">
  <p>These instructions are under <a href="#dp">Data-processing and miscellaneous
      instructions</a>.</p>
</div><div class="regdiagram-32">
  <table class="regdiagram">
    <thead>
      <tr>
        <td>31</td>
        <td>30</td>
        <td>0</td>
      </tr>
    </thead>
    <tbody>
      <tr class="firstrow">
        <td colspan="4" class="lr">!= 1111</td>
        <td colspan="5" class="lr">00010</td>
        <td colspan="2" class="lr">op0</td>
        <td colspan="1" class="lr">0</td>
        <td colspan="12" class="lr"></td>
        <td colspan="1" class="lr">0</td>
        <td colspan="3" class="lr">op1</td>
        <td colspan="4" class="lr"></td>
      </tr>
    </tbody>
  </table>
</div><div class="instructiontable">
  <table class="instructiontable">
    <tr>
      <th colspan="2">Decode fields</th>
      <th rowspan="2"> Instruction details </th>
    </tr>
    <tr>
      <th class="bitfields">op0</th>
      <th class="bitfields">op1</th>
    </tr>
    <tr class="instructiontable">
      <td class="bitfield"> 01 </td>
      <td class="bitfield"> 010 </td>
      <td class="iformname"><a href="bxj.html">BXJ</a></td>
    </tr>
    <tr class="instructiontable">
      <td class="bitfield"> 01 </td>
      <td class="bitfield"> 011 </td>
      <td class="iformname"><a href="blx_r.html">BLX (register)</a></td>
    </tr>
  </table>
</div><div class="decode_navigation">
  <p>These instructions are under <a href="#dp">Data-processing and miscellaneous
      instructions</a>.</p>
</div><div class="regdiagram-32">
  <table class="regdiagram">
    <thead>
      <tr>
        <td>31</td>
        <td>30</td>
        <td>0</td>
      </tr>
    </thead>
    <tbody>
      <tr class="firstrow">
        <td colspan="4" class="lr">!= 1111</td>
        <td colspan="3" class="lr">000</td>
        <td colspan="2" class="lr">op0</td>
        <td colspan="2" class="lr"></td>
        <td class="lr">op1</td>
        <td colspan="15" class="lr"></td>
        <td colspan="1" class="lr">0</td>
        <td colspan="4" class="lr"></td>
      </tr>
    </tbody>
  </table>
</div><div class="decode_constraints">
  <p> The following constraints also apply to this encoding: op0:op1 != 100 </p>
</div><div class="instructiontable">
  <table class="instructiontable">
    <tr>
      <th colspan="2">Decode fields</th>
      <th rowspan="2"> Instruction details </th>
    </tr>
    <tr>
      <th class="bitfields">op0</th>
      <th class="bitfields">op1</th>
    </tr>
    <tr class="instructiontable">
      <td class="bitfield"> 0x </td>
      <td class="bitfield"> </td>
      <td class="iformname"><a href="#intdp3reg_immsh">Integer Data Processing (three register,
          immediate shift)</a></td>
    </tr>
    <tr class="instructiontable">
      <td class="bitfield"> 10 </td>
      <td class="bitfield"> 1 </td>
      <td class="iformname"><a href="#intdp2reg_immsh">Integer Test and Compare (two register,
          immediate shift)</a></td>
    </tr>
    <tr class="instructiontable">
      <td class="bitfield"> 11 </td>
      <td class="bitfield"> </td>
      <td class="iformname"><a href="#logic3reg_immsh">Logical Arithmetic (three register,
          immediate shift)</a></td>
    </tr>
  </table>
</div><hr /><div class="iclass" id="intdp3reg_immsh">
  <a name="intdp3reg_immsh" id="intdp3reg_immsh"></a>
  <h3 class="iclass">Integer Data Processing (three register, immediate shift)</h3>
  <p>These instructions are under <a href="#dpregis">Data-processing register (immediate
      shift)</a>.</p>
  <div class="regdiagram-32">
    <table class="regdiagram">
      <thead>
        <tr>
          <td>31</td>
          <td>0</td>
        </tr>
      </thead>
      <tbody>
        <tr class="firstrow">
          <td colspan="4" class="lr">!= 1111</td>
          <td class="l">0</td>
          <td>0</td>
          <td>0</td>
          <td class="r">0</td>
          <td colspan="3" class="lr">opc</td>
          <td class="lr">S</td>
          <td colspan="4" class="lr">Rn</td>
          <td colspan="4" class="lr">Rd</td>
          <td colspan="5" class="lr">imm5</td>
          <td colspan="2" class="lr">type</td>
          <td class="lr">0</td>
          <td colspan="4" class="lr">Rm</td>
        </tr>
        <tr class="secondrow">
          <td colspan="4" class="droppedname">cond</td>
          <td colspan="4"></td>
          <td colspan="3"></td>
          <td></td>
          <td colspan="4"></td>
          <td colspan="4"></td>
          <td colspan="5"></td>
          <td colspan="2"></td>
          <td></td>
          <td colspan="4"></td>
        </tr>
      </tbody>
    </table>
  </div>
  <div class="decode_constraints">
    <p> The following constraints also apply to this encoding: cond != 1111 &amp;&amp; cond !=
      1111 </p>
  </div>
  <div class="instructiontable">
    <table class="instructiontable" id="intdp3reg_immsh">





      <thead class="instructiontable">
        <tr>
          <th class="bitfields-heading" rowspan="" colspan="3">Decode fields</th>
          <th class="iformname" rowspan="2" colspan=""> Instruction Details </th>

        </tr>
        <tr>
          <th class="bitfields" rowspan="" colspan="">opc</th>
          <th class="bitfields" rowspan="" colspan="">S</th>
          <th class="bitfields" rowspan="" colspan="">Rn</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <td class="bitfield">000</td>
          <td class="bitfield"></td>
          <td class="bitfield"></td>
          <td class="iformname"><a name="AND_r" href="and_r.html" id="AND_r">AND, ANDS
              (register)</a></td>

        </tr>
        <tr>
          <td class="bitfield">001</td>
          <td class="bitfield"></td>
          <td class="bitfield"></td>
          <td class="iformname"><a name="EOR_r" href="eor_r.html" id="EOR_r">EOR, EORS
              (register)</a></td>

        </tr>
      </tbody>
    </table>
  </div>
</div><div class="decode_constraints">
  <p> The following constraints also apply to this encoding: op0:op1 != 100 </p>
</div><div class="instructiontable">
  <table class="instructiontable">
    <tr>
      <th colspan="2">Decode fields</th>
      <th rowspan="2"> Instruction details </th>
    </tr>
    <tr>
      <th class="bitfields">op0</th>
      <th class="bitfields">op1</th>
    </tr>
    <tr class="instructiontable">
      <td class="bitfield"> 0x </td>
      <td class="bitfield"> </td>
      <td class="iformname"><a href="#intdp3reg_regsh">Integer Data Processing (three register,
          register shift)</a></td>
    </tr>
    <tr class="instructiontable">
      <td class="bitfield"> 10 </td>
      <td class="bitfield"> 1 </td>
      <td class="iformname"><a href="#intdp2reg_regsh">Integer Test and Compare (two register,
          register shift)</a></td>
    </tr>
  </table>
</div><hr /><h2><a name="dpimm" id="dpimm"></a>Data-processing immediate</h2><div
  class="decode_navigation">
  <p>These instructions are under <a href="#dp">Data-processing and miscellaneous
      instructions</a>.</p>
</div><div class="regdiagram-32">
  <table class="regdiagram">
    <thead>
      <tr>
        <td>31</td>
        <td>0</td>
      </tr>
    </thead>
    <tbody>
      <tr class="firstrow">
        <td colspan="4" class="lr">!= 1111</td>
        <td colspan="3" class="lr">001</td>
        <td colspan="2" class="lr">op0</td>
        <td colspan="1" class="lr"></td>
        <td colspan="2" class="lr">op1</td>
        <td colspan="20" class="lr"></td>
      </tr>
    </tbody>
  </table>
</div><div class="instructiontable">
  <table class="instructiontable">
    <tr>
      <th colspan="2">Decode fields</th>
      <th rowspan="2"> Instruction details </th>
    </tr>
    <tr>
      <th class="bitfields">op0</th>
      <th class="bitfields">op1</th>
    </tr>
    <tr class="instructiontable">
      <td class="bitfield"> 0x </td>
      <td class="bitfield"> </td>
      <td class="iformname"><a href="#intdp2reg_imm">Integer Data Processing (two register and
          immediate)</a></td>
    </tr>
  </table>
</div><hr /><div class="iclass" id="intdp2reg_imm">
  <a name="intdp2reg_imm" id="intdp2reg_imm"></a>
</div><div class="iclass" id="end">
  <a name="ldstimm" id="ldstimm"></a>
  <h3 class="iclass">Load/Store Word, Unsigned Byte (immediate, literal)</h3>
  <div class="regdiagram-32">
    <table class="regdiagram">
      <thead>
        <tr>
          <td>31</td>
          <td>0</td>
        </tr>
      </thead>
      <tbody>
        <tr class="firstrow">
          <td colspan="4" class="lr">!= 1111</td>
          <td class="l">0</td>
          <td>1</td>
          <td class="r">0</td>
          <td class="lr">P</td>
          <td class="lr">U</td>
          <td class="lr">o2</td>
          <td class="lr">W</td>
          <td class="lr">o1</td>
          <td colspan="4" class="lr">Rn</td>
          <td colspan="4" class="lr">Rt</td>
          <td colspan="12" class="lr">imm12</td>
        </tr>
        <tr class="secondrow">
          <td colspan="4" class="droppedname">cond</td>
          <td colspan="3"></td>
          <td></td>
          <td></td>
          <td></td>
          <td></td>
          <td></td>
          <td colspan="4"></td>
          <td colspan="4"></td>
          <td colspan="12"></td>
        </tr>
      </tbody>
    </table>
  </div>
  <div class="decode_constraints">
    <p> The following constraints also apply to this encoding: cond != 1111 &amp;&amp; cond !=
      1111 </p>
  </div>
  <div class="instructiontable">
    <table class="instructiontable" id="ldstimm">






      <thead class="instructiontable">
        <tr>
          <th class="bitfields-heading" rowspan="" colspan="4">Decode fields</th>
          <th class="iformname" rowspan="2" colspan=""> Instruction Details </th>

        </tr>
        <tr>
          <th class="bitfields" rowspan="" colspan="">P:W</th>
          <th class="bitfields" rowspan="" colspan="">o2</th>
          <th class="bitfields" rowspan="" colspan="">o1</th>
          <th class="bitfields" rowspan="" colspan="">Rn</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <td class="bitfield">!= 01</td>
          <td class="bitfield">0</td>
          <td class="bitfield">1</td>
          <td class="bitfield">1111</td>
          <td class="iformname"><a name="LDR_l" href="ldr_l.html" id="LDR_l">LDR
            (literal)</a></td>

        </tr>
        <tr>
          <td class="bitfield">!= 01</td>
          <td class="bitfield">1</td>
          <td class="bitfield">1</td>
          <td class="bitfield">1111</td>
          <td class="iformname"><a name="LDRB_l" href="ldrb_l.html" id="LDRB_l">LDRB
              (literal)</a></td>

        </tr>
      </tbody>
    </table>
  </div>
</div></body>
</html>

这是使用上面给出的XPath的输出:

<div class="decode_navigation">
  <p>These instructions are under the <a href="#top">top-level</a>.</p>
</div>

只是为了澄清,输出应该包括示例HTML中的所有div,总共约400行 .

我也尝试了下面给出的XPath建议,但它们产生了相同的结果 .

编辑:这是我的代码:

use strict; 
use warnings; 
use feature 'say'; 
use XML::LibXML; 

my $encoding_index_file = q(C:\path\to\testfile.html);
my $source_doc = XML::LibXML->load_html(
location        => $encoding_index_file,
recover         => 1,
suppress_errors => 1,
);
my ($node) = $source_doc->findnodes('//h2[a[@id="start"]]/following-sibling::*[not(preceding-sibling::div[@id="end"])]');
say $node->toString;

2 回答

  • 1

    该XPath不仅符合XPath 1.0,而且在XML :: LibXML中也能正常工作 .

    use strict;
    use warnings qw( all );
    use feature qw( say );
    
    use XML::LibXML qw( );
    
    my $doc = XML::LibXML->new->parse_html_string(<<'__EOS__');
    <html>
      <h2><a id="start">Foo</a></h2>
      <div id="pre1"><img></div>
      <div id="pre2"><img></div>
      <div id="end"><img></div>
      <div id="post1"><img></div>
      <div id="post2"><img></div>
    </html>
    __EOS__
    
    # Select all the siblings of the starting h2 element that follow
    # it and don't have <div id="end"/> as a preceding sibling.
    for my $node ($doc->findnodes('//h2[a[@id="start"]]/following-sibling::*[not(preceding-sibling::*[self::div[@id="end"]])]')) {
       my $name = $node->nodeName;
       my $id   = $node->getAttribute('id');
       say $id ? sprintf("%s#%s", $name, $id) : $name;
    }
    

    输出:

    div#pre1
    div#pre2
    div#end
    

    顺便说说,

    //h2[a[@id="start"]]/following-sibling::*[not(preceding-sibling::*[self::div[@id="end"]])]
    

    是一种奇怪的写作方式

    //h2[a[@id="start"]]/following-sibling::*[not(preceding-sibling::div[@id="end"])]
    

    也许你想要

    //h2[a[@id="start"]]/following-sibling::*[not(self::div[@id="end"] or preceding-sibling::div[@id="end"])]
    

    这将产生以下输出:

    div#pre1
    div#pre2
    
  • 3

    我设法找到导致问题的原因:XPath发现的每个兄弟都是由LibXML作为一个单独的节点处理的,所以我需要将它们分配给一个数组,而不是像我那样做一个简单的标量 . ikegami建议的非奇怪的XPath也比我使用的更好,因为我的输出中的所有东西都加倍了 .

    这是产生正确结果的代码:

    use strict;
    use warnings;
    use feature 'say';
    use XML::LibXML qw( );
    
    my $encoding_index_file = q(C:\path\to\testfile.html);
    my $source_doc = XML::LibXML->load_html(
    location        => $encoding_index_file,
    recover         => 1,
    suppress_errors => 1,
    );
    
    my $contents = "";
    my @nodes = $source_doc->findnodes('//h2[a[@id="start"]]/following-sibling::[not(preceding-sibling::div[@id="end"])]');
    foreach my $node (@nodes) {
      my ($str) = $node->toString;
      $contents = $contents . $str; 
    }
    print $contents;
    

相关问题