使用XPath计算每个表中TH的数量-Java 学习之路

陷入一个试图解析HTML文件的兔子洞 .

基础：

$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTMLFile('myfile.html');
$xp = new DOMXPath($dom);

在初始化之后，我的技术一直是使用XPATH查询来获取我想要的变量 .

我没有问题，真的，如果有一个特定项目或节点 - 非常容易精确定位和检索 .

所以在我加载的HTML中，它基本上是在循环中形成的 . 缩小它看起来像这样：

<div class="intro">
    <div class="desc-wrap">
        Text Text Text
    </div>
    <div class="main-wrap">
        <table class="table-wrap">
            <tbody>
                <tr>
                    <th class="range">Range </th>
                    <th>#1</th>
                    <th>#2</th>
                </tr>
            </tbody>
        </table>
    </div>
</div>
<div class="intro">
    <div class="desc-wrap">
        Text Text Text
    </div>
    <div class="main-wrap">
        <table class="table-wrap">
            <tbody>
                <tr>
                    <th class="range">Range </th>
                    <th>#1</th>
                    <th>#2</th>
                    <th>#3</th>
                    <th>#4</th>
                </tr>
            </tbody>
        </table>
    </div>
</div>

这持续100次（意味着100个实例 <div class="intro"> . . . </div>

所以我试图获取 desc-wrap 的内容（没有问题），文本节点以及每个表中有多少 <th> 的计数 .

考虑一个XPath查询可能比两个更好，我查询div .

$intropath = $xp->query("//div[@class='intro']");

循环它 .

$f=1;
foreach ($intropath as $sp) {
echo $f++ . '
'; // Makes it way to 100, good.

我的问题/核心问题是试图计算每个表中 <th> 的数量 .

$gettables = $xp->query("//div[contains(@class,'main-wrap')]/table[contains(@class, 'table-wrap')]//th", $sp);
var_dump($getsizes); // public 'length' => int 488
// Okay, so this is getting all the <th> elements in the 
// entire document, not just in the loop. Maybe not what I want.

这是我尝试过的其他内容（我的意思是失败）

好吧，让我们尝试只针对第一个表（在 //th 之前添加 [0] ），看看我们是否能得到一些东西 .

$gettables = $xp->query("//div[contains(@class,'main-wrap')]/table[contains(@class, 'table-wrap')][0]//th", $sp);

不 . 非对象 . 长度0.不知道为什么 . 好的，让我们把它关掉 .

也许试试这个？

//div[contains(@class,'main-wrap')]/table[contains(@class, 'table-wrap')]//th[count(following-sibling::*)]

好的 . 所以长度= 100.必须获得单个 th 并进行外推 . 不是我想要的 .

也许只是

//th[count(*)]

不 . 非对象 .

也许这个？

count(//div[contains(@class,'main-wrap')]/table[contains(@class, 'table-wrap')]//th)

不 . 更多非物体 .

那个_668052已经试过了 . 这很有趣失败（好吧，学习），但我错过了什么？我的输出......我只想知道每张 table 中有多少 <th> .

所以，像：

foreach ($intropath as $sp) {
$xpath = $xp->query("//actual/working/xpath/for/individual/th");
$thcount = count($getsizes->item(0)); // or something?
echo $thcount . '<br>';

在上面的例子中，将输出

3 5

当然还要继续进行其他98次迭代..

这可能很容易愚蠢 . 我一直在引用这个cheatsheet以及这个cheatsheet和我've learned a lot about XPATH'的能力，但这个答案暗指我 . 在这一点上，我甚至不确定我的 foreach ($intropath as $sp) { 是否是实现我正在做的事情的正确方法 .

任何人都想把我从这个洞里挖出来让我继续下一步和/或我的生活？

2 回答

使用迭代的 query() 调用计算合格节点 .

代码：（Demo）

$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($html);
$xp = new DOMXPath($dom);
foreach ($xp->query("//div[contains(@class,'main-wrap')]/table[contains(@class, 'table-wrap')]//tr") as $node) {
    echo $xp->query("th", $node)->length , "\n";
}

输出：

3
5

回复于 2024-05-05T21:48:14+08:00

1
首先，查询 table ：
```
$intropath = $xp->xpath("//table[contains(@class, 'table-wrap')]");
```
然后使用另一个XPath查询获取每个 table 的 th 的计数，并将 count PHP函数应用于相对于上下文节点的所有 th ：
```
foreach ($intropath as $tab) {
  $count = count($tab->xpath(".//th"));
  echo $count . "<br>";
}
```
这应该是全部 .

P.S.:
显然PHP不喜欢XPath count 函数，所以我使用了PHP count 函数 .

只是为了完整性：
如果您可以使用XPath-2.0，则以下表达式将更紧凑：
```
string-join(//table[contains(@class, 'table-wrap')]/count(.//th),'#')
```
这里， # 是每个 table 计数之间的分隔符 .
回复于 2024-05-05T21:48:14+08:00

使用XPath计算每个表中TH的数量

2 回答

相关问题