使用XPath 1.0查找最小值不起作用-Java 学习之路

我试图从XML文档中找到某个元素中的最小值（它实际上是一个转换为XML的HTML表） . 但是，这不符合预期 .

该查询类似于How can I use XPath to find the minimum value of an attribute in a set of elements?中使用的查询 . 它看起来像这样：

/table[@id="search-result-0"]/tbody/tr[
    not(substring-before(td[1], " ") > substring-before(../tr/td[1], " "))
]

在示例XML上执行

<table class="tablesorter" id="search-result-0">
    <thead>
        <tr>
            <th class="header headerSortDown">Preis</th>
            <th class="header headerSortDown">Zustand</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td width="45px">15 CHF</td>
            <td width="175px">Ausgepack und doch nie gebraucht</td>
        </tr>
        <tr>
            <td width="45px">20 CHF</td>
            <td width="175px">Ausgepack und doch nie gebraucht</td>
        </tr>
        <tr>
            <td width="45px">25 CHF</td>
            <td width="175px">Ausgepack und doch nie gebraucht</td>
        </tr>
        <tr>
            <td width="45px">35 CHF</td>
            <td width="175px">Ausgepack und doch nie gebraucht</td>
        </tr>
        <tr>
            <td width="45px">14 CHF</td>
            <td width="175px">Gebraucht, aber noch in Ordnung</td>
        </tr>
        <tr>
            <td width="45px">15 CHF</td>
            <td width="175px">Gebraucht, aber noch in Ordnung</td>
        </tr>
        <tr>
            <td width="45px">15 CHF</td>
            <td width="175px">Gebraucht, aber noch in Ordnung</td>
        </tr>
    </tbody>
</table>

查询返回以下结果：

<tr>
<td width="45px">15 CHF</td>
<td width="175px">Ausgepack und doch nie gebraucht</td>
</tr>
-----------------------
<tr>
<td width="45px">14 CHF</td>
<td width="175px">Gebraucht, aber noch in Ordnung</td>
</tr>
-----------------------
<tr>
<td width="45px">15 CHF</td>
<td width="175px">Gebraucht, aber noch in Ordnung</td>
</tr>
-----------------------
<tr>
<td width="45px">15 CHF</td>
<td width="175px">Gebraucht, aber noch in Ordnung</td>
</tr>

为什么返回的节点多于一个？应该只返回一个节点，因为只有一个最小值 . 有人看到查询有什么问题吗？它应该只返回包含 14 CHF 的节点 .

使用http://xpath.online-toolz.com/tools/xpath-editor.php获得的结果

3 回答

3
TML已经指出了为什么你当前的路径表达式不起作用，但没有提出一个可行的替代方案 .

原因很简单，正如@Tomalak所说：

我同意Mathias的观点 . 在不更改输入XML的情况下，这在XPath 1.0中实际上是不可能的 .

我添加这个答案来详细说明在搜索最小量的CHF之前必须预处理XML的方式 . 请记住：这很复杂，因为您在XPath 1.0中要求解决方案 . 使用XPath 2.0，您的问题可以通过单个路径表达式解决 .

XML Design

我认为您的问题说明了为什么XML设计在使用XML时实际上是必不可少的 . 为什么？因为您的问题归结为以下几点：您的XML设计方式使得操作内容变得困难 . 更准确地说，在这样的 td 元素中：
```
<td width="45px">15 CHF</td>
```
td 元素的文本节点中有一个数量（作为数字）和货币 . 如果您的XML输入是以更聪明或规范的方式设计的，它看起来像：
```
<td width="45px" currency="CHF">15</td>
```
看到不同？现在，不同种类的内容明显地彼此分离 .

XPath Revised

假设在新设计的XML中， tr/td[1] 元素的唯一内容是数字，您使用的Pavel Minaev的XPath表达式可以使用：
```
/table[@id="search-result-0"]/tbody/tr[not(td[1] > ../tr/td[1])][1]
```
XML Result （已通过the tool you use测试）
```
<tr>
<td width="45px">14</td>
<td width="175px">Ausgepack und doch nie gebraucht</td>
</tr>
```
Why does Pavel's expression not work, simply because I add substring-before?

你已经找到了部分答案 . 它与在XPath 1.0函数中处理项目序列的方式有关 .

substring-before() 是一个XPath 1.0函数，需要两个参数，它们都是字符串 . 而且，最重要的是，如果您将字符串序列定义为 substring-before() 的第一个参数，则只会处理 first string ，其他字符将被忽略 .

帕维尔的回答，适应了这个问题：
```
tr[not(td[1] > ../tr/td[1])][1]
```
依赖于表达式的第二部分 ../tr/td[1] 查找 tbody 的所有 tr 元素的所有第一个 td 子元素这一事实 . 没有涉及函数，序列作为 > 的操作数没有错 .

如果我们需要 substring-before() 因为文本内容实际上既是数字（我们想要的）又是货币（我们想要忽略），我们必须将它包装在表达式的两个部分：
```
tr[not(substring-before(td[1],' ') > substring-before(../tr/td[1],' '))][1]
```
在 > 的左侧没问题，因为当前 tr 只有一个 td[1] . 但是在右边，有一系列节点，即 ../tr/td[1] . 可悲的是， substring-before() 只能处理第一个 .

请参阅@TML的答案，了解其后果 .
回复于 2024-04-27T16:24:33+08:00
1
XPath查询你're using here would only find the 1330788 in cases where there are no duplicate values, and the values are sorted prior to being written into nodes; this is because it'只将当前值 substring-before(td[1], " ") 与找到的第一个值 substring-before(../tr/td[1], " ") 进行比较 . 要打破比较：
```
[1] not(15 > 15)
[2] not(20 > 15)
[3] not(25 > 15)
[4] not(35 > 15)
[5] not(14 > 15)
[6] not(15 > 15)
[7] not(15 > 15)
```
比较1,5,6和7评估为真（左侧不大于右侧） .
回复于 2024-04-27T16:24:33+08:00

与此同时，我决定使用XSLT . 这是我提出的样式表：

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://www.w3.org/1999/xhtml">

    <xsl:output method="text" omit-xml-declaration="yes" indent="no" encoding="UTF-8"/>
    <xsl:strip-space elements="*"/> 

    <xsl:template match="//table[@id=\'search-result-0\']/tbody">
        <ul>
            <xsl:for-each select="tr/td[@width=\'45px\']">
                <xsl:sort select="substring-before(., \' \')" data-type="number" order="ascending"/>

                <xsl:if test="position() = 1">
                     <xsl:value-of select="substring-before(., \' \')"/>
                </xsl:if>
            </xsl:for-each>
        </ul>
    </xsl:template>

    <xsl:template match="text()"/> <!-- ignore the plain text -->

</xsl:stylesheet>

回复于 2024-04-27T16:24:33+08:00

使用XPath 1.0查找最小值不起作用

3 回答

相关问题