解析XML节点列表的XSLT / Xpath数据-Java 学习之路

我正在搜索一个lib或工具，甚至是一些简单的代码，它们可以解析我们的XSLT文件中的Xpath / XSLT数据，以生成XSLT期望处理或查找的所有XML节点的Dictionary / List / Tree . 遗憾的是，我发现的一切都是使用XSLT解析XML而不是解析XSLT . 我正在处理的真正困难的部分是XPath的灵活性 .

例如，在我们使用的几个XSLT文件中，条目可以选择

nodeX/nodeY/nodeNeeded;

要么

../nodeNeeded;

要么

选择 nodeX 然后选择 nodeY 然后选择 nodeNeeded ;等等 .

我们想要做的是能够解析出XSLT文档并获得一种明确告诉我们XSLT在路径nodeX，nodeY中寻找nodeNeeded的数据结构，以便我们可以自定义构建XML数据 . 极简主义时尚

谢谢！

以下是用于可视化目的的模拟数据子集：

<server_stats>
    <server name="fooServer">
        <uptime>24d52m</uptime>
        <userCount>123456</userCount>
        <loggedInUsers>
            <user name="AnnaBannana">
                <created>01.01.2012:00.00.00</created>
                <loggedIn>25</loggedIn>
                <posts>3</posts>
             </user>
         </loggedInUsers>
         <temperature>82F</temperature>
         <load>72</load>
         <mem_use>45</mem_use>
         <visitors>
             <current>42</current>
             <browsers name="mozilla" version="X.Y.Z">22</browsers>
             <popular_link name="index.html">39</popular_link>
             <history>
                 <max_visitors>789</max_visitors>
                 <average_visitors>42</average_visitors>
             </history>
         </visitors>
     </server>
 </server_stats>

从这一个客户可能只想创建一个管理HTML页面，他们从树中提取硬件统计信息，并可能从访问者计数运行一些负载计算 . 另一个客户可能只想提取访客计数信息以在其公共站点上显示为信息 . 为了让每个客户系统负载尽可能小，我们希望解析他们的统计选择XSLT并为他们提供他们需要的数据（已经请求） . 显然问题是一个客户可以在访问者计数节点上执行直接选择，另一个客户可以选择访问者节点并选择他们想要的每个子节点等 .

在“访问者”中寻找“当前”节点的2个假设客户可能看起来像XSLT：

<xsl:template match="server_stats/server/visitors">
    <xsl:value-of select="current"/>
</xsl:template>

OR

<xsl:template match="server_stats">
     <xsl:for-each select="server">
          <xsl:value-of select="visitors/current"/>
          <xsl:value-of select="visitors/popular_link"/>
     </xsl:for-each>
</xsl:template>

在这个例子中，两者都试图选择相同的节点，但他们这样做的方式是不同的，“当前”并不是那么具体，所以我们还需要他们用来到达那里的路径，因为“当前”可能是几个项目的节点 . 这让我们只是在他们的XSLT中寻找“当前”，并且因为他们访问路径的方式可能非常不同，我们也无法搜索整个路径 .

所以我们想要的结果是解析他们的XSLT并给我们说一个统计列表：

Customer 1:
visitors/current
Customer 2:
visitors/current
visitors/popular_link

等等

一些示例选择破坏我们将要解决的下面提供的解决方案：

<xsl:variable name="fcolor" select="'Black'"/> results in a /'Black' entry
<xsl:for-each select="server"> we get the entry, but its children don't show it anymore
<xsl:value-of select="../../@name"/>  This was kind of expected, we can try to figure out how to skip attribute based selections but the relative paths show up as I thought they would
<xsl:when test="substring(someNode,1,2)=0 and substring(someNode,4,2)=0 and substring(someNode,7,2)>30">  This one is kind of throwing me, because this shows up as a path item, it's due to the when check in the solution but I don't see any nice solution since the same basic statement could have been checking for a branching path, so this might just be one of those cases we need to post-process or something of that nature.

2 回答

It is unrealistic to try reconstructing the structure of the source XML document from just looking at an XSLT transformation that operates on this document .

大多数XSLT转换都在一类XML文档上运行 - 而不是一种特定的文档类型 .

For example, the following is one of the most used XSLT transformation ：

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>
</xsl:stylesheet>

关于它处理的XML文档的结构的这种转换没有任何东西可以推断出来 .

有大量的转换只是覆盖了上述转换中的模板 .

For example, this is a useful transformation that renames any element having a particular name, specified in an external parameter ：

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:param name="pName"/>
 <xsl:param name="pNewName"/>

 <xsl:template match="node()|@*" name="identity">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="*">
  <xsl:if test="not(name() = $pName)">
   <xsl:call-template name="identity"/>
  </xsl:if>

  <xsl:element name="{$pNewName}">
   <xsl:apply-templates select="node()|@*"/>
  </xsl:element>
 </xsl:template>
</xsl:stylesheet>

Once again, absolutely nothing can be said about the names and structure of the source XML document.

UPDATE ：

也许是这样的：

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="xsl:template[@match]">
  <xsl:variable name="vPath" select="string(@match)"/>

  <xsl:value-of select="concat('&#xA;', $vPath)"/>

  <xsl:apply-templates select="*">
   <xsl:with-param name="pPath" select="$vPath"/>
  </xsl:apply-templates>
 </xsl:template>

 <xsl:template match="*">
  <xsl:param name="pPath"/>

  <xsl:apply-templates select="*">
   <xsl:with-param name="pPath" select="$pPath"/>
  </xsl:apply-templates>
 </xsl:template>

 <xsl:template match="xsl:for-each">
  <xsl:param name="pPath"/>

  <xsl:variable name="vPath">
   <xsl:choose>
    <xsl:when test="starts-with(@select, '/')">
      <xsl:value-of select="@select"/>
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="concat($pPath, '/', @select)"/>
    </xsl:otherwise>
   </xsl:choose>
  </xsl:variable>

  <xsl:value-of select="concat('&#xA;', $vPath)"/>

  <xsl:apply-templates select="*">
   <xsl:with-param name="pPath" select="$vPath"/>
  </xsl:apply-templates>
 </xsl:template>

 <xsl:template match="xsl:if | xsl:when">
  <xsl:param name="pPath"/>

  <xsl:variable name="vPath">
   <xsl:choose>
    <xsl:when test="starts-with(@test, '/')">
      <xsl:value-of select="@test"/>
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="concat($pPath, '/', @test)"/>
    </xsl:otherwise>
   </xsl:choose>
  </xsl:variable>

  <xsl:value-of select="concat('&#xA;', $vPath)"/>

  <xsl:apply-templates select="*">
   <xsl:with-param name="pPath" select="$pPath"/>
  </xsl:apply-templates>
 </xsl:template>

 <xsl:template match="*[@select]">
  <xsl:param name="pPath"/>

  <xsl:variable name="vPath">
   <xsl:choose>
    <xsl:when test="starts-with(@select, '/')">
      <xsl:value-of select="@select"/>
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="concat($pPath, '/', @select)"/>
    </xsl:otherwise>
   </xsl:choose>
  </xsl:variable>

  <xsl:value-of select="concat('&#xA;', $vPath)"/>

  <xsl:apply-templates select="*">
   <xsl:with-param name="pPath" select="$pPath"/>
  </xsl:apply-templates>
 </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the following XSLT stylesheet ：

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>

    <xsl:template match="/">
        <xsl:apply-templates/>
    </xsl:template>

    <xsl:template match="server_stats">
        <xsl:for-each select="server">
            <xsl:value-of select="visitors/current"/>
            <xsl:value-of select="visitors/popular_link"/>

            <xsl:for-each select="site">
              <xsl:value-of select="defaultPage/Url"/>
            </xsl:for-each>
        </xsl:for-each>
    </xsl:template>
</xsl:stylesheet>

the following wanted result is produced ：

/
server_stats
server_stats/server
server_stats/visitors/current
server_stats/visitors/popular_link
server_stats/site
server_stats/defaultPage/Url

Do Note ：这种分析不仅不完整，而且必须考虑到一点点 . 这些是静态分析的结果 . 在实践中可能发生的是，在100个路径中，在99％的时间内仅访问其中的5-6个 . 静态分析不能给你这样的信息 . 动态分析工具（类似于分析器）可以返回更加精确和有用的信息 .

回复于 2024-04-25T17:01:41+08:00

0

这将是具有挑战性的，因为XSLT是依赖于上下文的 . 你称之为“解析”是正确的，因为你将不得不复制许多将进入解析器的逻辑 .

我的建议是从蛮力方法开始，并在找到更多无法处理的测试用例时对其进行优化 . 查看几个XSLT文件并编写可以找到您正在寻找的结构的代码 . 再看几个，如果出现任何新结构，请优化代码以找到它们 .

这将无法找到可以使用XSLT和XPath的所有可能方式，作为解析这些文件的纯粹经验方法，但它将是一个小得多的项目，并将找到开发文件的人倾向于使用的结构 .

回复于 2024-04-25T17:01:41+08:00

解析XML节点列表的XSLT / Xpath数据

2 回答

相关问题