加速GraphDB上的SPARQL查询-Java 学习之路

我正在努力加快并优化此查询

select distinct ?root where { 
    ?root a :Root ;
          :hasnode* ?node ;
          :hasnode* ?node2 .

    ?node a :Node ;
           :hasAnnotation ?ann .
    ?ann :hasReference ?ref .
    ?ref a :ReferenceType1 .

    ?node2 a :Node ;
            :hasAnnotation ?ann2 .
    ?ann2 :hasReference ?ref2 .
    ?ref2 a :ReferenceType2 .

}

基本上，我正在分析一些树，我想得到所有树（即树的根），它们至少有几个底层节点，其格式如下：

?node_x a :Node ;
       :hasAnnotation ?ann_x .
?ann_x :hasReference ?ref_x .
?ref_x a :ReferenceTypex .

一个用 x = 1 ，另一个用 x = 2 .

因为在我的图中，一个节点最多只能有一个 :hasAnnotation 谓词，所以我不必指定那些节点必须是不同的 .

The problem

前面提到的查询描述了我需要的但是性能非常差 . 执行几分钟和几分钟后，它仍在运行 .

My (ugly) solution: breaking it in half

我注意到，如果一次查找节点模式，我会在几秒钟内得到我的结果（！） .

可悲的是，我目前的方法是运行以下两种查询类型：

select distinct ?root where { 
    ?root a :Root ;
          :hasnode* ?node .

    ?node a :Node ;
           :hasAnnotation ?ann_x .
    ?ann_x :hasReference ?ref_x .
    ?ref_x a :ReferenceTypex .
}

一个用 x = 1 ，另一个用 x = 2 .

将部分结果（即 ?root s）保存为2组，假设为 R1 和 R2 ，最后计算这些结果集之间的交集 .

有没有办法通过利用SPARQL加快我的初始方法来获得结果？

PS：我正在使用GraphDB .

2 回答

2
在不知道具体数据集的情况下，我只能给出一些如何优化查询的一般指导：

Avoid using DISTINCT for large datasets

GraphDB查询优化器不会自动重写查询以将EXISTS用于未参与投影的所有模式 . 查询语义是找到至少有一个这样的模式，但不给我所有绑定，然后消除重复的结果 .

Materialize the property paths

GraphDB具有非常有效的前向链接推理器和相对不那么优化的属性路径扩展 . 如果您不关心写/数据更新性能，我建议您将 :hasNode 声明为传递属性（请参阅owl:TransitiveProperty in query），这将消除属性路径通配符 . 这将提高查询速度的许多倍 .

您的最终查询应如下所示：
```
select ?root where { 
    ?root a :Root ;
          :hasnode ?node ;
          :hasnode ?node2 .

    FILTER (?node != ?node2)

    FILTER EXISTS {
        ?node a :Node ;
               :hasAnnotation ?ann .
        ?ann :hasReference ?ref .
        ?ref a :ReferenceType1 .
    }

    FILTER EXISTS {
        ?node2 a :Node ;
                :hasAnnotation ?ann2 .
        ?ann2 :hasReference ?ref2 .
        ?ref2 a :ReferenceType2 .
    }
}
```
回复于 2024-04-29T23:39:21+08:00

好吧，把自动暗示:)和斯坦尼斯拉夫的建议放在一起我想出了一个解决方案 .

Solution 1 nested query

以下列方式嵌套查询，我得到 15s 中的结果 .

select distinct ?root where { 
    ?root a :Root ;
          :hasnode* ?node .
    ?node a :Node ;
          :hasAnnotation ?ann .
    ?ann :hasReference ?ref .
    ?ref a :ReferenceType1 .
    {
        select distinct ?root where { 
            ?root a :Root ;
                  :hasnode* ?node2 .
            ?node2 a :Node ;
                   :hasAnnotation ?ann2 .
            ?ann2 :hasReference ?ref2 .
            ?ref2 a :ReferenceType2 .
        }
    }
}

Solution 2: groups into {}

按照斯坦尼斯拉夫的建议，将零件分组为 {} ，需要 60s .

select distinct ?root where { 
    {
    ?root a :Root ;
          :hasnode* ?node .

    ?node a :Node ;
           :hasAnnotation ?ann .
    ?ann :hasReference ?ref .
    ?ref a :ReferenceType1 .
    }
    {
        ?root a :Root ;
          :hasnode* ?node2 .

              ?node2 a :Node ;
            :hasAnnotation ?ann2 .
    ?ann2 :hasReference ?ref2 .
    ?ref2 a :ReferenceType2 .
    }
}

在第一种情况下，GraphDB的优化器可能会为我的数据构建更有效的查询计划（欢迎解释） .

我曾经以'声明'的方式考虑过SPARQL，但似乎在编写SPARQL的方式上性能存在巨大差异 . 从SQL开始，在我看来，这种性能变化远大于它在关系世界中发生的变化 .

但是，阅读this post，似乎我没有充分意识到SPARQL优化器的动态 . :)

回复于 2024-04-29T23:39:21+08:00

加速GraphDB上的SPARQL查询

2 回答

相关问题