首页 文章

如何使用neo4j cypher查询创建按关系数量划分的节点直方图?

提问于
浏览
3

我有一大堆节点与下面的密码相匹配:

(:Word)<-[:Searched]-(:Session)

我想在搜索关系的每个频率上制作一个Word节点数的直方图 .

我想制作这种图表:

Searches Words
0        100
1-5      200
6-10     150
11-15    50
16-20    25

我刚刚开始使用neo4j,我不知道如何处理这个问题,或者即使有一种方法可以在cypher中指定它 . 我最接近的是计算关系并获得平均值 .

MATCH (n:Word) 
RETURN
DISTINCT labels(n),
count(*) AS NumofNodes,
avg(size((n)<-[:Searched]-())) AS AvgNumOfRelationships,
min(size((n)<-[:Searched]-())) AS MinNumOfRelationships,
max(size((n)<-[:Searched]-())) AS MaxNumOfRelationships

这基于一个例子:https://neo4j.com/developer/kb/how-do-i-produce-an-inventory-of-statistics-on-nodes-relationships-properties/

我've also seen use of the modulus operator for grouping to get buckets, though I'我不知道如何使用它来引用计数:Neo4j cypher time interval histogram query of time tree

有没有“最好”的方法来做到这一点?

4 回答

  • 1

    我能够找出一个我认为能得到我想要的数据的查询:

    MATCH (n:Word) 
    WITH n, 5 AS bucketsize
    WITH (FLOOR(SIZE( (n)<-[:Searched]-() ) / bucketsize) * bucketsize) AS numRels
    RETURN numRels, COUNT(*)
    ORDER BY numRels ASC
    

    它没有得到我想拥有的零行,但似乎它起作用 . 希望其他人有更好的解决方案 .

  • 1

    以下应该有效:

    WITH 5 AS gSize
    MATCH (w:Word)
    OPTIONAL MATCH (w)<-[s:Searched]-()
    WITH gSize, w, TOINT((COUNT(s) + (gSize-1))/gSize * gSize) AS m
    RETURN
      CASE m WHEN 0 THEN '0' ELSE (m-gSize+1)+'-'+m END AS range,
      COUNT(*) AS ct
    ORDER BY range;
    

    使用@GaborSzarnyas提供的示例数据,输出为:

    +-------------+
    | range  | ct |
    +-------------+
    | "0"    | 1  |
    | "1-5"  | 1  |
    | "6-10" | 1  |
    +-------------+
    
  • 1

    我创建了一个包含三个单词的简单示例数据集: w1 没有搜索, w2 有3次搜索, w3 有6 .

    CREATE (w1:Word {w: '1'})
    WITH count(*) AS dummy
    
    CREATE (w2:Word {w: '2'}) WITH w2
    UNWIND range(1, 3) AS i
    CREATE (w2)<-[:Searched]-(:Session)
    WITH count(*) AS dummy
    
    CREATE (w3:Word {w: '3'}) WITH w3
    UNWIND range(1, 6) AS i
    CREATE (w3)<-[:Searched]-(:Session)
    

    我会这样做:首先,让我们创建一个列表,其中包含每个桶的上限:

    RETURN [i IN range(0, 4) | i*5] AS upperLimits
    
    ╒══════════════╕
    │"upperLimits" │
    ╞══════════════╡
    │[0,5,10,15,20]│
    └──────────────┘
    

    其次,使用list comprehension选择列表中具有足够大上限的元素 . 其中第一个是我们的存储桶,因此我们使用 [0] 列表索引器选择它 . 其余的只是计算下限和排序行:

    WITH [i IN range(0, 4) | i*5] AS upperLimits
    MATCH (n:Word) 
    WITH upperLimits, ID(n) AS n, size((n)<-[:Searched]-()) AS numOfRelationships
    WITH
      [upperLimit IN upperLimits WHERE numOfRelationships <= upperLimit][0] AS upperLimit,
      count(n) AS count
    RETURN
      upperLimit - 4 AS lowerLimit,
      upperLimit,
      count
    ORDER BY lowerLimit
    

    该查询提供以下结果:

    ╒════════════╤════════════╤═══════╕
    │"lowerLimit"│"upperLimit"│"count"│
    ╞════════════╪════════════╪═══════╡
    │-4          │0           │1      │
    ├────────────┼────────────┼───────┤
    │1           │5           │1      │
    ├────────────┼────────────┼───────┤
    │6           │10          │1      │
    └────────────┴────────────┴───────┘
    

    潜在改进:

    (1)如果 numOfRelationships 的值大于最大上限,则上面的查询将返回空列表的第一个元素,即 null . 为了避免这种情况,要么1)设置足够大的上限,例如,

    MATCH (n:Word) 
    WITH max(size((n)<-[:Searched]-())) AS maxNumberOfRelationShips
    WITH [i IN range(-1, maxNumberOfRelationShips/5+1) | {lower: i*5-4, upper: i*5}] AS limits
    RETURN *
    

    您可以将"16 or larger"语义的顶部存储区与coalesce一起使用 .

    (2) -4 作为下限不是很好,我们可以使用 CASE 来摆脱它 .

    把所有这些放在一起,我们得到这个:

    MATCH (n:Word) 
    WITH max(size((n)<-[:Searched]-())) AS maxNumberOfRelationShips
    WITH [i IN range(0, maxNumberOfRelationShips/5+1) | i*5] AS upperLimits
    MATCH (n:Word) 
    WITH upperLimits, ID(n) AS n, size((n)<-[:Searched]-()) AS numOfRelationships
    WITH
      [upperLimit IN upperLimits WHERE numOfRelationships <= upperLimit][0] AS upperLimit,
      count(n) AS count
    RETURN 
      CASE WHEN upperLimit - 4 < 0 THEN 0 ELSE upperLimit - 4 END AS lowerLimit,
      upperLimit,
      count
    ORDER BY lowerLimit
    

    结果如下:

    ╒════════════╤════════════╤═══════╕
    │"lowerLimit"│"upperLimit"│"count"│
    ╞════════════╪════════════╪═══════╡
    │0           │0           │1      │
    ├────────────┼────────────┼───────┤
    │1           │5           │1      │
    ├────────────┼────────────┼───────┤
    │6           │10          │1      │
    └────────────┴────────────┴───────┘
    
  • 2

    我在这种情况下通常做的是我使用neo4j中的设置,如果你将整数除以整数,你会得到一个整数 . 这简化了查询 . 我们为0添加一个特殊情况,它们都适合一行 .

    WITH [0,1,5,7,9,11] as list
    UNWIND list as x
    WITH CASE WHEN x = 0 THEN -1 ELSE  (x / 5) * 5 END as results
    return results
    

    这回来了

    -1,0,5,5,5,10

    考虑到你想要将1-5组合在一起,这是不理想的,但我认为足够好 .

相关问题