首页 文章

组功能使用无效;试图找到皮尔逊相关性

提问于
浏览
1

我'm trying to figure how to calculate the pearson correlation coefficient using sql. Here is the formula I' m使用:
enter image description here
,这是我正在使用的表:
enter image description here

这是我到目前为止的查询,但它给了我这样的信息:无效使用组功能

select first_id, second_id, movie_id, first_score, second_score,  count(*) as n, 
sum((first_score-avg(first_score))*(second_score-avg(second_score)))/
(
sqrt(sum(first_score-avg(first_score)))*
sqrt(sum(second_score-avg(second_score))))
as pearson
from connections
group by second_id

谢谢你的帮助

2 回答

  • 2

    这是一个在公式中进行计算的查询:

    select sum((first_score - avg_first_score)*(second_score - avg_second_score)) /
           (sqrt(sum(pow((first_score - avg_first_score), 2)))*
            sqrt(sum(pow((second_score - avg_second_score), 2)))
           ) as r      
    from connections c cross join
         (select avg(first_score) as avg_first_score, avg(second_score) as avg_second_score
          from connections
         ) const;
    

    你的尝试有很多问题 . 这会预先计算两个分数的平均值 . 然后它几乎按照书面形式应用公式 .

  • 0

    从纯粹的语法角度来看,你的 group by 条款存在问题 . 它应列出每个非聚合列以使其正常工作 . 它应该是:

    group by first_id, second_id, movie_id, first_score, second_score
    

相关问题