首页 文章

使用dplyr的困难时间 - n_distinct或spread - 使用子组

提问于
浏览
1

我想这个问题是重复的,但我找不到任何有效的答案,以简单而优雅的方式使用dplyr在group_by之后添加子组计数 . 如果此问题重复,请删除 . 如果你想要一个代码重现,我会这样做 . 请不要点击“否定” .

我曾尝试使用 spread ,但它没有用,之后,我试图按照说明here,一旦它有助于在数据框中按组计数唯一,但它不起作用 . 同样的解决方案是here,但输出很奇怪 .

我有什么:
enter image description here

我真正想要的(使用简单的代码......我想dplyr可以处理它而不必使用gather()),是为每个因子级别插入三个新列 .

enter image description here

我的代码:

descritivos %>% 
  group_by(sexo) %>% 
  summarise(n=n(),Idade_media = mean(idade, na.rm=T),
  idade_sd=sd(idade, na.rm=T),
              qtde_sexo = n(),
  Proporção_sexo = n()/nrow(.),
  Pontuação_media=mean(total),
  pontuacao_sd=sd(total), n_unique = n_distinct(Escolaridade))

有了这段代码,我几乎就在那里,但它复制了一些输出 .

descritivos %>% 
  group_by(sexo, Escolaridade) %>% 
  summarise(n=n(),Idade_media = mean(idade, na.rm=T),
  idade_sd=sd(idade, na.rm=T),
              qtde_sexo = n(),
  Proporção_sexo = n()/nrow(.),
  Pontuação_media=mean(total),
  pontuacao_sd=sd(total), n_unique = n_distinct(Escolaridade)) %>% spread(Escolaridade, n)
  spread(count(Escolaridade), n, fill=0)

enter image description here

这是一个可重现的代码:

library(tidyverse)
ds <- data.frame(sex=c(0,1), schooling=c("k12","high","college","university"), age=rnorm(mean=20,sd=2, n=40))

ds %>% group_by(sex, schooling) %>% 
  summarise(mean(age), n=n()) %>% spread(schooling, n)


ds %>% group_by(sex, schooling) %>% 
  summarise(n()) %>% t()

所需的输出:
enter image description here
非常感谢

最后编辑:

感谢@Akrun,我解决了我的问题 . 如果您有相同的,请遵循以下代码:

descritivos %>% 
  group_by(sexo) %>%
  group_by(Escolaridade, 
  Idade_media = mean(idade, na.rm=T),
  idade_sd=sd(idade, na.rm=T),
              qtde_sexo = n(),
  Proporção_sexo = n()/nrow(.),
  Pontuação_media=mean(total),
  pontuacao_sd=sd(total), add=TRUE) %>%
  summarise(n=n()) %>% 
  spread(Escolaridade, n)

或者此代码为可重现的代码:

ds %>% group_by(sex) %>% 
  group_by(schooling = paste0("school", schooling), Mean = mean(age), 
           ndist = n_distinct(schooling), add = TRUE) %>% summarise(n = n()) %>% 
  spread(schooling, n)

1 回答

  • 1

    我们可以在一个链中做到这一点

    ds %>% 
      group_by(sex) %>%
      group_by(schooling = paste0("school", schooling), Mean = mean(age), 
                    ndist = n_distinct(schooling), add = TRUE) %>% 
      summarise(n = n()) %>% 
      spread(schooling, n)
    

相关问题