首页 文章

数据转换:从R中的二元数据到观察数据

提问于
浏览
0

我有一个(定向的)二元数据集,看起来像这样(见下文) . 我现在想做的是每年只做一次观察 . 所以在这种情况下,1992年只有一次观察(AFG 1992)和1993年的一次观察(AFG 1993),同时删除了其他观察结果 . 同一年中我保留在数据中的哪个观察结果无关紧要(对country2不感兴趣) .

country1   country2    year    X   X1
Afghanistan Colombia    1992    1   0.44
Afghanistan Venezuela   1992    1   0.45
Afghanistan Peru        1992    1   0.46
Afghanistan Brazil      1992    1   0.47
Afghanistan Bolivia     1992    1   0.48
Afghanistan Chile       1992    1   0.49
Afghanistan Argentina   1992    1   0.50
Afghanistan Uruguay     1993    0   0.51
Afghanistan USA         1993    0   0.52
Afghanistan Canada      1993    0   0.53
Afghanistan UK          1993    0   0.54
Afghanistan Netherlands 1993    0   0.55
Afghanistan Belgium     1993    0   0.56
Afghanistan Luxembourg  1993    0   0.57
Afghanistan France      1993    0   0.58

我的尝试:

newdata<- data %>% 
  group_by(country1,year) %>%
  summarise() %>%
  select(unique.x=country1, unique.y=year)

This works BUT how do I keep all other variables from "data" in the "newdata"? I can't think of any way of doing this (我觉得更实用) . 有帮助吗?

Desired outcome

country1     year   X
    Afghanistan 1991   1
    Afghanistan 1992   0

dput(数据)结构(列表(country1 =结构(c(1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L) ,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L), . 标签=“阿富汗”,等级=“因子”),country2 =结构( c(8L,33L,24L,5L,4L,7L,1L,32L,31L,6L,30L,21L,3L,19L,14L,29L,27L,26L,15L,25L,2L,17L,10L,18L, 13L,28L,23L,11L,9L,16L,12L,20L,22L),. Label = c(“阿根廷”,“奥地利”,“比利时”,“玻利维亚,多民族国”,“巴西”,“加拿大“,”智利“,”哥伦比亚“,”古巴“,”捷克共和国“,”丹麦“,”多米尼加共和国“,”芬兰“,”法国“,”德国“,”几内亚比绍“,”匈牙利“, “意大利”,“卢森堡”,“毛里塔尼亚”,“荷兰”,“尼日尔”,“挪威”,“秘鲁”,“波兰”,“葡萄牙”,“西班牙”,“瑞典”,“瑞士”,“联合国” Kingdom“,”United States“,”Uruguay“,”Venezuela,Bolivarian Republic“),class =”factor“),year = c(1992L,1992L,1992L,1992L,1992L,1992L,1992L,1993L,1993L, 1993L,1993L,1993L,1993L,1993L,1993L, 1994L,1994L,1994L,1994L,1994L,1994L,1994L,1994L,1995L,1995L,1995L,1995L,1995L,1995L,1995L,1995L,1995L,1995L),X = c(1L,1L,1L,1L,1L ,1L,1L,0L,0L,0L,0L,0L,0L,0L,0L,0L,0L,0L,0L,0L,0L,0L,0L,1L,1L,1L,1L,1L,1L,1L ,1L,1L,1L),X1 = c(0.44,0.45,0.46,0.47,0.48,0.49,0.5,0.51,0.52,0.53,0.54,0.55,0.56,0.57,0.58,0.59,0.6,0.61,0.62, 0.63,0.64,0.65,0.66,0.67,0.68,0.69,7.7,0.71,0.72,0.73,0.74,0.75,0.76)),. Names = c(“country1”,“country2”,“year”,“X” ,“X1”),class =“data.frame”,row.names = c(NA,-33L))

3 回答

  • 0

    你可以试试:

    data %>%
        group_by(year) %>%
        top_n(1) %>%
        select(country1, X)
    
  • 0

    newdata <- olddata[!duplicated(olddata$year),]

    回答这个问题

    newdata <- olddata[!duplicated(paste(olddata$country1, olddata$year)),]

    给你你想要的东西

  • 1

    我真的不明白你的问题,但为了得到你的 desired output 你可以使用:

    data %>% 
      group_by(country1, year) %>%
      summarise(X = mean(X))
    

    当您将其应用于整个data.frame时,请记住,对于 country1year 的唯一组合,此代码将返回 X 中所有值的平均值 .

相关问题