数据转换：从R中的二元数据到观察数据-Java 学习之路

我有一个（定向的）二元数据集，看起来像这样（见下文） . 我现在想做的是每年只做一次观察 . 所以在这种情况下，1992年只有一次观察（AFG 1992）和1993年的一次观察（AFG 1993），同时删除了其他观察结果 . 同一年中我保留在数据中的哪个观察结果无关紧要（对country2不感兴趣） .

country1   country2    year    X   X1
Afghanistan Colombia    1992    1   0.44
Afghanistan Venezuela   1992    1   0.45
Afghanistan Peru        1992    1   0.46
Afghanistan Brazil      1992    1   0.47
Afghanistan Bolivia     1992    1   0.48
Afghanistan Chile       1992    1   0.49
Afghanistan Argentina   1992    1   0.50
Afghanistan Uruguay     1993    0   0.51
Afghanistan USA         1993    0   0.52
Afghanistan Canada      1993    0   0.53
Afghanistan UK          1993    0   0.54
Afghanistan Netherlands 1993    0   0.55
Afghanistan Belgium     1993    0   0.56
Afghanistan Luxembourg  1993    0   0.57
Afghanistan France      1993    0   0.58

我的尝试：

newdata<- data %>% 
  group_by(country1,year) %>%
  summarise() %>%
  select(unique.x=country1, unique.y=year)

This works BUT how do I keep all other variables from "data" in the "newdata"? I can't think of any way of doing this （我觉得更实用） . 有帮助吗？

Desired outcome

country1     year   X
    Afghanistan 1991   1
    Afghanistan 1992   0

dput（数据）结构（列表（country1 =结构（c（1L，1L，1L，1L，1L，1L，1L，1L，1L，1L，1L，1L，1L，1L，1L，1L，1L，1L），1L，1L，1L，1L，1L，1L，1L，1L，1L，1L，1L，1L，1L，1L，1L）， . 标签=“阿富汗”，等级=“因子”），country2 =结构（ c（8L，33L，24L，5L，4L，7L，1L，32L，31L，6L，30L，21L，3L，19L，14L，29L，27L，26L，15L，25L，2L，17L，10L，18L， 13L，28L，23L，11L，9L，16L，12L，20L，22L）,. Label = c（“阿根廷”，“奥地利”，“比利时”，“玻利维亚，多民族国”，“巴西”，“加拿大“，”智利“，”哥伦比亚“，”古巴“，”捷克共和国“，”丹麦“，”多米尼加共和国“，”芬兰“，”法国“，”德国“，”几内亚比绍“，”匈牙利“， “意大利”，“卢森堡”，“毛里塔尼亚”，“荷兰”，“尼日尔”，“挪威”，“秘鲁”，“波兰”，“葡萄牙”，“西班牙”，“瑞典”，“瑞士”，“联合国” Kingdom“，”United States“，”Uruguay“，”Venezuela，Bolivarian Republic“），class =”factor“），year = c（1992L，1992L，1992L，1992L，1992L，1992L，1992L，1993L，1993L， 1993L，1993L，1993L，1993L，1993L，1993L， 1994L，1994L，1994L，1994L，1994L，1994L，1994L，1994L，1995L，1995L，1995L，1995L，1995L，1995L，1995L，1995L，1995L，1995L），X = c（1L，1L，1L，1L，1L ，1L，1L，0L，0L，0L，0L，0L，0L，0L，0L，0L，0L，0L，0L，0L，0L，0L，0L，1L，1L，1L，1L，1L，1L，1L ，1L，1L，1L），X1 = c（0.44,0.45,0.46,0.47,0.48,0.49,0.5,0.51,0.52,0.53,0.54,0.55,0.56,0.57,0.58,0.59,0.6,0.61,0.62， 0.63,0.64,0.65,0.66,0.67,0.68,0.69,7.7,0.71,0.72,0.73,0.74,0.75,0.76））,. Names = c（“country1”，“country2”，“year”，“X” ，“X1”），class =“data.frame”，row.names = c（NA，-33L））

3 回答

你可以试试：

data %>%
    group_by(year) %>%
    top_n(1) %>%
    select(country1, X)

回复于 2024-05-06T13:17:51+08:00

0

newdata <- olddata[!duplicated(olddata$year),]

回答这个问题

newdata <- olddata[!duplicated(paste(olddata$country1, olddata$year)),]

给你你想要的东西

回复于 2024-05-06T13:17:51+08:00
1
我真的不明白你的问题，但为了得到你的 desired output 你可以使用：
```
data %>% 
  group_by(country1, year) %>%
  summarise(X = mean(X))
```
当您将其应用于整个data.frame时，请记住，对于 country1 和 year 的唯一组合，此代码将返回 X 中所有值的平均值 .
回复于 2024-05-06T13:17:51+08:00

数据转换：从R中的二元数据到观察数据

3 回答

相关问题