首页 文章

不 balancer 数据集的变量创建

提问于
浏览
0

我有一个观察数据集 ID year event_type event_date . 每_____1961_ year 的观测数量不 balancer . 具体而言,这些是冲突年代的战争结果 . 每场战斗都有一个日期和类型(结果) .

我想要做的是根据 ID year 子集中某种类型的事件数创建一个变量 . 所以:

by ID

by year

event_type == x 的总和

我理解如何使用常规for循环来执行此操作,但我知道我应该使用tapply(),因为每个 ID 有不同的观察数量?

2 回答

  • 2

    如果我正确理解了这个问题,那么:

    aggregate(event_type ~ ID + year, subset(df,event_type=="x"), length)
    
  • 2
    library(plyr)
    df <-data.frame(ID=sample(11:20,25,replace=T),year=sample(1900:1905,25,replace=T),event_type=sample(c("win","lose"),25,replace=T))
    
    # To see this sample data sorted by ID and year.
    arrange(df,ID,year)
      ID year event_type
    1  11 1901        win
    2  11 1904        win
    3  11 1910       lose
    4  12 1920       lose
    5  13 1900        win
    6  13 1905        win
    7  13 1906       lose
    8  13 1912        win
    9  13 1920       lose
    10 14 1906        win
    11 14 1918       lose
    12 14 1920        win
    13 15 1909        win
    14 15 1919        win
    15 16 1916        win
    16 16 1920       lose
    17 18 1901       lose
    18 18 1910       lose
    19 18 1912       lose
    20 18 1920        win
    21 19 1916        win
    22 19 1916        win
    23 19 1917       lose
    24 20 1901       lose
    25 20 1914       lose
    
    
    
       result <- ddply(df, .(ID,year,event_type),summarise, event_count=length(event_type))
    
        >result
       ID year event_type event_count
    1  11 1903        win           1
    2  11 1905       lose           1
    3  12 1903       lose           1
    4  12 1905        win           1
    5  13 1902        win           1
    6  13 1905       lose           1
    7  14 1903        win           1
    8  15 1901        win           2
    9  15 1903       lose           1
    10 15 1905        win           1
    11 16 1904        win           1
    12 17 1904       lose           1
    13 18 1900       lose           2
    14 18 1900        win           1
    15 18 1902       lose           1
    16 18 1904        win           1
    17 18 1905        win           1
    18 19 1901       lose           1
    19 19 1902        win           1
    20 19 1903       lose           1
    21 19 1903        win           1
    22 20 1901        win           1
    23 20 1904        win           1
    

    让我们说你只想计算胜利而不是损失,然后像:

    result <- ddply(subset(df,event_type=="win"), .(ID,year,event_type),summarise, event_count=length(event_type))
    
    >result
       ID year event_type event_count
    1  11 1903        win           1
    2  12 1905        win           1
    3  13 1902        win           1
    4  14 1903        win           1
    5  15 1901        win           2
    6  15 1905        win           1
    7  16 1904        win           1
    8  18 1900        win           1
    9  18 1904        win           1
    10 18 1905        win           1
    11 19 1902        win           1
    12 19 1903        win           1
    13 20 1901        win           1
    14 20 1904        win           1
    

相关问题