首页 文章

R:如何使用另一个数据框中的“countif”值在数据框中创建新列?

提问于
浏览
0

我有一个数据帧(df1),如下所示 . 它表明公司在特定市场中活跃的年份 .

Company  Country  Year
A        Austria  2010
A        Germany  2010
A        Austria  2011
B        Italy    2010

我现在有第二个数据帧(df2),如下所示 . 它列出了某个国家/地区公司在给定时间内的所有投资,投资类型为虚拟变量 .

Company  Country  Year  JointVenture  M&A  Greenfield
A        Austria  2010  1             0    0
A        Austria  2010  0             1    0
A        Austria  2010  1             0    0
...

我现在的问题如下:我想 add a new column to df1 ,包括df2中所示的每种投资类型的 "countif" . 例如,新的df1:

Company  Country  Year  Count.JointVenture  Count.M&A  Count.Greenfield
A        Austria  2010  2                   1          0
A        Germany  2010  ...........
A        Austria  2011
B        Italy    2010

另外,我如何能够将新列添加到df1,将这些计数转换为虚拟变量(如果> 0则为1;如果为0则为0)?

感谢和抱歉这个基本问题,但我没有在现有的线程中找到合适的解决方案 .

干杯,马丁

3 回答

  • 1

    使用aggregate()和ifelse()函数

    # test data
    df <- data.frame(Company = rep("A", 3), 
                     Country = rep("Austria", 3), 
                     Year = rep(2010, 3), 
                     JointVenture = c(1,0,1), 
                     MnA = c(0,1,0), 
                     Greenfield = rep(0,3))
    # this is the new df
    counts <- aggregate(cbind(JointVenture, MnA, Greenfield)~Country+Company+Year, data = df, FUN = sum)
    
    # dummy
    counts$dummyJointVenture <- ifelse(counts$JointVenture > 0, 1, 0)
    counts$dummyMnA <- ifelse(counts$MnA > 0, 1, 0)
    counts$dummyGreenfield <- ifelse(counts$Greenfield > 0, 1, 0)
    
  • 1

    我将我的 data.table 尝试投入竞技场:

    df <- fread("Company  Country  Year
                 A        Austria  2010
                 A        Germany  2010
                 A        Austria  2011
                 B        Italy    2010")
    
    df2 <- fread("Company  Country  Year  JointVenture  M&A  Greenfield
                  A        Austria  2010  1             0    0
                  A        Austria  2010  0             1    0
                  A        Austria  2010  1             0    0")
    
    setkey(df2, Company, Country, Year)
    df2[,c("JointVenture", "M&A", "Greenfield") := .(sum(JointVenture), sum(`M&A`), sum(Greenfield)), by=.(Company, Country, Year)]
    merge(x=df, y=unique(df2), by=c("Company", "Country", "Year"), all.x=T, all.y=F, suffixes = c("", "Count."))
    

    结果如何

    Company Country Year JointVenture M&A Greenfield
    1:       A Austria 2010            2   1          0
    2:       A Austria 2011           NA  NA         NA
    3:       A Germany 2010           NA  NA         NA
    4:       B   Italy 2010           NA  NA         NA
    
  • 2

    使用 dplyr::summarise_eachmerge 与Martin的数据 .

    df <- fread("Company  Country  Year
                 A        Austria  2010
                 A        Germany  2010
                 A        Austria  2011
                 B        Italy    2010")
    
    df2 <- fread("Company  Country  Year  JointVenture  MA  Greenfield
                  A        Austria  2010  1             0    0
                  A        Austria  2010  0             1    0
                  A        Austria  2010  1             0    0")
    library(dplyr)
    df2 %>% 
      group_by(Company, Country, Year) %>%
    summarise_each(funs(sum), JointVenture:Greenfield) %>%
    full_join(df, by = c("Company", "Country", "Year")) -> df
    

    编辑:用@zacdav中的输入替换 summarisesummarise_each 并将 merge 替换为 full_join 以保留 dplyr

相关问题