首页 文章

基于R中的匹配字符进行合并

提问于
浏览
0

对于两个示例数据帧:

df1 <- structure(list(name = c("Katie", "Eve", "James", "Alexander", 
"Mary", "Barrie", "Harry", "Sam"), postcode = c("CB12FR", "CB12FR", 
"NE34TR", "DH34RL", "PE46YH", "IL57DS", "IP43WR", "IL45TR")), .Names = c("name", 
"postcode"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-8L), spec = structure(list(cols = structure(list(name = structure(list(), class = c("collector_character", 
"collector")), postcode = structure(list(), class = c("collector_character", 
"collector"))), .Names = c("name", "postcode")), default = structure(list(), class = c("collector_guess", 
"collector"))), .Names = c("cols", "default"), class = "col_spec"))

df2 <-structure(list(name = c("Katie", "James", "Alexander", "Lucie", 
"Mary", "Barrie", "Claire", "Harry", "Clare", "Hannah", "Rob", 
"Eve", "Sarah"), postcode = c("CB12FR", "NE34TR", "DH34RL", "DL56TH", 
"PE46YH", "IL57DS", "RE35TP", "IP43WQ", "BH35OP", "CB12FR", "DL56TH", 
"CB12FR", "IL45TR"), rating = c(1L, 1L, 1L, 2L, 3L, 1L, 4L, 2L, 
2L, 3L, 1L, 4L, 2L)), .Names = c("name", "postcode", "rating"
), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-13L), spec = structure(list(cols = structure(list(name = structure(list(), class = c("collector_character", 
"collector")), postcode = structure(list(), class = c("collector_character", 
"collector")), rating = structure(list(), class = c("collector_integer", 
"collector"))), .Names = c("name", "postcode", "rating")), default = structure(list(), class = c("collector_guess", 
"collector"))), .Names = c("cols", "default"), class = "col_spec"))

我想在df1中添加一个额外的列,它给出df2的评级 . 每个邮政编码可能有多个评级(这就是为什么直接合并不起作用的原因 .

我只想合并两个数据帧当邮政编码和名称的前3个字符相同时(提供这些在df1中是唯一的) . 例如,如果有一个凯瑟琳和凯蒂 - (两个都有相同的邮政编码),这些将不会合并

我很高兴有没有合并的空白 .

有任何想法吗?

1 回答

  • 2

    使用多列的简单连接不会解决您的问题吗?就像是,

    df<-merge(x=df1,y=df2,by=c('name','postcode'),all.x=T)
    

    如果列名称不匹配,则替代解决方案,

    df1$key<-paste(df1$name,df1$postcode,sep="_")
    df2$key<-paste(df2$name,df2$postcode,sep="_")
    
    df<-merge(x=df1,y=df2,by=c('key'),all.x=T)
    

相关问题