首页 文章

使用gsub在函数内部无效的字符替换

提问于
浏览
1

我试图在R中的数据框中替换一些意外的字符 . 根据Replace multiple arguments with gsub,gsub函数应该在这种情况下正常工作,所以我尝试了这种方式 .

我在数据框第一列中的值如下:

La Flèche Wallonne
Liège - Bastogne - Liège
Tour de Romandie
Giro d´Italia
Critérium du Dauphiné

代码实现如下:

callChangeCharacters <- function(results){
for(i in 1:nrow(results)){
    race <- results[i,1]
    race <- gsub("é","e",race)
    race <- gsub("â","a",race)
    race <- gsub("ó","o",race)
    race <- gsub("ž","z",race)
    race <- gsub("ú","u",race)
    race <- gsub("ø","o",race)
    race <- gsub("Å›","s",race)
    race <- gsub("Å‚","l",race)
    race <- gsub("ä‚","a",race)
    race <- gsub("è","e",race)
    race <- gsub("Ã","a",race)
    race <- gsub("Å","s",race)
    race <- gsub("Ä","c",race)
    race <- gsub("´","'",race)
    results[i,1] <- race
}
return(results)
}

如果我运行for循环中的代码,我成功获得预期的结果:

La Fleche Wallonne
Liege - Bastogne - Liege
Tour de Romandie
Giro d'Italia
Criterium du Dauphine

但是,如果我调用该函数,结果将不相同,并且不会更正不需要的字符:

> correctedDF <- callChangeCharacters(results)
> correctedDF
                                        V1
La Flèche Wallonne
Liège - Bastogne - Liège
Tour de Romandie
Giro d´Italia
Critérium du Dauphiné

我得到的结果的输出如下(此版本的结果更长但问题是相同的):

> dput(results)
structure(list(V1 = c("Santos Tour Down Under", "Paris - Nice", 
"Tirreno-Adriatico", "Milano-Sanremo", "Volta Ciclista a Catalunya", 
"E3 Prijs Vlaanderen - Harelbeke", "Gent - Wevelgem", "Ronde van Vlaanderen / Tour des Flandres", 
"Vuelta Ciclista al Pais Vasco", "Paris - Roubaix", "Amstel Gold Race", 
"La Flèche Wallonne", "Liège - Bastogne - Liège", "Tour de Romandie", 
"Giro d´Italia", "Critérium du Dauphiné", "Tour de Suisse", 
"Tour de France", "Tour de Pologne", NA, "Clasica Ciclista San Sebastian", 
"Eneco Tour", "Vuelta a España", "Vattenfall Cyclassics", "GP Ouest France - Plouay", 
"Grand Prix Cycliste de Québec", "Grand Prix Cycliste de Montréal", 
"Il Lombardia", "Tour of Beijing")), .Names = "V1", row.names = c(1L, 
1686L, 4601L, 6743L, 6943L, 9274L, 9473L, 9673L, 9880L, 11581L, 
11779L, 11978L, 12168L, 12367L, 14264L, 21957L, 24734L, 27727L, 
35542L, 37354L, 37470L, 37627L, 39885L, 47277L, 47441L, 47624L, 
47788L, 47952L, 48147L), class = "data.frame")

知道为什么它在函数内部不起作用?

提前致谢 .

2 回答

  • 0

    我有一个类似的问题,因为我使用 source 函数导入我的代码而没有指定 encoding 参数应该是 "utf-8" .

    source("./code.R")
    

    在检查了我读过的函数时,我意识到 source 函数已经改变了某些特殊字符,因此函数没有按预期工作 . 解决方案是将 encoding 参数设置为 "utf-8" .

    source("./code.R", encoding="utf-8")
    
  • 2

    你的代码有效 . 此外,您还应该更改 ñ (请参阅"Vuelta a España") .

    gsub 函数是矢量化的,因此您根本不需要循环 .

    cleanup <- function(race) {
        race <- gsub("é","e",race)
        race <- gsub("â","a",race)
        race <- gsub("ó","o",race)
        race <- gsub("ž","z",race)
        race <- gsub("ú","u",race)
        race <- gsub("ø","o",race)
        race <- gsub("Å›","s",race)
        race <- gsub("Å‚","l",race)
        race <- gsub("ä‚","a",race)
        race <- gsub("è","e",race)
        race <- gsub("Ã","a",race)
        race <- gsub("Å","s",race)
        race <- gsub("Ä","c",race)
        race <- gsub("´","'",race)
        return(race)
    }
    
    results$V1 <- cleanup(results$V1)
    

    如果只有一列,为什么要使用data.frame?保持向量 race 会更方便 .

    如果你真的想要一个直接在 results 上工作的函数,那么仍然没有循环 .

    callChangeCharacters <- function(results) {
        results[,1] <- cleanup(results[,1])
        return(results)
    }
    

相关问题