首页 文章

R子集数据框中的错误然后使用sapply

提问于
浏览
-2

我试图在数据框中对数据组(县)进行回归(lm) . 但是,我首先想要过滤该数据帧(dat)以排除一些数据点太少的组 . 只要我不首先对数据框进行子集化,我就能让一切正常工作:

tmp1 <- with(dat, 
    by(dat, County,
        function(x) lm(formula = Y ~ A + B + C, data=x)))
sapply(tmp1, function(x) summary(x)$adj.r.squared)

我按预期回来了:

Barrow Carroll Cherokee Clayton Cobb Dekalb Douglas 0.00000 NaN 0.61952 0.69591 0.48092 0.61292 0.39335

但是,当我第一次对数据框进行子集时:

dat.counties <- aggregate(dat[,"County"], by=list(County), FUN=length)
good.counties <- as.matrix(subset(dat.counties, x > 20, select=Group.1))
dat.temp <- dat["County" %in% good.counties,]

然后运行相同的代码:

tmp2 <- with(dat, 
by(dat, County,
    function(x) lm(formula = Y ~ A + B + C, data=x)))
sapply(tmp2, function(x) summary(x)$adj.r.squared)

我收到以下错误:" $ operator is invalid for atomic vectors" . 如果我然后运行 summary(tmp2) 我看到以下内容:

长度等级模式
Barrow 0 -none- NULL Carroll 0 -none- NULL Cherokee 12 lm list Clayton 12 lm list

sapply显然是对Class -none-对象的轰炸 . 但那些是我上面排除的那些!它们如何仍然出现在我的新数据框架中?!

谢谢你的任何启示 .

1 回答

  • 1

    代码的某些部分不清楚 . 可能你做了 attach 数据集 . 此外,还有@BrodieG评论使用错误 dat 而不是 dat.temp 的问题 . 关于错误,可能是因为列 Countyfactor 并且 levels 未被删除 . 你可以试试

    dat.temp1 <- droplevels(dat.temp)
    tmp2 <- with(dat.temp1, 
          by(dat.temp1, County,
          function(x) lm(formula = Y ~ A + B + C, data=x)))
    sapply(tmp2, function(x) summary(x)$adj.r.squared)
    

    这是一个重现错误的示例

    set.seed(24)
    d <- data.frame(
     state = rep(c('NY', 'CA','MD', 'ND'), c(10,10,6,7)),
     year = sample(1:10,33,replace=TRUE),
     response= rnorm(33)
    )
    
     tmp1 <- with(d, by(d, state, function(x) lm(formula=response~year, data=x)))
     sapply(tmp1, function(x) summary(x)$adj.r.squared)
     #       CA          MD          ND          NY 
     # 0.03701114 -0.04988296 -0.07817515 -0.11850038 
    
    d.states <- aggregate(d[,"state"], by=list(d[,'state']), FUN=length)
    good.states <- as.matrix(subset(d.states, x > 6, select=Group.1))
    d.sub <-  d[d$state %in% good.states[,1],]
    
    tmp2 <- with(d.sub, 
        by(d.sub, state,
          function(x) lm(formula = response~year, data=x)))
    sapply(tmp2, function(x) summary(x)$adj.r.squared)
    #Error in summary(x)$adj.r.squared : 
    # $ operator is invalid for atomic vectors
    

    如果你看看

    tmp2[2]
     #$MD
     #NULL
    
    d.sub1 <- droplevels(d.sub)
    tmp2 <- with(d.sub1, 
          by(d.sub1, state,
              function(x) lm(formula = response~year, data=x)))
    sapply(tmp2, function(x) summary(x)$adj.r.squared)
    #       CA          ND          NY 
    # 0.03701114 -0.07817515 -0.11850038
    

相关问题