首页 文章

plm中的循环子集

提问于
浏览
2

我似乎做对了 . 我有一个50个国家(1到50)的数据集,每个国家15年,每个国家大约20个变量 . 现在我只在我的因变量( SMD )上测试一个变量( OS ) . 我想用循环国家/地区来做这个,所以我会得到每个国家的产出而不是总产出 .

我认为首先创建一个子集是明智的(能够首先查看国家1,之后我的循环应该增加国家和测试国家2的数量) . 我相信我在页面底部的回归应该会给出国家1的输出,而不是整个数据集的整体得分 . 但是我一直收到这些错误:

> pdata <- plm.data(newdata, index=c("Country","Date"))
  series    are constants and have been removed
> pooling <- plm(Y ~ X, data=pdata, model= "pooling") 
  series Country, xRegion are constants and have been removed
  Error in model.matrix.pFormula(formula, data, rhs = 1, model = model,  : 
  NA in the individual index variable
> summary(pooling)
  Error in summary(pooling) : object 'pooling' not found

我可能会看到这一切都错了,但我相信如果没有让它工作,那么进一步编程循环本身是没有意义的 . 关于解决我的错误或其他编程循环的方法的任何建议都非常感谢 .

我的代码:

rm(list = ls())
mydata <- read.table(file = file.choose(), header = TRUE, dec = ",")
names(mydata)
attach(mydata)

Y <- cbind(SMD)
X <- cbind(OS)

newdata <- subset(mydata, Country %in% c(1))

newdata

pdata <- plm.data(newdata, index=c("Country","Date"))
pooling <- plm(Y ~ X, data=pdata, model= "pooling") 
summary(pooling)

编辑:导致相同错误的前2个国家/地区的数据样本

dput(mydata)结构(列表(Region =结构)(c(1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L) ,1L,1L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L),. Label = c (“NAF”,“相同”),类=“因子”),国家= c(1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L,1L ,1L,1L,1L,1L,1L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L,2L ),Date = c(1995L,1996L,1997L,1998L,1999L,2000L,2001L,2002L,2003L,2004L,2005L,2006L,2007L,2008L,2009L,2010L,2011L,2012L,2013L,2014L,1995L,1996L, 1997L,1998L,1999L,2000L,2001L,2002L,2003L,2004L,2005L,2006L,2007L,2008L,2009L,2010L,2011L,2012L,2013L,2014L),OS =结构(c(19L,25L,27L,15L) ,22L,20L,23L,9L,7L,5L,2L,1L,4L,3L,6L,10L,11L,13L,11L,8L,26L,25L,31L,29L,28L,21L,30L,24L,24L ,16L,11L,14L,12L,17L,18L,29L,32L,32L,33L,34L),.标签= c(“51.5”,“52.2”,“55.6”,“56.4”,“56.7”,“ 57.7“,”57.8“,”5 8.3“,”59“,”59.2“,”59.6“,”59.9“,”60.2“,”60.4“,”61.1“,”61.2“,”62.2“,”62.3“,”62.8“,”63.2“ ,“63.3”,“63.8”,“63.9”,“64.2”,“64.3”,“64.5”,“64.7”,“65.3”,“65.5”,“65.6”,“66.4”,“68”,“ 69.6“,”70.7“),类=”因子“),SMD =结构(c(7L,12L,20L,21L,17L,15L,13L,10L,14L,22L,23L,33L,1L,32L,29L) ,34L,28L,25L,NA,NA,9L,6L,8L,4L,2L,35L,3L,36L,5L,11L,16L,18L,24L,19L,26L,31L,27L,30L,NA,NA ), . Label = c(“100.3565662”,“13.44788845”,“13.45858747”,“13.56815534”,“15.05892471”,“17.63789658”,“18.04088718”,“18.3101351”,“19.34226196”,“21.25530884”,“21.54423145” ,“23.75898948”,“24.08770926”,“26.39817342”,“29.44079001”,“31.40605191”,“34.46667996”,“34.52913657”,“35.66070947”,“36.4419931”,“39.16875621”,“44.0126137”,“45.72949566”,“ 49.13062679“,”54.83730247“,”56.87886311“,”59.80971583“,”60.5658962“,”69.20148901“,”70.91362874“,”72.64845214“,”73.97139238“,”75.20140919“,”76.18378138“,”9.570435019“,”9.867635305“ ),class =“factor” )),. Name = c(“Region”,“Country”,“Date”,“OS”,“SMD”),class =“data.frame”,row.names = c(NA,-40L))

1 回答

  • 0

    你确定你需要使用 plm ??这将按国家/地区生成摘要列表 .

    # convert factors to numeric
    mydata$SMD <- as.numeric(mydata$SMD)
    mydata$OS  <- as.numeric(mydata$OS)
    
    # Using lapply(...)
    smry <- lapply(unique(mydata$Country),
                   function(cntry)
                     summary(lm(SMD~OS,data=mydata[mydata$Country==cntry,])))
    # Same thing, using for loop
    smry <- list()
    for (cntry in unique(mydata$Country)) {
      smry <- list(smry, 
                   summary(lm(SMD~OS,data=mydata[mydata$Country==cntry,])))
    }
    

    在数据集中, SMDOS 是因子,需要先将其转换为数字 .

相关问题