首页 文章

逻辑下标太长了

提问于
浏览
0

我之前已经意识到这个问题,但是看了所有的答案,它们都是针对具体的问题,我无法找到适合我的独特情况的答案 .

我在R中输入以下内容,它适用于第一个示例,但不是第二个示例,我无法理解为什么 .

设置glm的数据:

setwd("P:/STAT319")
ucb2<-read.table('Berkeley.PoissonTwo.txt',header=TRUE)
attach(ucb2)

ucb2如下:

Count   Admit Department    Gender     
313 FALSE     A     Female     
512 TRUE      A     Female     
19  FALSE     A     Male       
89  TRUE      A     Male       
207 FALSE     B     Female     
353 TRUE      B     Female     
8   FALSE     B     Male       
17  TRUE      B     Male       
205 FALSE     C     Female     
120 TRUE      C     Female     
391 FALSE     C     Male       
202 TRUE      C     Male       
279 FALSE     D     Female     
138 TRUE      D     Female     
244 FALSE     D     Male       
131 TRUE      D     Male       
138 FALSE     E     Female     
53  TRUE      E     Female     
299 FALSE     E     Male       
94  TRUE      E     Male       
351 FALSE   F       Female     
22  TRUE      F     Female     
317 FALSE     F     Male       
24  TRUE      F     Male

使用因子变量,对于Admit和NotAdmit,为TRUE和FALSE:

Admit<-c(0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1)
fAdmit<-factor(Admit)
rAdmit<-factor(Admit,labels=c("FALSE","TRUE"))
glm2<-glm(Count~Admit+Department+Gender,family=poisson)
glm2

为离线验证做准备

library(car)
vif(glm2)
# GVIF Df GVIF^(1/(2*Df))
# Admit         1  1               1
# Department    1  5               1
# Gender        1  1               1
step(glm2)
# Start:  AIC=2272.73
# Count ~ Admit + Department + Gender
# 
# Df Deviance    AIC
# <none>            2097.7 2272.7
# - Department  5   2257.2 2422.2
# - Gender      1   2260.6 2433.6
# - Admit       1   2327.7 2500.8
# 
# Call:  glm(formula = Count ~ Admit + Department + Gender, family = poisson)
# 
# Coefficients:
#   (Intercept)        Admit  DepartmentB  DepartmentC  
# 5.82785     -0.45674     -0.46679     -0.01621  
# DepartmentD  DepartmentE  DepartmentF   GenderMale  
# -0.16384     -0.46850     -0.26752     -0.38287  

# Degrees of Freedom: 23 Total (i.e. Null);  16 Residual
# Null Deviance:        2650 
# Residual Deviance: 2098   AIC: 2273

library(ipred)
errorest(Count~Admit+Department+Gender,data=ucb2,model=glm,est.para=control.errorest(k=24))

# Call:
#   errorest.data.frame(formula = Count ~ Admit + Department + Gender, 
#                       data = ucb2, model = glm, est.para = control.errorest(k = # 24))
# 
# 24-fold cross-validation estimator of root mean squared error
# 
# Root mean squared error:  180.5741

所以第一个使用Data如图所示 . 现在要做同样的研究,我们不得不重新安排数据,并执行Logistic回归:

ucb1<-read.table('Monday.Late.txt',header=TRUE)
attach(ucb1)
# The following object is masked _by_ .GlobalEnv:
#   
#   Admit

# The following objects are masked from ucb2:
#   
#   Admit, Department, Gender

y<-cbind(ucb1[,1],ucb1[,2])
glm1<-glm(y~Gender+Department,family=binomial)

数据如下:

Admit   NotAdmit    Gender  Department     
512 313 female  a      
353 207 female  b      
120 205 female  c      
138 279 female  d      
53  138 female  e      
22  351 female  f      
89  19  male    a      
17  8   male    b      
202 391 male    c      
131 244 male    d      
94  299 male    e      
24  317 male    f

将此新数据设置为Leave One Out:

vif(glm1)
# GVIF Df GVIF^(1/(2*Df))
# Gender     1.384903  1        1.176819
# Department 1.384903  5        1.033099
step(glm1)
# Start:  AIC=103.14
# y ~ Gender + Department

# Df Deviance    AIC
# - Gender      1    21.74 102.68
# <none>             20.20 103.14
# - Department  5   783.61 856.55
# 
# Step:  AIC=102.68
# y ~ Department
# 
# Df Deviance    AIC
# <none>             21.74 102.68
# - Department  5   877.06 948.00
# 
# Call:  glm(formula = y ~ Department, family = binomial)
# 
# Coefficients:
#   (Intercept)  Departmentb  Departmentc  Departmentd  
# 0.59346     -0.05059     -1.20915     -1.25833  
# Departmente  Departmentf  
# -1.68296     -3.26911  
# 
# Degrees of Freedom: 11 Total (i.e. Null);  6 Residual
# Null Deviance:        877.1 
# Residual Deviance: 21.74  AIC: 102.7

到目前为止,这么好,但现在问题出现了:

errorest(y~Gender+Department,data=ucb1,model=glm,est.para=control.errorest(k=12))
Error in xj[i, , drop = FALSE] : (subscript) logical subscript too long

那么为什么会这样呢?我尝试了k的其他值,不确定k是什么值#意味着采取 - 我认为它的意思是行数

然后我尝试相同的数据,以不同的方式安排:

ucb1a<-read.table('Berkeley.Rearranged.txt',header=TRUE)
attach(ucb1a)
ucb1a

这是之前数据的重新排列

Admitted Not_Admit Depart Genders
1       512       313      A  Female
2        89        19      A    Male
3       353       207      B  Female
4        17         8      B    Male
5       120       205      C  Female
6       202       391      C    Male
7       138       279      D  Female
8       131       244      D    Male
9        53       138      E  Female
10       94       299      E    Male
11       22       351      F  Female
12       24       317      F    Male

然后

y<-cbind(ucb1[,1],ucb1[,2])
glm1a<-glm(y~Genders+Depart,family=binomial)
vif(glm1a)
# GVIF Df GVIF^(1/(2*Df))
# Gender     1.384903  1        1.176819
# Department 1.384903  5        1.033099

step(glm1a)
# Start:  AIC=103.14
# y ~ Gender + Department
# 
# Df Deviance    AIC
# - Gender      1    21.74 102.68
# <none>             20.20 103.14
# - Department  5   783.61 856.55
# 
# Step:  AIC=102.68
# y ~ Department
# 
# Df Deviance    AIC
# <none>             21.74 102.68
# - Department  5   877.06 948.00
# 
# Call:  glm(formula = y ~ Department, family = binomial)
# 
# Coefficients:
#   (Intercept)  Departmentb  Departmentc  Departmentd  
# 0.59346     -0.05059     -1.20915     -1.25833  
# Departmente  Departmentf  
# -1.68296     -3.26911  
# 
# Degrees of Freedom: 11 Total (i.e. Null);  6 Residual
# Null Deviance:        877.1 
# Residual Deviance: 21.74  AIC: 102.7

再次,到目前为止这么好,但再一次,这发生了:

errorest(y~Gender+Department,data=ucb1a,model=glm,est.para=control.errorest(k=12))
Error in xj[i, , drop = FALSE] : (subscript) logical subscript too long

并且相信我,我再次为k尝试了其他数字,我无法理解为什么这个数字出错了 . 因此,如果任何人有任何想法,对于这个(下标)逻辑下标的具体例子太长,请回复此 .

1 回答

  • 0

    当您的对象大小不同时会出现此问题 . 我认为你的问题来自attach(),但我不确定..尝试没有它的代码,或者你可以尝试使用() . 在使用nicola指出之前,你应该先检查为什么必须先使用attach() . 另外,我不确定你想用它来实现什么 .

    您可以在功能的帮助部分中看到以下内容:良好实践

    附加具有改变搜索路径的副作用,这很容易导致找到特定名称的错误对象 . 人们经常忘记分离数据库 .

    在交互式使用中,通常优于使用attach / detach,除非什么是save()生成的文件,在这种情况下attach()是load()的(安全)包装器 .

相关问题