我正在运行以这种方式定义的logit模型:
diversity_model <- glm(booking_bool ~ df$var_distance + df$var_price + df$var_prop_review_score +
df$var_starrating + srch_hits + min_rating + max_rating + avg_rating + min_review +
max_review + avg_review + min_loc_score + max_loc_score + avg_loc_score + avg_price + min_price +
srch_booking_window + srch_adults_count + random_bool + prop_starrating + prop_review_score +
prop_brand_bool + prop_location_score1 + prop_log_historical_price + position + promotion_flag +
click_bool ,
family = binomial(link = "logit"), df)
现在,基于结果,我想预测使用以下代码:
new.ob = with(df, data.frame(var_distance = mean(var_distance), var_price = mean(var_price),
var_prop_review_score = mean(var_prop_review_score),
var_starrating = mean(var_starrating), srch_hits = mean(srch_hits),
min_rating = mean(min_rating),max_rating = mean(max_rating),
avg_rating = mean(avg_rating), min_review = mean(min_review),
max_review = mean(max_review), avg_review = mean(avg_review),
min_loc_score = mean(min_loc_score), max_loc_score = mean(max_loc_score),
avg_loc_score = mean(avg_loc_score), avg_price = mean(avg_price),
min_price = mean(min_price),
srch_booking_window = mean(srch_booking_window), srch_adults_count = mean(srch_adults_count),
random_bool = mean(random_bool), prop_starrating = mean(prop_starrating), prop_review_score = mean(prop_review_score, na.rm=TRUE),
prop_brand_bool = mean(prop_brand_bool), prop_location_score1 = mean(prop_location_score1),
prop_log_historical_price = mean(prop_log_historical_price), position = mean(position), promotion_flag = mean(promotion_flag),
click_bool = mean(click_bool)))
predict(diversity_model, newdata = new.ob, type = "response")
我得到以下错误消息:
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) :variable lengths differ (found for 'srch_hits')
In addition: Warning message:
'newdata' had 1 row but variables found have 66766 rows
1 回答
通常,您不应使用
$
来访问模型公式中的变量(仅使用变量的名称,并依赖于R从作为data
参数传递的数据框中提取它) . 为方便起见,您可以在公式的左侧使用.
,这意味着"all of the variables in the data frame except the response variable" . 例如:如果要以所有变量的平均值计算预测(假设所有预测变量都是数字):