遗传算法和支持向量回归的大问题-Java 学习之路

我正在研究一种预测功能，该功能使用遗传算法优化RBF内核的nu-SVR超参数 . 该模型包括因变量的滞后值和其他回归量的滞后值 .

关于我做什么的一些说明

GA用于适应性的标准是减去样本外的均方误差 . 使用扩展窗口预测练习生成OOS错误 . 如果你遗漏了100个观测训练集中的30％，那么GA将使用超参数估计X许多SVR，它选择从1到（70-h 1）的观测值，其中h是预测范围 . 然后，它将针对所考虑的所有超参数值预测观察70并计算误差 . 然后，窗口扩展到包括观察（71-h 1）并重复该过程 . GA在一代中考虑的所有超参数将产生30个OOS预测误差 . 每个超参数向量的适应度是-mean（errors ^ 2） .

我使用输出列表列表的自定义函数为扩展窗口练习生成子样本 . 在上面的示例中，列表中有30个项目 . 在每种情况下，[[1]]将是用于估计的子样本，[[2]]包含用于进行预测和收集误差的观察矩阵的行 . 在所有情况下，第一列包含因变量，其他列包含回归量 .

回归矩阵也使用自定义函数 . 它还使用自定义函数来计算GA的MSE和模型适应度 . 我使用包'e1071'用于SVR，'GA'用于遗传算法 .

问题

昨天，一切都运转正常 . 我实际上设法运行遗传算法，获得最佳hp并产生预测 . 今天，我添加了一个函数来为'预测函数'创建一个合适的矩阵，并设法使用最优的hp和最后观察到的值来获得预测 . 遗憾的是，我没有保存我的工作的许多副本（我将再也不会这样做了），所以我不能仅仅取回工作代码 .

但是，我可能在我的一个自定义函数中的某个地方更改了一行 . 这个问题的复杂性在于'ga'称为'fit_fct'，它调用'mse_fct'，它调用'svm'......如果单独运行这些函数，它们可以正常使用我使用的对象（即fit_fct，mse_fct和所有其他函数都可以自行运行 . 问题在于'ga'如何调用它们 .

我收到此错误消息：

' Error in svm.default(x, y, scale = scale, ..., na.action = na.action) : NA/NaN/Inf in foreign function call (arg 12) '

现在，我将为您提供重现所有内容的代码 . 首先，所有功能：

# TS_validation -------------------------------------------
TS_samples <- function(data, frac, h){
 # Description: this function forms training and test samples
 # in an extended window POOS pattern. 
 #
 # INPUTS
 # Data contains both left hand and right hand side variables
 # frac is the fraction used for testing the model
 # h is the forecast horizon
 # 
 # OUTPUTS
 # list_TS is a list containing both training and test sets
 # organized as follows: list_TS[[sample]][[type]] where
 # type==1 is the training set and type==2 is the test set.

  bigt <- nrow(data)             # Length of training set
  tau  <- round((1-frac)*bigt)   # Start of OOS

  list_TS <- lapply(tau:bigt, function(i) list(
       train <- data[1:(i-h),],
       test  <- data[i,]
       ))
   return(list_TS)
 } # ----------------------------------------------------- #
 # MSE function for SVM ------------------------------------
 mse_fct <- function(formula_svm, train, test, cost, gamma, nu){
 # Description: computes MSE for an nu-SVR using training
 # and test sets.
 # 
 # NOTE: variable to forecasted must be in the first column
 # Dependencies: e1071
 #
 # INPUTS
 # formula_svm is the structure of the model
 # train and test are training and test sets
 # cost, gamma and nu are the hyperparameters of the SVM
 # 
 # OUTPUTS
 # mse (mean squared error)

    test <- as.matrix(test)
 if (dim(test)[2] == 1){
    test <- t(test)
 }
 # TRAIN MODEL
 mdl <- svm(formula_svm, data=train, 
            kernel='radial', type='nu-regression',
            cost=cost, gamma=gamma, nu=nu)
 # PREDICT LABELS
 mse <- mean((predict(mdl, newdata=test) - test[,1])^2, na.rm=TRUE)
 return(mse)
 } # ----------------------------------------------------- #
 # Fitness function for GA ---------------------------------
 fit_fct <- function(x, list_data, formula_svm){
 # Nu-SVR parameters
    cost  <- x[1]
    gamma <- x[2]
    nu    <- x[3]

 # Get MSE across all folds
 a <- list_data
 mse_value <- sapply(1:length(a), 
                  function(i) mse_fct(formula_svm = formula_svm, 
                                      train=a[[i]][[1]], test=a[[i]][[2]],
                                      cost=cost, gamma=gamma, nu=nu))
 # Fitness measure is maximized, thus -mse
 fit <- -mean(mse_value)
 return(fit)
 } # ----------------------------------------------------- #
 # Function to make regression matrix ----------------------
 make_reg_matrix <- function(y,factors,h,max_y,max_f){
 # Description: This function creates a regression matrix
 # containing the dependent variable, y, its lagged values
 # from h to max_y and lagged exogenous regressors, 
 # from lag h to max_f
 #
 # NOTE: y and factors must of same time dimension.
 #
 # OUTPUT
 # First column is dependent variable. All others are
 # regressors. 

 if (max_y < h || max_f < h){
   stop('Lags must be greater than or equal to forecast horizons.')
 }

 bigt <- nrow(as.matrix(y))       # Time dimension
 bign <- ncol(as.matrix(factors)) # Number of factors

 lags <- sapply(h:max_y, function(i) c(array(NA,dim=i),y[1:(bigt-i)]))
 f <- do.call(cbind, lapply(h:max_f, function(i) 
                         rbind(array(NA,dim=c(i,bign)),
                               factors[1:(bigt-i),]))
           )
 reg <- cbind(y,lags,f)
 colnames(reg) <- 1:ncol(reg)

 return(reg)
 }

现在，这里的代码类似于我正在编写的函数 . 在我看来，以这种方式跟踪错误更简单，它确实会产生完全相同的问题，所以如果它在这里得到解决，它会在任何地方得到解决：

set.seed(1234)
 y       <- as.matrix(rnorm(100))
 factors <- array(rnorm(200), dim=c(100,2))
 max_y <- 3
 max_f <- 3
 frac <- .3
 h <- 2

 # FORECASTING CODE

 # Range of hyperparameters
 min_theta <- c(abs(median(y)-3*sd(y)),  2^(-5), 0.1)
 max_theta <- c(abs(median(y)+3*sd(y)),  2^5, 0.7)

 # Make regressor matrix and last observed values
 train <- make_reg_matrix(y,factors,h,max_y,max_f)
 # Drop NAs (if any)
 drop <- max(sapply(seq_along(colnames(train)), 
                     function(i) sum(is.na(train[,i]))))
 train <- train[-c(1:drop),] # Dropping missing values
 # Add names to columns for formulas
 colnames(train) <- paste('V', colnames(train), sep='')

 # Make newdata for prediction (last observations)
 last  <- make_last(y,factors,h,max_y,max_f)

 # Make samples
 samples <- TS_samples(train,frac,h)

 # Genetic algorithm
 formula_svm <- as.formula(c('V1 ~ 1 +', 
                        paste(paste('V', 2:ncol(train), sep=''), 
                              collapse='+')))

 results <- ga(type = "real-valued", fitness = fit_fct, samples,
               formula_svm,
               names = c('Cost', 'Gamma', 'Nu'), 
               min = min_theta, max = max_theta,
               popSize = 50, maxiter = 10, maxFitness=-0.002, 
               seed = 1071)

你可以尝试单独调用每个函数（mse_fct，fit_fct等），看看它们确实可以自己运行 . 当我称之为'ga'时，这只是一个问题，这很奇怪，因为所有它确实是选择超参数向量并调用'fit_fct'来评估健身...

如果有人提出解决方案，我会非常满意，因为我完全不知道发生了什么 . 在此先感谢您的帮助 .

遗传算法和支持向量回归的大问题

相关问题