我需要在SVM中调整C&Sigma时跟踪F1分数,例如以下代码跟踪准确度,我需要将其更改为F1-Score但我无法做到......
%# read some training data
[labels,data] = libsvmread('./heart_scale');
%# grid of parameters
folds = 5;
[C,gamma] = meshgrid(-5:2:15, -15:2:3);
%# grid search, and cross-validation
cv_acc = zeros(numel(C),1);
for i=1:numel(C)
cv_acc(i) = svmtrain(labels, data, ...
sprintf('-c %f -g %f -v %d', 2^C(i), 2^gamma(i), folds));
end
%# pair (C,gamma) with best accuracy
[~,idx] = max(cv_acc);
%# now you can train you model using best_C and best_gamma
best_C = 2^C(idx);
best_gamma = 2^gamma(idx);
%# ...
我看过以下两个链接
Retraining after Cross Validation with libsvm
10 fold cross-validation in one-against-all SVM (using LibSVM)
我知道我必须首先在训练数据中找到最佳的C和gamma / sigma参数,然后使用这两个值进行LEAVE-ONE-OUT交叉验证分类实验,所以我现在想要的是先做一个网格 - 调整C&sigma的研究 . 请我更喜欢使用MATLAB-SVM而不是LIBSVM . 以下是我的LEAVE-ONE-OUT交叉验证分类代码 .
... clc
clear all
close all
a = load('V1.csv');
X = double(a(:,1:12));
Y = double(a(:,13));
% train data
datall=[X,Y];
A=datall;
n = 40;
ordering = randperm(n);
B = A(ordering, :);
good=B;
input=good(:,1:12);
target=good(:,13);
CVO = cvpartition(target,'leaveout',1);
cp = classperf(target); %# init performance tracker
svmModel=[];
for i = 1:CVO.NumTestSets %# for each fold
trIdx = CVO.training(i);
teIdx = CVO.test(i);
%# train an SVM model over training instances
svmModel = svmtrain(input(trIdx,:), target(trIdx), ...
'Autoscale',true, 'Showplot',false, 'Method','ls', ...
'BoxConstraint',0.1, 'Kernel_Function','rbf', 'RBF_Sigma',0.1);
%# test using test instances
pred = svmclassify(svmModel, input(teIdx,:), 'Showplot',false);
%# evaluate and update performance object
cp = classperf(cp, pred, teIdx);
end
%# get accuracy
accuracy=cp.CorrectRate*100
sensitivity=cp.Sensitivity*100
specificity=cp.Specificity*100
PPV=cp.PositivePredictiveValue*100
NPV=cp.NegativePredictiveValue*100
%# get confusion matrix
%# columns:actual, rows:predicted, last-row: unclassified instances
cp.CountingMatrix
recallP = sensitivity;
recallN = specificity;
precisionP = PPV;
precisionN = NPV;
f1P = 2*((precisionP*recallP)/(precisionP + recallP));
f1N = 2*((precisionN*recallN)/(precisionN + recallN));
aF1 = ((f1P+f1N)/2);
我已经改变了代码,但我犯了一些错误,我收到了错误,
a = load('V1.csv');
X = double(a(:,1:12));
Y = double(a(:,13));
% train data
datall=[X,Y];
A=datall;
n = 40;
ordering = randperm(n);
B = A(ordering, :);
good=B;
inpt=good(:,1:12);
target=good(:,13);
k=10;
cvFolds = crossvalind('Kfold', target, k); %# get indices of 10-fold CV
cp = classperf(target); %# init performance tracker
svmModel=[];
for i = 1:k
testIdx = (cvFolds == i); %# get indices of test instances
trainIdx = ~testIdx;
C = 0.1:0.1:1;
S = 0.1:0.1:1;
fscores = zeros(numel(C), numel(S)); %// Pre-allocation
for c = 1:numel(C)
for s = 1:numel(S)
vals = crossval(@(XTRAIN, YTRAIN, XVAL, YVAL)(fun(XTRAIN, YTRAIN, XVAL, YVAL, C(c), S(c))),inpt(trainIdx,:),target(trainIdx));
fscores(c,s) = mean(vals);
end
end
end
[cbest, sbest] = find(fscores == max(fscores(:)));
C_final = C(cbest);
S_final = S(sbest);
.......
和功能.....
.....
function fscore = fun(XTRAIN, YTRAIN, XVAL, YVAL, C, S)
svmModel = svmtrain(XTRAIN, YTRAIN, ...
'Autoscale',true, 'Showplot',false, 'Method','ls', ...
'BoxConstraint', C, 'Kernel_Function','rbf', 'RBF_Sigma', S);
pred = svmclassify(svmModel, XVAL, 'Showplot',false);
cp = classperf(YVAL, pred)
%# get accuracy
accuracy=cp.CorrectRate*100
sensitivity=cp.Sensitivity*100
specificity=cp.Specificity*100
PPV=cp.PositivePredictiveValue*100
NPV=cp.NegativePredictiveValue*100
%# get confusion matrix
%# columns:actual, rows:predicted, last-row: unclassified instances
cp.CountingMatrix
recallP = sensitivity;
recallN = specificity;
precisionP = PPV;
precisionN = NPV;
f1P = 2*((precisionP*recallP)/(precisionP + recallP));
f1N = 2*((precisionN*recallN)/(precisionN + recallN));
fscore = ((f1P+f1N)/2);
end
2 回答
所以基本上你想要采用你的这一行:
将它放在一个改变
'BoxConstraint'
和'RBF_Sigma'
参数的循环中,然后使用crossval输出该迭代参数组合的f1分数 .您可以像在libsvm代码示例中一样使用单个for循环(即使用
meshgrid
和1:numel()
,这可能更快)或嵌套的for循环 . 我将使用嵌套循环,以便您有两种方法:现在我们只需要定义函数
fun
. 文档有关于fun
的说法:所以
fun
需要:输出一个单独的f分数
将X和Y的训练和测试集作为输入 . 请注意,这些都是实际训练集的子集!可以把它们想象成训练集的训练和验证SUBSET . 另请注意,crossval将为您分割这些设置!
在训练子集上训练分类器(使用循环中的当前
C
和S
参数)在测试(或验证相当)子集上运行新分类器
计算并输出性能指标(在您的情况下,您希望获得f1分数)
你会注意到
fun
可以't take any extra parameters which is why I'将它包装在一个匿名函数中,这样我们就可以传递当前的C
和S
值 . (即上面所有的@(...)(fun(...))
. 这只是一个把我们的六个参数fun
变成4参数的技巧crossval
要求的一个 .我找到了
target(trainIdx)
的唯一问题 . 它是一个行向量,所以我只用target(trainIdx)
替换了target(trainIdx)
这是一个列向量 .