首页 文章

Tensorflow KNN:我们如何分配K参数来定义KNN中的邻居数?

提问于
浏览
0

我已经开始在python tensorflow库上使用K-Nearest-Neighbors方法开发机器学习项目 . 我没有使用tensorflow工具的经验,所以我在github中找到了一些代码并为我的数据修改了它 .

我的数据集是这样的:

2,2,2,2,0,0,3
2,2,2,2,0,1,0
2,2,2,4,2,2,1
...
2,2,2,4,2,0,0

这是实际工作正常的代码:

import tensorflow as tf
import numpy as np

# Whole dataset => 1428 samples
dataset = 'car-eval-data-1.csv'
# samples for train, remaining for test
samples = 1300
reader = np.loadtxt(open(dataset, "rb"), delimiter=",", skiprows=1, dtype=np.int32)

train_x, train_y = reader[:samples,:5], reader[:samples,6]
test_x, test_y = reader[samples:, :5], reader[samples:, 6]

# Placeholder you can assign values in future. its kind of a variable
#  v = ("variable type",[None,4])  -- you can have multidimensional values here
training_values = tf.placeholder("float",[None,len(train_x[0])])
test_values     = tf.placeholder("float",[len(train_x[0])])

# MANHATTAN distance
distance = tf.abs(tf.reduce_sum(tf.square(tf.subtract(training_values,test_values)),reduction_indices=1))

prediction = tf.arg_min(distance, 0)
init = tf.global_variables_initializer()

accuracy = 0.0

with tf.Session() as sess:
    sess.run(init)
    # Looping through the test set to compare against the training set
    for i in range (len(test_x)):
        # Tensor flow method to get the prediction near to the test parameters in the training set.
        index_in_trainingset = sess.run(prediction, feed_dict={training_values:train_x,test_values:test_x[i]})    

        print("Test %d, and the prediction is %s, the real value is %s"%(i,train_y[index_in_trainingset],test_y[i]))
        if train_y[index_in_trainingset] == test_y[i]:
        # if prediction is right so accuracy increases.
            accuracy += 1. / len(test_x)

print('Accuracy -> ', accuracy * 100, ' %')

我唯一不理解的是,如果它是 KNN 方法,那么必须有一些 K parameter 定义 number of neighbors for predicting the label for each test sample .
我们如何分配K参数来调整代码的最近邻居数?
有没有办法修改此代码以使用K参数?

1 回答

  • 1

    你是对的,上面的例子没有选择K-Nearest邻居的规定 . 在下面的代码中,我添加了添加此类参数(knn_size)以及其他更正的功能

    import tensorflow as tf
    import numpy as np
    
    # Whole dataset => 1428 samples
    dataset = 'PATH_TO_DATASET_CSV'
    knn_size = 1
    # samples for train, remaining for test
    samples = 1300
    reader = np.loadtxt(open(dataset, "rb"), delimiter=",", skiprows=1, dtype=np.int32)
    
    train_x, train_y = reader[:samples,:6], reader[:samples,6]
    test_x, test_y = reader[samples:, :6], reader[samples:, 6]
    
    # Placeholder you can assign values in future. its kind of a variable
    #  v = ("variable type",[None,4])  -- you can have multidimensional values here
    training_values = tf.placeholder("float",[None, len(train_x[0])])
    test_values     = tf.placeholder("float",[len(train_x[0])])
    
    
    # MANHATTAN distance
    distance = tf.abs(tf.reduce_sum(tf.square(tf.subtract(training_values,test_values)),reduction_indices=1))
    
    # Here, we multiply the distance by -1 to reverse the magnitude of distances, i.e. the largest distance becomes the smallest distance
    # tf.nn.top_k returns the top k values and their indices, here k is controlled by the parameter knn_size 
    k_nearest_neighbour_values, k_nearest_neighbour_indices = tf.nn.top_k(tf.scalar_mul(-1,distance),k=knn_size)
    
    #Based on the indices we obtain from the previous step, we locate the exact class label set of the k closest matches in the training data
    best_training_labels = tf.gather(train_y,k_nearest_neighbour_indices)
    
    if knn_size==1:
        prediction = tf.squeeze(best_training_labels)
    else:
        # Now we make our prediction based on the class label that appears most frequently
        # tf.unique_with_counts() gives us all unique values that appear in a 1-D tensor along with their indices and counts 
        values, indices, counts = tf.unique_with_counts(best_training_labels)
        # This gives us the index of the class label that has repeated the most
        max_count_index = tf.argmax(counts,0)
        #Retrieve the required class label
        prediction = tf.gather(values,max_count_index)
    
    
    
    
    init = tf.global_variables_initializer()
    
    accuracy = 0.0
    
    with tf.Session() as sess:
        sess.run(init)
        # Looping through the test set to compare against the training set
        for i in range (len(test_x)):
    
    
            # Tensor flow method to get the prediction near to the test parameters in the training set.
            prediction_value = sess.run([prediction], feed_dict={training_values:train_x,test_values:test_x[i]})
    
            print("Test %d, and the prediction is %s, the real value is %s"%(i,prediction_value[0],test_y[i]))
            if prediction_value[0] == test_y[i]:
            # if prediction is right so accuracy increases.
                accuracy += 1. / len(test_x)
    
    print('Accuracy -> ', accuracy * 100, ' %')
    

相关问题