我的训练集中我的神经网络的丢失正在增加,而我试图用我的优化器最小化它 . 我正在尝试用回归学习神经网络 . 它有327个输入值,2个隐藏层(256和128个节点)和3个值的输出层(它可以做出3个选择的估计奖励) .
我在除输出层之外的所有层上使用ReLu,并且我使用ADAM优化器 . 此外,我使用10个元素的训练集来简化事物(我也尝试使用更大的数据集) .
神经网络代码:
import tensorflow as tf
import numpy as np
from random import choice
training_input = []
training_labels = []
# Read input data
with open("mini_training_set.csv", "r") as f:
for line in f:
line_string = line.split(",")
# Input data is of type int
training_input.append(map(int, line_string[:327]))
# Label data is of type float
training_labels.append(map(float, line_string[327:]))
# Parameters
learning_rate = 0.01
training_epochs = 2000
batch_size = 10 # Only one batch per epoch
display_step = 1 # Display values after each epoch
# Network Parameters
input = 327
hidden_1 = 256 # 1st hidden layer
hidden_2 = 128 # 2nd hidden layer
output = 3
# tf Graph input, output
tf_input = tf.placeholder(tf.int32, [None, input])
tf_output = tf.placeholder(tf.float32, [None, output])
# Store layers weights (initialize with normal distribution)
weights = {
'h1': tf.Variable(tf.random_normal([input, hidden_1])),
'h2': tf.Variable(tf.random_normal([hidden_1, hidden_2])),
'out': tf.Variable(tf.random_normal([hidden_2, output]))
}
# Store layers biases
biases = {
'b1': tf.Variable(tf.random_normal([hidden_1])),
'b2': tf.Variable(tf.random_normal([hidden_2])),
'out': tf.Variable(tf.random_normal([output]))
}
# Create model
def fully_conected_neural_net(x, weights, biases):
# Fully connected hidden layer 1
fc1 = tf.add(tf.matmul(tf.cast(x, tf.float32), weights['h1']), biases['b1'])
fc1 = tf.nn.relu(fc1)
# Fully connected hidden layer 2
fc2 = tf.add(tf.matmul(fc1, weights['h2']), biases['b2'])
fc2 = tf.nn.relu(fc2)
# Output
out = tf.add(tf.matmul(fc2, weights['out']), biases['out'])
return out
# Construct model
model = fully_conected_neural_net(tf_input, weights, biases)
# Define loss and optimizer
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=model, labels=tf_output))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss)
# Initializing the variables
init = tf.global_variables_initializer()
# Launch the graph
with tf.Session() as sess:
sess.run(init)
batches_per_epoch = int(len(training_input) / batch_size) # One in this case
# Training cycle
for epoch in range(training_epochs):
avg_loss = 0. # Loss per epoch
# Loop over all batches (1 in this case)
for i in range(batches_per_epoch):
input_batch = training_input[i*batch_size:(i+1)*batch_size]
label_batch = training_labels[i * batch_size:(i + 1) * batch_size]
# Run optimization and check loss
_, batch_loss = sess.run([optimizer, loss], feed_dict={tf_input: input_batch, tf_output: label_batch})
avg_loss += batch_loss / batches_per_epoch
# Display logs per epoch step
if epoch % display_step == 0:
print("Epoch:", '%04d' % (epoch + 1), " loss={:.9f} ".format(avg_loss))
一些时代:
-
("Epoch:","0001"," loss=0.138995901 ")
-
("Epoch:","0101"," loss=0.206539005 ")
-
("Epoch:","0200"," loss=2.097094059 ")
-
("Epoch:","0301"," loss=7.385912895 ")
-
("Epoch:","0400","loss=10.268421173 ")
-
("Epoch:","2000"," loss=725.435485840 ")
我想知道这种行为来自哪里?不应该将训练集的损失收敛到零(我知道在那种情况下它会在验证集上过度拟合,但至少应该这样做)?难道我做错了什么?