In a previous post, we went through the TensorFlow code for a multilayer perceptron. Now we will discuss how we train the model with TensorFlow, specifically in a TensorFlow Session.
We will use Aymeric Damien’s implementation in this post. I recommend you skim through the code first and have the code open in a separate window. I have included the key portions of the code below.
Procedures within a TensorFlow Session
Let’s take a look at the portion of code that works inside a TensorFlow session first.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
# Initializing the variables init = tf.initialize_all_variables() # Launch the graph with tf.Session() as sess: sess.run(init) # Training cycle for epoch in range(training_epochs): avg_cost = 0. total_batch = int(mnist.train.num_examples/batch_size) # Loop over all batches for i in range(total_batch): batch_x, batch_y = mnist.train.next_batch(batch_size) # Run optimization op (backprop) and cost op (to get loss value) _, c = sess.run([optimizer, cost], feed_dict={x: batch_x, y: batch_y}) # Compute average loss avg_cost += c / total_batch # Display logs per epoch step if epoch % display_step == 0: print("Epoch:", '%04d' % (epoch+1), "cost=", \ "{:.9f}".format(avg_cost)) print("Optimization Finished!") # Test model correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1)) # Calculate accuracy accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float")) print("Accuracy:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels})) |
Within a session, we
- Initialise variables
- For each epoch:
- For each batch:
- Run (1) optimisation op (backprop) and (2) cost op (to get loss value)
- Compute average cost for batch
- Display average cost for this epoch
- For each batch:

Slightly more graphical
Unpacking the optimisation and cost operations
The meat of this is in running the optimisation and cost operations. Let’s unpack that section. The code is
1 |
_, c = sess.run([optimizer, cost], feed_dict={x: batch_x, y: batch_y}) |
This runs the optimiser and cost operations with the input x = batch_x, y = batch_y. We’re feeding the model the input, hence feed_dict.
Return values:
- _ is the variable we use to hold what gets returned by running the optimizer in the Session. We don’t need to save that output since the optimiser alters the model’s weights and biases directly, so we save the optimiser’s returned value to _.
- c is the variable we save the cost to.
What’s happening when we run optimizer and cost? We go further back into the code to find:
1 2 3 |
# Define loss and optimizer cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y)) optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost) |
Looking further back, we can see where pred
comes from:
1 2 |
# Construct model pred = multilayer_perceptron(x, weights, biases) |
What does all this mean? We can summarise it in a diagram:
Further reading: