CME 323:
TensorFlow Tutorial
Bharath Ramsundar
Deep-Learning Package Zoo
● Torch
● Caffe
● Theano (Keras, Lasagne)
● CuDNN
● Tensorflow
● Mxnet
● Etc.
Deep-Learning Package Design Choices
● Model specification: Configuration file (e.g. Caffe,
DistBelief, CNTK) versus programmatic generation (e.g.
(Py)Torch, Theano, Tensorflow)
● Static graphs (TensorFlow, Theano) vs Dynamic Graphs
(PyTorch, TensorFlow Eager)
What is TensorFlow?
● TensorFlow is a deep learning library
recently open-sourced by Google.
● Extremely popular (4th most popular
software project on GitHub; more popular
than React...)
● But what does it actually do?
○ TensorFlow provides primitives for
defining functions on tensors and
automatically computing their derivatives.
But what’s a Tensor?
● Formally, tensors are multilinear maps from vector spaces
to the real numbers ( vector space, and dual space)
● A scalar is a tensor ( )
● A vector is a tensor ( )
● A matrix is a tensor ( )
● Common to have fixed basis, so a tensor can be
represented as a multidimensional array of numbers.
TensorFlow vs. Numpy
● Few people make this comparison, but TensorFlow and
Numpy are quite similar. (Both are N-d array libraries!)
● Numpy has Ndarray support, but doesn’t offer methods to
create tensor functions and automatically compute
derivatives (+ no GPU support).
VS
Simple Numpy Recap
In [23]: import numpy as np
In [24]: a = [Link]((2,2)); b = [Link]((2,2))
In [25]: [Link](b, axis=1)
Out[25]: array([ 2., 2.])
In [26]: [Link]
Out[26]: (2, 2)
In [27]: [Link](a, (1,4))
Out[27]: array([[ 0., 0., 0., 0.]])
More on Session
Repeat in TensorFlow soon
More on .eval()
In [31]: import tensorflow as tf in a few slides
In [32]: [Link]()
In [33]: a = [Link]((2,2)); b = [Link]((2,2))
In [34]: tf.reduce_sum(b, reduction_indices=1).eval()
Out[34]: array([ 2., 2.], dtype=float32) TensorShape behaves
like a python tuple.
In [35]: a.get_shape()
Out[35]: TensorShape([Dimension(2), Dimension(2)])
In [36]: [Link](a, (1, 4)).eval()
Out[36]: array([[ 0., 0., 0., 0.]], dtype=float32)
Numpy to TensorFlow Dictionary
Numpy TensorFlow
a = [Link]((2,2)); b = [Link]((2,2)) a = [Link]((2,2)), b = [Link]((2,2))
[Link](b, axis=1) tf.reduce_sum(a,reduction_indices=[1])
[Link] a.get_shape()
[Link](a, (1,4)) [Link](a, (1,4))
b * 5 + 1 b * 5 + 1
[Link](a,b) [Link](a, b)
a[0,0], a[:,0], a[0,:] a[0,0], a[:,0], a[0,:]
TensorFlow requires explicit evaluation!
In [37]: a = [Link]((2,2))
In [38]: ta = [Link]((2,2)) TensorFlow computations define a
computation graph that has no numerical
In [39]: print(a) value until evaluated!
[[ 0. 0.]
[ 0. 0.]] TensorFlow Eager has begun to change
this state of affairs...
In [40]: print(ta)
Tensor("zeros_1:0", shape=(2, 2), dtype=float32)
In [41]: print([Link]())
[[ 0. 0.]
[ 0. 0.]]
TensorFlow Session Object (1)
● “A Session object encapsulates the environment in which
Tensor objects are evaluated” - TensorFlow Docs
In [20]: a = [Link](5.0)
In [21]: b = [Link](6.0)
[Link]() is just syntactic sugar for
In [22]: c = a * b [Link](c) in the currently active
session!
In [23]: with [Link]() as sess:
....: print([Link](c))
....: print([Link]())
....:
30.0
30.0
TensorFlow Session Object (2)
● [Link]() is just convenient syntactic
sugar for keeping a default session open in ipython.
● [Link](c) is an example of a TensorFlow Fetch. Will
say more on this soon.
Tensorflow Computation Graph
● “TensorFlow programs are usually structured into a
construction phase, that assembles a graph, and an
execution phase that uses a session to execute ops in the
graph.” - TensorFlow docs
● All computations add nodes to global default graph (docs)
TensorFlow Variables (1)
● “When you train a model you use variables to hold and
update parameters. Variables are in-memory buffers
containing tensors” - TensorFlow Docs.
● All tensors we’ve used previously have been constant
tensors, not variables.
TensorFlow Variables (2)
In [32]: W1 = [Link]((2,2))
In [33]: W2 = [Link]([Link]((2,2)), name="weights")
In [34]: with [Link]() as sess:
print([Link](W1))
[Link](tf.initialize_all_variables())
print([Link](W2))
....:
[[ 1. 1.]
[ 1. 1.]]
[[ 0. 0.] Note the initialization step
[ 0. 0.]] tf.initialize_all_variables()
TensorFlow Variables (3)
● TensorFlow variables must be initialized before they have
values! Contrast with constant tensors.
Variable objects can be
initialized from constants or
In [38]: W = [Link]([Link]((2,2)), name="weights")
random values
In [39]: R = [Link](tf.random_normal((2,2)), name="random_weights")
In [40]: with [Link]() as sess:
....: [Link](tf.initialize_all_variables())
....: print([Link](W))
....: print([Link](R)) Initializes all variables with
specified values.
....:
Updating Variable State
In [63]: state = [Link](0, name="counter")
In [64]: new_value = [Link](state, [Link](1)) Roughly new_value = state + 1
In [65]: update = [Link](state, new_value) Roughly state = new_value
In [66]: with [Link]() as sess: Roughly
....: [Link](tf.initialize_all_variables()) state = 0
....: print([Link](state)) print(state)
....: for _ in range(3): for _ in range(3):
....: [Link](update)
state = state + 1
....: print([Link](state))
....: print(state)
0
1
2
3
Fetching Variable State (1)
In [82]: input1 = [Link](3.0) Calling [Link](var) on a [Link]() object
In [83]: input2 = [Link](2.0) retrieves its value. Can retrieve multiple variables
In [84]: input3 = [Link](5.0)
simultaneously with [Link]([var1, var2])
In [85]: intermed = [Link](input2, input3)
In [86]: mul = [Link](input1, intermed) (See Fetches in TF docs)
In [87]: with [Link]() as sess:
....: result = [Link]([mul, intermed])
....: print(result)
....:
[21.0, 7.0]
Fetching Variable State (2)
Inputting Data
● All previous examples have manually defined tensors.
How can we input external data into TensorFlow?
● Simple solution: Import from Numpy:
In [93]: a = [Link]((3,3))
In [94]: ta = tf.convert_to_tensor(a)
In [95]: with [Link]() as sess:
....: print([Link](ta))
....:
[[ 0. 0. 0.]
[ 0. 0. 0.]
[ 0. 0. 0.]]
Placeholders and Feed Dictionaries (1)
● Inputting data with tf.convert_to_tensor() is
convenient, but doesn’t scale.
● Use [Link] variables (dummy nodes that
provide entry points for data to computational graph).
● A feed_dict is a python dictionary mapping from
[Link] vars (or their names) to data (numpy
arrays, lists, etc.).
Placeholders and Feed Dictionaries (2)
In [96]: input1 = [Link](tf.float32)
Define [Link]
objects for data entry.
In [97]: input2 = [Link](tf.float32)
In [98]: output = [Link](input1, input2)
In [99]: with [Link]() as sess:
....: print([Link]([output], feed_dict={input1:[7.], input2:[2.]}))
....:
[array([ 14.], dtype=float32)]
Fetch value of output Feed data into
from computation graph. computation graph.
Placeholders and Feed Dictionaries (3)
Variable Scope (1)
● Complicated TensorFlow models can have hundreds of
variables.
○ tf.variable_scope() provides simple name-spacing
to avoid clashes.
○ tf.get_variable() creates/accesses variables from
within a variable scope.
Variable Scope (2)
● Variable scope is a simple type of namespacing that adds
prefixes to variable names within scope
with tf.variable_scope("foo"):
with tf.variable_scope("bar"):
v = tf.get_variable("v", [1])
assert [Link] == "foo/bar/v:0"
Variable Scope (3)
● Variable scopes control variable (re)use
with tf.variable_scope("foo"):
v = tf.get_variable("v", [1])
tf.get_variable_scope().reuse_variables()
v1 = tf.get_variable("v", [1])
assert v1 == v
Understanding get_variable (1)
● Behavior depends on whether variable reuse enabled
● Case 1: reuse set to false
○ Create and return new variable
with tf.variable_scope("foo"):
v = tf.get_variable("v", [1])
assert [Link] == "foo/v:0"
Understanding get_variable (2)
● Case 2: Variable reuse set to true
○ Search for existing variable with given name. Raise
ValueError if none found.
with tf.variable_scope("foo"):
v = tf.get_variable("v", [1])
with tf.variable_scope("foo", reuse=True):
v1 = tf.get_variable("v", [1])
assert v1 == v
Ex: Linear Regression in TensorFlow (1)
import numpy as np
import seaborn
# Define input data
X_data = [Link](100, step=.1)
y_data = X_data + 20 * [Link](X_data/10)
# Plot input data
[Link](X_data, y_data)
Ex: Linear Regression in TensorFlow (2)
# Define data size and batch size
n_samples = 1000
batch_size = 100
# Tensorflow is finicky about shapes, so resize
X_data = [Link](X_data, (n_samples,1))
y_data = [Link](y_data, (n_samples,1))
# Define placeholders for input
X = [Link](tf.float32, shape=(batch_size, 1))
y = [Link](tf.float32, shape=(batch_size, 1))
Ex: Linear Regression in TensorFlow (3)
# Define variables to be learned Note reuse=False so
with tf.variable_scope("linear-regression"): these tensors are
W = tf.get_variable("weights", (1, 1), created anew
initializer=tf.random_normal_initializer())
b = tf.get_variable("bias", (1,),
initializer=tf.constant_initializer(0.0))
y_pred = [Link](X, W) + b
loss = tf.reduce_sum((y - y_pred)**2/n_samples)
Ex: Linear Regression in TensorFlow (4)
# Sample code to run one step of gradient descent
Note TensorFlow scope is
In [136]: opt = [Link]()
not python scope! Python
variable loss is still visible.
In [137]: opt_operation = [Link](loss)
In [138]: with [Link]() as sess:
.....: [Link](tf.initialize_all_variables())
.....: [Link]([opt_operation], feed_dict={X: X_data, y: y_data})
.....:
But how does this actually work under the
hood? Will return to TensorFlow
computation graphs and explain.
Ex: Linear Regression in TensorFlow (4)
# Sample code to run full gradient descent:
# Define optimizer operation
opt_operation = [Link]().minimize(loss)
with [Link]() as sess:
Let’s do a deeper.
# Initialize Variables in graph graphical dive into
[Link](tf.initialize_all_variables()) this operation
# Gradient descent loop for 500 steps
for _ in range(500):
# Select random minibatch
indices = [Link](n_samples, batch_size)
X_batch, y_batch = X_data[indices], y_data[indices]
# Do gradient descent step
_, loss_val = [Link]([opt_operation, loss], feed_dict={X: X_batch, y: y_batch})
Ex: Linear Regression in TensorFlow (5)
Ex: Linear Regression in TensorFlow (6)
Learned model offers nice
fit to data.
Concept: Auto-Differentiation
● Linear regression example computed L2 loss for a linear
regression system. How can we fit model to data?
○ [Link] creates an optimizer.
○ [Link](loss, var_list)
adds optimization operation to computation graph.
● Automatic differentiation computes gradients without user
input!
TensorFlow Gradient Computation
● TensorFlow nodes in computation graph have attached
gradient operations.
● Use backpropagation (using node-specific gradient ops) to
compute required gradients for all variables in graph.
TensorFlow Gotchas/Debugging (1)
● Convert tensors to numpy array and print.
● TensorFlow is fastidious about types and shapes. Check
that types/shapes of all tensors match.
● TensorFlow API is less mature than Numpy API. Many
advanced Numpy operations (e.g. complicated array
slicing) not supported yet!
TensorFlow Gotchas/Debugging (2)
● If you’re stuck, try making a pure Numpy implementation
of forward computation.
● Then look for analog of each Numpy function in
TensorFlow API
● Use [Link]() to experiment in shell.
Trial and error works!
● We didn’t cover it, but TensorFlow Eager is a great tool for
experimentation!
TensorBoard
● TensorFlow has some neat
built-in visualization tools
(TensorBoard).
● We encourage you to check it
out for your projects.