Neural Programmer- Interpreters ICLR 2016 Best Paper Award Scott Reed & Nando de Freitas
 Google DeepMind citation: 19
 London, UK Katy, 2016/10/14
Motivation • ML is ultimately about automating tasks, hoping that machine can do everything for human • For example, I want the machine to make a cup of coffee for me
Motivation • Ancient way: is to write full highly-detailed program specifications to carry them out • AI way: come up with a lot of training examples that capture the variability in the real world, and then train some general learning machine on this large data set.
Motivation • but sometimes the dataset is not big enough! and it doesn’t generalize well.. • NPI is an attempt to use neural methods to train machines to carry out simple tasks based on a small amount of training data.
NPI Goals • 1. Long-term prediction: Model potentially long sequences of actions by exploiting compositional structure. • 2. Continual learning: Learn new programs by composing previously- learned programs, rather than from scratch. • 3. Data efficiency: Learn generalizable programs from a small number of example traces. • 4. Interpretability: By looking at NPI’s generated commands, we can understand what it is doing at multiple levels of temporal abstraction.
Related Work • Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with neural networks." Advances in neural information processing systems. 2014. • Graves, Alex, Greg Wayne, and Ivo Danihelka. "Neural turing machines." arXiv preprint arXiv: 1410.5401 (2014).
Sequence to sequence learning with neural networks
Neural turing machines http://cpmarkchang.logdown.com/posts/279710-neural- network-neural-turing-machine
Outline • NPI core module: how it works • Demos • Experiment • Conclusion
Outline • NPI core module: how it works • Demos • Experiment • Conclusion
NPI core module • The NPI core is a LSTM network that acts as a router between programs conditioned on the current state observation and previous hidden unit states • input: a learnable program embedding, program arguments passed on by the calling program, and a feature representation of the environment. • output: a key indicating what program to call next, arguments for the following program and a flag indicating whether the program should terminate. core: an LSTM-based sequence model
Adding Numbers Together
Bubble Sort
Car Rendering • Whatever the starting position, the program should generate a trajectory of actions that delivers the camera to the target view, e.g. frontal pose at a 15◦ elevation.
NPI Model
How it Works
How it Works e: environment a: program argument p: embedded program vector r(t): probability to terminate the current program
How it Works
How it Works
Outline • NPI core module: how it works • Demos • Experiment • Conclusion
Adding Numbers • Environment: • Scratch pad with the two numbers to be added, a carry row and output row. • 4 read/write pointers location • Program: • LEFT, RIGHT programs that can move a carry pointer left or right, respectively. • WRITE program that writes a specified value to the location of a specified pointer
Adding Numbers Actual trace of addition program generated by our model on the problem shown to the left.
Adding Numbers • all output actions (primitive atomic actions that can be performed on the environment) are performed with a single instruction – ACT. all output actions (primitive atomic actions that can be performed on the environment) are performed with a single instruction – ACT.
Adding Numbers Together
Bubble Sort • environment: • Scratch pad with the array to be sorted. • Read/Write pointers

Bubble Sort
Bubble Sort
Car Rendering • Environment: • Rendering of the car (pixels). (use CNN as feature encoder) • The current car pose is NOT provided • Target angle and elevation coordinates. 

Car Rendering
Car Rendering • Whatever the starting position, the program should generate a trajectory of actions that delivers the camera to the target view, e.g. frontal pose at a 15◦ elevation.
GOTO
HGOTO • horizontal goto
LGOTO • Left goto
ACT • rotate 15 degree
give control back to LGOTO
core realized it haven’t done with horizontal rotation
Control back to GOTO
Outline • NPI core module: how it works • Demos • Experiment • Conclusion
Experiments • Data Efficiency • Generalization • Learning new programs with a fixed NPI core
Data Efficiency - Sorting • Seq2Seq LSTM and NPI used the same number of layers
 and hidden units. • Trained on length 20 arrays of single-digit numbers. • NPI benefits from mining multiple subprogram examples per sorting instance accuracy v.s. training example
Generalization - Sorting • For each length 2 up to 20, we provided 64 example bubble sort traces, for a total of 1216 examples. • Then, we evaluated whether the network can learn to sort arrays beyond length 20
Generalization - Adding only train on sequence length up to 20
Learning New Programs with a Fixed NPI Core • example task: find the Max in array • RJMP: move all pointers to the right by repeatedly calling RSHIFT program • MAX: call BUBBLESORT and then RJMP • Expand program memory by adding 2 slots. Randomly initialize, then learn by backpropagation with the NPI core and all other parameters fixed.
• 1. Randomly initialize new program vectors in memory • 2. Freeze core and other program vectors • 3. Backpropagate gradients to new program vectors
• + Max: performance after addition of MAX program to memory. 
 • “unseen” uses a test set with disjoint car models from the training set. 

Outline • NPI core module: how it works • Demos • Experiment • Conclusion
Conclusion(1/2) • NPI is a RNN/LSTM-based sequence-to-sequence translator with the ability to keep track of calling programs while recurse into sub-program • NPI generalizes well in comparison to sequence-to- sequence LSTMs. • A trained NPI with a fix core can learn new task while not forgetting about the old task
Conclusion(2/2) • provide far fewer examples, but where the labels contains richer information allowing the model to learn compositional structure(It’s like sending kids to school)
Further Discussion • Can each task help each other during training? • Can we share environment encoder? • Any comments? project page: http://www-personal.umich.edu/~reedscot/ iclr_project.html

Neural_Programmer_Interpreter

  • 1.
    Neural Programmer- Interpreters ICLR 2016 BestPaper Award Scott Reed & Nando de Freitas
 Google DeepMind citation: 19
 London, UK Katy, 2016/10/14
  • 2.
    Motivation • ML isultimately about automating tasks, hoping that machine can do everything for human • For example, I want the machine to make a cup of coffee for me
  • 3.
    Motivation • Ancient way:is to write full highly-detailed program specifications to carry them out • AI way: come up with a lot of training examples that capture the variability in the real world, and then train some general learning machine on this large data set.
  • 4.
    Motivation • but sometimesthe dataset is not big enough! and it doesn’t generalize well.. • NPI is an attempt to use neural methods to train machines to carry out simple tasks based on a small amount of training data.
  • 5.
    NPI Goals • 1.Long-term prediction: Model potentially long sequences of actions by exploiting compositional structure. • 2. Continual learning: Learn new programs by composing previously- learned programs, rather than from scratch. • 3. Data efficiency: Learn generalizable programs from a small number of example traces. • 4. Interpretability: By looking at NPI’s generated commands, we can understand what it is doing at multiple levels of temporal abstraction.
  • 6.
    Related Work • Sutskever,Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with neural networks." Advances in neural information processing systems. 2014. • Graves, Alex, Greg Wayne, and Ivo Danihelka. "Neural turing machines." arXiv preprint arXiv: 1410.5401 (2014).
  • 7.
    Sequence to sequence learningwith neural networks
  • 8.
  • 9.
    Outline • NPI coremodule: how it works • Demos • Experiment • Conclusion
  • 10.
    Outline • NPI coremodule: how it works • Demos • Experiment • Conclusion
  • 11.
    NPI core module •The NPI core is a LSTM network that acts as a router between programs conditioned on the current state observation and previous hidden unit states • input: a learnable program embedding, program arguments passed on by the calling program, and a feature representation of the environment. • output: a key indicating what program to call next, arguments for the following program and a flag indicating whether the program should terminate. core: an LSTM-based sequence model
  • 12.
  • 13.
  • 14.
    Car Rendering • Whateverthe starting position, the program should generate a trajectory of actions that delivers the camera to the target view, e.g. frontal pose at a 15◦ elevation.
  • 15.
  • 16.
  • 17.
    How it Works e:environment a: program argument p: embedded program vector r(t): probability to terminate the current program
  • 18.
  • 19.
  • 20.
    Outline • NPI coremodule: how it works • Demos • Experiment • Conclusion
  • 21.
    Adding Numbers • Environment: •Scratch pad with the two numbers to be added, a carry row and output row. • 4 read/write pointers location • Program: • LEFT, RIGHT programs that can move a carry pointer left or right, respectively. • WRITE program that writes a specified value to the location of a specified pointer
  • 22.
    Adding Numbers Actual traceof addition program generated by our model on the problem shown to the left.
  • 23.
    Adding Numbers • alloutput actions (primitive atomic actions that can be performed on the environment) are performed with a single instruction – ACT. all output actions (primitive atomic actions that can be performed on the environment) are performed with a single instruction – ACT.
  • 24.
  • 25.
    Bubble Sort • environment: •Scratch pad with the array to be sorted. • Read/Write pointers

  • 26.
  • 27.
  • 28.
    Car Rendering • Environment: •Rendering of the car (pixels). (use CNN as feature encoder) • The current car pose is NOT provided • Target angle and elevation coordinates. 

  • 29.
  • 30.
    Car Rendering • Whateverthe starting position, the program should generate a trajectory of actions that delivers the camera to the target view, e.g. frontal pose at a 15◦ elevation.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
    core realized ithaven’t done with horizontal rotation
  • 37.
  • 39.
    Outline • NPI coremodule: how it works • Demos • Experiment • Conclusion
  • 40.
    Experiments • Data Efficiency •Generalization • Learning new programs with a fixed NPI core
  • 41.
    Data Efficiency -Sorting • Seq2Seq LSTM and NPI used the same number of layers
 and hidden units. • Trained on length 20 arrays of single-digit numbers. • NPI benefits from mining multiple subprogram examples per sorting instance accuracy v.s. training example
  • 42.
    Generalization - Sorting •For each length 2 up to 20, we provided 64 example bubble sort traces, for a total of 1216 examples. • Then, we evaluated whether the network can learn to sort arrays beyond length 20
  • 43.
    Generalization - Adding onlytrain on sequence length up to 20
  • 44.
    Learning New Programs witha Fixed NPI Core • example task: find the Max in array • RJMP: move all pointers to the right by repeatedly calling RSHIFT program • MAX: call BUBBLESORT and then RJMP • Expand program memory by adding 2 slots. Randomly initialize, then learn by backpropagation with the NPI core and all other parameters fixed.
  • 45.
    • 1. Randomlyinitialize new program vectors in memory • 2. Freeze core and other program vectors • 3. Backpropagate gradients to new program vectors
  • 46.
    • + Max:performance after addition of MAX program to memory. 
 • “unseen” uses a test set with disjoint car models from the training set. 

  • 47.
    Outline • NPI coremodule: how it works • Demos • Experiment • Conclusion
  • 48.
    Conclusion(1/2) • NPI isa RNN/LSTM-based sequence-to-sequence translator with the ability to keep track of calling programs while recurse into sub-program • NPI generalizes well in comparison to sequence-to- sequence LSTMs. • A trained NPI with a fix core can learn new task while not forgetting about the old task
  • 49.
    Conclusion(2/2) • provide farfewer examples, but where the labels contains richer information allowing the model to learn compositional structure(It’s like sending kids to school)
  • 50.
    Further Discussion • Caneach task help each other during training? • Can we share environment encoder? • Any comments? project page: http://www-personal.umich.edu/~reedscot/ iclr_project.html