Skip to content

Commit ab3e54c

Browse files
committed
Update ExecutionPlan design doc.
1 parent 617b8f6 commit ab3e54c

File tree

1 file changed

+63
-18
lines changed

1 file changed

+63
-18
lines changed

doc/design/program.md

Lines changed: 63 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
## Compile and Execution
44

5-
A PaddlePaddle program consists of three parts -- the first generates a `ProgramDesc` protobuf message that describes the program, the second optimizes this message using a C++ class `Optimizer` and generates an `ExecutionPlan` protobuf messages, and the third run the message using a C++ class `Executor`.
5+
A PaddlePaddle program consists of three parts -- the first generates a `ProgramDesc` protobuf message that describes the program, the second plans this message using a C++ class `Planner` and generates an `ExecutionPlan` protobuf messages, and the third run the message using a C++ class `Executor`.
66

77
A simple example PaddlePaddle program can be found in [graph.md](./graph.md):
88

@@ -15,7 +15,68 @@ optimize(cost)
1515
train(cost, reader=mnist.train())
1616
```
1717

18-
The first five lines of the following PaddlePaddle program generates, or, compiles, the `ProgramDesc` message. The last line optimizes and runs it.
18+
The first five lines of the following PaddlePaddle program generates,
19+
or, compiles, the `ProgramDesc` message. The last line runs it by
20+
generating the `ExecutionPlan` and sending to `Executor` for
21+
execution.
22+
23+
24+
<!-- The message will be the same regardless which devices the program runs -->
25+
<!-- on: CPU/single GPU/multiple GPU/multiple nodes. The `Planner` will -->
26+
<!-- take the `ProgramDesc` and the device information as the input, and -->
27+
<!-- outputs one `ExecutionPlan` per `Executor`. The `ExecutionPlan` will -->
28+
<!-- be different if the devices are different. -->
29+
30+
### ProgramDesc
31+
32+
The `ProgramDesc` describes the computation specified by the user, with
33+
the following requirements:
34+
35+
1. It should be programming language agnostic. Currently we have a
36+
Python API that generates the `ProgramDesc`, but we could add the
37+
support for other languages later.
38+
39+
1. It should **not** describe anything that is not specified by the
40+
user. For example:
41+
1. The OPs for the backward pass added by PaddlePaddle
42+
1. Any optimizations to the program.
43+
1. OP placement information that is not specified by the user.
44+
45+
46+
### ExecutionPlan
47+
48+
The `ExecutionPlan` contains all the details of running the program,
49+
including which device each OP is placed on. One `Executor` could have
50+
mutilple devices (e.g, CPU, GPUs), but it runs only one
51+
`ExecutionPlan`. In distributed training there will be `n`
52+
`ExecutionPlan` for `n` `Executor`, jointly completes the
53+
`ProgramDesc` specified by the user.
54+
55+
56+
### Planner
57+
58+
The planner takes `ProgramDesc` as the input and outputs the
59+
`ExcutionPlan`, the steps are:
60+
61+
1. Add necessary OPs that are not specified by the user to the
62+
`ProgramDesc`. E.g., add the backward pass.
63+
64+
1. Prune the unnecessary computations from the `ProgramDesc`.
65+
66+
1. Transforms the `ProgramDesc` given the available devices. E.g., add
67+
data parallelism by spliting the input mini-batches and replicating
68+
the OPs onto different GPUs.
69+
70+
1. Generate `ExecutionPlan` by placing each OP onto available devices,
71+
the placement information is written in the `ExecutionPlan`.
72+
73+
1. In distributed training, split the `ExecutionPlan` into multiple
74+
`ExecutionPlans` and add send/recv OP between them. For local
75+
training, this step is not necessary since there is only one
76+
executor.
77+
78+
1. Send the `ExecutionPlan` to the executor for execution.
79+
1980

2081
## Programs and Blocks
2182

@@ -120,22 +181,6 @@ message AttrDesc {
120181
}
121182
```
122183

123-
## ProgramDesc and ExecutionPlan
124-
125-
The goal of `ProgramDesc` is to describe **what** the user wants to calculate, and the goal of `ExecutionPlan` is to specify **how** to calculate it.
126-
127-
For example, the `ExecutionPlan` has OP placement information to indicate which device the OP will run, but the `ProgramDesc` does not have this information since currently our Python API does not support manually pinning an OP onto a type of device (e.g., GPU or FPGA). On the other hand, the `ProgramDesc` should have information about if an OP belongs to an optimizer, this information is provided by the user and helps to place the OPs onto the parameter servers, but the `ExecutionPlan` does not have this information.
128-
129-
### Optimizer
130-
131-
The optimizer takes `ProgramDesc` as the input and outputs the `ExcutionPlan`, the steps are:
132-
1. Add the prgram in `ProgramDesc` and the coresponding backward pass program into the `ExecutionPlan`.
133-
1. Optimizes the program according to the avaiable devices.
134-
For example, add data parallelism by spliting the input mini-batches and replicating the OPs onto different GPUs. Note that even if the OPs are replicated on different GPUs, there is still only **one** execution plan. One executor runs and only runs one `ExecutionPlan`.
135-
1. Place each OP onto available devices, the placement information is written in the `ExecutionPlan`.
136-
1. In distributed training, split the `ExecutionPlan` into multiple `ExecutionPlans` and add send/recv OP between them. For local training, this step is not necessary since there is only one executor.
137-
1. Send the `ExecutionPlan` to the executor for execution.
138-
139184
## InferShape
140185

141186
With this design, the InferShape function should take the following parameters:

0 commit comments

Comments
 (0)