You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/design/program.md
+63-18Lines changed: 63 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
## Compile and Execution
4
4
5
-
A PaddlePaddle program consists of three parts -- the first generates a `ProgramDesc` protobuf message that describes the program, the second optimizes this message using a C++ class `Optimizer` and generates an `ExecutionPlan` protobuf messages, and the third run the message using a C++ class `Executor`.
5
+
A PaddlePaddle program consists of three parts -- the first generates a `ProgramDesc` protobuf message that describes the program, the second plans this message using a C++ class `Planner` and generates an `ExecutionPlan` protobuf messages, and the third run the message using a C++ class `Executor`.
6
6
7
7
A simple example PaddlePaddle program can be found in [graph.md](./graph.md):
8
8
@@ -15,7 +15,68 @@ optimize(cost)
15
15
train(cost, reader=mnist.train())
16
16
```
17
17
18
-
The first five lines of the following PaddlePaddle program generates, or, compiles, the `ProgramDesc` message. The last line optimizes and runs it.
18
+
The first five lines of the following PaddlePaddle program generates,
19
+
or, compiles, the `ProgramDesc` message. The last line runs it by
20
+
generating the `ExecutionPlan` and sending to `Executor` for
21
+
execution.
22
+
23
+
24
+
<!-- The message will be the same regardless which devices the program runs -->
25
+
<!-- on: CPU/single GPU/multiple GPU/multiple nodes. The `Planner` will -->
26
+
<!-- take the `ProgramDesc` and the device information as the input, and -->
27
+
<!-- outputs one `ExecutionPlan` per `Executor`. The `ExecutionPlan` will -->
28
+
<!-- be different if the devices are different. -->
29
+
30
+
### ProgramDesc
31
+
32
+
The `ProgramDesc` describes the computation specified by the user, with
33
+
the following requirements:
34
+
35
+
1. It should be programming language agnostic. Currently we have a
36
+
Python API that generates the `ProgramDesc`, but we could add the
37
+
support for other languages later.
38
+
39
+
1. It should **not** describe anything that is not specified by the
40
+
user. For example:
41
+
1. The OPs for the backward pass added by PaddlePaddle
42
+
1. Any optimizations to the program.
43
+
1. OP placement information that is not specified by the user.
44
+
45
+
46
+
### ExecutionPlan
47
+
48
+
The `ExecutionPlan` contains all the details of running the program,
49
+
including which device each OP is placed on. One `Executor` could have
50
+
mutilple devices (e.g, CPU, GPUs), but it runs only one
51
+
`ExecutionPlan`. In distributed training there will be `n`
52
+
`ExecutionPlan` for `n``Executor`, jointly completes the
53
+
`ProgramDesc` specified by the user.
54
+
55
+
56
+
### Planner
57
+
58
+
The planner takes `ProgramDesc` as the input and outputs the
59
+
`ExcutionPlan`, the steps are:
60
+
61
+
1. Add necessary OPs that are not specified by the user to the
62
+
`ProgramDesc`. E.g., add the backward pass.
63
+
64
+
1. Prune the unnecessary computations from the `ProgramDesc`.
65
+
66
+
1. Transforms the `ProgramDesc` given the available devices. E.g., add
67
+
data parallelism by spliting the input mini-batches and replicating
68
+
the OPs onto different GPUs.
69
+
70
+
1. Generate `ExecutionPlan` by placing each OP onto available devices,
71
+
the placement information is written in the `ExecutionPlan`.
72
+
73
+
1. In distributed training, split the `ExecutionPlan` into multiple
74
+
`ExecutionPlans` and add send/recv OP between them. For local
75
+
training, this step is not necessary since there is only one
76
+
executor.
77
+
78
+
1. Send the `ExecutionPlan` to the executor for execution.
79
+
19
80
20
81
## Programs and Blocks
21
82
@@ -120,22 +181,6 @@ message AttrDesc {
120
181
}
121
182
```
122
183
123
-
## ProgramDesc and ExecutionPlan
124
-
125
-
The goal of `ProgramDesc` is to describe **what** the user wants to calculate, and the goal of `ExecutionPlan` is to specify **how** to calculate it.
126
-
127
-
For example, the `ExecutionPlan` has OP placement information to indicate which device the OP will run, but the `ProgramDesc` does not have this information since currently our Python API does not support manually pinning an OP onto a type of device (e.g., GPU or FPGA). On the other hand, the `ProgramDesc` should have information about if an OP belongs to an optimizer, this information is provided by the user and helps to place the OPs onto the parameter servers, but the `ExecutionPlan` does not have this information.
128
-
129
-
### Optimizer
130
-
131
-
The optimizer takes `ProgramDesc` as the input and outputs the `ExcutionPlan`, the steps are:
132
-
1. Add the prgram in `ProgramDesc` and the coresponding backward pass program into the `ExecutionPlan`.
133
-
1. Optimizes the program according to the avaiable devices.
134
-
For example, add data parallelism by spliting the input mini-batches and replicating the OPs onto different GPUs. Note that even if the OPs are replicated on different GPUs, there is still only **one** execution plan. One executor runs and only runs one `ExecutionPlan`.
135
-
1. Place each OP onto available devices, the placement information is written in the `ExecutionPlan`.
136
-
1. In distributed training, split the `ExecutionPlan` into multiple `ExecutionPlans` and add send/recv OP between them. For local training, this step is not necessary since there is only one executor.
137
-
1. Send the `ExecutionPlan` to the executor for execution.
138
-
139
184
## InferShape
140
185
141
186
With this design, the InferShape function should take the following parameters:
0 commit comments