- Notifications
You must be signed in to change notification settings - Fork 5.9k
add VarDesc design #3835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add VarDesc design #3835
Conversation
doc/design/var_desc.md Outdated
| create a variable with a tensor value. | ||
| | ||
| ```python | ||
| a = Variable("X", shape=[784, 10], data_type=INT32, value=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
INT32 -> pd.int32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/design/var_desc.md Outdated
| ## Background | ||
| PaddlePaddle divides the description of neural network computation graph into two stages: compile time and runtime. | ||
| | ||
| The data structure to describe the compile time graph should be able to be serialized for distributing training. So we use proto message OpDesc to describe computation and VarDesc to describe data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
distributing training -> distributed training
doc/design/var_desc.md Outdated
| INT64 = 3; | ||
| FP16 = 4; | ||
| FP32 = 5; | ||
| DOUBLE = 6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unify float names, please. Either FP16, FP32, FP64, or half, float, double. Do not mix them together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/design/var_desc.md Outdated
| } | ||
| | ||
| Type element_type = 1; | ||
| repeated int dims = 2; // [UNK, UNK, 6000] is saved as [-1, -1, 6000] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[UNK, 640, 480] as [-1, 640, 480]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/design/var_desc.md Outdated
| Type element_type = 1; | ||
| repeated int dims = 2; // [UNK, UNK, 6000] is saved as [-1, -1, 6000] | ||
| optional int lod_level [default=0] = 3; | ||
| repeated int32 int16_val = 4 [packed = true]; // INT16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LoDTensorDesc doesn't have values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
doc/design/var_desc.md Outdated
| LOD_TENSOR = 6; | ||
| } | ||
| | ||
| message Value { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VarDesc doesn't have value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
doc/design/var_desc.md Outdated
| INT64 = 3; | ||
| FP16 = 4; | ||
| FP32 = 5; | ||
| DOUBLE = 6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DOUBEL = FP64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/design/var_desc.md Outdated
| There is a class `Variable` in python to help create Variable. | ||
| | ||
| ```python | ||
| class Variable(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following example shows how a Variable is to be used in Python programs:
def flatten_size(X, num_flatten_dims): prod = 1 # of last num_flatten_dims for i in xrange(num_flatten_dims): prod = prod * X.dims[-i-1] return prod def layer.fc(X, output_size, num_flatten_dims): W = tensor(elem_type=FP32, dims=[flatten_size(X, num_flatten_dims), output_size]) b = tensor(elem_type=FP32, diims=[output_size]) y = operator.fc(X, W, b) return y x = var(dim=[-1, 640, 480]) y = layer.fc(x, output_size=100) paddle.train(y, ...) print(y)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import VarDesc import framework class Var(object): def __init__(self, name, dims, type): self._name = name self.op = None _var_desc = VarDesc(name=name, dims=dims, data_type=type) self._var = framework.CreateVar(_var_desc) def dims(self): return self._var.dims() def type(self): return self._var.type()The following example shows how a Variable is to be used in Python programs:
import paddle as pd def flatten_size(X, num_flatten_dims): prod = 1 # of last num_flatten_dims for i in xrange(num_flatten_dims): prod = prod * X.dims[-i-1] return prod def layer.fc(X, output_size, num_flatten_dims): W = Var(type=FP32, dims=[flatten_size(X, num_flatten_dims), output_size]) b = Var(type=FP32, dims=[output_size]) out = Var(type=FP32) y = operator.fc(X, W, b, output=out) # fc will put fc op input into out pd.InferShape(y) return out x = var(dim=[-1, 640, 480]) y = layer.fc(x, output_size=100) z = layer.fc(y, output_size=200) paddle.train(z, ...) print(y)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
几点达成一致:
- Var是python端的一个class。封装了一个cpp端的VarDesc。
- Eval(targets=[]) targets是是一个Var数组。
- Var中需要保存生成这个Var的Op。
- 多个Var可以同名,但是包含不同的Op,这样他们的内存是共同的,但是追踪依赖的时候会有区别。
doc/design/var_desc.md Outdated
| or create a Variable with a string value | ||
| | ||
| ```python | ||
| a = Variable("X", data_type=pd.STRING, value="aa") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If Variable only contains the VarDesc*, we cannot implement Block.eval(targets=[]). Only use Variable cannot specify what operators should be run to get that Variable since Variable can be written by many operators.
For example, an SGD operator reads the weight tensor and gradient tensor of one parameter and writes the weight tensor. The same weight tensor is the input and the output of SGD operator. That weight tensor is also written by Load operator or Random operator. If the user specifies Block.eval(weight), what operators should be run?
So we should add a field to identify which operator generates that Variable. The implementation could be
class Var(object): def __init__(self): self.var_desc = ... self.op = ...So if we assume the Block is a linear list of operators, the Block.eval could be
class Block(object): def __init__(self): self.ops = [] # a list of operators def eval(block, targets=[]): last_op_idx = get_last_op_in_block(block, targets) needed_var_names = set(get_var_names(targets)) ops = self.op[0: last_op_idx+1] sub_block = [] for op in reverse(ops): if any of op.outputs in needed_var_names: needed_var_names.extends(op.inputs) sub_block.append(op) sub_block = reverse(sub_block) sub_block.run()| | ||
| ```proto | ||
| message VarDesc { | ||
| required string name = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to use required field? https://stackoverflow.com/a/31814967/852385
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point! proto3 even remove required.
But to be compatible with the current code, I want to use required in this PR and create another pr to change all the proto at once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks!
| 1. Computation graph should be able to be saved to a file. | ||
| 1. In distributed training, the graph will be serialized and send to multiple workers. | ||
| | ||
| The computation graph is constructed by Data Node and Operation Node. The concept to represent them is in the table below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the Node and Edge of computation graph?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data and operator are both Node, the Edge is the relationship between data and operator: input/output relation.
| | ||
| PaddlePaddle use proto message to describe compile time graph for | ||
| | ||
| 1. Computation graph should be able to be saved to a file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be saved to a file --> to be serialized
| In Python API, layer will take Variable as Input, and return Variable as Output. There should be a class `Variable` in python to help create and manage Variable. | ||
| | ||
| ```python | ||
| image = Variable(dims=[-1, 640, 480]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-1 is not good for user. Maybe UNK or BatchSize here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure whether the code in this design doc is runnable. It seems many details should be considered when implementing.
Just LGTM now, but this documentation should be in a flux.
| if initializer is not None: | ||
| AddInitialOperator(self, initializer) | ||
| | ||
| def dims(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shape ? shape of a tensor, not dims
| def dims(self): | ||
| return self._var.dims() | ||
| | ||
| def data_type(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type in __init__, data_type here, need a unified name for data type
| # add an initialize Operator to block to init this Variable | ||
| | ||
| class Variable(object): | ||
| def __init__(self, name, dims, type, initializer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make name = None as default? no name is passed in the demo below.
| | ||
| class Variable(object): | ||
| def __init__(self, name, dims, type, initializer): | ||
| self._block = get_default_block() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
block and name do not need to be a protected member, make the public is ok.
| return prod | ||
| | ||
| def layer.fc(X, output_size, num_flatten_dims): | ||
| W = Variable(pd.random_uniform(), type=FP32, dims=[flatten_size(X, num_flatten_dims), output_size]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FP32 is a strange name, what is it mean?
use lower-wise and full name is ok, for example, type=pd.float32 is more clear.
| pd.InferShape(y) | ||
| return out | ||
| | ||
| x = Variable(dims=[-1, 640, 480]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-1 -> None, None is more clear to be a placeholder here, -1 looks vague, if -1 is ok, what do -2, -200 mean here
| y = layer.fc(x, output_size=100) | ||
| z = layer.fc(y, output_size=200) | ||
| | ||
| paddle.eval(targets=[z], ...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pd.eval
add VarDesc: #3776