- Notifications
You must be signed in to change notification settings - Fork 5.9k
MKLDNN design doc #3241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MKLDNN design doc #3241
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
需要补充下新方案的原因,解决了什么问题,目前阶段不解决什么问题。评论里的文字可以重新组织。
doc/design/mkldnn/README.MD Outdated
| | ||
| ## KeyPoints | ||
| | ||
| 为了更好的符合PaddlePaddle的代码风格,同时又尽可能少的牺牲MKLDNN的性能。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里需要详细写一下原因:可以放上 #3096 链接。然后总结下这个PR讨论的几点:
- 原来这个PR是怎么做的:
- 为了不牺牲MKLDNN的性能,是采用每个layer需要知道next_layer的方式。
- 为什么存在牺牲MKLDNN性能的地方,因为MKLDNN采用的内存排布和paddle的NCHW不一样,这样每个mkldnnlayer需要计算前先把paddle的格式转成mkldnn的,计算后再转成paddle的,有两次转换开销。
- 同样是第三方库,cudnn/cuda为什么不存在mkldnn这样的问题,也需要简单说一下。
- 原来这个PR和paddle的冲突:
- paddle中,无论是重构前的layer还是重构后的op,都不会知道next layer/op的信息。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
嗯,你总结的已经非常好了!thx
doc/design/mkldnn/README.MD Outdated
| 4. 创建**MkldnnBase**,定义一些除了layer和memory相关的类和函数。包括MKLDNN会用到Stream和CpuEngine,和未来可能还会用到FPGAEngine等。 | ||
| 5. 在**Argument**里添加两个MkldnnMatrixPtr,取名为mkldnnValue和mkldnnGrad,用于存放MkldnnLayer会用到的memory buffer。 并且添加函数cvt(会修改为一个更加合适的函数名),用于处理"CPU device"和"MKLDNN device"之间memory的相互转化。 | ||
| 6. 在父类Layer中的**getOutput**函数中添加一段逻辑,用于判断`deviceId`,并针对device在MKLDNN和CPU之间不统一的情况,做一个前期转换。 也就是调用`Argument`的cvt函数把output统一到需要的device上。 | ||
| 7. 在原来的`FLAGS`中添加一个`use_mkldnn`的flag,用于选择是否使用MKLDNN的相关功能。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
需要再写一下新方案解决了问题:
- 解决了每次进入一个mkldnnlayer时,两次格式转换的性能开销问题。
- mkldnn有很多种数据格式,但对paddle程序透明(不对的地方请指正)。
新方案首先解决全部是mkldnnlayer的网络,暂不解决mkldnnlayer+普通layer的网络?即cvt函数只做一种形式的转换?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mkldnn有很多种数据格式,但对paddle程序透明(不对的地方请指正)。
这里的具体格式内容不准备写在文档中,会说下包含在MkldnnMatrix中。
新方案首先解决全部是mkldnnlayer的网络,暂不解决mkldnnlayer+普通layer的网络?即cvt函数只做一种形式的转换?
新方案就是都一起解决。只不过提PR的时候会先分开提。
doc/design/mkldnn/README.MD Outdated
| ### Layer | ||
| 所有的layer相关的C++代码,都会在按照PaddlePaddle的目录结构存放在 | ||
| `paddle\gserver\layers`中,文件名以*Mkldnn*开头。 | ||
| 并且有可能会在Layer.h和Layer.cpp里面添加少量的code,用宏定义`PADDLE_USE_MKLDNN`隔开。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
并且有可能会在Layer.h和Layer.cpp里面添加少量的code,用宏定义
PADDLE_USE_MKLDNN隔开。
我觉的这句话可以先删掉,是否能够添加取决于具体的PR里的逻辑逻辑。另外,我觉的是应该尽量避免在Layer.h/cpp里面引入需要用PADDLE_USE_MKLDNN隔开的代码。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同意,这部分其实是之前老的方案,其实如果可以,我们本来觉得是可以用删除符号划掉的。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不需要用删除符号,直接删了就行。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
doc/design/mkldnn/README.MD Outdated
| 会考虑添加部分逻辑在`benchmark\paddle\image\run.sh`,添加使用mkldnn的测试。 | ||
| | ||
| ### Others | ||
| 1. 如果在使用MKLDNN的情况下,会把CPU的Buffer对齐为64。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MKLDNN是需要用自己的allocator,还是可以用posix_memalign,然后修改这里?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
目前应该还没到需要自己的allocator的时候,是可以直接posix_memalign的。但是以后要是有FPGA的engine,就需要到时候再看了。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
其实如果可以的话,我也希望直接把
CHECK_EQ(posix_memalign(&ptr, 32ul, size), 0);
里的32 换为64。
不过也可以做到别的地方。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
但是,我看到caffe2里面用的是dnnAllocateBuffer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
那个其实不是MKL-DNN,而是MKL大包里面的dnn,具体可以看这里
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
噢,我明白了,多谢~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
客气~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
doc/design/mkldnn/README.MD Outdated
| ## Details | ||
| 我们把集成方案大致分为了如下几个方面。 | ||
| | ||
| ### Cmake |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cmake => CMake
CMake这个产品的名字在 cmake.org 里有
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks~
doc/design/mkldnn/README.MD Outdated
| - [Others](#others) | ||
| - [KeyPoints](#keypoints) | ||
| | ||
| ## Overall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall => Overview
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks~
doc/design/mkldnn/README.MD Outdated
| Figure 1. PaddlePaddle on IA. | ||
| </div> | ||
| | ||
| ## Details |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Details => Actions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks~
doc/design/mkldnn/README.MD Outdated
| 我们把集成方案大致分为了如下几个方面。 | ||
| | ||
| ### Cmake | ||
| 我们会在`CMakeLists.txt`中会添加`WITH_MKLDNN`的选项,当设置这个值为`ON`的时候会启用编译MKLDNN功能。同时会自动开启`OpenMP`用于提高MKLDNN的性能。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`OpenMP` => OpenMP There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks~
doc/design/mkldnn/README.MD Outdated
| ### Cmake | ||
| 我们会在`CMakeLists.txt`中会添加`WITH_MKLDNN`的选项,当设置这个值为`ON`的时候会启用编译MKLDNN功能。同时会自动开启`OpenMP`用于提高MKLDNN的性能。 | ||
| | ||
| 为了让PaddlePaddle更好的发挥MKLDNN的性能,我们还会引入`WITH_MKLML`的选项,用于选择是否用MKLDNN自带的MKLML的安装包。这个安装包可以独立于MKLDNN使用,但是建议在开启MKLDNN的同时也打开MKLML的开关,这样才能发挥最好的性能。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
“为了。。。。”这个短句不是加一个CMake选项的原因。可以删了。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it
doc/design/mkldnn/README.MD Outdated
| | ||
| 会在`v1_api_demo`目录下添加一个`mkldnn`的文件夹,里面放入一些用于mkldnn测试的demo脚本。 | ||
| | ||
| ### Benchmark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmark => Benchmarking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thx
doc/design/mkldnn/README.MD Outdated
| 1. 如果在使用MKLDNN的情况下,会把CPU的Buffer对齐为64。 | ||
| 2. 深入PaddlePaddle,寻找有没有其他可以优化的可能,进一步优化。比如可能会用`OpenMP`改进SGD的更新性能。 | ||
| | ||
| ## KeyPoints |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
英语里没有一个叫 KeyPoints 的单词。这个标题从下面的内容看应该叫 Design Concerns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx
doc/design/mkldnn/README.MD Outdated
| | ||
| 1. 使用**deviceId_**。为了尽可能少的在父类Layer中添加变量或者函数,我们决定使用已有的`deviceId_`变量来区分layer的属性,定义`-2`为**MkldnnLayer**特有的设备ID。 | ||
| 2. 重写父类Layer的**init**函数,修改`deviceId_`为`-2`,代表这个layer是用于跑在MKLDNN的环境下。 | ||
| 3. 创建**MkldnnMatrix**,用于管理MKLDNN会用到的相关memory函数、接口以及会用的到格式信息。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MkldnnMatrix => MKLDNNMatrix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是否真需要创建这个class?在目前正在重构中的PaddlePaddle会删掉Matrix这个class,以及所有相关的class,而是使用一个新的class 叫 Tensor。 Tensor只负责管理内存和访问元素,不管计算。这样不同的operators、layers可以调用不同的计算库(MLKDNN、CUDNN、等)来做计算。如果MKLDNN的功能通过加入一个Matrix-dervied class加入PaddlePaddle,一旦新版本在一两个月里上线,这些代码就要被删掉了。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MkldnnMatrix => MKLDNNMatrix
这里,我们本来的意思写code中名字,参考的是CpuMatrix,然后又想加粗,所以才会这样。我还是统一为code格式吧。
是否真需要创建这个class?在目前正在重构中的PaddlePaddle会删掉Matrix这个class,以及所有相关的class,而是使用一个新的class 叫 Tensor。
那到时候会有区分GPU的tensor还是CPU的tensor吗?还是Layer自己去区分?
因为MkldnnMatrix 的存在就是为了实现区分不同设备,以及负责设备之间buffer的转换。假如把MKL-DNN也看做是某种设备的话,这样PaddlePaddle总共有3种设备:MKL-DNN、CPU和GPU。
经过上次和几位Baidu同事的讨论,这个方法是在目前架构下比较可行的。
由于我们对新的框架不是很了解,不过我想原有的CpuMatrix和GpuMatrix那些类应该也会有相应的替代吧,或者,请问有没有新框架的说明,我们也好有一个提前准备。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
那到时候会有区分GPU的tensor还是CPU的tensor吗?还是Layer自己去区分?
Tensor自带了place信息,即Tensor本身可以区分GPU/CPU/其他设备。
doc/design/mkldnn/README.MD Outdated
| 1. 使用**deviceId_**。为了尽可能少的在父类Layer中添加变量或者函数,我们决定使用已有的`deviceId_`变量来区分layer的属性,定义`-2`为**MkldnnLayer**特有的设备ID。 | ||
| 2. 重写父类Layer的**init**函数,修改`deviceId_`为`-2`,代表这个layer是用于跑在MKLDNN的环境下。 | ||
| 3. 创建**MkldnnMatrix**,用于管理MKLDNN会用到的相关memory函数、接口以及会用的到格式信息。 | ||
| 4. 创建**MkldnnBase**,定义一些除了layer和memory相关的类和函数。包括MKLDNN会用到Stream和CpuEngine,和未来可能还会用到FPGAEngine等。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MkldnnBase => MKLDNN
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上,thx
doc/design/mkldnn/README.MD Outdated
| 1. 使用**deviceId_**。为了尽可能少的在父类Layer中添加变量或者函数,我们决定使用已有的`deviceId_`变量来区分layer的属性,定义`-2`为**MkldnnLayer**特有的设备ID。 | ||
| 2. 重写父类Layer的**init**函数,修改`deviceId_`为`-2`,代表这个layer是用于跑在MKLDNN的环境下。 | ||
| 3. 创建**MkldnnMatrix**,用于管理MKLDNN会用到的相关memory函数、接口以及会用的到格式信息。 | ||
| 4. 创建**MkldnnBase**,定义一些除了layer和memory相关的类和函数。包括MKLDNN会用到Stream和CpuEngine,和未来可能还会用到FPGAEngine等。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CpuEngine => CPUEngine 英语里缩写要大写
下面的代码命名里有不少这样的问题。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 感谢 @wangkuiyi 的宝贵意见。 |
| 请merge最新的develop分支,来过teamcity |
| Done, thx |
resolve #3200