-   Notifications  You must be signed in to change notification settings 
- Fork 5.9k
add MKLDNN_DEVICE #3712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
   Merged  
     Merged  
 add MKLDNN_DEVICE #3712
Changes from all commits
 Commits 
  Show all changes 
  13 commits   Select commit Hold shift + click to select a range 
 4d8992c  check format before set header format 
  tensor-tang 462b9b1  update mkldnn tag v0.10 
  tensor-tang 62e6dac  add MKLDNNMatrix files 
  tensor-tang 4bffbd3  use MKLDNNMatrix in fc forward 
  tensor-tang 4eecd0c  use MKLDNNMatrix in fc backward 
  tensor-tang 48d87e5  pass test, support input CPU device 
  tensor-tang 780c8d9  make downSpatial work, and remove hasSpatial_ 
  tensor-tang 4cc5783  enable reorder 
  tensor-tang 98b7c67  add todo 
  tensor-tang 2efac83  Merge remote-tracking branch 'upstream/develop' into merge 
  tensor-tang fe51f72  fix cmake 
  tensor-tang bfbd066  refine 
  tensor-tang c5183ca  rename 
  tensor-tang File filter
Filter by extension
Conversations
 Failed to load comments.  
    Loading  
 Jump to
  Jump to file  
  Failed to load files.  
    Loading  
 Diff view
Diff view
There are no files selected for viewing
   This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters   
        This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters   
        This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters   
        This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters   
        This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters   
     | Original file line number | Diff line number | Diff line change | 
|---|---|---|
|  | @@ -61,43 +61,42 @@ void MKLDNNFcLayer::convertWeightsFromPaddle() { | |
| return; | ||
| } | ||
|  | ||
| // TODO(TJ): dst format should get from wgtVal_ | ||
| int dstFmt = PARAM_FORMAT_MKLDNN_OI; | ||
| int srcFmt = weight_->getParameterPtr()->getHeaderFormat(); | ||
| if (srcFmt == dstFmt) { | ||
| return; | ||
| } | ||
|  | ||
| // The weight_ is transposed from initial paddle weight | ||
| MatrixPtr paddleWgt = Matrix::create( | ||
| weight_->getW()->getData(), iLayerSize_, oc_, false, false); | ||
|  | ||
| // TODO(TJ): remove this print when do not need differ weights | ||
| std::ostringstream ostr; | ||
| paddleWgt->print(ostr); | ||
| VLOG(MKLDNN_ALL) << "Initial Weight from paddle: " << std::endl << ostr.str(); | ||
|  | ||
| // The mkldnn weight is transposed from initial paddle matrix | ||
| MatrixPtr paddleWgtT; | ||
| paddleWgt->transpose(paddleWgtT, true); | ||
| weight_->getW()->copyFrom(*paddleWgtT); | ||
| weight_->getParameterPtr()->setHeaderFormat(dstFmt); | ||
| CHECK(wgtVal_) << "should have been initialized"; | ||
| bool hasNoSpatial_ = ih_ == 1 && iw_ == 1; | ||
| auto targetDim = wgtVal_->getDims(); | ||
| auto srcFmt = hasNoSpatial_ ? memory::format::io : memory::format::ihwo; | ||
| wgtVal_->reorderDataFrom(wgtVal_, srcFmt, targetDim); | ||
| hasInitedWgt_ = true; | ||
| } | ||
|  | ||
| void MKLDNNFcLayer::convertWeightsToPaddle() { | ||
| MatrixPtr dnnWgt = weight_->getW(); | ||
| MatrixPtr paddleWgt; | ||
| dnnWgt->transpose(paddleWgt, true); | ||
|  | ||
| // copy paddle weight and override on weight_ | ||
| MatrixPtr dnnWgtT = Matrix::create( | ||
| dnnWgt->getData(), dnnWgt->getWidth(), dnnWgt->getHeight(), false, false); | ||
| dnnWgtT->copyFrom(*paddleWgt); | ||
| CHECK(wgtVal_) << "should have been initialized"; | ||
| bool hasNoSpatial_ = ih_ == 1 && iw_ == 1; | ||
| auto targetDim = wgtVal_->getDims(); | ||
| auto dstFmt = hasNoSpatial_ ? memory::format::io : memory::format::ihwo; | ||
| wgtVal_->reorderDataTo(wgtVal_, dstFmt, targetDim); | ||
| } | ||
|  | ||
| void MKLDNNFcLayer::convertOutputToOtherDevice() { | ||
| copyOutputInfoToOtherDevice(); | ||
| // find other cpu device and reorder output to cpu device | ||
| int cnt = 0; | ||
| for (size_t i = 0; i < outputOtherDevice_.size(); i++) { | ||
| if (outputOtherDevice_[i].deviceId == CPU_DEVICE) { | ||
| // fc cpu output value do not need convert | ||
| // just share point | ||
| outputOtherDevice_[i].value = output_.value; | ||
| ++cnt; | ||
| } | ||
| } | ||
|  | ||
| if (cnt > 1) { | ||
| LOG(WARNING) << "should not have more than one CPU devie"; | ||
| } | ||
| } | ||
|  | ||
| void MKLDNNFcLayer::reshape() { | ||
| const Argument& input = getInput(0); | ||
| const Argument& input = getInput(0, getPrev(0)->getDeviceId()); | ||
| int batchSize = input.getBatchSize(); | ||
| if (bs_ == batchSize) { | ||
| return; | ||
|  | @@ -111,10 +110,6 @@ void MKLDNNFcLayer::reshape() { | |
| if (iw_ == 0) { | ||
| iw_ = 1; | ||
| } | ||
| hasSpatial_ = true; | ||
| if (ih_ == 1 && iw_ == 1) { | ||
| hasSpatial_ = false; | ||
| } | ||
| CHECK_EQ(iLayerSize_, inputLayers_[0]->getSize()); | ||
| ic_ = iLayerSize_ / (ih_ * iw_); | ||
| CHECK_EQ(size_t(ic_ * ih_ * iw_), iLayerSize_) << "not divisible"; | ||
|  | @@ -135,37 +130,53 @@ void MKLDNNFcLayer::reshape() { | |
|  | ||
| void MKLDNNFcLayer::resetFwd() { | ||
| bool hasBias = biases_ && biases_->getW(); | ||
| real* iData = getInputValue(0)->getData(); | ||
| real* oData = getOutputValue()->getData(); | ||
| real* wData = weight_->getW()->getData(); | ||
| real* bData = hasBias ? biases_->getW()->getData() : NULL; | ||
|  | ||
| // TODO(TJ): below create should be covered in MkldnnMatrix | ||
| // create memory desc | ||
| memory::desc iMD = hasSpatial_ ? createMD({bs_, ic_, ih_, iw_}, format::nchw) | ||
| : createMD({bs_, ic_}, format::nc); | ||
| memory::desc wMD = hasSpatial_ ? createMD({oc_, ic_, ih_, iw_}, format::oihw) | ||
| : createMD({oc_, ic_}, format::oi); | ||
| memory::desc bMD = bData != NULL ? createMD({oc_}, format::x) | ||
| : createMD({}, format::format_undef); | ||
| memory::desc oMD = createMD({bs_, oc_}, format::nc); | ||
|  | ||
| // create memory primitive desc and memory self | ||
| inVal_.reset(new memory(memory::primitive_desc(iMD, engine_), iData)); | ||
| wgtVal_.reset(new memory(memory::primitive_desc(wMD, engine_), wData)); | ||
| outVal_.reset(new memory(memory::primitive_desc(oMD, engine_), oData)); | ||
| const MatrixPtr& wgt = weight_->getW(); | ||
| const MatrixPtr& bias = hasBias ? biases_->getW() : nullptr; | ||
| const MatrixPtr& out = output_.value; | ||
|  | ||
| if (inputIsOnlyMKLDNN()) { | ||
| const MatrixPtr& in = getInputValue(0); | ||
| inVal_ = std::dynamic_pointer_cast<MKLDNNMatrix>(in); | ||
| CHECK(inVal_) << "Input should be MKLDNNMatrix"; | ||
| } else { | ||
| CHECK_EQ(getPrev(0)->getDeviceId(), CPU_DEVICE) << "Only support CPU yet"; | ||
| const MatrixPtr& in = getInputValue(0, CPU_DEVICE); | ||
| inVal_ = MKLDNNMatrix::create( | ||
| in, memory::dims{bs_, ic_, ih_, iw_}, format::nchw, engine_); | ||
| } | ||
| inVal_->downSpatial(); | ||
| wgtVal_ = MKLDNNMatrix::create( | ||
| wgt, memory::dims{oc_, ic_, ih_, iw_}, format::oihw, engine_); | ||
| wgtVal_->downSpatial(); | ||
| biasVal_ = | ||
| hasBias ? MKLDNNMatrix::create(bias, {oc_}, format::x, engine_) : nullptr; | ||
| outVal_ = MKLDNNMatrix::create(out, {bs_, oc_}, format::nc, engine_); | ||
|  | ||
| // change original output value to mkldnn output value | ||
| There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. original output value?指的是什么格式? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 这里的 | ||
| output_.value = std::dynamic_pointer_cast<Matrix>(outVal_); | ||
| if (!outputIsOnlyMKLDNN()) { | ||
| convertOutputToOtherDevice(); | ||
| } | ||
|  | ||
| // create forward handle | ||
| prop_kind pk = prop_kind::forward; | ||
| fc_fwd::desc fwdDesc = bData != NULL ? fc_fwd::desc(pk, iMD, wMD, bMD, oMD) | ||
| : fc_fwd::desc(pk, iMD, wMD, oMD); | ||
| fc_fwd::desc fwdDesc = hasBias ? fc_fwd::desc(pk, | ||
| inVal_->getMemoryDesc(), | ||
| wgtVal_->getMemoryDesc(), | ||
| biasVal_->getMemoryDesc(), | ||
| outVal_->getMemoryDesc()) | ||
| : fc_fwd::desc(pk, | ||
| inVal_->getMemoryDesc(), | ||
| wgtVal_->getMemoryDesc(), | ||
| outVal_->getMemoryDesc()); | ||
| fc_fwd::primitive_desc fwdPD = fc_fwd::primitive_desc(fwdDesc, engine_); | ||
|  | ||
| if (bData != NULL) { | ||
| biasVal_.reset(new memory(memory::primitive_desc(bMD, engine_), bData)); | ||
| if (hasBias) { | ||
| fwd_.reset(new fc_fwd(fwdPD, *inVal_, *wgtVal_, *biasVal_, *outVal_)); | ||
| } else { | ||
| fwd_.reset(new fc_fwd(fwdPD, *inVal_, *wgtVal_, *outVal_)); | ||
| } | ||
| printValueFormatFlow(); | ||
|  | ||
| pipelineFwd_.clear(); | ||
| pipelineFwd_.push_back(*fwd_); | ||
| } | ||
|  | @@ -175,45 +186,46 @@ void MKLDNNFcLayer::resetBwd() { | |
| return; | ||
| } | ||
| needResetBwd_ = false; | ||
|  | ||
| bool hasBias = biases_ && biases_->getWGrad(); | ||
| real* iData = getInputValue(0)->getData(); | ||
| real* iDiff = getInputGrad(0) != nullptr ? getInputGrad(0)->getData() : NULL; | ||
| real* oDiff = getOutputGrad()->getData(); | ||
| real* wDiff = weight_->getWGrad()->getData(); | ||
| real* bDiff = hasBias ? biases_->getWGrad()->getData() : NULL; | ||
|  | ||
| /// backward weight | ||
| // create memory desc for backward memory | ||
| memory::desc iMD = hasSpatial_ ? createMD({bs_, ic_, ih_, iw_}, format::nchw) | ||
| : createMD({bs_, ic_}, format::nc); | ||
| memory::desc wMD = hasSpatial_ ? createMD({oc_, ic_, ih_, iw_}, format::oihw) | ||
| : createMD({oc_, ic_}, format::oi); | ||
| memory::desc oMD = createMD({bs_, oc_}, format::nc); | ||
| memory::desc bMD = bDiff != NULL ? createMD({oc_}, format::x) | ||
| : createMD({}, format::format_undef); | ||
|  | ||
| if (inVal_) { | ||
| // update data | ||
| inVal_->set_data_handle(iData); | ||
| } else { | ||
| inVal_.reset(new memory(memory::primitive_desc(iMD, engine_), iData)); | ||
| } | ||
|  | ||
| // create memory primitive desc and memory self | ||
| wgtGrad_.reset(new memory(memory::primitive_desc(wMD, engine_), wDiff)); | ||
| outGrad_.reset(new memory(memory::primitive_desc(oMD, engine_), oDiff)); | ||
|  | ||
| fc_fwd::desc fwdDesc = fc_fwd::desc(prop_kind::forward, iMD, wMD, oMD); | ||
| CHECK(inVal_) << "Should have input value"; | ||
| const MatrixPtr& wgt = weight_->getWGrad(); | ||
| const MatrixPtr& bias = hasBias ? biases_->getWGrad() : nullptr; | ||
|  | ||
| // TODO(TJ): merge outgrad | ||
| int device = outputIsOnlyMKLDNN() ? MKLDNN_DEVICE : CPU_DEVICE; | ||
| // for MKLDNN device: | ||
| // can not directly cast outputgrad to mkldnnmatrix, | ||
| // since each layer can not write the inputgrad to mkldnn inputgrad. | ||
| // So just create from matrix with outputvalue format. | ||
| // for CPU device: | ||
| // fc do not need to convert from cpu device since output is always nc format | ||
| // only need create from cpu device | ||
| const MatrixPtr& out = getOutput(device).grad; | ||
| outGrad_ = MKLDNNMatrix::create(out, outVal_->getPrimitiveDesc()); | ||
| wgtGrad_ = MKLDNNMatrix::create(wgt, wgtVal_->getPrimitiveDesc()); | ||
| biasGrad_ = hasBias ? MKLDNNMatrix::create(bias, biasVal_->getPrimitiveDesc()) | ||
| : nullptr; | ||
|  | ||
| // create memory primitive desc | ||
| fc_fwd::desc fwdDesc = fc_fwd::desc(prop_kind::forward, | ||
| inVal_->getMemoryDesc(), | ||
| wgtGrad_->getMemoryDesc(), | ||
| outGrad_->getMemoryDesc()); | ||
| fc_fwd::primitive_desc fwdPD = fc_fwd::primitive_desc(fwdDesc, engine_); | ||
| fc_bwdWgt::desc bwdWgtDesc = bDiff != NULL | ||
| ? fc_bwdWgt::desc(iMD, wMD, bMD, oMD) | ||
| : fc_bwdWgt::desc(iMD, wMD, oMD); | ||
| fc_bwdWgt::desc bwdWgtDesc = hasBias | ||
| ? fc_bwdWgt::desc(inVal_->getMemoryDesc(), | ||
| wgtGrad_->getMemoryDesc(), | ||
| biasGrad_->getMemoryDesc(), | ||
| outGrad_->getMemoryDesc()) | ||
| : fc_bwdWgt::desc(inVal_->getMemoryDesc(), | ||
| wgtGrad_->getMemoryDesc(), | ||
| outGrad_->getMemoryDesc()); | ||
| fc_bwdWgt::primitive_desc bwdWgtPD = | ||
| fc_bwdWgt::primitive_desc(bwdWgtDesc, engine_, fwdPD); | ||
|  | ||
| if (bDiff != NULL) { | ||
| biasGrad_.reset(new memory(memory::primitive_desc(bMD, engine_), bDiff)); | ||
| if (hasBias) { | ||
| bwdWgt_.reset( | ||
| new fc_bwdWgt(bwdWgtPD, *inVal_, *outGrad_, *wgtGrad_, *biasGrad_)); | ||
| } else { | ||
|  | @@ -223,15 +235,26 @@ void MKLDNNFcLayer::resetBwd() { | |
| pipelineBwd_.push_back(*bwdWgt_); | ||
|  | ||
| /// backward data | ||
| if (iDiff == NULL) { | ||
| device = inputIsOnlyMKLDNN() ? MKLDNN_DEVICE : CPU_DEVICE; | ||
| const MatrixPtr& in = getInputGrad(0, device); | ||
| if (in == nullptr) { | ||
| return; | ||
| } | ||
| fc_bwdData::desc bwdDataDesc = fc_bwdData::desc(iMD, wMD, oMD); | ||
| if (getInput(0, device).getAllCount() > 1) { | ||
| // TODO(TJ): use outputMaps_ ways when merge outgrad done | ||
| } else { | ||
| inGrad_ = MKLDNNMatrix::create(in, inVal_->getPrimitiveDesc()); | ||
| } | ||
|  | ||
| fc_bwdData::desc bwdDataDesc = fc_bwdData::desc(inVal_->getMemoryDesc(), | ||
| wgtGrad_->getMemoryDesc(), | ||
| outGrad_->getMemoryDesc()); | ||
| fc_bwdData::primitive_desc bwdDataPD = | ||
| fc_bwdData::primitive_desc(bwdDataDesc, engine_, fwdPD); | ||
| inGrad_.reset(new memory(memory::primitive_desc(iMD, engine_), iDiff)); | ||
|  | ||
| CHECK(wgtVal_) << "Should have weight memory"; | ||
| bwdData_.reset(new fc_bwdData(bwdDataPD, *outGrad_, *wgtVal_, *inGrad_)); | ||
| printGradFormatFlow(); | ||
| pipelineBwd_.push_back(*bwdData_); | ||
| } | ||
|  | ||
|  | @@ -241,11 +264,7 @@ void MKLDNNFcLayer::forward(PassType passType) { | |
|  | ||
| { | ||
| REGISTER_TIMER_INFO("mkldnn_FwdTimer", getName().c_str()); | ||
|  | ||
| // update input data | ||
| // since it might be changed if this is after data layer | ||
| real* iData = getInputValue(0)->getData(); | ||
| inVal_->set_data_handle(iData); | ||
| syncInputValue(); | ||
|  | ||
| // just submit forward pipeline | ||
| stream_->submit(pipelineFwd_); | ||
|  | @@ -267,10 +286,7 @@ void MKLDNNFcLayer::backward(const UpdateCallback& callback) { | |
| REGISTER_TIMER_INFO("mkldnn_bwdTimer", getName().c_str()); | ||
| resetBwd(); | ||
|  | ||
| // update diff | ||
| real* oDiff = getOutputGrad()->getData(); | ||
| outGrad_->set_data_handle(oDiff); | ||
|  | ||
| syncOutputGrad(); | ||
| // just sumbmit backward pipeline | ||
| stream_->submit(pipelineBwd_); | ||
| } | ||
|  | ||
  Oops, something went wrong.  
  Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.    
 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对mkldnn layer来说,要转value的地方多么?如果大部分layer都是
那放进基类函数即可。需要转的layer再单独写一下,会比较清爽。
不能超过1个CPU设备的检查也应该放进基类函数中吧。而且为什么不能超过1个呢?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不是大部分都是
outputOtherDevice_[i].value = output_.value;,另外的layer是需要别的操作,这里fc因为一直是nc格式的输出,与paddle的cpu device格式一样,所以直接可以share。不过后面layer多一点之后,可以再整理一遍的。
不超过一个CPU device是理论上我认为不应该会出现多个,担心目前考虑的不周全,比如RNN的case会不会有影响,所以给一个warning。如果就算是多个,每个还是用的share。这一点也是特定在FClayer的。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果要做检查,也应该放在mkldnnLayer的convertOutputToOtherDevice里做,可以在下一个PR中修改。