Skip to content

Conversation

@chenyanlann
Copy link
Contributor

@chenyanlann chenyanlann commented Nov 12, 2021

PR types

New features

PR changes

APIs

Describe

为 Paddle Inference 添加 Java Apis

Task: #35977

Paddle Inference java API

Paddle Inference java API 基于 capi 和 jni 实现,需要您提前准备好C预测库。

安装(Linux)

1.下载C预测库

您可以选择直接下载paddle_inference_c预测库,或通过源码编译的方式安装,源码编译方式参考官网文档,注意这里cmake编译时打开-DON_INFER=ON,在编译目录下得到paddle_inference_c_install_dir

2.准备预测部署模型

下载 resnet50 模型后解压,得到 Paddle Combined 形式的模型。

wget https://paddle-inference-dist.bj.bcebos.com/Paddle-Inference-Demo/resnet50.tgz tar zxf resnet50.tgz #### 获得 resnet50 目录结构如下 resnet50/ ├── inference.pdmodel ├── inference.pdiparams └── inference.pdiparams.info 
3.准备预测执行目录
git clone github.com/paddlepaddle/paddle/paddle/fluid/inference/javaapi 
3. 编译动态链接库和jar包
在javaapi目录下执行 ./build.sh {c预测库目录} {jni头文件目录} {jni系统头文件目录} 以笔者的目录结构为例 ./build.sh /root/paddle_c/paddle_inference_c_2.2/paddle_inference_c/ /usr/lib/jvm/java-8-openjdk-amd64/include /usr/lib/jvm/java-8-openjdk-amd64/include/linux 执行完成后,会在当前目录下生成JavaInference.jar和libpaddle_inference.so
5.运行单测,验证
在javaapi目录下执行 ./test.sh {c预测库目录} {.pdmodel文件目录} {.pdiparams文件目录} 以笔者的目录结构为例 ./test.sh "/root/paddle_c/paddle_inference_c_2.2/paddle_inference_c" "/root/paddle_c/resnet50/inference.pdmodel" "/root/paddle_c/resnet50/inference.pdiparams" 

在Java中使用Paddle预测

首先创建预测配置

Config config = new Config(); config.setCppModel(model_file, params_file);

创建predictor

Predictor predictor = Predictor.createPaddlePredictor(config);

获取输入Tensor

String inNames = predictor.getInputNameById(0); Tensor inHandle = predictor.getInputHandle(inNames);

设置输入数据(假设只有一个输入)

inHandle.Reshape(4, new int[]{1, 3, 224, 224}); float[] inData = new float[1*3*224*224]; inHandle.CopyFromCpu(inData);

运行预测

predictor.Run();

获取输出Tensor

String outNames = predictor.getOutputNameById(0); Tensor outHandle = predictor.getOutputHandle(outNames); float[] outData = new float[outHandle.GetSize()]; outHandle.CopyToCpu(outData);
@chenyanlann
Copy link
Contributor Author

@TCChenlong
Copy link
Contributor

还在review中哈~

@DannyIsFunny
Copy link
Contributor

编译过程的中间产物需要从PR中删除: JavaInference.jarlibpaddle_inference.so Config.class 等等

@DannyIsFunny
Copy link
Contributor

PaddlePaddle 编译体系基于CMake工具构建、建议将build.sh中的 gcc 的编译流程修改为CMakeLists.txt
eg.

# CMakeLists.txt add_library(paddle_inference SHARED com_baidu_paddle_inference_Predictor.cpp com_baidu_paddle_inference_Config.cpp com_baidu_paddle_inference_Tensor.cpp) target_link_libraries(paddle_inference paddle_inference_c)
@DannyIsFunny
Copy link
Contributor

JAVA API 请 @winter-wang review一下

@DannyIsFunny
Copy link
Contributor

当前产出的java api动态库仍然需要动态依赖 C API的动态库、能否直接产出独立的JAVA API动态库

Copy link
Contributor

@winter-wang winter-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我在所有java涉及到的接口中没有看见对 PD_TensorDestroy函数和PD_PredictorDestroy函数的调用。请问在java接口使用过程中,底层的Tensor对象和Predictor对象如何回收尼?

const char* params_file = PD_ConfigGetProgFile(
reinterpret_cast<PD_Config*>(cppPaddleConfigPointer));
jstring paramsFile = env->NewStringUTF(params_file);
free(const_cast<char*>(params_file));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

此处不能显式free。 params_file指向的内存是底层std::string对应的内存。


JNIEXPORT jstring JNICALL Java_com_baidu_paddle_inference_Config_paramsFile(
JNIEnv* env, jobject obj, jlong cppPaddleConfigPointer) {
const char* params_file = PD_ConfigGetProgFile(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C接口函数调用错误。此处应该调用 PD_ConfigGetParamsFile函数。

const char* prog_file = PD_ConfigGetProgFile(
reinterpret_cast<PD_Config*>(cppPaddleConfigPointer));
jstring progFile = env->NewStringUTF(prog_file);
free(const_cast<char*>(prog_file));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

此处不能显式free. prog_file对应的内存是底层std::string的内存。

const char* model_dir = PD_ConfigGetModelDir(
reinterpret_cast<PD_Config*>(cppPaddleConfigPointer));
jstring modelDir = env->NewStringUTF(model_dir);
free(const_cast<char*>(model_dir));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

此处不能显式free. model_dir对应的内存是底层std::string的内存

const char* prog_file = env->GetStringUTFChars(progFile, 0);
PD_ConfigSetProgFile(reinterpret_cast<PD_Config*>(cppPaddleConfigPointer),
prog_file);
free(const_cast<char*>(prog_file));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这种GetStringUTFChars得到的内存不能用free函数是释放吧?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

其它地方也类似。

@paddle-bot-old
Copy link

Sorry to inform you that 5cf5259's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

@CLAassistant
Copy link

CLAassistant commented Nov 26, 2021

CLA assistant check
All committers have signed the CLA.

@chenyanlann
Copy link
Contributor Author

@winter-wang 感谢,已修复

@chenyanlann
Copy link
Contributor Author

@DannyIsFunny 感谢,已删除中间产物,添加CMakeLists.txt

@paddle-bot-old
Copy link

paddle-bot-old bot commented Dec 7, 2021

Sorry to inform you that 34ec990's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually.

winter-wang
winter-wang previously approved these changes Dec 10, 2021
Copy link
Contributor

@winter-wang winter-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Shixiaowei02
Copy link
Contributor

Shixiaowei02 commented Dec 13, 2021

@chenyanlann 您好!首先感谢您为飞桨贡献代码!

作为修改建议的最后一步,请将相关代码移入 experimental 名称空间或目录中,并参考已有的 C++ 部分 贡献 Java 接口文档和示例

用户接口是框架最重要的代码,我们将发起评审并正式合入。谢谢!

Copy link
Contributor

@Superjomn Superjomn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@winter-wang winter-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@raindrops2sea raindrops2sea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Superjomn Superjomn merged commit 238f3c8 into PaddlePaddle:develop Feb 10, 2022
Shixiaowei02 added a commit that referenced this pull request Feb 16, 2022
* 【Pten】Adjust the Empyt dev_api (#39143) * adjust the Empyt dev_api * fix merge conflict * fix sparse_utils_kernel * Fix code conflict of empty dev_api (#39430) * fix code conflict * clear cache * just try * [PluggableDevice] custom kernel supports multi cpp_dtype registering (#39385) * [PTen] Add standard kernel suffix set (#39404) * add standard_suffix_set_and_remove_reshape_with_xshape * revert reshape change * polish reduce name * [pten] update isnan registration (#39419) * update isnan registration * fix compile * [bf16] add bf16 kernel: dropout & reshape & slice (#39395) * add dropout * add reshape * add slice * refien slice unittest * refine slice unittest * add cpu bf16 kernel * [bf16] add bf16 kernel: squeeze & unsqueeze & stack (#39402) * add squeeze unsqueeze stack * add unittest * add cpu kernel * Modify the unsqueeze dimension of input data in conv1d NCL And NLC format (#38425) * optimize conv1d forward * add conv opt * Optimize memory copy * delete share data with * set num_filters=512 * add nlc optimize * Optimize num_filter=512 data on A100 and V100 * Fix the workspace_size size setting of filter * 【Pten】Refactor C++ API code-gen (#39408) * refactor C++ API code-gen * fix windows problem of C++ API * Refactored Python-C Attributes Parsing Functions (#39328) * Add _get_parameter method to Lamb optimizer (#39416) * add _get_parameter func to lamb * remove duplicate code * mkldnn layout issue fix (#39422) * mkldnn conv fix * definetion * fix compile error on jetson (#39441) * move Masked select to pten (#39193) * move masked select cpu kernel * add masked selected gpu kernel; test=develop * fix bugs; test=develop * bug fix; test=develop * bug fix; test=develop * add namespace to set mask array; test=develop * fix bug; test=develop * fix bugs; test=develop * fix ddim bug; test=develop * fix npu op bug; test=develop * fix xpu dependecy bug; test=develop * move kernel args to sig.cc; test=develop * 【PaddlePaddle Hackathon】31. Add Java frontend for Paddle Inference (#37162) * fix check error of ResetHolder (#39439) * Added python-c code generation for final state Eager Dygraph (#39233) * Removed debug info * Added automatic code generation for final state Eager Dygraph * Modified backward yaml * Added EagerUtils helper functions for final state CodeGen * Adjusted CMakeFiles to support compilation for final state auto generated codes * Added python-c code generation for final state Eager Dygraph * Fixed minor issue * Fixed yaml.load() method failure * Fixed minor issues * Refactored Python-C Attributes Parsing Functions * Fixed minor issue with Python-C AddFunctions * Fixed issues from merge * Fixed merge issues * change dtype of pooling mask to 'int32' for Paddle2ONNX (#39314) * change dtype of pooling mask to 'int32' for Paddle2ONNX * empty commit to rerun ci * fix format * share MemOptVarInfos of external variables into cinn_launch subgraph (#39209) * add a graph pass to share MemOptVarInfos of external variables into subgraph * update pass name * fix compile failed * add share_mem_opt_info_to_subgraph_pass test * share_mem_opt_info_to_subgraph_pass_test pass * modify some codes for better style and more robust * update cmake * [NPU] add reduce_min (#39019) [NPU] add reduce_min * [MLU] add mlu kernel for accuracy op (#39337) * [MLU] add mlu kernel for accuracy op * fix license format * fix error message * [Dy2St]Handle `a, b = paddle.shape(x)` in Static Analysis (#39245) * refine Assign * add UT * 【Pten】Auto-Generate InterMeta register (#39436) * fix code conflict * generate inter_meta register * clear cache * just try * add sign c++ api * polish some code * Support different dtypes of inputs for elementwise ops (#38859) * improve backward performance * support different dtypes for elementwise ops * Add profiler node tree implementation (#39316) * add event node implementation * modify profiler.stop interface * fix according to review * fix file mode * modify class method name in event_node.cc * modify LLONG_MAX to ULLONG_MAX * fix ci error * fix ci error * add print pten kernel tool (#39371) * test=document_fix;add print pten kernel tool * test=document_fix * test=document_fix * test=document_fix * test=document_fix * add print_pten_kernels tool * add print_pten_kernels tool * fix windows complie * notest,test=rocm_ci * add merge tool * add comments * [new-exec] set type of op-kernel op by place (#39458) * Add log for executor (#39459) * add align for WorkQueue * add spinlock * merge develop * merge * Add EventsWaiter * Revert "Add EventsWaiter" This reverts commit e206173. * add log for Executor Co-authored-by: liutiexing <liutiexing@google.com> * [Paddle Inference] support ernie quant model with interleaved (#39424) * support ernie quant model with interleaved * support ernie quant model with interleaved * support ernie quant model with interleaved * support ernie quant model with interleaved * support ernie quant model with interleaved * support ernie quant model with interleaved * support ernie quant model with interleaved * 统一 ps 开发 - python (#39431) * delete gloo connect retry * the_one_ps dirs reconstruct * . * . * create the_one_ps dirs * create the_one_ps dirs * create the_one_ps dirs * create the_one_ps dirs * create the_one_ps dirs * create the_one_ps dirs * the one ps dirs modify * the one ps dirs modify * the one ps dirs modify * the one ps dirs modify * refactor ps optimize * refactor ps optimize * refactor ps optimize * . * . * . * . * . * . * refactor theoneps * the_one_ps * add ps pass unittest * add ps pass unittest * ps unitest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * add cpu_async_ps_mode test * add cpu_async_ps_mode test * add cpu_async_ps_mode test * ps unittest ready * ps unittest ready * solve dist_pass init conflict * solve import CommContext error * unittest ok * implement AllocateFrom * solve setup.py.in conflict * solve conflict * solve conflict * solve conflict * . * . * cpu-async-ps minimize test ok & gpu minimize test ok Co-authored-by: zkh2016 <zhangkaihuo@baidu.com> * [PTen] Move grad GetExpectedPtenKernelArgs into pten (#39418) * move grad get expected pten kernel args * fix reduce sum error * fix element_sub_grad failed * revert kernel judge change * fix compilation warning on mac (#39438) * get build time (#39368) * fix prelu trt convert (#39389) * Optimize bilinear interpolation foward (#39243) * bilinear_fw init * optimize code * pre-compute linear_interp input index * Optimize performance of softmax_bwd when axis!=-1 (#38609) * Optimize performance of softmax_bwd when axis!=-1 * fix * fix * fix * fix * [PTen] Remove pten core's dependency on fluid xxx_info.h (#39401) * ermove xxx_info include * fix namespace error * resolve conflict * skip xpu context in registry * fix macro error * resolve conflict * resolve conflict * revert xpu convert * remove trans to fluid place * remove useless headers * [Pten] move operators/math/math_function_* to pten/kernels/func (#39300) * move operators/math/math_function_* to pten/kernels/func * namespace from `paddle::operators::math` to `pten::funcs` * [MLU] add pool2d and pool2d_grad mlu kernel (#39453) * [MLU]support c_gen_cncl_id_op run on MLU device (#39336) Co-authored-by: zhangna <zhangna@cambricon.com> * [bf16] add bf16 kernel: transpose & unbind (#39457) * add transpose unbind * add unittest * refine transpose unittest * uniform_random op for mlu (#39450) * [MLU] add pool2d pytest (#39454) * Added shape (U)INT8/BF16/FP32 oneDNN kernel (#36033) * added shape oneDNN kernel * removed unnecessary import from test * added skipping tests for GPU * refactoring * refactored shape kernel * added tests in new framework * removed one line * minor change * added newline at EOF * added formatting * added attributes as extra * move memcpy.h into cc file (#39469) * Add TensorRT inspector into Paddle-TRT (#38362) * Fix add profiler node tree implementation cmake error (#39474) * add event node implementation * modify profiler.stop interface * fix according to review * fix file mode * modify class method name in event_node.cc * modify LLONG_MAX to ULLONG_MAX * fix ci error * fix ci error * fix dependency error * unify naming style (#39481) * [Pten] Generate Wrapped InferMeta by Yaml (#39482) * generate wrapped_infer_meta * add test for wrapped_infer_meta * Update test_meta_fn_utils.cc * change the dir of generated file Co-authored-by: Chen Weihang <chenweihang@baidu.com> Co-authored-by: Chen Weihang <chenwhpro@163.com> * Adjusted python-level trace_op to accomodate final state Eager Dygraph (#39319) * Removed debug info * Added automatic code generation for final state Eager Dygraph * Modified backward yaml * Added EagerUtils helper functions for final state CodeGen * Adjusted CMakeFiles to support compilation for final state auto generated codes * Added python-c code generation for final state Eager Dygraph * Fixed minor issue * Fixed yaml.load() method failure * Fixed minor issues * Refactored Python-C Attributes Parsing Functions * Fixed minor issue with Python-C AddFunctions * Adjusted python-level trace_op to accomodate final state Eager Dygraph * Added Logs for final state Eager Dygraph * Fixed merge issues * Fixed minor issue * Fixed get_tensor method for EagerTensor (#39414) * Enabled Eager OpTest #1 * Enabled Eager OpTest #1 * Fixed get_tensor method for EagerTensor * [Approver Update] update check approver of qili93, test=document_fix (#39483) * [MLU] add mlu kernel for c_broadcast op (#39470) * update xpu test build script and fix get_test_cover_info, *test=kunlun (#39235) * fix gather_nd, *test=kunlun (#39283) * [pten] add split kernel (#39060) * add split kernel * add split kernel signature * fix split bug * modify MakePtenScalarArrayFromVarList * modify MakePtenScalarArrayFromVarList * fix split windows register error * add test case for split kernel * replace raw split kernel with pten kernel * fix makeScalar/ScalarArray bug * remove debug log * remove int64_t type in buildPtcontext * update by code review * fix split dev test failed * change DenseTensorMeta to MetaTensor * change split api code from auto gen to manual * split cuda kernel support bfloat16 type * fix conflict * rm raw split kernel * merge develop branch * change to pten::errors * new may of test cases, *test=kunlun (#39444) * new may of test cases, *test=kunlun * new may of test cases, *test=kunlun * new may of test cases, *test=kunlun * [PTen] Add HasAttr for ArgumentMappingContext (#39464) * add has_attr for arg map context * skip useless attr now * skip attr if not exists * fix typo * [ROCm] fix missing dcu kernel in operator.cmake, test=develop (#39480) Co-authored-by: zyfncg <zhangyunfei07@baidu.com> Co-authored-by: Aganlengzi <aganlengzi@gmail.com> Co-authored-by: Chen Weihang <chenweihang@baidu.com> Co-authored-by: Leo Chen <chenqiuliang@baidu.com> Co-authored-by: zhangbo9674 <82555433+zhangbo9674@users.noreply.github.com> Co-authored-by: crystal <62974595+Zjq9409@users.noreply.github.com> Co-authored-by: Zhanlue Yang <jim19930609@gmail.com> Co-authored-by: sneaxiy <32832641+sneaxiy@users.noreply.github.com> Co-authored-by: wenbin <wang3323032@qq.com> Co-authored-by: Wilber <jiweibo@baidu.com> Co-authored-by: hong <43953930+phlrain@users.noreply.github.com> Co-authored-by: chenyanlann <62465397+chenyanlann@users.noreply.github.com> Co-authored-by: Wei Shengyu <weisy11@163.com> Co-authored-by: TeFeng Chen <ctfeng66@163.com> Co-authored-by: furnace <34057289+windstamp@users.noreply.github.com> Co-authored-by: fwenguang <95677191+fwenguang@users.noreply.github.com> Co-authored-by: 0x45f <23097963+0x45f@users.noreply.github.com> Co-authored-by: Zhang Ting <zhangting_2017@163.com> Co-authored-by: chenjian <chenjian26@baidu.com> Co-authored-by: Shang Zhizhou <shangzhizhou@baidu.com> Co-authored-by: liutiexing <74819124+liutiexing@users.noreply.github.com> Co-authored-by: liutiexing <liutiexing@google.com> Co-authored-by: Wangzheee <634486483@qq.com> Co-authored-by: ziyoujiyi <73728031+ziyoujiyi@users.noreply.github.com> Co-authored-by: zkh2016 <zhangkaihuo@baidu.com> Co-authored-by: zhangchunle <clzhang_cauc@163.com> Co-authored-by: JingZhuangzhuang <75348594+JZZ-NOTE@users.noreply.github.com> Co-authored-by: Lijunhui <1578034415@qq.com> Co-authored-by: Zhang Zheng <32410583+ZzSean@users.noreply.github.com> Co-authored-by: Feiyu Chan <chenfeiyu@baidu.com> Co-authored-by: zn <96479180+kangna-qi@users.noreply.github.com> Co-authored-by: zhangna <zhangna@cambricon.com> Co-authored-by: joeqiao12 <45232181+joeqiao12@users.noreply.github.com> Co-authored-by: jakpiase <jakpia21@gmail.com> Co-authored-by: Leo Chen <39020268+leo0519@users.noreply.github.com> Co-authored-by: Chen Weihang <chenwhpro@163.com> Co-authored-by: Qi Li <qili93@qq.com> Co-authored-by: maxhuiy <1508399706@qq.com> Co-authored-by: TTerror <tangzhiyi11@users.noreply.github.com> Co-authored-by: chentianyu03 <chentianyu03@baidu.com> Co-authored-by: helen88 <z8hanghuan@126.com>
winter-wang pushed a commit to winter-wang/Paddle that referenced this pull request Feb 16, 2022
* 【Pten】Adjust the Empyt dev_api (PaddlePaddle#39143) * adjust the Empyt dev_api * fix merge conflict * fix sparse_utils_kernel * Fix code conflict of empty dev_api (PaddlePaddle#39430) * fix code conflict * clear cache * just try * [PluggableDevice] custom kernel supports multi cpp_dtype registering (PaddlePaddle#39385) * [PTen] Add standard kernel suffix set (PaddlePaddle#39404) * add standard_suffix_set_and_remove_reshape_with_xshape * revert reshape change * polish reduce name * [pten] update isnan registration (PaddlePaddle#39419) * update isnan registration * fix compile * [bf16] add bf16 kernel: dropout & reshape & slice (PaddlePaddle#39395) * add dropout * add reshape * add slice * refien slice unittest * refine slice unittest * add cpu bf16 kernel * [bf16] add bf16 kernel: squeeze & unsqueeze & stack (PaddlePaddle#39402) * add squeeze unsqueeze stack * add unittest * add cpu kernel * Modify the unsqueeze dimension of input data in conv1d NCL And NLC format (PaddlePaddle#38425) * optimize conv1d forward * add conv opt * Optimize memory copy * delete share data with * set num_filters=512 * add nlc optimize * Optimize num_filter=512 data on A100 and V100 * Fix the workspace_size size setting of filter * 【Pten】Refactor C++ API code-gen (PaddlePaddle#39408) * refactor C++ API code-gen * fix windows problem of C++ API * Refactored Python-C Attributes Parsing Functions (PaddlePaddle#39328) * Add _get_parameter method to Lamb optimizer (PaddlePaddle#39416) * add _get_parameter func to lamb * remove duplicate code * mkldnn layout issue fix (PaddlePaddle#39422) * mkldnn conv fix * definetion * fix compile error on jetson (PaddlePaddle#39441) * move Masked select to pten (PaddlePaddle#39193) * move masked select cpu kernel * add masked selected gpu kernel; test=develop * fix bugs; test=develop * bug fix; test=develop * bug fix; test=develop * add namespace to set mask array; test=develop * fix bug; test=develop * fix bugs; test=develop * fix ddim bug; test=develop * fix npu op bug; test=develop * fix xpu dependecy bug; test=develop * move kernel args to sig.cc; test=develop * 【PaddlePaddle Hackathon】31. Add Java frontend for Paddle Inference (PaddlePaddle#37162) * fix check error of ResetHolder (PaddlePaddle#39439) * Added python-c code generation for final state Eager Dygraph (PaddlePaddle#39233) * Removed debug info * Added automatic code generation for final state Eager Dygraph * Modified backward yaml * Added EagerUtils helper functions for final state CodeGen * Adjusted CMakeFiles to support compilation for final state auto generated codes * Added python-c code generation for final state Eager Dygraph * Fixed minor issue * Fixed yaml.load() method failure * Fixed minor issues * Refactored Python-C Attributes Parsing Functions * Fixed minor issue with Python-C AddFunctions * Fixed issues from merge * Fixed merge issues * change dtype of pooling mask to 'int32' for Paddle2ONNX (PaddlePaddle#39314) * change dtype of pooling mask to 'int32' for Paddle2ONNX * empty commit to rerun ci * fix format * share MemOptVarInfos of external variables into cinn_launch subgraph (PaddlePaddle#39209) * add a graph pass to share MemOptVarInfos of external variables into subgraph * update pass name * fix compile failed * add share_mem_opt_info_to_subgraph_pass test * share_mem_opt_info_to_subgraph_pass_test pass * modify some codes for better style and more robust * update cmake * [NPU] add reduce_min (PaddlePaddle#39019) [NPU] add reduce_min * [MLU] add mlu kernel for accuracy op (PaddlePaddle#39337) * [MLU] add mlu kernel for accuracy op * fix license format * fix error message * [Dy2St]Handle `a, b = paddle.shape(x)` in Static Analysis (PaddlePaddle#39245) * refine Assign * add UT * 【Pten】Auto-Generate InterMeta register (PaddlePaddle#39436) * fix code conflict * generate inter_meta register * clear cache * just try * add sign c++ api * polish some code * Support different dtypes of inputs for elementwise ops (PaddlePaddle#38859) * improve backward performance * support different dtypes for elementwise ops * Add profiler node tree implementation (PaddlePaddle#39316) * add event node implementation * modify profiler.stop interface * fix according to review * fix file mode * modify class method name in event_node.cc * modify LLONG_MAX to ULLONG_MAX * fix ci error * fix ci error * add print pten kernel tool (PaddlePaddle#39371) * test=document_fix;add print pten kernel tool * test=document_fix * test=document_fix * test=document_fix * test=document_fix * add print_pten_kernels tool * add print_pten_kernels tool * fix windows complie * notest,test=rocm_ci * add merge tool * add comments * [new-exec] set type of op-kernel op by place (PaddlePaddle#39458) * Add log for executor (PaddlePaddle#39459) * add align for WorkQueue * add spinlock * merge develop * merge * Add EventsWaiter * Revert "Add EventsWaiter" This reverts commit e206173. * add log for Executor Co-authored-by: liutiexing <liutiexing@google.com> * [Paddle Inference] support ernie quant model with interleaved (PaddlePaddle#39424) * support ernie quant model with interleaved * support ernie quant model with interleaved * support ernie quant model with interleaved * support ernie quant model with interleaved * support ernie quant model with interleaved * support ernie quant model with interleaved * support ernie quant model with interleaved * 统一 ps 开发 - python (PaddlePaddle#39431) * delete gloo connect retry * the_one_ps dirs reconstruct * . * . * create the_one_ps dirs * create the_one_ps dirs * create the_one_ps dirs * create the_one_ps dirs * create the_one_ps dirs * create the_one_ps dirs * the one ps dirs modify * the one ps dirs modify * the one ps dirs modify * the one ps dirs modify * refactor ps optimize * refactor ps optimize * refactor ps optimize * . * . * . * . * . * . * refactor theoneps * the_one_ps * add ps pass unittest * add ps pass unittest * ps unitest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * ps unittest frame * add cpu_async_ps_mode test * add cpu_async_ps_mode test * add cpu_async_ps_mode test * ps unittest ready * ps unittest ready * solve dist_pass init conflict * solve import CommContext error * unittest ok * implement AllocateFrom * solve setup.py.in conflict * solve conflict * solve conflict * solve conflict * . * . * cpu-async-ps minimize test ok & gpu minimize test ok Co-authored-by: zkh2016 <zhangkaihuo@baidu.com> * [PTen] Move grad GetExpectedPtenKernelArgs into pten (PaddlePaddle#39418) * move grad get expected pten kernel args * fix reduce sum error * fix element_sub_grad failed * revert kernel judge change * fix compilation warning on mac (PaddlePaddle#39438) * get build time (PaddlePaddle#39368) * fix prelu trt convert (PaddlePaddle#39389) * Optimize bilinear interpolation foward (PaddlePaddle#39243) * bilinear_fw init * optimize code * pre-compute linear_interp input index * Optimize performance of softmax_bwd when axis!=-1 (PaddlePaddle#38609) * Optimize performance of softmax_bwd when axis!=-1 * fix * fix * fix * fix * [PTen] Remove pten core's dependency on fluid xxx_info.h (PaddlePaddle#39401) * ermove xxx_info include * fix namespace error * resolve conflict * skip xpu context in registry * fix macro error * resolve conflict * resolve conflict * revert xpu convert * remove trans to fluid place * remove useless headers * [Pten] move operators/math/math_function_* to pten/kernels/func (PaddlePaddle#39300) * move operators/math/math_function_* to pten/kernels/func * namespace from `paddle::operators::math` to `pten::funcs` * [MLU] add pool2d and pool2d_grad mlu kernel (PaddlePaddle#39453) * [MLU]support c_gen_cncl_id_op run on MLU device (PaddlePaddle#39336) Co-authored-by: zhangna <zhangna@cambricon.com> * [bf16] add bf16 kernel: transpose & unbind (PaddlePaddle#39457) * add transpose unbind * add unittest * refine transpose unittest * uniform_random op for mlu (PaddlePaddle#39450) * [MLU] add pool2d pytest (PaddlePaddle#39454) * Added shape (U)INT8/BF16/FP32 oneDNN kernel (PaddlePaddle#36033) * added shape oneDNN kernel * removed unnecessary import from test * added skipping tests for GPU * refactoring * refactored shape kernel * added tests in new framework * removed one line * minor change * added newline at EOF * added formatting * added attributes as extra * move memcpy.h into cc file (PaddlePaddle#39469) * Add TensorRT inspector into Paddle-TRT (PaddlePaddle#38362) * Fix add profiler node tree implementation cmake error (PaddlePaddle#39474) * add event node implementation * modify profiler.stop interface * fix according to review * fix file mode * modify class method name in event_node.cc * modify LLONG_MAX to ULLONG_MAX * fix ci error * fix ci error * fix dependency error * unify naming style (PaddlePaddle#39481) * [Pten] Generate Wrapped InferMeta by Yaml (PaddlePaddle#39482) * generate wrapped_infer_meta * add test for wrapped_infer_meta * Update test_meta_fn_utils.cc * change the dir of generated file Co-authored-by: Chen Weihang <chenweihang@baidu.com> Co-authored-by: Chen Weihang <chenwhpro@163.com> * Adjusted python-level trace_op to accomodate final state Eager Dygraph (PaddlePaddle#39319) * Removed debug info * Added automatic code generation for final state Eager Dygraph * Modified backward yaml * Added EagerUtils helper functions for final state CodeGen * Adjusted CMakeFiles to support compilation for final state auto generated codes * Added python-c code generation for final state Eager Dygraph * Fixed minor issue * Fixed yaml.load() method failure * Fixed minor issues * Refactored Python-C Attributes Parsing Functions * Fixed minor issue with Python-C AddFunctions * Adjusted python-level trace_op to accomodate final state Eager Dygraph * Added Logs for final state Eager Dygraph * Fixed merge issues * Fixed minor issue * Fixed get_tensor method for EagerTensor (PaddlePaddle#39414) * Enabled Eager OpTest PaddlePaddle#1 * Enabled Eager OpTest PaddlePaddle#1 * Fixed get_tensor method for EagerTensor * [Approver Update] update check approver of qili93, test=document_fix (PaddlePaddle#39483) * [MLU] add mlu kernel for c_broadcast op (PaddlePaddle#39470) * update xpu test build script and fix get_test_cover_info, *test=kunlun (PaddlePaddle#39235) * fix gather_nd, *test=kunlun (PaddlePaddle#39283) * [pten] add split kernel (PaddlePaddle#39060) * add split kernel * add split kernel signature * fix split bug * modify MakePtenScalarArrayFromVarList * modify MakePtenScalarArrayFromVarList * fix split windows register error * add test case for split kernel * replace raw split kernel with pten kernel * fix makeScalar/ScalarArray bug * remove debug log * remove int64_t type in buildPtcontext * update by code review * fix split dev test failed * change DenseTensorMeta to MetaTensor * change split api code from auto gen to manual * split cuda kernel support bfloat16 type * fix conflict * rm raw split kernel * merge develop branch * change to pten::errors * new may of test cases, *test=kunlun (PaddlePaddle#39444) * new may of test cases, *test=kunlun * new may of test cases, *test=kunlun * new may of test cases, *test=kunlun * [PTen] Add HasAttr for ArgumentMappingContext (PaddlePaddle#39464) * add has_attr for arg map context * skip useless attr now * skip attr if not exists * fix typo * [ROCm] fix missing dcu kernel in operator.cmake, test=develop (PaddlePaddle#39480) Co-authored-by: zyfncg <zhangyunfei07@baidu.com> Co-authored-by: Aganlengzi <aganlengzi@gmail.com> Co-authored-by: Chen Weihang <chenweihang@baidu.com> Co-authored-by: Leo Chen <chenqiuliang@baidu.com> Co-authored-by: zhangbo9674 <82555433+zhangbo9674@users.noreply.github.com> Co-authored-by: crystal <62974595+Zjq9409@users.noreply.github.com> Co-authored-by: Zhanlue Yang <jim19930609@gmail.com> Co-authored-by: sneaxiy <32832641+sneaxiy@users.noreply.github.com> Co-authored-by: wenbin <wang3323032@qq.com> Co-authored-by: Wilber <jiweibo@baidu.com> Co-authored-by: hong <43953930+phlrain@users.noreply.github.com> Co-authored-by: chenyanlann <62465397+chenyanlann@users.noreply.github.com> Co-authored-by: Wei Shengyu <weisy11@163.com> Co-authored-by: TeFeng Chen <ctfeng66@163.com> Co-authored-by: furnace <34057289+windstamp@users.noreply.github.com> Co-authored-by: fwenguang <95677191+fwenguang@users.noreply.github.com> Co-authored-by: 0x45f <23097963+0x45f@users.noreply.github.com> Co-authored-by: Zhang Ting <zhangting_2017@163.com> Co-authored-by: chenjian <chenjian26@baidu.com> Co-authored-by: Shang Zhizhou <shangzhizhou@baidu.com> Co-authored-by: liutiexing <74819124+liutiexing@users.noreply.github.com> Co-authored-by: liutiexing <liutiexing@google.com> Co-authored-by: Wangzheee <634486483@qq.com> Co-authored-by: ziyoujiyi <73728031+ziyoujiyi@users.noreply.github.com> Co-authored-by: zkh2016 <zhangkaihuo@baidu.com> Co-authored-by: zhangchunle <clzhang_cauc@163.com> Co-authored-by: JingZhuangzhuang <75348594+JZZ-NOTE@users.noreply.github.com> Co-authored-by: Lijunhui <1578034415@qq.com> Co-authored-by: Zhang Zheng <32410583+ZzSean@users.noreply.github.com> Co-authored-by: Feiyu Chan <chenfeiyu@baidu.com> Co-authored-by: zn <96479180+kangna-qi@users.noreply.github.com> Co-authored-by: zhangna <zhangna@cambricon.com> Co-authored-by: joeqiao12 <45232181+joeqiao12@users.noreply.github.com> Co-authored-by: jakpiase <jakpia21@gmail.com> Co-authored-by: Leo Chen <39020268+leo0519@users.noreply.github.com> Co-authored-by: Chen Weihang <chenwhpro@163.com> Co-authored-by: Qi Li <qili93@qq.com> Co-authored-by: maxhuiy <1508399706@qq.com> Co-authored-by: TTerror <tangzhiyi11@users.noreply.github.com> Co-authored-by: chentianyu03 <chentianyu03@baidu.com> Co-authored-by: helen88 <z8hanghuan@126.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

9 participants