Integrating ONNX runtime (ORT) in Spark NLP 5.0.0 🎉 #13857

maziyarpanahi · 2023-06-15T09:32:28Z

Overview

This pull request (PR) aims to enhance the capabilities of Spark NLP by introducing the integration of the ONNX runtime ORT for Java. This integration will enable the users to import Transformers and LLM (Language Model) models in ONNX format into Spark NLP. 🎉

In the upcoming release of Spark NLP 5.0.0, users can work with models in both TensorFlow and ONNX formats. However, our team's default provision of pretrained models will be in ONNX format. This choice is driven by the fact that the ONNX models yield significantly better inference results, ranging from 3x to 5x, even without any optimization or quantization techniques on CPUs.

The integration of ORT in Spark NLP empowers users to further enhance the performance of their models. When exporting models to ONNX, users can leverage built-in features provided by libraries such as onnx-runtime, transformers, optimum, and pytorch. These features include optimization and quantization capabilities, which come ready to use out of the box.

Initial Annotators/Features to support ONNX Runtime

In the realm of Vector Databases, the quest for faster and more efficient Embeddings models has become an imperative pursuit. Models like BERT, DistilBERT, and DeBERTa have revolutionized natural language processing tasks by capturing intricate semantic relationships between words. However, their computational demands and slow inference times pose significant challenges in the game of Vector Databases.

In Vector Databases, the speed at which queries are processed and embeddings are retrieved directly impacts the overall performance and responsiveness of the system. As these databases store vast amounts of vectorized data, such as documents, sentences, or entities, swiftly retrieving relevant embeddings becomes paramount. It enables real-time applications like search engines, recommendation systems, sentiment analysis, and chat/instruct-like products similar to ChatGPT to deliver timely and accurate results, ensuring a seamless user experience.

With that in mind, we have started with the following annotators: BertEmbeddings, DistilBertEmebeddings, and DeBertaEmbeddings. We will identify all the existing models for these annotators on our Models Hub, re-exporting them in ONNX format, and re-uploading them with the same name to have a seamless transition for our community starting Spark NLP 5.0.0. (stuff will just get faster with each release starting Spark NLP 5.0.0 🚀)

Models converted to ONNX

BERT (for Word Embeddings)
DistilBERT (for Word Embeddings)
DeBERTa (for Word Embeddings)
RoBERTa (for Word Embeddings)

Tested platforms

Test CPU (normal, optimized, and quantized)
- Colab
- EMR
- Glue 3/4
- AWS ECS Fargate
- Databricks
Test GPU/NVIDIA (normal, optimized, and quantized)
- Colab
- EMR
- Glue 3/4
- Databricks

coveralls · 2023-06-19T08:07:51Z

Pull Request Test Coverage Report for Build 5388321371

1 of 1 (100.0%) changed or added relevant line in 1 file are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.008%) to 65.887%

Totals
Change from base Build 5308728546:	0.008%
Covered Lines:	8647
Relevant Lines:	13124

💛 - Coveralls

src/main/scala/com/johnsnowlabs/ml/ai/Bert.scala

src/main/scala/com/johnsnowlabs/nlp/embeddings/BertSentenceEmbeddings.scala

src/main/scala/com/johnsnowlabs/nlp/embeddings/DeBertaEmbeddings.scala

Co-authored-by: Stefano Lori <s.lori@izicap.com>

maziyarpanahi added 24 commits April 13, 2023 13:59

Add ONNX Runtime to the dependencies

406ef47

Add both CPU and GPU coordinates for onnxruntime

1dcd672

Implement OnnxSerializeModel

61ecae1

Implement OnnxWrapper

c1ae390

Update error message for loading external models

1a02cc4

Add support for ONNX to BertEmbeddings annotator

73ce499

Add support for ONNX to BERT backend

7dfb64b

Add support for ONNX to DeBERTa

991aa80

Implement ONNX in DeBERTa backend

1a7ba47

Adapt Bert For sentence embeddings with the new backend

28f3ec4

Update unit test for BERT (temp)

4dd6a63

Update unit test for DeBERTa (temp)

fd4be2d

Update onnxruntime and google cloud dependencies

2068899

Seems Apple Silicon and Aarch64 are supported in onnxruntime

8c745cc

Cleaning up

ef3233b

Remove bad merge

81dc5d4

Update BERT unit test

2de44f9

Merge branch 'master' into feature/onnx-runtime

70763b7

Add fix me to the try

d507528

Making withSafeOnnxModelLoader thread safe

f71e645

Merge branch 'master' into feature/onnx-runtime

bafb5df

Merge branch 'master' into feature/onnx-runtime

9afd6ab

update onnxruntime

2578b4b

Merge branch 'master' into feature/onnx-runtime

3415fc1

maziyarpanahi added documentation new-feature Introducing a new feature new model DON'T MERGE Do not merge this PR labels Jun 15, 2023

maziyarpanahi self-assigned this Jun 15, 2023

Revert back to normal unit tests for now [ski ptest]

103818b

maziyarpanahi requested review from danilojsl and wolliq June 15, 2023 09:39

maziyarpanahi assigned ahmedlone127 Jun 15, 2023

maziyarpanahi changed the base branch from master to release/500-release-candidate June 19, 2023 06:51

Merge branch 'release/500-release-candidate' into feature/onnx-runtime

56867fd

danilojsl requested changes Jun 19, 2023

View reviewed changes

wolliq and others added 7 commits June 23, 2023 14:18

Added ADT for ModelEngine (#13862)

ffdc375

Co-authored-by: Stefano Lori <s.lori@izicap.com>

Optimize ONNX on CPU

a675e97

refactor

e67d79d

Add ONNX support to DistilBERT

a2dc8c6

Add support for ONNX in RoBERTa

2fe844f

Fix the bad serialization on write

a904234

Fix using the wrong object

0521edd

maziyarpanahi merged commit c2dd80d into release/500-release-candidate Jul 1, 2023

maziyarpanahi mentioned this pull request Jul 1, 2023

Release/500 release candidate #13873

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Integrating ONNX runtime (ORT) in Spark NLP 5.0.0 🎉 #13857

Integrating ONNX runtime (ORT) in Spark NLP 5.0.0 🎉 #13857

Uh oh!

maziyarpanahi commented Jun 15, 2023 •

edited

Loading

coveralls commented Jun 19, 2023 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Labels

6 participants

Integrating ONNX runtime (ORT) in Spark NLP 5.0.0 🎉 #13857

Integrating ONNX runtime (ORT) in Spark NLP 5.0.0 🎉 #13857

Uh oh!

Conversation

maziyarpanahi commented Jun 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Initial Annotators/Features to support ONNX Runtime

Models converted to ONNX

Tested platforms

coveralls commented Jun 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 5388321371

💛 - Coveralls

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Labels

6 participants

maziyarpanahi commented Jun 15, 2023 •

edited

Loading

coveralls commented Jun 19, 2023 •

edited

Loading