You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
📢 Spark NLP 6.0.1: Introducing New State-of-the-Art Vision-Language Models and Enhanced Document Processing
We are pleased to announce the release of Spark NLP 6.0.1, bringing exciting new vision features and continued enhancements. Expand your NLP capabilities at scale for a wide range of tasks by upgrading to 6.0.1 and leverage these powerful new additions and improvements!
We also have been adding blog posts covering various examples for our newest features. Check them out at Medium - Spark NLP!
🔥 Highlights
Added support for several new State-of-the-Art vision language models (VLM) including Gemma 3, PaliGemma, PaliGemma2, and SmolVLM.
Introduced new parameter options for the PDF Reader for enhanced document ingestion control.
🚀 New Features & Enhancements
New VLM Implementations
This release adds support for several cutting-edge VLMs, significantly expanding the range of tasks you can tackle with Spark NLP:
Gemma 3: The latest version of Google's lightweight, state-of-the-art open models. (link to notebook)
PaliGemma and PaliGemma 2: Integration of the original PaliGemma vision-language model by Gogle. This annotator can also read PaliGemma2 models. (link to notebook)
SmolVLM: small, fast, memory-efficient, and fully open-source 2B VLM (link to notebook)
PDF Reader Enhancements
The PDF Reader now includes additional parameters and options, providing users with more flexible and controlled ingestion of PDF documents, improving handling of various PDF structures. (link to notebook)
You can now
Add splitPage parameter to identify the correct number of pages
Add onlyPageNum parameter to display only the number of pages of the document
Add textStripper parameter used for output layout and formatting
Add sort parameter to enable or disable sorting lines
🐛 Bug Fixes
This release also includes fixes for several issues:
Fixed a python error in RoBERtaMultipleChoice, preventing these types of annotators to be loaded in Python
Fixed various typos and issues in our Jupyter notebook examples
❤️ Community Support
Slack For live discussion with the Spark NLP community and the team
GitHub Bug reports, feature requests, and contributions
Discussions Engage with other community members, share ideas, and show off how you use Spark NLP!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
📢 Spark NLP 6.0.1: Introducing New State-of-the-Art Vision-Language Models and Enhanced Document Processing
We are pleased to announce the release of Spark NLP 6.0.1, bringing exciting new vision features and continued enhancements. Expand your NLP capabilities at scale for a wide range of tasks by upgrading to 6.0.1 and leverage these powerful new additions and improvements!
We also have been adding blog posts covering various examples for our newest features. Check them out at Medium - Spark NLP!
🔥 Highlights
🚀 New Features & Enhancements
New VLM Implementations
This release adds support for several cutting-edge VLMs, significantly expanding the range of tasks you can tackle with Spark NLP:
PDF Reader Enhancements
The PDF Reader now includes additional parameters and options, providing users with more flexible and controlled ingestion of PDF documents, improving handling of various PDF structures. (link to notebook)
You can now
splitPageparameter to identify the correct number of pagesonlyPageNumparameter to display only the number of pages of the documenttextStripperparameter used for output layout and formattingsortparameter to enable or disable sorting lines🐛 Bug Fixes
This release also includes fixes for several issues:
RoBERtaMultipleChoice, preventing these types of annotators to be loaded in Python❤️ Community Support
⚙️ Installation
Python
Spark Packages
spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x (Scala 2.12):
GPU
Apple Silicon
AArch64
Maven
spark-nlp on Apache Spark 3.0.x, 3.1.x, 3.2.x, 3.3.x, and 3.4.x:
spark-nlp-gpu:
spark-nlp-silicon:
spark-nlp-aarch64:
FAT JARs
What's Changed
Full Changelog: 6.0.0...6.0.1
This discussion was created from the release Spark NLP 6.0.1: SmolVLM, PaliGemma 2, Gemma 3, PDF Reader enhancements.
Beta Was this translation helpful? Give feedback.
All reactions