Skip to content

Commit 56512b0

Browse files
prabodDevinTDHa
andauthored
[SPARKNLP-1123] Introducing InternVL (#14578)
* add intervl scala api * add internvl python api * internvl docs * update scala and python api for tests Signed-off-by: Prabod Rathnayaka <prabod@rathnayaka.me> * add notebook * InternVL: minor python adjustments --------- Signed-off-by: Prabod Rathnayaka <prabod@rathnayaka.me> Co-authored-by: Devin Ha <devin@trungducha.de>
1 parent b316a91 commit 56512b0

File tree

15 files changed

+2910
-1
lines changed

15 files changed

+2910
-1
lines changed
Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
{%- capture title -%}
2+
InternVLForMultiModal
3+
{%- endcapture -%}
4+
5+
{%- capture description -%}
6+
Visual Question Answering using InternVL.
7+
8+
InternVLForMultiModal can load InternVL Vision models for visual question answering.
9+
The model consists of a vision encoder, a text encoder, a text decoder and a model merger.
10+
The vision encoder will encode the input image, the text encoder will encode the input text,
11+
the model merger will merge the image and text embeddings, and the text decoder will output the answer.
12+
13+
InternVL 2.5 is an advanced multimodal large language model (MLLM) series that builds upon InternVL 2.0,
14+
maintaining its core model architecture while introducing significant enhancements in training and testing
15+
strategies as well as data quality. Key features include:
16+
- Large context window support
17+
- Multilingual support
18+
- Multimodal capabilities handling both text and image inputs
19+
- Optimized for deployment with int4 quantization
20+
21+
Pretrained models can be loaded with `pretrained` of the companion object:
22+
23+
```scala
24+
val visualQA = InternVLForMultiModal.pretrained()
25+
.setInputCols("image_assembler")
26+
.setOutputCol("answer")
27+
```
28+
The default model is `"internvl2_5_1b_int4"`, if no name is provided.
29+
30+
For available pretrained models please see the
31+
[Models Hub](https://sparknlp.org/models?task=Question+Answering).
32+
33+
To see which models are compatible and how to import them see
34+
[Import Transformers into Spark NLP 🚀](https://github.com/JohnSnowLabs/spark-nlp/discussions/5669).
35+
36+
{%- endcapture -%}
37+
38+
{%- capture input_anno -%}
39+
IMAGE
40+
{%- endcapture -%}
41+
42+
{%- capture output_anno -%}
43+
DOCUMENT
44+
{%- endcapture -%}
45+
46+
{%- capture python_example -%}
47+
import sparknlp
48+
from sparknlp.base import *
49+
from sparknlp.annotator import *
50+
from pyspark.ml import Pipeline
51+
from pyspark.sql.functions import lit
52+
53+
image_df = spark.read.format("image").load(path=images_path) # Replace with your image path
54+
test_df = image_df.withColumn("text", lit("<|im_start|><image>\nDescribe this image in detail.<|im_end|><|im_start|>assistant\n"))
55+
56+
imageAssembler = ImageAssembler()
57+
.setInputCol("image")
58+
.setOutputCol("image_assembler")
59+
60+
visualQAClassifier = InternVLForMultiModal.pretrained()
61+
.setInputCols("image_assembler")
62+
.setOutputCol("answer")
63+
64+
pipeline = Pipeline().setStages([
65+
imageAssembler,
66+
visualQAClassifier
67+
])
68+
69+
result = pipeline.fit(test_df).transform(test_df)
70+
result.select("image_assembler.origin", "answer.result").show(False)
71+
{%- endcapture -%}
72+
73+
{%- capture scala_example -%}
74+
import spark.implicits._
75+
import com.johnsnowlabs.nlp.base._
76+
import com.johnsnowlabs.nlp.annotator._
77+
import org.apache.spark.ml.Pipeline
78+
import org.apache.spark.sql.DataFrame
79+
import org.apache.spark.sql.functions.lit
80+
81+
val imageFolder = "path/to/your/images" // Replace with your image path
82+
83+
val imageDF: DataFrame = spark.read
84+
.format("image")
85+
.option("dropInvalid", value = true)
86+
.load(imageFolder)
87+
88+
val testDF: DataFrame = imageDF.withColumn("text", lit("<|im_start|><image>\nDescribe this image in detail.<|im_end|><|im_start|>assistant\n"))
89+
90+
val imageAssembler: ImageAssembler = new ImageAssembler()
91+
.setInputCol("image")
92+
.setOutputCol("image_assembler")
93+
94+
val visualQAClassifier = InternVLForMultiModal.pretrained()
95+
.setInputCols("image_assembler")
96+
.setOutputCol("answer")
97+
98+
val pipeline = new Pipeline().setStages(Array(
99+
imageAssembler,
100+
visualQAClassifier
101+
))
102+
103+
val result = pipeline.fit(testDF).transform(testDF)
104+
105+
result.select("image_assembler.origin", "answer.result").show(false)
106+
{%- endcapture -%}
107+
108+
{%- capture api_link -%}
109+
[InternVLForMultiModal](/api/com/johnsnowlabs/nlp/annotators/cv/InternVLForMultiModal)
110+
{%- endcapture -%}
111+
112+
{%- capture python_api_link -%}
113+
[InternVLForMultiModal](/api/python/reference/autosummary/sparknlp/annotator/cv/internvl_for_multimodal/index.html#sparknlp.annotator.cv.internvl_for_multimodal.InternVLForMultiModal)
114+
{%- endcapture -%}
115+
116+
{%- capture source_link -%}
117+
[InternVLForMultiModal](https://github.com/JohnSnowLabs/spark-nlp/tree/master/src/main/scala/com/johnsnowlabs/nlp/annotators/cv/InternVLForMultiModal.scala)
118+
{%- endcapture -%}
119+
120+
{% include templates/anno_template.md
121+
title=title
122+
description=description
123+
input_anno=input_anno
124+
output_anno=output_anno
125+
python_example=python_example
126+
scala_example=scala_example
127+
api_link=api_link
128+
python_api_link=python_api_link
129+
source_link=source_link
130+
%}

0 commit comments

Comments
 (0)