Skip to content

Conversation

@prabod
Copy link
Contributor

@prabod prabod commented May 16, 2025

Description

This pull request introduces support for the InternVLForMultiModal model, a multimodal large language model designed for visual question answering. The changes include the addition of a new annotator, utility functions for image preprocessing, and test cases to validate functionality. This annotator can load InternVL 2, 2.5 and 3 family of models.

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • Code improvements with no or little impact
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING page.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
@prabod prabod requested a review from DevinTDHa May 16, 2025 05:12
@prabod prabod self-assigned this May 16, 2025
@prabod prabod added the new-feature Introducing a new feature label May 16, 2025
@DevinTDHa DevinTDHa changed the base branch from master to release/602-release-candidate May 23, 2025 08:20
Copy link
Member

@DevinTDHa DevinTDHa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor changes needed for the python part (configProtoBytes not used), which I'll remove during the merge, but other than that looks good to me!


outputAnnotatorType = AnnotatorType.DOCUMENT

configProtoBytes = Param(Params._dummy(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Config Protobytes not needed

image_assembler = ImageAssembler().setInputCol("image").setOutputCol("image_assembler")

imageClassifier = (InternVLForMultiModal \
.loadSavedModel("/mnt/research/Projects/ModelZoo/internVL/models/int4/OpenGVLab/InternVL2-1B", self.spark) \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should change this to pretrained() for the master branch

@DevinTDHa DevinTDHa merged commit 56512b0 into release/602-release-candidate May 23, 2025
4 checks passed
@DevinTDHa DevinTDHa mentioned this pull request May 23, 2025
10 tasks
DevinTDHa added a commit that referenced this pull request May 28, 2025
* add intervl scala api * add internvl python api * internvl docs * update scala and python api for tests Signed-off-by: Prabod Rathnayaka <prabod@rathnayaka.me> * add notebook * InternVL: minor python adjustments --------- Signed-off-by: Prabod Rathnayaka <prabod@rathnayaka.me> Co-authored-by: Devin Ha <devin@trungducha.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new-feature Introducing a new feature

3 participants