Skip to content

Conversation

@Tsundoku958
Copy link
Contributor

@Tsundoku958 Tsundoku958 commented Dec 9, 2025

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily receiving feedbacks. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

I noticed that the current lmdeploy does not use tensor parallelism for the embedding layer and lm_head, yet they consume nearly as much GPU memory as the linear layers.Maybe This PR adds support for tensor parallelism in the embedding layer.

Modification

  • The rowwise tensor parallelism for the embedding layer
  • Corresponding unit test files.

Perhaps TP (tensor parallelism) for embedding and lm_head could be enabled by default in lmdeploy, or a new args could be added to let users control whether to enable or disable embedding parallelism?
@grimoire @lvhan028

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant