support exlusion of params when using low bit optim #1225

nighting0le01 · 2024-11-05T23:22:24Z

This PR allows, exclusion and inclusion of params layerwise when using low bit optimizers. this will allow for improving stability by running certain layers with 32 bit adam. https://huggingface.co/docs/bitsandbytes/main/en/optimizers

pytorch-bot · 2024-11-05T23:22:27Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1225

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit d0278ab with merge base 4f8021f ():

NEW FAILURES - The following jobs have failed:

Code Analysis with Ruff / build (3.9) (gh)
##[error]Process completed with exit code 1.
Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://downloa... / linux-job (gh)
test/integration/test_integration.py::TestSubclass::test_int8_dynamic_quant_subclass_api_5_cuda

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2024-11-05T23:22:29Z

Hi @nighting0le01!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

facebook-github-bot · 2024-11-06T01:07:13Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

gau-nernst

Thank you for the PR! Left some small comments.

test/prototype/test_low_bit_optim.py

gau-nernst · 2024-11-06T02:02:58Z

torchao/prototype/low_bit_optim/adam.py

+ self.exclude_low_bit_optim_params_ids = set(
+ id(p) for p in exclude_low_bit_optim_params
+ ) if exclude_low_bit_optim_params else set()


I think you can hash tensor directly (it will use object id internally). PyTorch optimizer already hashes tensors when it uses params as keys in self.state.

Can you try this? I don't think you need to use id(p) explicitly.

torchao/prototype/low_bit_optim/adam.py

nighting0le01 · 2024-11-06T22:30:48Z

@gau-nernst shall i also add configurable min_8bit_size like https://github.com/bitsandbytes-foundation/bitsandbytes/blob/9568735b21b9325e4789d6a5004517f2287f47c8/bitsandbytes/optim/optimizer.py#L603

over here :

ao/torchao/prototype/low_bit_optim/adam.py

Line 61 in 71a442a

if p.numel() >= 4096 and p.numel() % self.block_size == 0:

gau-nernst · 2024-11-07T01:22:51Z

@nighting0le01 Adding something like min_8bit_size should be good. Though personally I don't know if having it is useful in any ways (does anyone use it / does adjust it help with stability?) If you still want to add it, maybe we can call it more generic, like min_size_for_low_bit, since we also have 4-bit and FP8.

Do you mind rebase/merge from main and make sure the tests pass?

nighting0le01 · 2024-11-07T02:16:25Z

hi @gau-nernst !

yes i have rebased and confirmed all test cases pass.
min_size_for_low_bit. why i propose this is to allow running gradient exploding or unstable layers in 32bit precision. similar motivation to https://huggingface.co/docs/bitsandbytes/main/en/optimizers#optimize-unstable-parameters
i can push it in another PR if you suggest

gau-nernst · 2024-11-07T02:30:54Z

There are conflicts in your branch, hence I can't run the CI. Do you mind double-check? (from Github UI it shows test_low_bit_optim.py and adam.py have conflicts) Seem like you rebase from an outdated main? The diff for this PR looks kinda strange (there are changes in unwanted places)
From what I understand min_size_for_low_bit (or the original min_8bit_size) is to skip small params that don't contribute much memory savings if we use low-bit optim state for them (e.g. biases, norm params). How would it improve exploding gradients or instability? Usually instability appears in embedding layer or LM head I think (correct me if I'm wrong), which are large params but receive (somewhat) sparse gradients. In other words, how does increasing (or decreasing) the threshold help to improve stability? With this PR, the users can already select which specific params they want to keep optim state in original precision.

stack-info: PR: pytorch#1228, branch: drisspg/stack/19

* support dcp.save * add test for dcp.load() * fix test * typo * implement aten.slice * skip test * fix checks * run ruff * fix formatting * remove add safe globals in test * sort some imports --------- Co-authored-by: Mark Saroufim <marksaroufim@meta.com>

stack-info: PR: pytorch#1228, branch: drisspg/stack/19

nighting0le01 · 2024-12-05T14:38:35Z

@gau-nernst hi, sorry i was OOO for the last month, can you please run CI/CD now? verified test case passing locally

nighting0le01 · 2024-12-05T14:48:24Z

@gau-nernst hi, sorry i was OOO for the last month, can you please run CI/CD now? verified test case passing locally

@gau-nernst ran ruff check also now

nighting0le01 · 2024-12-05T14:53:03Z

topic: new feature can you please add this @gau-nernst . or any other topic that is relevant

gau-nernst

Ruff lint is still failing. Can you double check?

Failing CUDA night seems to be unrelated.

Requested some changes because some of the code has been changed since you last opened this PR. Lmk if you have any questions.

gau-nernst · 2024-12-06T00:44:28Z

test/prototype/test_low_bit_optim.py

+from torchao.utils import (
+ TORCH_VERSION_AT_LEAST_2_3,
+ TORCH_VERSION_AT_LEAST_2_4,
+ TORCH_VERSION_AT_LEAST_2_6,
+)


Don't reimport these

gau-nernst · 2024-12-06T00:44:54Z

test/prototype/test_low_bit_optim.py

 loss1, loss2, msg=lambda msg: f"Iteration {idx}. {msg}"
 )

+ @pytest.mark.skipif(not TORCH_VERSION_AT_LEAST_2_3, reason="requires PyTorch >= 2.3")


In our CI, min PyTorch version is 2.3. We don't need to check >=2.3 anymore. You can remove this line

gau-nernst · 2024-12-06T00:46:11Z

torchao/prototype/low_bit_optim/adam.py

+ self.exclude_low_bit_optim_params_ids = set(
+ id(p) for p in exclude_low_bit_optim_params
+ ) if exclude_low_bit_optim_params else set()


Can you try this? I don't think you need to use id(p) explicitly.

gau-nernst · 2024-12-06T00:50:02Z

torchao/prototype/low_bit_optim/adam.py


 # follow bitsandbytes, only quantize tensors >= 4096 values
- if local_p.numel() >= 4096 and local_p.numel() % self.block_size == 0:
+ if p.numel() >= 4096 and p.numel() % self.block_size == 0 and id(p) not in self.exclude_low_bit_optim_params_ids:


You should keep using local_p here for FSDP to work correctly (the check on divisibility should be done on local tensor, not the full tensor)

Furthermore, the check id(p) not in self.exclude_low_bit_optim_params_ids should be done before this. Just short-circuit it e.g. (if p in self.exclude_low_bit_optim_params: return torch.zeros_like(p))

jcaip · 2025-03-19T19:13:14Z

cc @nighting0le01 are you still planning on working on this?

nighting0le01 · 2025-03-19T19:14:29Z

Sure let me land it this week

…

On Wed, Mar 19, 2025 at 12:13 PM Jesse Cai ***@***.***> wrote: cc @nighting0le01 <https://github.com/nighting0le01> are you still planning on working on this? — Reply to this email directly, view it on GitHub <#1225 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ATO4RLPMNAL4LQ7JXTOYRKD2VG6WBAVCNFSM6AAAAABRHTVSCOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOMZXG42TIOJQGE> . You are receiving this because you were mentioned.Message ID: ***@***.***> [image: jcaip]*jcaip* left a comment (pytorch/ao#1225) <#1225 (comment)> cc @nighting0le01 <https://github.com/nighting0le01> are you still planning on working on this? — Reply to this email directly, view it on GitHub <#1225 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ATO4RLPMNAL4LQ7JXTOYRKD2VG6WBAVCNFSM6AAAAABRHTVSCOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOMZXG42TIOJQGE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 6, 2024

gau-nernst requested changes Nov 6, 2024

View reviewed changes

nighting0le01 requested a review from gau-nernst November 6, 2024 22:29

asahni-sc and others added 15 commits December 5, 2024 06:12

support exlusion of params when using low bit optim

bc528b0

simplify test case, remove nit

fac9cbc

support exlusion of params when using low bit optim

3e2cc41

rebase complete

da264a5

update test case to take into account param requirements and blocksize

12dd20f

support exlusion of params when using low bit optim

87a1e65

Fix for weights-only load (pytorch#1228)

9ec0edd

stack-info: PR: pytorch#1228, branch: drisspg/stack/19

support exlusion of params when using low bit optim

9372454

Fix for weights-only load (pytorch#1228)

ed83e26

stack-info: PR: pytorch#1228, branch: drisspg/stack/19

support exlusion of params when using low bit optim

5563fe1

rebase complete

98b8bed

support exlusion of params when using low bit optim

90a48a4

Fix for weights-only load (pytorch#1228)

11b13da

stack-info: PR: pytorch#1228, branch: drisspg/stack/19

fix nit

a0dc6a9

nighting0le01 force-pushed the asahni/low_bit_optim_layerwise branch from 8d7f968 to a0dc6a9 Compare December 5, 2024 14:28

nit

d0278ab

gau-nernst added the topic: new feature Use this tag if this PR adds a new feature label Dec 5, 2024

gau-nernst requested changes Dec 6, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

support exlusion of params when using low bit optim #1225

support exlusion of params when using low bit optim #1225

Uh oh!

nighting0le01 commented Nov 5, 2024

pytorch-bot bot commented Nov 5, 2024 •

edited

Loading

facebook-github-bot commented Nov 5, 2024

facebook-github-bot commented Nov 6, 2024

gau-nernst left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gau-nernst Nov 6, 2024

gau-nernst Dec 6, 2024

Uh oh!

nighting0le01 commented Nov 6, 2024

gau-nernst commented Nov 7, 2024

nighting0le01 commented Nov 7, 2024

gau-nernst commented Nov 7, 2024 •

edited

Loading

nighting0le01 commented Dec 5, 2024

nighting0le01 commented Dec 5, 2024

nighting0le01 commented Dec 5, 2024 •

edited

Loading

gau-nernst left a comment

gau-nernst Dec 6, 2024

gau-nernst Dec 6, 2024

gau-nernst Dec 6, 2024

gau-nernst Dec 6, 2024

jcaip commented Mar 19, 2025

nighting0le01 commented Mar 19, 2025 via email

Labels

6 participants

support exlusion of params when using low bit optim #1225

Are you sure you want to change the base?

support exlusion of params when using low bit optim #1225

Uh oh!

Conversation

nighting0le01 commented Nov 5, 2024

pytorch-bot bot commented Nov 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1225

❌ 2 New Failures

facebook-github-bot commented Nov 5, 2024

Action Required

Process

facebook-github-bot commented Nov 6, 2024

gau-nernst left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gau-nernst Nov 6, 2024

Choose a reason for hiding this comment

gau-nernst Dec 6, 2024

Choose a reason for hiding this comment

Uh oh!

nighting0le01 commented Nov 6, 2024

gau-nernst commented Nov 7, 2024

nighting0le01 commented Nov 7, 2024

gau-nernst commented Nov 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

nighting0le01 commented Dec 5, 2024

nighting0le01 commented Dec 5, 2024

nighting0le01 commented Dec 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

gau-nernst left a comment

Choose a reason for hiding this comment

gau-nernst Dec 6, 2024

Choose a reason for hiding this comment

gau-nernst Dec 6, 2024

Choose a reason for hiding this comment

gau-nernst Dec 6, 2024

Choose a reason for hiding this comment

gau-nernst Dec 6, 2024

Choose a reason for hiding this comment

jcaip commented Mar 19, 2025

nighting0le01 commented Mar 19, 2025 via email

Labels

6 participants

pytorch-bot bot commented Nov 5, 2024 •

edited

Loading

gau-nernst commented Nov 7, 2024 •

edited

Loading

nighting0le01 commented Dec 5, 2024 •

edited

Loading