You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: bitsandbytes/optim/adamw.py
+6-12Lines changed: 6 additions & 12 deletions
Original file line number
Diff line number
Diff line change
@@ -26,7 +26,7 @@ def __init__(
26
26
Base AdamW optimizer.
27
27
28
28
Arguments:
29
-
params (`torch.tensor`):
29
+
params (`torch.Tensor`):
30
30
The input parameters to optimize.
31
31
lr (`float`, defaults to 1e-3):
32
32
The learning rate.
@@ -87,7 +87,7 @@ def __init__(
87
87
8-bit AdamW optimizer.
88
88
89
89
Arguments:
90
-
params (`torch.tensor`):
90
+
params (`torch.Tensor`):
91
91
The input parameters to optimize.
92
92
lr (`float`, defaults to 1e-3):
93
93
The learning rate.
@@ -159,7 +159,7 @@ def __init__(
159
159
32-bit AdamW optimizer.
160
160
161
161
Arguments:
162
-
params (`torch.tensor`):
162
+
params (`torch.Tensor`):
163
163
The input parameters to optimize.
164
164
lr (`float`, defaults to 1e-3):
165
165
The learning rate.
@@ -219,7 +219,7 @@ def __init__(
219
219
Paged AdamW optimizer.
220
220
221
221
Arguments:
222
-
params (`torch.tensor`):
222
+
params (`torch.Tensor`):
223
223
The input parameters to optimize.
224
224
lr (`float`, defaults to 1e-3):
225
225
The learning rate.
@@ -241,8 +241,6 @@ def __init__(
241
241
Adapts clipping threshold automatically by tracking the last 100 gradient norms and clipping the gradient at a certain percentile to improve stability.
242
242
block_wise (`bool`, defaults to `True`):
243
243
Whether to independently quantize each block of tensors to reduce outlier effects and improve stability.
244
-
is_paged (`bool`, defaults to `False`):
245
-
Whether the optimizer is a paged optimizer or not.
246
244
"""
247
245
super().__init__(
248
246
"adam",
@@ -279,7 +277,7 @@ def __init__(
279
277
Paged 8-bit AdamW optimizer.
280
278
281
279
Arguments:
282
-
params (`torch.tensor`):
280
+
params (`torch.Tensor`):
283
281
The input parameters to optimize.
284
282
lr (`float`, defaults to 1e-3):
285
283
The learning rate.
@@ -303,8 +301,6 @@ def __init__(
303
301
Adapts clipping threshold automatically by tracking the last 100 gradient norms and clipping the gradient at a certain percentile to improve stability.
304
302
block_wise (`bool`, defaults to `True`):
305
303
Whether to independently quantize each block of tensors to reduce outlier effects and improve stability.
306
-
is_paged (`bool`, defaults to `False`):
307
-
Whether the optimizer is a paged optimizer or not.
308
304
"""
309
305
# Validate unsupported parameters
310
306
ifamsgrad:
@@ -350,7 +346,7 @@ def __init__(
350
346
Paged 32-bit AdamW optimizer.
351
347
352
348
Arguments:
353
-
params (`torch.tensor`):
349
+
params (`torch.Tensor`):
354
350
The input parameters to optimize.
355
351
lr (`float`, defaults to 1e-3):
356
352
The learning rate.
@@ -372,8 +368,6 @@ def __init__(
372
368
Adapts clipping threshold automatically by tracking the last 100 gradient norms and clipping the gradient at a certain percentile to improve stability.
373
369
block_wise (`bool`, defaults to `True`):
374
370
Whether to independently quantize each block of tensors to reduce outlier effects and improve stability.
375
-
is_paged (`bool`, defaults to `False`):
376
-
Whether the optimizer is a paged optimizer or not.
0 commit comments