Add the GeLU activation from pytorch with the tanh approximation #21345

jlamypoirier · 2023-01-27T23:00:12Z

Fixes #21344. See that issue for more details.

HuggingFaceDocBuilderDev · 2023-01-27T23:19:01Z

The documentation is not available anymore as the PR was closed or merged.

sgugger · 2023-01-30T14:59:54Z

Thanks for working on this! Does the new implementation in Pytorch produce the exact same results as gelu_fast? If that is the case, I would prefer we just replace the current gelu_fast with this when PyTorch is 1.12 or above.

jlamypoirier · 2023-01-30T21:40:26Z

Thanks for working on this! Does the new implementation in Pytorch produce the exact same results as gelu_fast? If that is the case, I would prefer we just replace the current gelu_fast with this when PyTorch is 1.12 or above.

The results are similar but there are still rounding errors, see my analysis in the related issue #21344. I would also be in favor of replacing the existing implementation / using it as default, but I would introduce small numerical differences in some models, is that a problem?

sgugger · 2023-01-30T21:53:06Z

Ah yes, the difference is quite significant sadly, so this will probably introduce a difference that is too big :-/
So let's go with a new activation. Maybe gelu_pytorch is a better name?

jlamypoirier · 2023-01-30T23:01:11Z

Ah yes, the difference is quite significant sadly, so this will probably introduce a difference that is too big :-/ So let's go with a new activation. Maybe gelu_pytorch is a better name?

Wouldn't it cause confusion with the default pytorch implementation? That one is currently named "gelu". (And the one named "gelu_python").

Also should I add an explicit pytorch version check?

sgugger · 2023-01-31T14:25:01Z

Ok for the name then. For the version check, you will need to create a function that returns the instance of GELU and issues an import error if the PyTorch version is too low, then put that function in the mappinh.

jlamypoirier · 2023-01-31T21:20:48Z

Ok for the name then. For the version check, you will need to create a function that returns the instance of GELU and issues an import error if the PyTorch version is too low, then put that function in the mappinh.

Made a class to match the other activations, and raising a NotImplementedError (I don't think an ImportError is the best here since the function exists in earlier versions.) Also added to test_get_activation.

sgugger

Thanks for iterating! I just have one last comment on the error raised.

sgugger · 2023-02-01T14:29:56Z

src/transformers/activations.py

+ def __init__(self):
+ super().__init__()
+ if version.parse(torch.__version__) < version.parse("1.12.0"):
+ raise NotImplementedError(


Suggested change

raise NotImplementedError(

raise ImportError(

All fixed, this should be ready to merge once the tests pass.

sgugger · 2023-02-02T14:32:58Z

Failure is unrelated so merging. Thanks again for your contribution!

gelu_python_tanh

d365491

jlamypoirier mentioned this pull request Jan 27, 2023

Add the pytorch implementation of the OpenAI GeLU approximation #21344

Closed

rename

eace7bc

jlamypoirier added 2 commits January 31, 2023 16:13

Version check, add test

98f1a30

Merge branch 'main' into gelu_new_python

a78482a

jlamypoirier marked this pull request as ready for review January 31, 2023 21:20

sgugger approved these changes Feb 1, 2023

View reviewed changes

jlamypoirier added 2 commits February 1, 2023 22:57

Pr comment

171b7b1

Merge branch 'main' into gelu_new_python

51ab70d

sgugger merged commit e006ab5 into huggingface:main Feb 2, 2023

jlamypoirier deleted the gelu_new_python branch February 2, 2023 23:37

jlamypoirier mentioned this pull request Apr 4, 2023

Add GPTBigCode model (Optimized GPT2 with MQA from Santacoder & BigCode) #22575

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add the GeLU activation from pytorch with the tanh approximation #21345

Add the GeLU activation from pytorch with the tanh approximation #21345

jlamypoirier commented Jan 27, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jan 27, 2023 •

edited

Loading

sgugger commented Jan 30, 2023

jlamypoirier commented Jan 30, 2023

sgugger commented Jan 30, 2023

jlamypoirier commented Jan 30, 2023

sgugger commented Jan 31, 2023

jlamypoirier commented Jan 31, 2023

sgugger left a comment

sgugger Feb 1, 2023

jlamypoirier Feb 2, 2023

sgugger commented Feb 2, 2023

Labels

3 participants

Add the GeLU activation from pytorch with the tanh approximation #21345

Add the GeLU activation from pytorch with the tanh approximation #21345

Conversation

jlamypoirier commented Jan 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

HuggingFaceDocBuilderDev commented Jan 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

sgugger commented Jan 30, 2023

jlamypoirier commented Jan 30, 2023

sgugger commented Jan 30, 2023

jlamypoirier commented Jan 30, 2023

sgugger commented Jan 31, 2023

jlamypoirier commented Jan 31, 2023

sgugger left a comment

Choose a reason for hiding this comment

sgugger Feb 1, 2023

Choose a reason for hiding this comment

jlamypoirier Feb 2, 2023

Choose a reason for hiding this comment

sgugger commented Feb 2, 2023

Labels

3 participants

jlamypoirier commented Jan 27, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jan 27, 2023 •

edited

Loading