- Notifications
You must be signed in to change notification settings - Fork 2.4k
[FRONTEND] Complete rewrite of the runtime #644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7d4c3b4 to adc65de Compare 912d9cd to df4d57b Compare b8a249d to ee772da Compare | @ptillet Hi regarding lowering the cpu overhead, I wonder if this PR can handle those ops with dynamic shapes? And how? |
| I am not sure I understand the question. Triton kernels recompile int arguments when they are equal to 1 or a multiple of 16. If a frontend maps shapes to int arguments then things won't get recompiled everytime the shapes change |
| Do you mean that you actually cache the result for every seen shapes? If there is a new shape, triton still needs to do the compilation? My question is related to some language models. Some of kernels could have variable length of inputs. |
| This is not what I said. Triton maintains only three versions of each int arguments: any value, multiple of 16, and equal to 1. |
| what if they are not equal to 1 or a multiple of 16? |
| Then it's it's the third version, they're unannotated int32 arguments. |
This PR completely rewrites the runtime of Triton to be more lean and clearly separate the compilation step from the just-in-time caching logic. This should substantially reduce launch overhead.
This PR completely rewrites the runtime of Triton to be more lean and clearly separate the compilation step from the just-in-time cache logic. This should substantially remove launch overhead, and also pave the way for even lower overhead in the future as support for type annotations is added, and users start explicitly leveraging the <500ns C entry point that
triton.compilenow provides when specialization hints are known.