Skip to content

Conversation

@ptillet
Copy link
Collaborator

@ptillet ptillet commented Sep 11, 2022

This PR completely rewrites the runtime of Triton to be more lean and clearly separate the compilation step from the just-in-time cache logic. This should substantially remove launch overhead, and also pave the way for even lower overhead in the future as support for type annotations is added, and users start explicitly leveraging the <500ns C entry point that triton.compile now provides when specialization hints are known.

@ptillet ptillet merged commit 4a77dfb into master Sep 18, 2022
@ptillet ptillet deleted the phil/new-runtime branch September 18, 2022 15:51
@Young768
Copy link

@ptillet Hi regarding lowering the cpu overhead, I wonder if this PR can handle those ops with dynamic shapes? And how?

@ptillet
Copy link
Collaborator Author

ptillet commented Oct 12, 2022

I am not sure I understand the question. Triton kernels recompile int arguments when they are equal to 1 or a multiple of 16. If a frontend maps shapes to int arguments then things won't get recompiled everytime the shapes change

@Young768
Copy link

Do you mean that you actually cache the result for every seen shapes? If there is a new shape, triton still needs to do the compilation? My question is related to some language models. Some of kernels could have variable length of inputs.

@ptillet
Copy link
Collaborator Author

ptillet commented Oct 12, 2022

This is not what I said. Triton maintains only three versions of each int arguments: any value, multiple of 16, and equal to 1.

@Young768
Copy link

what if they are not equal to 1 or a multiple of 16?

@ptillet
Copy link
Collaborator Author

ptillet commented Oct 12, 2022

Then it's it's the third version, they're unannotated int32 arguments.

pingzhuu pushed a commit to siliconflow/triton that referenced this pull request Apr 2, 2024
This PR completely rewrites the runtime of Triton to be more lean and clearly separate the compilation step from the just-in-time caching logic. This should substantially reduce launch overhead.
ZzEeKkAa pushed a commit to ZzEeKkAa/triton that referenced this pull request Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants