Skip to content

Conversation

@dmahan93
Copy link
Contributor

@dmahan93 dmahan93 commented Sep 4, 2024

  • Only tested with BF16 Zero2 (on a separate repo, so beware copy/paste errors)
  • Made an assumption that SP should have preference over TP to be on the same node in the topology, but happy to revisit that
  • Only supports the zigzag convention of ring attention
  • The attention mask stuff is v important, it lets you do 128k context on 1 node even without ring attention
@dmahan93
Copy link
Contributor Author

dmahan93 commented Sep 4, 2024

@Quentin-Anthony probably merge conflicts and other stuff, it's v untested, but it sounded like it would be better to have something than wait around so it's a bit rushed so forgive any annoying issues 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant