Starcoder2 model #28

jlamypoirier · 2024-01-10T22:50:11Z

The mistral model adapted to run Starcoder 2:

Missing for starcoder 1 (do we want to support it?):

Other notes:

Has less entries in modeling auto than gpt bigcode (3 instead of 6), probably doesn't matter
Using repeat for kv cache in flash attn, might not be necessary.

Still got a bunch of minor things to do (see todos)

jlamypoirier added 4 commits January 10, 2024 15:20

Copy model

81bcfbd

changes

4f2df8e

misc

5b88238

fixes

e0ec999

NouamaneTazi mentioned this pull request Feb 5, 2024

Make modeling compatible with Nanotron + few optims #23

Closed

5 tasks

RaymondLi0 and others added 3 commits February 19, 2024 14:13

add embed and residual dropout (#30)

4983a75

Merge branch 'hf_main' into starcoder2

65f9c26

misc

7fac7d8

Provide feedback