Skip to content

Conversation

fffffgggg54
Copy link
Contributor

CvT as described in https://arxiv.org/abs/2103.15808

Swin-era heirarchical transformer. From-scratch reimplementation, cleaner than original that exposes most module cfgs as kwargs, uses sdpa/timm style (https://github.com/microsoft/CvT/tree/main). WIP/barebones test for now, stuck at successful weight remap but incorrect activations that seem to come, at least in part, from BatchNorm layers.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@fffffgggg54
Copy link
Contributor Author

Validation for cvt-13 lines up with paper (81.678 top-1 for me), acts are off from reference impl by minute amounts (MSE of logits for 1 sample off on the order of 1e-10). Initial problems had to do with norm before attn and attn residual when there is a cls_token. I'll finish this off later today most likely.

@Smartappli
Copy link

@fffffgggg54 is this still in progress ?

@fffffgggg54
Copy link
Contributor Author

@fffffgggg54 is this still in progress ?

Yes. I am a bit busy and traveling right now, but I am still working on getting the validation to line up with the paper. It has been a pain to try and find which part of the model deviates from the reference impl.

@fffffgggg54
Copy link
Contributor Author

fffffgggg54 commented Jun 28, 2024

Some updates @rwightman @Smartappli: I have added the reference implementation into a branch on my fork and compared validation performance. There are some slight numerical deviations and the top-1 is off by an insignificant amount. After digging in the reference repo's validation setup, I changed the validation configurations so that the crop settings match what the authors used. The throughput of my implementation sees a ~10% increase over the reference impl on win10/pt2.2.0/fp32. Fused attn was not available. There is a bit of cleanup still (head, torchscript?, stem), but this is the last technical hurdle I was hung up on.

@fffffgggg54 fffffgggg54 marked this pull request as ready for review June 28, 2024 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants