Hello, great work. I encountered a problem in the core code:
File "/tmp/pycharm_project_858/m_llava/model/multimodal_projector/builder.py", line 112, in forward key = self.ln_k_1(self.k_proj_1(x_multi)).permute(1, 0, 2)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (576x1024 and 4096x1024)
I am using clip-vit-large-patch14-336, which shows that the shape of the encoded tensor should be (bs, 576, 1024). It seems that it does not match the 4096 above? Why is this?