Skip to content

Conversation

@zpcore
Copy link
Collaborator

@zpcore zpcore commented Feb 25, 2025

Port scan and hostoffloading for llama model based on @tengyifei 's prototype in 1 and 2.

The sharding schema in torchprime/torch_xla_models/configs/model/scaling/llama-fsdp.yaml also plays well with the scan code.

Currently there are NaN issue when we use scan with flash attention kernel related to pytorch/xla#8734. Need to resolve the issue before producing the correct output.

attention_bias: false
flash_attention: true
rope_theta: 500000.0
scan_decoder_layers: true
Copy link
Collaborator Author

@zpcore zpcore Feb 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to default yaml file

@zpcore
Copy link
Collaborator Author

zpcore commented Mar 14, 2025

@tengyifei has been working actively on #148 etc. to formally bring up Scan and Host offloading features. Close this one for now.

@zpcore zpcore closed this Mar 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants