Skip to content

Conversation

@rpsilva-aws
Copy link
Collaborator

@rpsilva-aws rpsilva-aws commented Apr 16, 2025

We currently rely on the user to enable GetAliasWithBufferDonorConfig, so that the explicitly marked tensors for donation can be included in the buffer donor indices. However, it is unnecessary to have this constraint during step barrier, where the graph is expected to break, requiring the donation to happen (force_ltc_data) for all live tensors. Instead, we choose to default explicit donation for the step barrier (mark step), and only make it optional, depending on the buffer donor config, for other graph synchronizations (e.g. _xla_sync_live_tensors or _xla_warm_up_cache), where it requires an opt-in by the user to donate at that point (knowingly that the tensors won't be accessible anymore).

Originally, the explicit donation config was mostly to decide whether to run LTC or explicit donations. Now that these can be used simultaneously, there is no added benefit in requiring the option, at least for the step barrier.

@rpsilva-aws rpsilva-aws force-pushed the rpsilva_default_donors branch from 5b75598 to c092a6f Compare April 16, 2025 01:27
@rpsilva-aws rpsilva-aws force-pushed the rpsilva_default_donors branch from c092a6f to 6535efe Compare April 16, 2025 06:58
@rpsilva-aws rpsilva-aws requested a review from tengyifei April 17, 2025 01:19
@rpsilva-aws rpsilva-aws marked this pull request as ready for review April 17, 2025 01:20
@rpsilva-aws
Copy link
Collaborator Author

Hey @tengyifei, do you see concerns with this? I also aligned with @mcuiaws, who had done the some of the refactorings on the code path.

@qihqi qihqi merged commit e2f2852 into pytorch:master Apr 18, 2025
44 of 46 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants