Skip to content

benchmarks/torchbench_model: some benchmarks fail to load and kill experiment_runner's main process #6207

@cota

Description

@cota

🐛 Bug

In dfcf306e7 Apply precision config env vars in the root process. (#6152)
we started running load_benchmark() from experiment_runner's
main process. Unfortunately, load_benchmark() for
some models does exit the calling process.
This results in experiment_runner exiting prematurely.

To Reproduce

Try to run under XLA any of the benchmarks added to the deny list in #6199. For example:

python xla/benchmarks/experiment_runner.py --dynamo=openxla --dynamo=openxla_eval --xla=PJRT --test=eval --test=train --accelerator=cuda --output-dirname=/tmp/pix2pix --repeat=5 --print-subprocess --suite-name=torchbench --filter='^pytorch_CycleGAN_and_pix2pix$' --log-level=debug ; echo $? 

Note: pytorch_CycleGAN_and_pix2pix also fails early under inductor.

Expected behavior

The above should print a 0 exit code regardless of whether the benchmark fails to run or not. However, it prints 2.

Environment

  • Reproducible on XLA backend [CPU/TPU]: GPU
  • torch_xla version: dfcf306 and later.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions