Skip to content

Commit 691f8ca

Browse files
msaroufimpytorchmergebot
authored andcommitted
faster build instructions CONTRIBUTING.md (pytorch#109900)
Discovered this as I was building pytorch on a fresh g5.4x instance on aws, building flash attnetion was bricking my machine ``` Building wheel torch-2.2.0a0+gitd0c8e82 -- Building version 2.2.0a0+gitd0c8e82 cmake --build . --target install --config Release [1/748] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim96_fp16_sm80.cu.o FAILED: caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim96_fp16_sm80.cu.o /opt/conda/envs/torchbench/bin/ccache /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -DAT_PER_OPERATOR_HEADERS -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DTORCH_CUDA_BUILD_MAIN_LIB -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_CUDA -DUSE_DISTRIBUTED -DUSE_EXPERIMENTAL_CUDNN_V8_API -DUSE_EXTERNAL_MZCRC -DUSE_FLASH_ATTENTION -DUSE_MEM_EFF_ATTENTION -DUSE_NCCL -DUSE_RPC -DUSE_TENSORPIPE -D_FILE_OFFSET_BITS=64 -Dtorch_cuda_EXPORTS -I/home/ubuntu/pytorch/build/aten/src -I/home/ubuntu/pytorch/aten/src -I/home/ubuntu/pytorch/build -I/home/ubuntu/pytorch -I/home/ubuntu/pytorch/cmake/../third_party/benchmark/include -I/home/ubuntu/pytorch/third_party/onnx -I/home/ubuntu/pytorch/build/third_party/onnx -I/home/ubuntu/pytorch/third_party/foxi -I/home/ubuntu/pytorch/build/third_party/foxi -I/home/ubuntu/pytorch/aten/src/THC -I/home/ubuntu/pytorch/aten/src/ATen/cuda -I/home/ubuntu/pytorch/aten/src/ATen/../../../third_party/cutlass/include -I/home/ubuntu/pytorch/build/caffe2/aten/src -I/home/ubuntu/pytorch/aten/src/ATen/.. -I/home/ubuntu/pytorch/build/nccl/include -I/home/ubuntu/pytorch/c10/cuda/../.. -I/home/ubuntu/pytorch/c10/.. -I/home/ubuntu/pytorch/third_party/tensorpipe -I/home/ubuntu/pytorch/build/third_party/tensorpipe -I/home/ubuntu/pytorch/third_party/tensorpipe/third_party/libnop/include -I/home/ubuntu/pytorch/torch/csrc/api -I/home/ubuntu/pytorch/torch/csrc/api/include -isystem /home/ubuntu/pytorch/build/third_party/gloo -isystem /home/ubuntu/pytorch/cmake/../third_party/gloo -isystem /home/ubuntu/pytorch/cmake/../third_party/tensorpipe/third_party/libuv/include -isystem /home/ubuntu/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /home/ubuntu/pytorch/cmake/../third_party/googletest/googletest/include -isystem /home/ubuntu/pytorch/third_party/protobuf/src -isystem /home/ubuntu/pytorch/third_party/gemmlowp -isystem /home/ubuntu/pytorch/third_party/neon2sse -isystem /home/ubuntu/pytorch/third_party/XNNPACK/include -isystem /home/ubuntu/pytorch/third_party/ittapi/include -isystem /home/ubuntu/pytorch/cmake/../third_party/eigen -isystem /usr/local/cuda/include -isystem /home/ubuntu/pytorch/third_party/ideep/mkl-dnn/include/oneapi/dnnl -isystem /home/ubuntu/pytorch/third_party/ideep/include -isystem /home/ubuntu/pytorch/cmake/../third_party/cudnn_frontend/include -D_GLIBCXX_USE_CXX11_ABI=1 -Xfatbin -compress-all -DONNX_NAMESPACE=onnx_torch -gencode arch=compute_86,code=sm_86 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -Wno-deprecated-gpu-targets --expt-extended-lambda -DCUB_WRAPPED_NAMESPACE=at_cuda_detail -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -O3 -DNDEBUG -std=c++17 -Xcompiler=-fPIC -DTORCH_USE_LIBUV -DCAFFE2_USE_GLOO -Xcompiler=-Wall,-Wextra,-Wno-unused-parameter,-Wno-unused-function,-Wno-unused-result,-Wno-missing-field-initializers,-Wno-unknown-pragmas,-Wno-type-limits,-Wno-array-bounds,-Wno-unknown-pragmas,-Wno-strict-overflow,-Wno-strict-aliasing,-Wno-missing-braces,-Wno-maybe-uninitialized -MD -MT caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim96_fp16_sm80.cu.o -MF caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim96_fp16_sm80.cu.o.d -x cu -c /home/ubuntu/pytorch/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim96_fp16_sm80.cu -o caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/transformers/cuda/flash_attn/kernels/flash_bwd_hdim96_fp16_sm80.cu.o Killed ``` Pull Request resolved: pytorch#109900 Approved by: https://github.com/drisspg
1 parent 8ed08e5 commit 691f8ca

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

CONTRIBUTING.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -669,7 +669,7 @@ only interested in a specific component.
669669
- Don't need Caffe2? Pass `BUILD_CAFFE2=0` to disable Caffe2 build.
670670

671671
On the initial build, you can also speed things up with the environment
672-
variables `DEBUG`, `USE_DISTRIBUTED`, `USE_MKLDNN`, `USE_CUDA`, `BUILD_TEST`, `USE_FBGEMM`, `USE_NNPACK` and `USE_QNNPACK`.
672+
variables `DEBUG`, `USE_DISTRIBUTED`, `USE_MKLDNN`, `USE_CUDA`, `USE_FLASH_ATTENTION`, `USE_MEM_EFF_ATTENTION`, `BUILD_TEST`, `USE_FBGEMM`, `USE_NNPACK` and `USE_QNNPACK`.
673673

674674
- `DEBUG=1` will enable debug builds (-g -O0)
675675
- `REL_WITH_DEB_INFO=1` will enable debug symbols with optimizations (-g -O3)
@@ -681,6 +681,7 @@ variables `DEBUG`, `USE_DISTRIBUTED`, `USE_MKLDNN`, `USE_CUDA`, `BUILD_TEST`, `U
681681
- `USE_NNPACK=0` will disable compiling with NNPACK.
682682
- `USE_QNNPACK=0` will disable QNNPACK build (quantized 8-bit operators).
683683
- `USE_XNNPACK=0` will disable compiling with XNNPACK.
684+
- `USE_FLASH_ATTENTION=0` and `USE_MEM_EFF_ATTENTION=0` will disable compiling flash attention and memory efficient kernels respectively
684685

685686
For example:
686687

@@ -712,6 +713,8 @@ with `pip install ninja`. If PyTorch was already built, you will need
712713
to run `python setup.py clean` once after installing ninja for builds to
713714
succeed.
714715

716+
Note: Make sure to use a machine with a larger number of CPU cores, this will significantly reduce your build times.
717+
715718
#### Use CCache
716719

717720
Even when dependencies are tracked with file modification, there are many

0 commit comments

Comments
 (0)