- Notifications
You must be signed in to change notification settings - Fork 13.8k
Description
We have this code in our https://github.com/zama-ai/tfhe-rs project on commit f1c21888a762ddf9de017ae52dc120c141ec9c02 from tfhe/docs/how_to/compress.md line 44 and beyond:
use tfhe::prelude::*; use tfhe::{ generate_keys, set_server_key, ClientKey, CompressedServerKey, ConfigBuilder, FheUint8, }; fn main() { let config = ConfigBuilder::all_disabled() .enable_default_integers() .build(); let cks = ClientKey::generate(config); let compressed_sks = CompressedServerKey::new(&cks); println!( "compressed size : {}", bincode::serialize(&compressed_sks).unwrap().len() ); let sks = compressed_sks.decompress(); println!( "decompressed size: {}", bincode::serialize(&sks).unwrap().len() ); set_server_key(sks); let clear_a = 12u8; let a = FheUint8::try_encrypt(clear_a, &cks).unwrap(); let c = a + 234u8; let decrypted: u8 = c.decrypt(&cks); assert_eq!(decrypted, clear_a.wrapping_add(234)); }
I expected to see this happen: running the doctest with the following command should work (note that we modify the release profile to have lto = "fat" enabled):
RUSTFLAGS="-C target-cpu=native" cargo +nightly-2023-10-17 test --profile release --doc --features=aarch64-unix,boolean,shortint,integer,internal-keycache -p tfhe -- test_user_docs::how_to_compress
Instead, this happened: the program crashes, compiling the same code in a separate example and the same cargo configuration results in an executable that works. Turning LTO off also makes a doctest that compiles properly, indicating LTO is at fault or part of the problem when combined with doctests.
It has been happening randomly for doctests on a lot of Rust versions but we could not identify what the issue was, looks like enabling LTO creates a miscompile where a value that is provably 0 (as it's never modified by the code) is asserted to be != 0 and crashes the program, sometimes different things error out, it looks like the program is reading at the wrong location on the stack. The value being asserted != 0 is in https://github.com/zama-ai/tfhe-rs/blob/f1c21888a762ddf9de017ae52dc120c141ec9c02/tfhe/src/core_crypto/algorithms/ggsw_encryption.rs#L551
Unfortunately we are not able to minify this issue at the moment as it's not happening reliably across doctests.
Meta
rustc --version --verbose
:
rustc 1.75.0-nightly (49691b1f7 2023-10-16) binary: rustc commit-hash: 49691b1f70d71dd7b8349c332b7f277ee527bf08 commit-date: 2023-10-16 host: aarch64-apple-darwin release: 1.75.0-nightly LLVM version: 17.0.2
Unfortunately on nightly (used to recover the doctest binaries via RUSTDOCFLAGS="-Z unstable-options --persist-doctests doctestbins") only exhibits the crash for the parallel version of an encryption algorithm used with rayon (on current stable we can also get the crash with a serial algorithm but we don't seem to be able to recover the doctest binary).
doctest_miscompile.zip
The archive contains the objdump --disassemble
for the code compiled as an example (running fine) and the code compiled as a doctest exhibiting the miscompilation, if needed I can provide the binaries, but I would understand if nobody would want to run a binary coming from a bug report.
objdump --version Apple LLVM version 14.0.3 (clang-1403.0.22.14.1) Optimized build. Default target: arm64-apple-darwin22.5.0 Host CPU: apple-m1l Registered Targets: aarch64 - AArch64 (little endian) aarch64_32 - AArch64 (little endian ILP32) aarch64_be - AArch64 (big endian) arm - ARM arm64 - ARM64 (little endian) arm64_32 - ARM64 (little endian ILP32) armeb - ARM (big endian) thumb - Thumb thumbeb - Thumb (big endian) x86 - 32-bit X86: Pentium-Pro and above x86-64 - 64-bit X86: EM64T and AMD64
Here is a snippet of a backtrace with two threads erroring on two different issues (while there is no problem having the same code compiled as an example).
Backtrace
stack backtrace: 0: 0x102712f6c - thread '<<unnamed>std' panicked at tfhe/src/core_crypto/algorithms/ggsw_encryption.rs:551:::5sys_common: ::assertion failed: ciphertext_modulus.is_compatible_with_native_modulus()backtrace ::_print::DisplayBacktrace as core::fmt::Display>::fmt::h06ea57ce7b13512d 1: 0x10268b4f8 - core::fmt::write::h4d15d254ca20c331 2: 0x1026c6a68 - std::io::Write::write_fmt::hfdc8b2852a9a03fa 3: 0x102715ea0 - std::sys_common::backtrace::print::h139bbaa51f48014c 4: 0x102715a08 - std::panicking::default_hook::{{closure}}::hbbb7d85a61092397 5: 0x1027157cc - std::panicking::default_hook::hb0db088803baef11 6: 0x102717234 - std::panicking::rust_panic_with_hook::h78dc274574606137 7: 0x102716da8 - std::panicking::begin_panic_handler::{{closure}}::h2905be29dbe9281c 8: 0x102716c88 - std::sys_common::backtrace::__rust_end_short_backtrace::h2a15f4fd2d64df91 9: 0x102716c7c - _rust_begin_unwind 10: 0x1027fe624 - core::panicking::panic_fmt::hd8e61ff6f38230f9 11: 0x1027fe7b0 - core::panicking::panic::h4a945e52b5fb1050 12: 0x1027990bc - tfhe::core_crypto::algorithms::glwe_encryption::encrypt_seeded_glwe_ciphertext_assign_with_existing_generator::hb32b93df2aa13c6e 13: 0x1027d8d44 - <rayon::iter::for_each::ForEachConsumer<F> as rayon::iter::plumbing::Folder<T>>::consume_iter::h6b9d6bce496a26b2 14: 0x10277099c - rayon::iter::plumbing::Producer::fold_with::h3252c105ae5580f0 15: 0x10278c92c - rayon::iter::plumbing::bridge_producer_consumer::helper::h516df06807eeed76 16: 0x10271ff70 - rayon_core::join::join_context::{{closure}}::h7ecf44f403b2e94c 17: 0x102729d00 - rayon_core::registry::in_worker::hb2d005d9f62ec9b8 18: 0x10278c918 - rayon::iter::plumbing::bridge_producer_consumer::helper::h516df06807eeed76 19: 0x102792d0c - <<rayon::iter::map::Map<I,F> as rayon::iter::IndexedParallelIterator>::with_producer::Callback<CB,F> as rayon::iter::plumbing::ProducerCallback<T>>::callback::h282ea6fb42ca6c2b 20: 0x10276aaa0 - <<rayon::iter::zip::Zip<A,B> as rayon::iter::IndexedParallelIterator>::with_producer::CallbackB<CB,A> as rayon::iter::plumbing::ProducerCallback<ITEM>>::callback::h6c6ab19b4791d17e 21: 0x1027dcc88 - <<rayon::iter::enumerate::Enumerate<I> as rayon::iter::IndexedParallelIterator>::with_producer::Callback<CB> as rayon::iter::plumbing::ProducerCallback<I>>::callback::h62504345ff3d393a 22: 0x10278f38c - rayon::iter::plumbing::bridge::h142cac5b932df279 23: 0x1027de84c - rayon::iter::plumbing::Producer::fold_with::hda6c429fb67861a6 24: 0x10278b204 - rayon::iter::plumbing::bridge_producer_consumer::helper::ha97da0be53d3520b 25: 0x1027930fc - <<rayon::iter::map::Map<I,F> as rayon::iter::IndexedParallelIterator>::with_producer::Callback<CB,F> as rayon::iter::plumbing::ProducerCallback<T>>::callback::h5caece096ea77aa2 26: 0x102768cdc - <<rayon::iter::zip::Zip<A,B> as rayon::iter::IndexedParallelIterator>::with_producer::CallbackA<CB,B> as rayon::iter::plumbing::ProducerCallback<ITEM>>::callback::h9c59859a5ada9da8 27: 0x102790548 - rayon::iter::plumbing::bridge::h691ef483cd06a966 28: 0x1027d896c - tfhe::core_crypto::algorithms::ggsw_encryption::par_encrypt_constant_seeded_ggsw_ciphertext_with_existing_generator::h1092854bcdddc1c5 29: 0x1027d8540 - <rayon::iter::for_each::ForEachConsumer<F> as rayon::iter::plumbing::Folder<T>>::consume_iter::h58460779da245a1d 30: 0x102771604 - rayon::iter::plumbing::Producer::fold_with::h5c2dab692eefc651 31: 0x10278a424 - rayon::iter::plumbing::bridge_producer_consumer::helper::hd7e30ce6b8c8fdf8 32: 0x102759bec - <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute::he14a52c10f982320 33: 0x1027ff980 - rayon_core::registry::WorkerThread::wait_until_cold::hadf889fe03869109 34: 0x10271ec34 - rayon_core::join::join_context::{{closure}}::h6ff07f0ad22d988f 35: 0x1027292dc - rayon_core::registry::in_worker::h72ac659d0872c7bc 36: 0x10278a410 - rayon::iter::plumbing::bridge_producer_consumer::helper::hd7e30ce6b8c8fdf8 37: 0x102759bec - <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute::he14a52c10f982320 38: 0x1027ff980 - rayon_core::registry::WorkerThread::wait_until_cold::hadf889fe03869109 39: 0x10280004c - rayon_core::join::join_recover_from_panic::hac430d1fb14e684b 40: 0x10271eb10 - rayon_core::join::join_context::{{closure}}::h6ff07f0ad22d988f 41: 0x1027292dc - rayon_core::registry::in_worker::h72ac659d0872c7bc 42: 0x10278a410 - rayon::iter::plumbing::bridge_producer_consumer::helper::hd7e30ce6b8c8fdf8 43: 0x10271eac8 - rayon_core::join::join_context::{{closure}}::h6ff07f0ad22d988f 44: 0x1027292dc - rayon_core::registry::in_worker::h72ac659d0872c7bc 45: 0x10278a410 - rayon::iter::plumbing::bridge_producer_consumer::helper::hd7e30ce6b8c8fdf8 46: 0x1027306d4 - rayon_core::join::join_context::{{closure}}::h6ff07f0ad22d988f 47: 0x102750400 - <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute::h5752c5eaefb098bd 48: 0x1027ff980 - rayon_core::registry::WorkerThread::wait_until_cold::hadf889fe03869109 49: 0x1026a9300 - rayon_core::registry::ThreadBuilder::run::h03f0186f2f91b865 50: 0x1026b1ee4 - std::sys_common::backtrace::__rust_begin_short_backtrace::hf857650a9dcd5e44 51: 0x1026ac8c8 - core::ops::function::FnOnce::call_once{{vtable.shim}}::heab0ff5ef27f89d0 52: 0x1027183c4 - std::sys::unix::thread::Thread::new::thread_start::h2ab8753089ede7d0 53: 0x19832bfa8 - __pthread_joiner_wake stack backtrace: 0: 0x102712f6c - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h06ea57ce7b13512d 1: 0x10268b4f8 - core::fmt::write::h4d15d254ca20c331 2: 0x1026c6a68 - std::io::Write::write_fmt::hfdc8b2852a9a03fa 3: 0x102715ea0 - std::sys_common::backtrace::print::h139bbaa51f48014c 4: 0x102715a08 - std::panicking::default_hook::{{closure}}::hbbb7d85a61092397 5: 0x1027157cc - std::panicking::default_hook::hb0db088803baef11 6: 0x102717234 - std::panicking::rust_panic_with_hook::h78dc274574606137 7: 0x102716da8 - std::panicking::begin_panic_handler::{{closure}}::h2905be29dbe9281c 8: 0x102716c88 - std::sys_common::backtrace::__rust_end_short_backtrace::h2a15f4fd2d64df91 9: 0x102716c7c - _rust_begin_unwind 10: thread ' <unnamed> ' panicked at /rustc/49691b1f70d71dd7b8349c332b7f277ee527bf08/library/core/src/num/mod.rs : 1166 :0x51027fe624: - attempt to calculate the remainder with a divisor of zerocore ::panicking::panic_fmt::hd8e61ff6f38230f9 11: 0x1027fe7b0 - core::panicking::panic::h4a945e52b5fb1050 12: 0x1027990bc - tfhe::core_crypto::algorithms::glwe_encryption::encrypt_seeded_glwe_ciphertext_assign_with_existing_generator::hb32b93df2aa13c6e 13: 0x1027d8d44 - <rayon::iter::for_each::ForEachConsumer<F> as rayon::iter::plumbing::Folder<T>>::consume_iter::h6b9d6bce496a26b2 14: 0x10277099c - rayon::iter::plumbing::Producer::fold_with::h3252c105ae5580f0 15: 0x10278c92c - rayon::iter::plumbing::bridge_producer_consumer::helper::h516df06807eeed76 16: 0x102756c50 - <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute::hb4b2cce923b187bc 17: 0x1027ff980 - rayon_core::registry::WorkerThread::wait_until_cold::hadf889fe03869109 18: 0x10280004c - rayon_core::join::join_recover_from_panic::hac430d1fb14e684b 19: 0x10271eb10 - rayon_core::join::join_context::{{closure}}::h6ff07f0ad22d988f 20: 0x1027292dc - rayon_core::registry::in_worker::h72ac659d0872c7bc 21: 0x10278a410 - rayon::iter::plumbing::bridge_producer_consumer::helper::hd7e30ce6b8c8fdf8 22: 0x102759bec - <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute::he14a52c10f982320 23: 0x1027ff980 - rayon_core::registry::WorkerThread::wait_until_cold::hadf889fe03869109 24: 0x10280004c - rayon_core::join::join_recover_from_panic::hac430d1fb14e684b 25: 0x10271eb10 - rayon_core::join::join_context::{{closure}}::h6ff07f0ad22d988f 26: 0x1027292dc - rayon_core::registry::in_worker::h72ac659d0872c7bc 27: 0x10278a410 - rayon::iter::plumbing::bridge_producer_consumer::helper::hd7e30ce6b8c8fdf8 28: 0x102759bec - <rayon_core::job::StackJob<L,F,R> as rayon_core::job::Job>::execute::he14a52c10f982320 29: 0x1027ff980 - rayon_core::registry::WorkerThread::wait_until_cold::hadf889fe03869109 30: 0x1026a9300 - rayon_core::registry::ThreadBuilder::run::h03f0186f2f91b865 31: 0x1026b1ee4 - std::sys_common::backtrace::__rust_begin_short_backtrace::hf857650a9dcd5e44 32: 0x1026ac8c8 - core::ops::function::FnOnce::call_once{{vtable.shim}}::heab0ff5ef27f89d0 33: 0x1027183c4 - std::sys::unix::thread::Thread::new::thread_start::h2ab8753089ede7d0 34: 0x19832bfa8 - __pthread_joiner_wake
We have also seen some flaky doctests on x86_64 and could not narrow down the issue, we have turned off LTO for all of our doctests for now and we will monitor how things evolve, the reason for the suspicion of an issue on x86 as well is that M1 builds have been running with LTO off for months and have never exhibited the flaky doctest we saw on x86_64, though given the compiled code in that case is significantly different (intrinsics usage being one factor) we can't yet be sure a similar issue is happening on x86_64.
Cheers