Skip to content

Conversation

tgross35
Copy link
Contributor

@tgross35 tgross35 commented May 10, 2024

Division is not yet working, but all others were straigtforward to add so I split them out of #587.

This includes some bigger changes that are split per commit:

  • Split Int into Int and MinInt so we can use traits with bigint types
  • Add 256-bit bigint types for widening operations on 128-bit integers
  • Refactor test macros so that systems without implementations test against rustc_apfloat rather than just being skipped. I needed this so I can actually debug.
  • Add implementations and tests for the new operations
  • Change powerpc symbol names to match what LLVM emits, from https://gcc.gnu.org/wiki/Ieee128PowerPC
@tgross35 tgross35 force-pushed the f16-f128-intrinsics-min branch 7 times, most recently from 2e9bb45 to 7871c81 Compare May 11, 2024 05:43
@tgross35
Copy link
Contributor Author

tgross35 commented May 11, 2024

Powerpc seems to be hitting a SIGILL on stxvd2x vs34,0,r9. From https://bugzilla.redhat.com/show_bug.cgi?id=1045384 and https://bugzilla.redhat.com/show_bug.cgi?id=1002077 it seems like this may be a limitation of qemu, but information is scarce. I may just need to use the apfloat fallback.

Also looks like symbol names may be different, I need to double check https://gcc.gnu.org/wiki/Ieee128PowerPC

@tgross35
Copy link
Contributor Author

tgross35 commented May 11, 2024

Ok, some more info. Compiling C for 32-bit powerpc seems to always emit __gcc_qadd regardless of -mlong-double flag. When forced to use __addkf3 it segfaults as expected because of the stxvd2x instruction. I think we are okay just ignoring this one as a likely limitation of qemu.

Compiling C for powerpc64 seems to use an ifunc resolver for __addkf3, which goes to __addkf3_hw for the single-instruction xsaddqp. Compiling Rust seems to do the exact same thing but I am getting different results, __addtf3(0x00000000000000000000000000000000, 0x00000000000000000000000000000001): std: 0x00000000000000000000000000000000, builtins: 0x00000000000000000000000000000001. I get the same results when calling __addkf3 directly in C edit: no I don't. Maybe the operation format of xsaddqp needs to be set somehow to choose between ppc doubledouble and ieee f128?

@tgross35
Copy link
Contributor Author

tgross35 commented May 12, 2024

Some asm for anyone who understands it better:

Assembly generated from Rust (incorrect result)
0000000000010d00 <.add_entry>:  10d00: 7c 08 02 a6 mflr r0  10d04: f8 21 ff 91 stdu r1,-112(r1)  10d08: f8 01 00 80 std r0,128(r1)  10d0c: 4b ff c6 95 bl d3a0 <00000143.plt_call.__addkf3>  10d10: e8 41 00 28 ld r2,40(r1)  10d14: 38 21 00 70 addi r1,r1,112  10d18: e8 01 00 10 ld r0,16(r1)  10d1c: 7c 08 03 a6 mtlr r0  10d20: 4e 80 00 20 blr 000000000000d3a0 <00000143.plt_call.__addkf3>:  d3a0: f8 41 00 28 std r2,40(r1)  d3a4: 3d 62 ff ff addis r11,r2,-1  d3a8: e9 8b 7f 58 ld r12,32600(r11)  d3ac: 7d 89 03 a6 mtctr r12  d3b0: e8 4b 7f 60 ld r2,32608(r11)  d3b4: 4e 80 04 20 bctr 0000000000056790 <.__addkf3_resolve>:  56790: 81 2d 8f 9c lwz r9,-28772(r13)  56794: 75 29 00 40 andis. r9,r9,64  56798: 41 82 00 18 beq 567b0 <.__addkf3_resolve+0x20>  5679c: e8 62 80 10 ld r3,-32752(r2)  567a0: 4e 80 00 20 blr  567a4: 60 00 00 00 nop  567a8: 60 00 00 00 nop  567ac: 60 42 00 00 ori r2,r2,0  567b0: e8 62 80 18 ld r3,-32744(r2)  567b4: 4e 80 00 20 blr  ...  567c4: 60 00 00 00 nop  567c8: 60 00 00 00 nop  567cc: 60 42 00 00 ori r2,r2,0 000000000005e6e0 <.__addkf3_hw>:  5e6e0: fc 42 18 08 xsaddqp v2,v2,v3  5e6e4: 4e 80 00 20 blr  ...  5e6f4: 60 00 00 00 nop  5e6f8: 60 00 00 00 nop  5e6fc: 60 00 00 00 nop 0000000000057050 <.__addkf3_sw>:  57050: fb 41 ff d0 std r26,-48(r1)  57054: fb 61 ff d8 std r27,-40(r1)  57058: fb 81 ff e0 std r28,-32(r1)  5705c: fb a1 ff e8 std r29,-24(r1)  57060: fb c1 ff f0 std r30,-16(r1)  57064: fb e1 ff f8 std r31,-8(r1)  57068: f8 21 ff 41 stdu r1,-192(r1)  5706c: 39 21 00 70 addi r9,r1,112  57070: 39 41 00 70 addi r10,r1,112  // ... full sw implementation
Assembly generated from C (correct result)
0000000010000af8 <.add_entry>:  10000af8: 7c 08 02 a6 mflr r0  10000afc: f8 01 00 10 std r0,16(r1)  10000b00: fb e1 ff f8 std r31,-8(r1)  10000b04: f8 21 ff 81 stdu r1,-128(r1)  10000b08: 7c 3f 0b 78 mr r31,r1  10000b0c: 39 20 00 30 li r9,48  10000b10: 39 5f 00 80 addi r10,r31,128  10000b14: 7c 4a 4f 99 stxvd2x vs34,r10,r9  10000b18: 39 20 00 40 li r9,64  10000b1c: 39 5f 00 80 addi r10,r31,128  10000b20: 7c 6a 4f 99 stxvd2x vs35,r10,r9  10000b24: 39 20 00 40 li r9,64  10000b28: 39 5f 00 80 addi r10,r31,128  10000b2c: 7c 6a 4e 99 lxvd2x vs35,r10,r9  10000b30: 39 20 00 30 li r9,48  10000b34: 39 5f 00 80 addi r10,r31,128  10000b38: 7c 4a 4e 99 lxvd2x vs34,r10,r9  10000b3c: 4b ff fb a5 bl 100006e0 <00000019.plt_call.__addkf3>  10000b40: e8 41 00 28 ld r2,40(r1)  10000b44: f0 02 14 96 xxmr vs0,vs34  10000b48: f0 40 04 91 xxmr vs34,vs0  10000b4c: 38 3f 00 80 addi r1,r31,128  10000b50: e8 01 00 10 ld r0,16(r1)  10000b54: 7c 08 03 a6 mtlr r0  10000b58: eb e1 ff f8 ld r31,-8(r1)  10000b5c: 4e 80 00 20 blr  10000b60: 00 00 00 00 .long 0x0  10000b64: 00 00 00 01 .long 0x1  10000b68: 80 01 00 01 lwz r0,1(r1) 0000000010000c40 <.__addkf3_resolve>:  10000c40: 81 2d 8f 9c lwz r9,-28772(r13)  10000c44: 75 29 00 40 andis. r9,r9,64  10000c48: 41 82 00 18 beq 10000c60 <.__addkf3_resolve+0x20>  10000c4c: e8 62 80 10 ld r3,-32752(r2)  10000c50: 4e 80 00 20 blr  10000c54: 60 00 00 00 nop  10000c58: 60 00 00 00 nop  10000c5c: 60 42 00 00 ori r2,r2,0  10000c60: e8 62 80 18 ld r3,-32744(r2)  10000c64: 4e 80 00 20 blr  ...  10000c74: 60 00 00 00 nop  10000c78: 60 00 00 00 nop  10000c7c: 60 42 00 00 ori r2,r2,0 0000000010008b90 <.__addkf3_hw>:  10008b90: fc 42 18 08 xsaddqp v2,v2,v3  10008b94: 4e 80 00 20 blr  ...  10008ba4: 60 00 00 00 nop  10008ba8: 60 00 00 00 nop  10008bac: 60 00 00 00 nop 0000000010001500 <.__addkf3_sw>:  10001500: fb 41 ff d0 std r26,-48(r1)  10001504: fb 61 ff d8 std r27,-40(r1)  10001508: fb 81 ff e0 std r28,-32(r1)  1000150c: fb a1 ff e8 std r29,-24(r1)  10001510: fb c1 ff f0 std r30,-16(r1)  10001514: fb e1 ff f8 std r31,-8(r1)  10001518: f8 21 ff 41 stdu r1,-192(r1)  1000151c: 39 21 00 70 addi r9,r1,112  10001520: 39 41 00 70 addi r10,r1,112  // ...
Rust code
#![feature(f128)] #[no_mangle] #[inline(never)] fn add_entry(a: f128, b: f128) -> f128 { a + b } fn main() { let a = f128::from_bits(0x0); let b = f128::from_bits(0x1); dbg!(a, b); let c = add_entry(a, b); dbg!(c); }
C code
#include <stdio.h> #include <stdlib.h> #include <inttypes.h> #define _Float128 __float128 typedef struct { #if __BYTE_ORDER == __LITTLE_ENDIAN uint64_t lower, upper; #elif __BYTE_ORDER == __BIG_ENDIAN uint64_t upper, lower; #else #error missing endian check #endif } __attribute__((aligned(_Alignof(_Float128)))) u128; _Float128 __addkf3(_Float128, _Float128); void f128_print(_Float128 val) { u128 ival = *((u128 *)(&val)); printf("%#018" PRIx64 "%016" PRIx64 " %lf\n", ival.upper, ival.lower, (double)val); } _Float128 new_f128(uint64_t upper, uint64_t lower) { u128 val = { .lower = lower, .upper = upper }; return *((_Float128 *)(&val)); } _Float128 add_entry(_Float128 a, _Float128 b) { #ifdef USE_ADDKF3 return __addkf3(a, b); #else return a + b; #endif } int main() { _Float128 a = new_f128(0x0000000000000000, 0x0000000000000000); _Float128 b = new_f128(0x0000000000000000, 0x0000000000000001); f128_print(a); f128_print(b); _Float128 c = add_entry(a, b); f128_print(c); return 0; }
@tgross35 tgross35 force-pushed the f16-f128-intrinsics-min branch 11 times, most recently from 5119b1a to 3133d8f Compare May 12, 2024 22:25
@tgross35
Copy link
Contributor Author

I don't have any further insight on powerpc64, so I'll just disable testing against the system for now unless you have any suggestions @Amanieu. We're still testing against apfloat so our compiler-builtins is correct, but using the system symbols may cause precision issues for users. I think we can probably just address this more after everything merges.

This PR is probably best reviewed by-commit.

@tgross35 tgross35 marked this pull request as ready for review May 12, 2024 22:58
@Amanieu
Copy link
Member

Amanieu commented May 13, 2024

@lu-zero Maybe you have some insight into the powerpc issues?

@tgross35
Copy link
Contributor Author

Discussed some more at https://rust-lang.zulipchat.com/#narrow/stream/122651-general/topic/f128.20system.20libraries.20noncompliant.20platforms/near/438486364, @ecnelises is going to take a look at some point. This PR doesn't break anything new so I don't think it is blocked, opened rust-lang/rust#125109 to track the issue further.

@lu-zero
Copy link

lu-zero commented May 14, 2024

@Amanieu I can build and try on real hardware. We can ask for real runners, I think.

@tgross35
Copy link
Contributor Author

tgross35 commented May 14, 2024

@lu-zero could you try the code at #606 (comment) on powerpc64? I put more about how I am building at rust-lang/rust#125109

Less pressing but it would also be nice if you have a way to confirm that stxvd2x does not sigill on real powerpc-* targets, as in #606 (comment)

@lu-zero
Copy link

lu-zero commented May 14, 2024

so this is what I'm getting.

$ cargo run Compiling test_f128 v0.1.0 (/home/lu_zero/test_f128) Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.23s Running `target/debug/test_f128` [src/main.rs:12:5] a = 0x00000000000000000000000000000000 [src/main.rs:12:5] b = 0x00000000000000000000000000000001 [src/main.rs:14:5] c = 0x00000000000000000000000000000001 
@tgross35
Copy link
Contributor Author

Well that is interesting. That is 64-bit, correct?

@lu-zero
Copy link

lu-zero commented May 14, 2024

Yes, tested on power9, used as ppc64le

@tgross35
Copy link
Contributor Author

LE seems to work fine in qemu, the green test in CI isn't using the fallback. Only powerpc- and powerpc64- are causing the issues https://github.com/rust-lang/compiler-builtins/blob/3133d8f555580d93645eb763f94ec4cb59ac5880/testcrate/build.rs

@lu-zero
Copy link

lu-zero commented May 14, 2024

Sadly less and less people care about BE and it is fairly annoying to set up a BE environment =/

@tgross35
Copy link
Contributor Author

Ah, well thanks for taking a look! I'll dig a little bit more but I think we are probably okay to not worry about it unless rust-lang/rust unit f128 unit tests have failures (we currently can't test much without this)

@lu-zero
Copy link

lu-zero commented May 14, 2024

Either qemu gets fixed for BE or we'll need a BE runner.

@tgross35
Copy link
Contributor Author

I don't think the ppc64 BE is a qemu issue since the C version works fine. 32-bit hopefully should be just qemu.

Agreed that native runners wouldn't be a bad thing.

tgross35 added 8 commits May 15, 2024 07:19
`MinInt` contains the basic methods that are only needed by integers involved in widening operations, i.e. big integers. `Int` retains all other operations and convenience methods.
Change float test macros to fall back to testing against `rustc_apfloat` when system implementations are not available, rather than just skipping tests. This allows for easier debugging where operations may not be supported.
@tgross35 tgross35 force-pushed the f16-f128-intrinsics-min branch from 3133d8f to 6a847ab Compare May 15, 2024 12:19
@Amanieu
Copy link
Member

Amanieu commented May 16, 2024

This took a while to fully review, but LGTM!

@Amanieu Amanieu merged commit 449643f into rust-lang:master May 16, 2024
@tgross35
Copy link
Contributor Author

Thanks for taking a look!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants