Add addition, subtraction, multiplication, and compare operations for `f128` #606

tgross35 · 2024-05-10T23:43:41Z

Division is not yet working, but all others were straigtforward to add so I split them out of #587.

This includes some bigger changes that are split per commit:

Split Int into Int and MinInt so we can use traits with bigint types
Add 256-bit bigint types for widening operations on 128-bit integers
Refactor test macros so that systems without implementations test against rustc_apfloat rather than just being skipped. I needed this so I can actually debug.
Add implementations and tests for the new operations
Change powerpc symbol names to match what LLVM emits, from https://gcc.gnu.org/wiki/Ieee128PowerPC

tgross35 · 2024-05-11T08:39:14Z

Powerpc seems to be hitting a SIGILL on stxvd2x vs34,0,r9. From https://bugzilla.redhat.com/show_bug.cgi?id=1045384 and https://bugzilla.redhat.com/show_bug.cgi?id=1002077 it seems like this may be a limitation of qemu, but information is scarce. I may just need to use the apfloat fallback.

Also looks like symbol names may be different, I need to double check https://gcc.gnu.org/wiki/Ieee128PowerPC

tgross35 · 2024-05-11T09:50:03Z

Ok, some more info. Compiling C for 32-bit powerpc seems to always emit __gcc_qadd regardless of -mlong-double flag. When forced to use __addkf3 it segfaults as expected because of the stxvd2x instruction. I think we are okay just ignoring this one as a likely limitation of qemu.

Compiling C for powerpc64 seems to use an ifunc resolver for __addkf3, which goes to __addkf3_hw for the single-instruction xsaddqp. Compiling Rust seems to do the exact same thing but I am getting different results, __addtf3(0x00000000000000000000000000000000, 0x00000000000000000000000000000001): std: 0x00000000000000000000000000000000, builtins: 0x00000000000000000000000000000001. ~~I get the same results when calling __addkf3 directly in C~~ edit: no I don't. Maybe the operation format of xsaddqp needs to be set somehow to choose between ppc doubledouble and ieee f128?

tgross35 · 2024-05-12T10:23:29Z

Some asm for anyone who understands it better:

Assembly generated from Rust (incorrect result)

0000000000010d00 <.add_entry>:  10d00: 7c 08 02 a6 mflr r0  10d04: f8 21 ff 91 stdu r1,-112(r1)  10d08: f8 01 00 80 std r0,128(r1)  10d0c: 4b ff c6 95 bl d3a0 <00000143.plt_call.__addkf3>  10d10: e8 41 00 28 ld r2,40(r1)  10d14: 38 21 00 70 addi r1,r1,112  10d18: e8 01 00 10 ld r0,16(r1)  10d1c: 7c 08 03 a6 mtlr r0  10d20: 4e 80 00 20 blr 000000000000d3a0 <00000143.plt_call.__addkf3>:  d3a0: f8 41 00 28 std r2,40(r1)  d3a4: 3d 62 ff ff addis r11,r2,-1  d3a8: e9 8b 7f 58 ld r12,32600(r11)  d3ac: 7d 89 03 a6 mtctr r12  d3b0: e8 4b 7f 60 ld r2,32608(r11)  d3b4: 4e 80 04 20 bctr 0000000000056790 <.__addkf3_resolve>:  56790: 81 2d 8f 9c lwz r9,-28772(r13)  56794: 75 29 00 40 andis. r9,r9,64  56798: 41 82 00 18 beq 567b0 <.__addkf3_resolve+0x20>  5679c: e8 62 80 10 ld r3,-32752(r2)  567a0: 4e 80 00 20 blr  567a4: 60 00 00 00 nop  567a8: 60 00 00 00 nop  567ac: 60 42 00 00 ori r2,r2,0  567b0: e8 62 80 18 ld r3,-32744(r2)  567b4: 4e 80 00 20 blr  ...  567c4: 60 00 00 00 nop  567c8: 60 00 00 00 nop  567cc: 60 42 00 00 ori r2,r2,0 000000000005e6e0 <.__addkf3_hw>:  5e6e0: fc 42 18 08 xsaddqp v2,v2,v3  5e6e4: 4e 80 00 20 blr  ...  5e6f4: 60 00 00 00 nop  5e6f8: 60 00 00 00 nop  5e6fc: 60 00 00 00 nop 0000000000057050 <.__addkf3_sw>:  57050: fb 41 ff d0 std r26,-48(r1)  57054: fb 61 ff d8 std r27,-40(r1)  57058: fb 81 ff e0 std r28,-32(r1)  5705c: fb a1 ff e8 std r29,-24(r1)  57060: fb c1 ff f0 std r30,-16(r1)  57064: fb e1 ff f8 std r31,-8(r1)  57068: f8 21 ff 41 stdu r1,-192(r1)  5706c: 39 21 00 70 addi r9,r1,112  57070: 39 41 00 70 addi r10,r1,112  // ... full sw implementation

Assembly generated from C (correct result)

0000000010000af8 <.add_entry>:  10000af8: 7c 08 02 a6 mflr r0  10000afc: f8 01 00 10 std r0,16(r1)  10000b00: fb e1 ff f8 std r31,-8(r1)  10000b04: f8 21 ff 81 stdu r1,-128(r1)  10000b08: 7c 3f 0b 78 mr r31,r1  10000b0c: 39 20 00 30 li r9,48  10000b10: 39 5f 00 80 addi r10,r31,128  10000b14: 7c 4a 4f 99 stxvd2x vs34,r10,r9  10000b18: 39 20 00 40 li r9,64  10000b1c: 39 5f 00 80 addi r10,r31,128  10000b20: 7c 6a 4f 99 stxvd2x vs35,r10,r9  10000b24: 39 20 00 40 li r9,64  10000b28: 39 5f 00 80 addi r10,r31,128  10000b2c: 7c 6a 4e 99 lxvd2x vs35,r10,r9  10000b30: 39 20 00 30 li r9,48  10000b34: 39 5f 00 80 addi r10,r31,128  10000b38: 7c 4a 4e 99 lxvd2x vs34,r10,r9  10000b3c: 4b ff fb a5 bl 100006e0 <00000019.plt_call.__addkf3>  10000b40: e8 41 00 28 ld r2,40(r1)  10000b44: f0 02 14 96 xxmr vs0,vs34  10000b48: f0 40 04 91 xxmr vs34,vs0  10000b4c: 38 3f 00 80 addi r1,r31,128  10000b50: e8 01 00 10 ld r0,16(r1)  10000b54: 7c 08 03 a6 mtlr r0  10000b58: eb e1 ff f8 ld r31,-8(r1)  10000b5c: 4e 80 00 20 blr  10000b60: 00 00 00 00 .long 0x0  10000b64: 00 00 00 01 .long 0x1  10000b68: 80 01 00 01 lwz r0,1(r1) 0000000010000c40 <.__addkf3_resolve>:  10000c40: 81 2d 8f 9c lwz r9,-28772(r13)  10000c44: 75 29 00 40 andis. r9,r9,64  10000c48: 41 82 00 18 beq 10000c60 <.__addkf3_resolve+0x20>  10000c4c: e8 62 80 10 ld r3,-32752(r2)  10000c50: 4e 80 00 20 blr  10000c54: 60 00 00 00 nop  10000c58: 60 00 00 00 nop  10000c5c: 60 42 00 00 ori r2,r2,0  10000c60: e8 62 80 18 ld r3,-32744(r2)  10000c64: 4e 80 00 20 blr  ...  10000c74: 60 00 00 00 nop  10000c78: 60 00 00 00 nop  10000c7c: 60 42 00 00 ori r2,r2,0 0000000010008b90 <.__addkf3_hw>:  10008b90: fc 42 18 08 xsaddqp v2,v2,v3  10008b94: 4e 80 00 20 blr  ...  10008ba4: 60 00 00 00 nop  10008ba8: 60 00 00 00 nop  10008bac: 60 00 00 00 nop 0000000010001500 <.__addkf3_sw>:  10001500: fb 41 ff d0 std r26,-48(r1)  10001504: fb 61 ff d8 std r27,-40(r1)  10001508: fb 81 ff e0 std r28,-32(r1)  1000150c: fb a1 ff e8 std r29,-24(r1)  10001510: fb c1 ff f0 std r30,-16(r1)  10001514: fb e1 ff f8 std r31,-8(r1)  10001518: f8 21 ff 41 stdu r1,-192(r1)  1000151c: 39 21 00 70 addi r9,r1,112  10001520: 39 41 00 70 addi r10,r1,112  // ...

Rust code

#![feature(f128)] #[no_mangle] #[inline(never)] fn add_entry(a: f128, b: f128) -> f128 { a + b } fn main() { let a = f128::from_bits(0x0); let b = f128::from_bits(0x1); dbg!(a, b); let c = add_entry(a, b); dbg!(c); }

C code

#include <stdio.h> #include <stdlib.h> #include <inttypes.h> #define _Float128 __float128 typedef struct { #if __BYTE_ORDER == __LITTLE_ENDIAN uint64_t lower, upper; #elif __BYTE_ORDER == __BIG_ENDIAN uint64_t upper, lower; #else #error missing endian check #endif } __attribute__((aligned(_Alignof(_Float128)))) u128; _Float128 __addkf3(_Float128, _Float128); void f128_print(_Float128 val) { u128 ival = *((u128 *)(&val)); printf("%#018" PRIx64 "%016" PRIx64 " %lf\n", ival.upper, ival.lower, (double)val); } _Float128 new_f128(uint64_t upper, uint64_t lower) { u128 val = { .lower = lower, .upper = upper }; return *((_Float128 *)(&val)); } _Float128 add_entry(_Float128 a, _Float128 b) { #ifdef USE_ADDKF3 return __addkf3(a, b); #else return a + b; #endif } int main() { _Float128 a = new_f128(0x0000000000000000, 0x0000000000000000); _Float128 b = new_f128(0x0000000000000000, 0x0000000000000001); f128_print(a); f128_print(b); _Float128 c = add_entry(a, b); f128_print(c); return 0; }

tgross35 · 2024-05-12T22:58:51Z

I don't have any further insight on powerpc64, so I'll just disable testing against the system for now unless you have any suggestions @Amanieu. We're still testing against apfloat so our compiler-builtins is correct, but using the system symbols may cause precision issues for users. I think we can probably just address this more after everything merges.

This PR is probably best reviewed by-commit.

Amanieu · 2024-05-13T10:16:53Z

@lu-zero Maybe you have some insight into the powerpc issues?

tgross35 · 2024-05-14T07:28:20Z

Discussed some more at https://rust-lang.zulipchat.com/#narrow/stream/122651-general/topic/f128.20system.20libraries.20noncompliant.20platforms/near/438486364, @ecnelises is going to take a look at some point. This PR doesn't break anything new so I don't think it is blocked, opened rust-lang/rust#125109 to track the issue further.

lu-zero · 2024-05-14T07:31:42Z

@Amanieu I can build and try on real hardware. We can ask for real runners, I think.

tgross35 · 2024-05-14T07:39:31Z

@lu-zero could you try the code at #606 (comment) on powerpc64? I put more about how I am building at rust-lang/rust#125109

Less pressing but it would also be nice if you have a way to confirm that stxvd2x does not sigill on real powerpc-* targets, as in #606 (comment)

lu-zero · 2024-05-14T07:56:37Z

so this is what I'm getting.

$ cargo run Compiling test_f128 v0.1.0 (/home/lu_zero/test_f128) Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.23s Running `target/debug/test_f128` [src/main.rs:12:5] a = 0x00000000000000000000000000000000 [src/main.rs:12:5] b = 0x00000000000000000000000000000001 [src/main.rs:14:5] c = 0x00000000000000000000000000000001

tgross35 · 2024-05-14T08:01:49Z

Well that is interesting. That is 64-bit, correct?

lu-zero · 2024-05-14T08:12:09Z

Yes, tested on power9, used as ppc64le

tgross35 · 2024-05-14T08:19:41Z

LE seems to work fine in qemu, the green test in CI isn't using the fallback. Only powerpc- and powerpc64- are causing the issues https://github.com/rust-lang/compiler-builtins/blob/3133d8f555580d93645eb763f94ec4cb59ac5880/testcrate/build.rs

lu-zero · 2024-05-14T08:25:36Z

Sadly less and less people care about BE and it is fairly annoying to set up a BE environment =/

tgross35 · 2024-05-14T08:33:57Z

Ah, well thanks for taking a look! I'll dig a little bit more but I think we are probably okay to not worry about it unless rust-lang/rust unit f128 unit tests have failures (we currently can't test much without this)

lu-zero · 2024-05-14T08:39:51Z

Either qemu gets fixed for BE or we'll need a BE runner.

tgross35 · 2024-05-14T08:50:27Z

I don't think the ppc64 BE is a qemu issue since the C version works fine. 32-bit hopefully should be just qemu.

Agreed that native runners wouldn't be a bad thing.

`MinInt` contains the basic methods that are only needed by integers involved in widening operations, i.e. big integers. `Int` retains all other operations and convenience methods.

Change float test macros to fall back to testing against `rustc_apfloat` when system implementations are not available, rather than just skipping tests. This allows for easier debugging where operations may not be supported.

PowerPC uses `kf` instead of `tf`: <https://gcc.gnu.org/wiki/Ieee128PowerPC>

Amanieu · 2024-05-16T14:08:04Z

This took a while to fully review, but LGTM!

tgross35 · 2024-05-16T17:47:05Z

Thanks for taking a look!

tgross35 force-pushed the f16-f128-intrinsics-min branch 7 times, most recently from 2e9bb45 to 7871c81 Compare May 11, 2024 05:43

tgross35 force-pushed the f16-f128-intrinsics-min branch 11 times, most recently from 5119b1a to 3133d8f Compare May 12, 2024 22:25

tgross35 marked this pull request as ready for review May 12, 2024 22:58

This was referenced May 13, 2024

Add missing functions for f16 and f128 #587

Closed

f128 symbols on powerpc64 give inaccurate results rust-lang/rust#125109

Open

This was referenced May 14, 2024

Tracking Issue for f16 and f128 float types rust-lang/rust#116909

Open

Add more intrinsics #611

Closed

tgross35 added 8 commits May 15, 2024 07:19

Split Int into Int and MinInt

9c6fcb5

`MinInt` contains the basic methods that are only needed by integers involved in widening operations, i.e. big integers. `Int` retains all other operations and convenience methods.

Add i256 and u256 bigint types

2868c26

Refactor float test macros to have a fallback

77faba1

Change float test macros to fall back to testing against `rustc_apfloat` when system implementations are not available, rather than just skipping tests. This allows for easier debugging where operations may not be supported.

Enable no-fail-fast for more usable test output

255c9f3

Implement f128 addition and subtraction

c8cc819

Implement f128 multiplication

58ad317

Implement f128 comparison

9bea196

Correct f128 extend and truncate symbol names on powerpc

6a847ab

PowerPC uses `kf` instead of `tf`: <https://gcc.gnu.org/wiki/Ieee128PowerPC>

tgross35 force-pushed the f16-f128-intrinsics-min branch from 3133d8f to 6a847ab Compare May 15, 2024 12:19

Amanieu merged commit 449643f into rust-lang:master May 16, 2024

tgross35 deleted the f16-f128-intrinsics-min branch May 16, 2024 17:47

This was referenced May 17, 2024

Unable to find long double symbols on aarch64-apple-darwin #567

Closed

[PowerPC] SIGILL in PPCTargetLowering for powerpc-unknown-linux-gnu with pwr9 target feature llvm/llvm-project#92233

Closed

tgross35 mentioned this pull request Aug 26, 2024

release/19.x: [PowerPC] Respect endianness when bitcasting to fp128 (#95931) llvm/llvm-project#105623

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add addition, subtraction, multiplication, and compare operations for `f128` #606

Add addition, subtraction, multiplication, and compare operations for `f128` #606

Uh oh!

tgross35 commented May 10, 2024 •

edited

Loading

tgross35 commented May 11, 2024 •

edited

Loading

tgross35 commented May 11, 2024 •

edited

Loading

tgross35 commented May 12, 2024 •

edited

Loading

tgross35 commented May 12, 2024

Amanieu commented May 13, 2024

tgross35 commented May 14, 2024

lu-zero commented May 14, 2024

tgross35 commented May 14, 2024 •

edited

Loading

lu-zero commented May 14, 2024

tgross35 commented May 14, 2024

lu-zero commented May 14, 2024 •

edited

Loading

tgross35 commented May 14, 2024

lu-zero commented May 14, 2024

tgross35 commented May 14, 2024

lu-zero commented May 14, 2024

tgross35 commented May 14, 2024

Amanieu commented May 16, 2024

tgross35 commented May 16, 2024

Labels

3 participants

Add addition, subtraction, multiplication, and compare operations for f128 #606

Add addition, subtraction, multiplication, and compare operations for f128 #606

Uh oh!

Conversation

tgross35 commented May 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

tgross35 commented May 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

tgross35 commented May 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

tgross35 commented May 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

tgross35 commented May 12, 2024

Amanieu commented May 13, 2024

tgross35 commented May 14, 2024

lu-zero commented May 14, 2024

tgross35 commented May 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

lu-zero commented May 14, 2024

tgross35 commented May 14, 2024

lu-zero commented May 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

tgross35 commented May 14, 2024

lu-zero commented May 14, 2024

tgross35 commented May 14, 2024

lu-zero commented May 14, 2024

tgross35 commented May 14, 2024

Amanieu commented May 16, 2024

tgross35 commented May 16, 2024

Labels

3 participants

Add addition, subtraction, multiplication, and compare operations for `f128` #606

Add addition, subtraction, multiplication, and compare operations for `f128` #606

tgross35 commented May 10, 2024 •

edited

Loading

tgross35 commented May 11, 2024 •

edited

Loading

tgross35 commented May 11, 2024 •

edited

Loading

tgross35 commented May 12, 2024 •

edited

Loading

tgross35 commented May 14, 2024 •

edited

Loading

lu-zero commented May 14, 2024 •

edited

Loading