Skip to content

Conversation

@Wunkolo
Copy link
Contributor

@Wunkolo Wunkolo commented May 15, 2024

Implements a 64-bit ARM backend that emits a64 instructions using oaknut.

image

image

Depends on #2258 and xenia-project/FFmpeg#8

Addresses #2002

Tested on a ThinkPad X13s and uses unit tests from #1348 as well. There is currently a ARMv8.1-a requirement due to the use of some of the newer atomic instructions such as CASAL.

Wunkolo added 9 commits April 27, 2024 16:45
Separates the `Windows` platform into `Windows-x86_64` and `Windows-ARM64`. Adds `--arch` argument to `build`. Removes x64 backend on non-x64 targets.
Marked as TODO for now
Adding the `a64` backend will be a different PR. For now it's stubbed to the null backend to allow the main executable to open without failing initalization.
This value is currently returning `0` on ARM machines and throws an exception.
@Wunkolo
Copy link
Contributor Author

Wunkolo commented May 23, 2024

Debugger, instruction-stepping, call-stack unwinding, etc have been implemented as well:
image

@Wunkolo
Copy link
Contributor Author

Wunkolo commented May 28, 2024

Latest iteration running Beautiful Katamari and Geometry Wars. Still some minor issues but serving gameplay now.

kata.mp4
geo.wars.mp4
@Wunkolo
Copy link
Contributor Author

Wunkolo commented May 29, 2024

No longer requires Armv8.1. Instructions are emitted with an Armv8.0-a baseline and will detect features such as FP16 and LSE and such before utilizing them(and expose them in the feature-mask config similar to x64).

@Wunkolo Wunkolo force-pushed the arm64-backend branch 3 times, most recently from 4fa2462 to 54790a4 Compare June 8, 2024 21:34
@Wunkolo Wunkolo mentioned this pull request Jun 12, 2024
Wunkolo added 11 commits June 23, 2024 13:48
Addresses a build issue that seems to occur now that xenia-app is not getting SDL2 through one of its submodues
Adds the new `xenia-cpu-backend-a64` build-target with linkage following the x64 backend.
Header-only library for emitting arm64v8 instructions. Enables C++20 only for the a64 backend for now
Mostly element-accessors
First pass framework that gets emitted ARM code executing. Based on the x64 backend, implements an ARM64 JIT backend.
This just reverses the bytes of 32-bit values, not reverse the whole vector.
Wunkolo added 17 commits June 23, 2024 14:00
Indices and non-const tables were using the same scratch-register
Uses `CNTFRQ` and `CNTVCT` system-registers as a raw clock source. On my ThinkPad x13s, the raw clock source returns a tick-frequency of 19,200,000 while the platform clock source(QueryPerformanceFrequency) returns 10,000,000. Almost double the accuracy over the platform-clock!
Misses some during the first pass. Now the config files with mention a64 differences.
Read direction from the ZR in the case that we are just storing a 64 or 32 bit zero
This directly maps to the QC bit in the FPSR. Just have to make sure that the saturated instruction is the very last instruction(which is currently the case for stuff like VECTOR_ADD and such).
The 64-bit cases uses a particular Replicated 8-bit immediate so something else will have to handle that This cases a lot of cases without having to touch memory. Does not catch cases of `1.0`(0x3f800000).
`FMOV` encodes an 8-bit floating point immediate that can be used to accelerate the loading of certain constant floating point values between -31.0 and 32.0. A lot of immediates such as -1.0, 1.0, 0.5, etc fall within this range and this code gets lots of hits in my testing. This is much more optimal than trying to load a 32/64-bit value in W0/X0 and moving it into an FP register.
Uses LSE when available, but provides an armv8.0 baseline implementation.
Removes all comments relating to x64 implementation details
`dc civac` causes an illegal-instruciton on Windows-ARM. This is likely as a security measure against cache-attacks. On Linux this instruction is trapped into an EL1 kernel function. Windows does not seem to have any user-mode cache-maintenance instructions available for data-cache(only instruction-cache via `FlushInstructionCache`). The closest thing we can do for now is a full data memory-barrier with `dsb ish`. Prefetches are implemented using `prfm pldl1keep, ...`.
Out-of-bound shift-values are handled as modulo-element-size
The emitter doesn't actually hold onto executable code, but just generates the assembly-data into a buffer for the currently-resolving function before placing it into a code-cache. When code gets pushed into the code-cache, it can just be copied from an `std::vector` and reset. The code-cache itself maintains the actual executable memory and stack-unwinding code and such. This also fixes a bunch of errornous relative-addressing glitches where relative addresses were calculated based on the address of the unused CodeBlock rather than being position-independent. `MOVP2R` in particular was generating different instructions depending on its distance from the code block when it should always just use `MOV` and not do any relative-address calculations since we can't predict where the actual instruction's offset will be(we cannot predict what the program counter will be). Oaknut probably needs a "position independent" policy or mode or something so that it avoids PC-relative instructions.
These `MOV`->`DUP` splats can just be a singular `MOVI` instruction
Byte-sized constants can utilize the `MOVI` instructions. This makes many cases such as zero-splats much faster since this encodes as just a register-rename(similar to `xor` on x64).
Moves the `FMOV` constant functions into `a64_util` so it is available to other translation units. Optimize constant-splats with conditional use of `MOVI` and `FMOV`.
The last `FADDP` writes into an `S` register, which automatically masks all the other lanes to zero.
The `SUB` instruction can only encode immediates in the form of `0xFFF` or `0xFFF000`. In the case that the stack size is greater than `0xFFF`, then just align the stack-size by `0x1000` to keep the bottom 12 bits clear.
@talynone
Copy link

Any progress on this possible?

@Wunkolo
Copy link
Contributor Author

Wunkolo commented Nov 27, 2024

At this point this is pretty much ready for review and merging, but it depends on #2258 and xenia-project/FFmpeg#8 being merged and the submodules being updated in this repo and maybe some additional testing with more games. Though, this repo is somewhat inactive these days it seems. The last PR was merged several months ago.

@franklypaladin
Copy link

Though, this repo is somewhat inactive these days it seems. The last PR was merged several months ago.

You should put this on PR on xenia-canary, its very much active.

@A1eNaz
Copy link

A1eNaz commented Feb 23, 2025

@Wunkolo any plans of starting a PR on https://github.com/xenia-canary/xenia-canary ?

@Wunkolo
Copy link
Contributor Author

Wunkolo commented Feb 24, 2025

@Wunkolo any plans of starting a PR on https://github.com/xenia-canary/xenia-canary ?

This is gonna sound really weird and sudden but for reasons I cannot really elaborate on right now, someone else will likely have to make the PR on canary. I kinda wish I did that from the start now 😓.
I give anyone permission to use this code and its related changes so long as it has some attribution. Maybe there will be a time where I can elaborate why but I don't think I can right now.

@mrdc
Copy link

mrdc commented Jun 6, 2025

@Wunkolo Hi! I'm trying to merge your ARM64 PRs to Xenia-canary and it looks not bad at the moment: the build fails, but main issues are in memory.cc:

D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\memory.cc(96,5): error C3861: '_movdir64b': identifier not found [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\memory.cc(97,5): error C3861: '_movdir64b': identifier not found [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\memory.cc(98,5): error C3861: '_movdir64b': identifier not found [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\memory.cc(99,5): error C3861: '_movdir64b': identifier not found [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\memory.cc(168,5): error C3861: '_movdir64b': identifier not found [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\memory.cc(169,5): error C3861: '_movdir64b': identifier not found [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\memory.cc(173,5): error C3861: '_movdir64b': identifier not found [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\memory.cc(199,7): error C2653: 'amd64': is not a class or namespace name [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\memory.cc(199,14): error C3861: 'GetFeatureFlags': identifier not found [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\memory.cc(199,34): error C2653: 'amd64': is not a class or namespace name [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\memory.cc(199,41): error C2065: 'kX64EmitMovdir64M': undeclared identifier [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\memory.cc(202,14): error C2653: 'amd64': is not a class or namespace name [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\memory.cc(202,21): error C3861: 'GetFeatureFlags': identifier not found [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\memory.cc(202,41): error C2653: 'amd64': is not a class or namespace name [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\memory.cc(202,48): error C2065: 'kX64FastRepMovs': undeclared identifier [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] mutex.cc D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\memory.cc(476,32): error C2131: expression did not evaluate to a constant [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\memory.cc(477,7): a non-constant (sub-)expression was encountered D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\memory.cc(510,32): error C2131: expression did not evaluate to a constant [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\memory.cc(511,7): a non-constant (sub-)expression was encountered D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\memory.cc(542,32): error C2131: expression did not evaluate to a constant [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\memory.cc(543,7): a non-constant (sub-)expression was encountered 

and platform_amd64.cc:

D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\platform_amd64.cc(53,5): error C2065: 'kX64EmitAVX2': undeclared identifier [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\platform_amd64.cc(54,5): error C2065: 'kX64EmitFMA': undeclared identifier [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\platform_amd64.cc(55,5): error C2065: 'kX64EmitLZCNT': undeclared identifier [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\platform_amd64.cc(56,5): error C2065: 'kX64EmitBMI1': undeclared identifier [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\platform_amd64.cc(57,5): error C2065: 'kX64EmitBMI2': undeclared identifier [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\platform_amd64.cc(58,5): error C2065: 'kX64EmitMovbe': undeclared identifier [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\platform_amd64.cc(59,5): error C2065: 'kX64EmitGFNI': undeclared identifier [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\platform_amd64.cc(60,5): error C2065: 'kX64EmitAVX512F': undeclared identifier [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\platform_amd64.cc(61,5): error C2065: 'kX64EmitAVX512VL': undeclared identifier [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\platform_amd64.cc(62,5): error C2065: 'kX64EmitAVX512BW': undeclared identifier [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\platform_amd64.cc(63,5): error C2065: 'kX64EmitAVX512DQ': undeclared identifier [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\platform_amd64.cc(64,5): error C2065: 'kX64EmitAVX512VBMI': undeclared identifier [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\platform_amd64.cc(65,5): error C2065: 'kX64EmitPrefetchW': undeclared identifier [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\platform_amd64.cc(79,25): error C2065: 'kX64EmitPrefetchW': undeclared identifier [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\platform_amd64.cc(82,40): error C2065: 'kX64EmitLZCNT': undeclared identifier [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\platform_amd64.cc(82,58): error C2065: 'kX64EmitLZCNT': undeclared identifier [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\platform_amd64.cc(83,27): error C2065: 'kX64EmitLZCNT': undeclared identifier [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\platform_amd64.cc(88,40): error C2065: 'kX64EmitFMA4': undeclared identifier [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\platform_amd64.cc(88,57): error C2065: 'kX64EmitFMA4': undeclared identifier [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\platform_amd64.cc(89,27): error C2065: 'kX64EmitFMA4': undeclared identifier [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\platform_amd64.cc(93,40): error C2065: 'kX64EmitTBM': undeclared identifier [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\platform_amd64.cc(93,56): error C2065: 'kX64EmitTBM': undeclared identifier [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\platform_amd64.cc(94,27): error C2065: 'kX64EmitTBM': undeclared identifier [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\platform_amd64.cc(98,40): error C2065: 'kX64EmitXOP': undeclared identifier [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\platform_amd64.cc(98,56): error C2065: 'kX64EmitXOP': undeclared identifier [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\platform_amd64.cc(99,27): error C2065: 'kX64EmitXOP': undeclared identifier [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\platform_amd64.cc(109,25): error C2065: 'kX64FastJrcx': undeclared identifier [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\platform_amd64.cc(110,25): error C2065: 'kX64FastLoop': undeclared identifier [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\platform_amd64.cc(114,27): error C2065: 'kX64FlagsIndependentVars': undeclared identifier [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\platform_amd64.cc(124,38): error C2065: 'kX64EmitMovdir64M': undeclared identifier [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\platform_amd64.cc(125,25): error C2065: 'kX64EmitMovdir64M': undeclared identifier [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\platform_amd64.cc(127,62): error C2065: 'kX64FastRepMovs': undeclared identifier [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] D:\a\xenia-canary-arm64\xenia-canary-arm64\src\xenia\base\platform_amd64.cc(128,25): error C2065: 'kX64FastRepMovs': undeclared identifier [D:\a\xenia-canary-arm64\xenia-canary-arm64\build\xenia-base.vcxproj] 

The build fails because compiler throws away x64 code when we compile for ARM64, while other code depends on it.
The rest I've merged and things look good :D
The main reason for this is to make Xenia one step closer to macOS ARM64.

rtissera added a commit to rtissera/xenia-canary that referenced this pull request Jun 20, 2025
@ashumish-QCOM
Copy link

Hi team,

Thanks for all the great work on ARM64 support!

I’ve been following both this PR and #2259 closely as I’m interested in getting Xenia running on a Snapdragon-based Windows ARM64 device. I haven’t tested the build yet, but I’m tracking progress and would love to try it out once it’s closer to integration or available in Canary.

Is there any recent update or plan to port these changes to the Canary branch?

Appreciate all the effort happy to help however I can.

Regards,
Ashutosh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

6 participants