Skip to content

Conversation

@alexsusu
Copy link

@alexsusu alexsusu commented Oct 28, 2025

Connex is an established, almost 30-year old family of scalable research vector processors (see, for example, http://users.dcae.pub.ro/~gstefan/2ndLevel/connex.html) with a number of lanes between 32 and 4096, easily changeable at synthesis time.
A very interesting feature is that the Connex family of processors has a local banked vector memory (each lane has its own local memory), which achieves 1 cycle latency with direct and indirect loads and stores - this implies that the memory bandwidth is very big.

The Connex-S vector processor from the Connex family has 16-bit signed integer Execution Units in each lane, making it low-power. It is emulating efficiently (via inlining the emulation subroutines in the instruction selection pass) 32-bit int and IEEE 754-2008 compliant 16-bit floating point (Clang type _Float16, C for ARM _fp16, LLVM IR half type), and, in part, IEEE 754 compliant 32-bit floating point. The emulation subroutines are in the lib/Target/Connex/Select*_OpincaaCodeGen.h files, which are to be included in the ConnexISelDAGToDAG.cpp module, in the ConnexDAGToDAGISel::Select() method. These emulation subroutines can be easily adjusted using for example to increase performance by sacrificing accuracy of f16 - drop me an email to ask how can you do it.
The Connex-S vector processor does not currently support the 64-bit floating point, nor the 64-bit integer types.

The back end targets only the Connex-S processor, used as an accelerator. The working compiler is described at https://dl.acm.org/doi/10.1145/3406536 and at https://sites.google.com/site/connextools/ .

Note that currently our back end targets only our Connex-S OPINCAA assembler (very easy to learn and use) available at https://github.com/alexsusu/opincaa .
The Connex-S OPINCAA assembler allows to run arbitrary Connex-S vector-length, host (CPU) agnostic code.

The ISA of the Connex-S vector processor is available in ConnexISA.pdf (you can find it at https://sites.google.com/site/connextools/).
The Connex-S vector processor has also an open source C++ simulator that comes together with the OPINCAA assembler.
The mailing list for the Connex-S processor and tools is: https://groups.google.com/forum/#!forum/connex-tools .

An interesting feature is that, in order to support recovering from from the Instruction selection pass' SelectionDAG back to the original source (C) code we require adding a simple data structure in include/llvm/CodeGen/SelectionDAG.h (and helper methods in related files) that maps an SDValue to the LLVM IR Value object it was used to translate from:
DenseMap<const Value*, SDValue> *crtNodeMapPtr

The Connex-S back end is 5 years old. We published 2 academic papers on it at ACM TECS and a CGO workshop (https://dl.acm.org/citation.cfm?id=3306166) . However, we are still adding features to the back end.

Small note: the Connex backend is rather small, it builds fast (in ~3-5 mins, single-threaded on a decent machine in 2019).

An important thing is that I think the test/MC/Connex folder should not be populated for this patch, because the Connex back end is able to generate only an assembly code that is required to be used by the special OPINCAA assembler, which is not integrated in LLVM. I've seen other back ends doing a similar thing such as the NVPTX back end, which doesn't support object file generation. The Connex back end also doesn't support object file generation.
The eBPF+Connex-S processor has the same ABI as the eBPF processor it extends, except that Connex-S supports natively only 16-bit integers and it is able to access the banked vector memory only by line (so Connex-S can't perform unaligned accesses).

The Connex processor is currently implemented in FPGA, but was also implemented in silicon also:

an older version for HDTV: Gheorghe M. Stefan, "The CA1024: A Massively Parallel Processor for Cost-Effective HDTV", 2006 (http://users.dcae.pub.ro/~gstefan/2ndLevel/images/connex_v4.ppt)
M. Malita and Gheorghe M. Stefan, "Map-scan Node Accelerator for Big-data"
Gheorghe M. Stefan and Mihaela Malita, "Can One-Chip Parallel Computing Be Liberated From Ad Hoc Solutions? A Computation Model Based Approach and Its Implementation"

Comitting first separately the patch with llvm/CMakeLists.txt, llvm/include/llvm/TargetParser/Triple.h and llvm/unittests/TargetParser/TripleTest.cpp.

@github-actions
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant