Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.
Connex is an established, almost 30-year old family of scalable research vector processors (see, for example, http://users.dcae.pub.ro/~gstefan/2ndLevel/connex.html) with a number of lanes between 32 and 4096, easily changeable at synthesis time.
A very interesting feature is that the Connex family of processors has a local banked vector memory (each lane has its own local memory), which achieves 1 cycle latency with direct and indirect loads and stores - this implies that the memory bandwidth is very big.
The Connex-S vector processor from the Connex family has 16-bit signed integer Execution Units in each lane, making it low-power. It is emulating efficiently (via inlining the emulation subroutines in the instruction selection pass) 32-bit int and IEEE 754-2008 compliant 16-bit floating point (Clang type _Float16, C for ARM _fp16, LLVM IR half type), and, in part, IEEE 754 compliant 32-bit floating point. The emulation subroutines are in the lib/Target/Connex/Select*_OpincaaCodeGen.h files, which are to be included in the ConnexISelDAGToDAG.cpp module, in the ConnexDAGToDAGISel::Select() method. These emulation subroutines can be easily adjusted using for example to increase performance by sacrificing accuracy of f16 - drop me an email to ask how can you do it.
The Connex-S vector processor does not currently support the 64-bit floating point, nor the 64-bit integer types.
The back end targets only the Connex-S processor, used as an accelerator. The working compiler is described at https://dl.acm.org/doi/10.1145/3406536 and at https://sites.google.com/site/connextools/ .
Note that currently our back end targets only our Connex-S OPINCAA assembler (very easy to learn and use) available at https://github.com/alexsusu/opincaa .
The Connex-S OPINCAA assembler allows to run arbitrary Connex-S vector-length, host (CPU) agnostic code.
The ISA of the Connex-S vector processor is available in ConnexISA.pdf (you can find it at https://sites.google.com/site/connextools/).
The Connex-S vector processor has also an open source C++ simulator that comes together with the OPINCAA assembler.
The mailing list for the Connex-S processor and tools is: https://groups.google.com/forum/#!forum/connex-tools .
An interesting feature is that, in order to support recovering from from the Instruction selection pass' SelectionDAG back to the original source (C) code we require adding a simple data structure in include/llvm/CodeGen/SelectionDAG.h (and helper methods in related files) that maps an SDValue to the LLVM IR Value object it was used to translate from:
DenseMap<const Value*, SDValue> *crtNodeMapPtr
The Connex-S back end is 5 years old. We published 2 academic papers on it at ACM TECS and a CGO workshop (https://dl.acm.org/citation.cfm?id=3306166) . However, we are still adding features to the back end.
Small note: the Connex backend is rather small, it builds fast (in ~3-5 mins, single-threaded on a decent machine in 2019).
An important thing is that I think the test/MC/Connex folder should not be populated for this patch, because the Connex back end is able to generate only an assembly code that is required to be used by the special OPINCAA assembler, which is not integrated in LLVM. I've seen other back ends doing a similar thing such as the NVPTX back end, which doesn't support object file generation. The Connex back end also doesn't support object file generation.
The eBPF+Connex-S processor has the same ABI as the eBPF processor it extends, except that Connex-S supports natively only 16-bit integers and it is able to access the banked vector memory only by line (so Connex-S can't perform unaligned accesses).
The Connex processor is currently implemented in FPGA, but was also implemented in silicon also:
an older version for HDTV: Gheorghe M. Stefan, "The CA1024: A Massively Parallel Processor for Cost-Effective HDTV", 2006 (http://users.dcae.pub.ro/~gstefan/2ndLevel/images/connex_v4.ppt)
M. Malita and Gheorghe M. Stefan, "Map-scan Node Accelerator for Big-data"
Gheorghe M. Stefan and Mihaela Malita, "Can One-Chip Parallel Computing Be Liberated From Ad Hoc Solutions? A Computation Model Based Approach and Its Implementation"
Comitting first separately the patch with llvm/CMakeLists.txt, llvm/include/llvm/TargetParser/Triple.h and llvm/unittests/TargetParser/TripleTest.cpp.