Name	Name	Last commit message	Last commit date
Latest commit History 17 Commits
src	src
.gitignore	.gitignore
CMakeLists.txt	CMakeLists.txt
LICENSE	LICENSE
README.md	README.md
fx.bat	fx.bat

Introduction

This is a C++ port of Phil Karn's Viterbi decoder which can be found here.

Modifications include:

Runtime definable constraint length and code rate
Runtime definable initial state values and renormalisation threshold
Reusability of branch table for better memory usage
Intrinsic support for arbitary constraint lengths if they meet the alignment requirements

NOTE: The provided implementation is not aiming to be the fastest there is. It is designed to be relatively portable and easy to add into a C++ project. Changes are encouraged if your usage requirements demand more performance.

Intrinsics support

The following intrinsic implementations exist:

16bit error metrics and soft decision values
8bit error metrics and soft decision values

Type	Requirements	Speedup
Scalar	K >= 2	1x
SSE - 16bit	K >= 5	8x
AVX - 16bit	K >= 6	16x
SSE - 8bit	K >= 6	16x
AVX - 8bit	K >= 7	32x

Further support

Using the 16bit and 8bit based intrinsics implementations as a guide you can make your own intrinsics implementation that uses a different level of quantization.

You can port the x86 intrinsics implementation to ARM processors since the vector instructions are similiar to NEON.

Custom modifications

The library uses helper classes which can be replaced with your changes. These include:

Parity check table
Aligned malloc wrapper
Basic functions like min/max/clamp

Benchmarks

Benchmarks were run using run_benchmarks.cpp. The input length was adjusted so that the scalar code would take a few seconds. The benchmark aims to measure the relative speedup using SSE and AVX vectorised intrinsics code.

Values are excluded from the table if the convolutional code cannot be vectorised with intrinsics.

The 16bit and 8bit scalar decoders take similar amounts of time. This means we can compare the speedup from using 8bit and 16bit vectorisations.

Setup

./run_benchmark.exe -c <id> -M <mode> -L <input_length> -T <total_runs>
Benchmark was executed on an Intel i5-7200U connected to battery power and kept at 3.1GHz.
16bit performance was measured with soft decision decoding
8bit performance was measured with hard decision decoding
The code that is executed in the same with soft and hard decision decoding. Only the soft decision values and renormalisation threshold are different.
No noise was added to the encoded symbols since it is difficult to apply the same amount of noise to a soft decision value and a hard decision value.
Reasons why a comparison between 16bit and 8bit soft decision decoding is difficult:
- Renormalisation threshold varies depending on choice of soft decision values, which makes it usage dependent.
- Difficult to apply the same dB of noise to a 16bit and 8bit soft value due to extreme quantisation.

16bit performance

ID	Name	K	R	SSE	AVX	Input size	Total runs
				8	16
0	Generic	3	2	----	----	----	----
1	Generic	5	2	6.5	----	8192	1000
2	Voyager	7	2	9.5	13.4	4096	1000
3	LTE	7	3	9.6	14.8	4096	1000
4	DAB Radio	7	4	9.3	15.6	2048	1000
5	CDMA IS-95A	9	2	11.1	18.5	2048	1000
6	CDMA 2000	9	4	10.6	18.7	2048	1000
7	Cassini	15	6	11.3	19.4	256	100

8bit performance

ID	Name	K	R	SSE	AVX	Input size	Total runs
				16	32
0	Generic	3	2	----	----	----	----
1	Generic	5	2	----	----	8192	1000
2	Voyager	7	2	15.4	21.0	4096	1000
3	LTE	7	3	16.1	21.9	4096	1000
4	DAB Radio	7	4	16.4	22.6	4096	1000
5	CDMA IS-95A	9	2	19.2	30.2	2048	1000
6	CDMA 2000	9	4	20.1	29.0	2048	1000
7	Cassini	15	6	20.2	31.3	256	100

Analysis of performance

Both the 16bit and 8bit codes perform better when the contraint length increases
The 8bit AVX code doesn't scale as well as the 16bit AVX code relative to its SSE counterpart.
Even with the substandard scaling of 8bit AVX it is still a major speedup.
Running this code on newer CPUs indicates greater performance gains when using SSE or AVX vectorisation for both 8bit and 16bit.

Additional notes

The implementations check if the parameters are valid using assert statements. This is for performance reasons as you usually use known parameters. You will need to add your own runtime checks if you are compiling without asserts.
The implementations are not ideal for performance since they make some tradeoffs. These include:
- Using unsigned integer types for error metrics for larger range of error values after renormalisation.
- Using saturated arithmetic in vectorised code to prevent unwanted overflows/underflows. This incurs up to a 33% speed penalty due to CPI increasing from 0.33 to 0.5 when moving from modular arithmetic.
- Keeping track of absolute error by accumulating bias adjustments when renormalising.
- Dynamic allocation for branch table and error metrics using runtime parameters.
Unsigned 8bit error metrics have severe limitations. These include:
- Limited range of soft decision values to avoid overflowing past the renormalisation threshold.
- Higher code rates (such as Cassini) will quickly reach the renormalisation threshold and deteriorate in accuracy. This is because the maximum error for each branch is a multiple of the code rate.
If you are only interested in hard decision decoding then using 8bit error metrics is the better choice
- Use hard decision values [-1,+1] of type int8_t
- Due to the small maximum branch error of 2*R we can set the renormalisation threshold to be quite high.
- renormalisation_threshold = UINT8_MAX - (2*R*10)
- This gives similar levels of accuracy to 16bit soft decision decoding but with up to 2x performance due to the usage of 8bit values.
Depending on your usage requirements changes to the library are absolutely encouraged
Additionally check out Phil Karn's fine tuned assembly code here for the best possible performance
This code is not heavily tested and your mileage may vary. This was written for personal usage.

Example code

Name	Description
run_benchmark	Runs benchmark to compare performance between vectorisations
run_decoder	Runs any arbitary decoder
run_punctured_decoder	Implementation of DAB radio punctured decoding

Useful alternatives

Spiral project aims to auto-generate high performance code for any input parameters
ka9q/libfec is Phil Karn's original C implementation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Introduction

Intrinsics support

Further support

Custom modifications

Benchmarks

Setup

16bit performance

8bit performance

Analysis of performance

Additional notes

Example code

Useful alternatives

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

williamyang98/ViterbiDecoderCpp

Folders and files

Latest commit

History

Repository files navigation

Introduction

Intrinsics support

Further support

Custom modifications

Benchmarks

Setup

16bit performance

8bit performance

Analysis of performance

Additional notes

Example code

Useful alternatives

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages