This is a proof-of-concept binary search implementation that entirely avoids branching, i.e. if-then-else. It accomplishes this with a mix of bit twiddling, jump tables, and access to the x86 BSR instruction through a compiler builtin.
Branch mispredictions can be expensive, especially on architectures with long pipelines. This approach avoids branching entirely, at the expense of code size and more overall instructions.
Additionally, because the same code is run regardless of input, this approach can be extended to have SIMD/vector support, parallelizing searches across the same (or different) data.
At a high level, the search works as follows:
- compute
log2(length) switchon the result- Taking advantage of case fall-through, run a modified binary search update
log2(length)times. - Interpret the values of the search window to get the result
This is an optimization. The can work correctly by running log2(sizeof(length)) (e.g. 32, 64) times.
A standard binary search implicitly computes this empirically, essentially dividing length by 2 until it's 0. Because this implementation uses a jump table to avoid a loop condition, it needs to be known before the search.
unsigned int bits = 64 - __builtin_clzl(size);This is functionally the same as the loop in an iterative binary search implementation—it runs update_search_window log2(length) times.
Without bits % 64, the compiler adds a check to make sure there's a case for bits.
switch (bits % 64) { case 63: update_search_window(array, &start, &end, key); case 62: update_search_window(array, &start, &end, key); case 61: update_search_window(array, &start, &end, key); ...For the most part, the update function looks similar to a standard binary search, but instead of returning when the result is found, the window remains the same allowing the function to be run again.
The second difference is more in implementation than function. Rather than using an if-then-else construct to updated start or end, both are updated regardless, with bit masks used to select the next value.
__attribute__((always_inline)) static inline void update_search_window( const int32_t *array, ptrdiff_t *start, ptrdiff_t *end, const int32_t key) { ptrdiff_t median = (*start + *end) / 2; int32_t value = array[median]; uintptr_t start_mask = -(key < value); uintptr_t end_mask = -(key > value); *start = (median & ~start_mask) | (*start & start_mask); *end = ((median & ~end_mask) | (*end & end_mask)) + (~start_mask & ~end_mask & 1); }Like a recursive binary search's base case, the result is determined by comparing the key being looked up the current window. Like in the window update function, bitmasks are used to avoid branching.
Unlike bsearch(), but like Java's Arrays.binarySearch(), this returns -insertion_point - 1 when key isn't found.
int is_match_mask = -(array[start] == key); int is_after_start = -(array[start] < key); int is_before_end = -(array[end] > key); return (is_empty_mask & -1) | (is_non_empty_mask & ( (is_match_mask & start) | (~is_match_mask & ( (is_after_start & (-end - 1)) | (~is_after_start & (is_before_end & (-start - 1)))))));