Beating Floating Point at its Own Game: Posit Arithmetic

Beating Floating Point at its Own Game: Posit Arithmetic John L. Gustafson Visiting Scientist at A*STAR and Professor at National University of Singapore

The “Memory Wall” Opera&on Energy 64-bit ﬂoa+ng-point mul+ply-add 0.2 nanojoules Read 64 bits from memory (DRAM) 12 nanojoules The issue is communica(on, not computa+on: Run +me, parts cost, electric power. What if we could cut communica+on in half by doubling informa(on per bit?

Decades of asking “How do you know your answer is correct?” •  “(Laughter) “What do you mean?” •  “Well, we used double precision.” •  “Oh. We always get that answer.” •  “Other people get that answer.”

Precision versus Accuracy 150,000 pixels 432 pixels

Metrics for Number Systems •  Relative Error = |(correct x – computed x) / correct x| •  Dynamic range = log10(maxreal / minreal) decades •  Decimal Accuracy = –log10(Relative Error) •  Percentage of operations that are exact (closure under + – × ÷ √ etc.) •  Average accuracy loss when inexact •  Entropy per bit (maximize information) •  Accuracy benchmarks: simple formulas, linear equation solving, math kernels…

Analogy: Printing Technology 1970: 30 sec 2017: 30 sec

Challenges for the Existing Arithmetic •  No guarantee of repeatable or portable behavior (!) •  Insufficient 32-bit accuracy forces wasteful use of 64-bit types •  Fails to obey laws of algebra (associative, distributive laws) •  Poor handling of overflow, underflow, Not-a-Number results •  Dynamic ranges are too large, stealing accuracy needed for workloads •  Rounding errors are invisible, hazardous, and costly to debug •  Computations are unstable when parallelized IEEE Standard Floats are a storage-inefficient, 1980s-era design.

Why worry about floating point? a = (3.2e8, 1, –1, 8.0e7) b = (4.0e7, 1, –1, –1.6e8) Single Precision, 32 bits: Note: All values are integers that can be expressed exactly in the IEEE 754 Standard floating-point format (single or double precision) Double Precision, 64 bits: a · b = 0 a · b = 0 Correct answer: a · b = 2 Most linear algebra is unstable with floats! Find the scalar product a · b:

What’s wrong with IEEE 754? A start: •  No guarantee of identical results across systems •  It’s a guideline, not a standard •  Breaks the laws of algebra: a + (b + c) ≠ (a + b) + c a·(b + c) ≠ a·b + a·c •  Overflow to infinity creates infinite relative error. IEEE floats are weapons of math destruction.

What else is wrong with IEEE 754? •  Exponents usually take too many bits •  Accuracy is flat across a vast range, then falls off a cliff •  Subnormal numbers are a headache (“gradual underflow”) •  Divides are messy and slow •  Wasted bit patterns: “negative zero,” too many NaN values Do we really need 9,007,199,254,740,990 numbers to indicate something is Not a Number??

Floats: Designed by a 1980s Committee exponent man+ssa ± “The “hidden bit” is always 1… unless it’s not. Gradual underflow!” “Sign-magnitude integer? 2’s complement? 1’s complement? Aw heck, let’s just say you subtract 126.” “But subtract 127 if It’s a gradual underflow case!” “Wait, this makes comparing two floats really complicated. It’s not like comparing integers!” “If a result is Not a Number, we’ll make it a number. That’s logical, right? Call it ‘NaN’. Let’s have millions of NaN values instead of numerical values.” “I know. Let’s say that the square root of ‘nega+ve zero’ is also nega+ve zero!” “The sign bit doesn’t apply to zero. Except… some(mes.” “And the reciprocal of nega+ve zero can be nega+ve infinity! Cool. Except… uh-oh… infinity equals nega+ve infinity?” “My company wants to use guard bits so we can say our answers are be6er. The Standard beaer allow that!” “My company has fused mul+ply-adds. Let’s put that in the standard so everyone else has to redesign to catch up with us.” “Is that German guy s(ll telling us we need an exact dot product? Get him out of here.” “If the exponent is all 1s, let’s say that infinity is when the man+ssa is all 0s.” “And if a number gets too big, we’ll just round it to infinity.” “We can s+ck lots of flags in the processor. Someday, languages will support them.” “How should we round? Down? Up? Toward zero? To nearest? I’ve got it. Let’s put all four ways in there. Then we don’t have to pick.” “Let’s have more exponent bits than we really need. That’ll save transistors.” Good idea. Those transistors are so expensive!

Contrasting Calculation “Esthetics” IEEE Standard (1985) Floats, f = n × 2m m, n are integers Intervals [f1, f2], all x such that f1 ≤ x ≤ f2 Rounded: cheap, uncertain, “good enough” Rigorous: more work, certain, mathematical If you mix the two esthetics, you end up satisfying neither. “I need the hardware to protect me from NaN and overﬂow in my code.” “Really? And do you keep debugger mode turned on in your produc+on sodware?”

Posits use the Projective Reals •  Posits map reals to standard signed integers. •  Can be as small as 2 bits and still be useful! •  This eliminates “negative zero” and other IEEE float issues

Example with nbits = 3, es = 1. Value at 45° is always If bit string < 0, set sign to – and negate integer. useed useed = 2 es 2 Mapping to the Projective Reals

Rules for inserting new points Between ±maxpos and ±∞, scale up by useed. (New regime bit) Between 0 and ±minpos, scale down by useed. (New regime bit) Between 2m and 2n where n – m ≥ 2, insert 2(m + n)/2. (New exponent bit) useed 2

At nbits = 5, fraction bits appear. Between x and y where y ≤ 2x, insert (x + y)/2. Existing values stay put as trailing bits are added. Appending bits increases accuracy east and west, dynamic range north and south! useed 3

Posit Arithmetic: Beating floats at their own game Fixed size, nbits. es = exponent size = 0, 1, 2,… bits. es is also the number of times you square 2 to get useed: 2, 4, 16, 256, 65536,…

Posit Format Example Here, es = 3. Float-like circuitry is all that is needed (integer add, integer multiply, shifts to scale by 2k) Posits do not overflow. There is no NaN. Relative error ≤ 1. Simpler, faster circuits than IEEE 754 = 3.55⋯×10–6

What reals should we seek to represent? Studies show very rare use of values outside 10–13 to 1013. Central Limit Theorem says exponents distribute as a bell curve. ⟵ Typical distribution of real values used in computations 10-40 10-20 1 1020 1040

IEEE floats have about 7 decimals accuracy, flat except on the left This shows 32-bit float accuracy. Dynamic range of 83 decades. For 64-bit floats, exponent range is even sillier: 631 decades. Is flat accuracy over a huge range really what we need? IEEE float accuracy ⟵ Typical distribution of real values used in computations 10-40 10-20 1 1020 1040

Posits provide concise tapered accuracy Posits have same or beaer accuracy on the vast majority of calcula+ons, yet have greater dynamic range. This is only one major advantage of posits over ﬂoats. IEEE float accuracy ⟵ Equal or superior accuracy of posits ⟵ Typical distribution of real values used in computations ⟵ Equal or superior accuracy of posits 10-40 10-20 1 1020 1040

Posits: Designed by Mathematics �� 0 �� 1/64 �� 1/16 �� 1/8 �� 1/4 �� 3/8 �� 1/2 ��3/4 ��1 �� 3/2 �� 2 �� 3 �� 4 �� 8 �� 16 �� 64 �� ±∞ �� -64 �� -16 �� -8�� -4�� -3 �� -2 �� -3/2 �� -1 �� -3/4 �� -1/2 �� -3/8 �� -1/4 �� -1/8 �� -1/16 �� -1/64 •  1-to-1 map of binary integers to ordered real numbers •  Appending bits gives isomorphic increase in precision and dynamic range, automatically •  No “negative zero” •  No bit patterns wasted on “NaN” •  Simpler circuitry, less chip area •  No hidden and unused flags •  More information per bit •  As reproducible as integer math; no “hidden scratchpad” work •  Obeys mathematical laws nega+on symmetry reciprocal symmetry

Posits v. Floats: a metrics-based study •  Compare quarter-precision IEEE-style floats •  Sign bit, 4 exponent bits, 3 fraction bits •  smallsubnormal = 1/512; maxfloat = 240. •  Dynamic range of five orders of magnitude •  Two bit patterns that mean zero •  Fourteen bit patterns that mean “Not a Number” (NaN)

Float accuracy tapers only on left •  Min: 0.52 decimals •  Avg: 1.40 decimals •  Max: 1.55 decimals Graph shows decimals of accuracy from minfloat to maxfloat.

Posit accuracy tapers on both sides •  Min: 0.22 decimals •  Avg: 1.46 decimals •  Max: 1.86 decimals Graph shows decimals of accuracy from minpos to maxpos. But posits cover seven orders of magnitude, not five.

• Min: 0.22 decimals • Avg: 1.46 decimals • Max: 1.86 decimals • Min: 0.52 decimals • Avg: 1.40 decimals • Max: 1.55 decimals What do these remind you of?

Both graphs at once ⇦ Posits ⇦ Floats Where most calculations occur

Matching float dynamic ranges Note: Isaac Yonemoto has shown that 8-bit posits suffice for neural network training, with es = 0

8-bit posits for fast neural nets Sigmoid functions take 1 cycle in posits, vs. dozens of cycles with float math libraries. (Observation by I. Yonemoto)

ROUND 1 Unary Operations 1/x, √x, x2, log2(x), 2x

Closure under Reciprocation, 1/x

Closure under Square Root, √x

ROUND 2 Two-Argument Operations x + y, x × y, x ÷ y

Addition Closure Plot: Floats 18.533% exact 70.190% inexact 0.635% overflow 10.641% NaN Inexact results are magenta; the larger the error, the brighter the color. Addition can overflow, but cannot underflow.

Addition Closure Plot: Posits Only one case is a NaN: ±∞ + ±∞ With posits, a NaN interrupts the calculation. (Optional mode uses ±∞ as quiet NaN.) 25.005% exact 74.994% inexact 0.000% overflow 0.002% NaN

Multiplication Closure Plot: Floats 22.272% exact 51.233% inexact 3.345% underflow 12.500% overflow 10.651% NaN Floats score their first win: more exact products than posits… but at a terrible cost!

Multiplication Closure Plot: Posits 18.002% exact 81.995% inexact 0.000% underflow 0.000% overflow 0.003% NaN Only two cases produce a NaN: ±∞ × 0 0 × ±∞

The sorted losses tell the real story

ROUND 3 Higher-Precision Operations 32-bit formula evaluation LINPACK solved perfectly with… 16 bits!

Accuracy on a 32-Bit Budget 27 /10 − e π − 2 + 3( ) ⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ 67 /16 = 302.8827196…Compute: with ≤ 32 bits per number. Number Type Dynamic Range Answer Error IEEE 32-bit float 83 decades 302.912⋯ 0.0297⋯ 32-bit posits, no fusing 144 decades 302.8823⋯ 0.00040⋯ 32-bit posits, fused ops 144 decades 302.882713⋯ 0.0000063⋯ Posits beat floats at both dynamic range and accuracy.

LINPACK: Solve Ax = b 16-bit posits versus 16-bit floats •  A is a 100 by 100 dense matrix; random Aij entries in (0, 1) •  b chosen so x should be all 1s exactly •  Use classic LINPACK method: LU factorization with partial pivoting. Allow refinement using residual. IEEE 16-bit Floats Dynamic range: 12 decades Maximum error: 0.011 Decimal accuracy: 1.96 16-bit Posits Dynamic range: 16 decades Maximum error: NONE Decimal accuracy: ∞ Work funded in part by DARPA under contract BAA 16-39

LINPACK: 64-bit float versus 16-bit posits 64-bit IEEE Floats 1.0000000000000124344978758017532527446746826171875 0.9999999999999837907438404727145098149776458740234375 1.0000000000000193178806284777238033711910247802734375 0.99999999999998501198916756038670428097248077392578125 0.9999999999999911182158029987476766109466552734375 0.99999999999999900079927783735911361873149871826171875 ⋮ 16-bit Posits 1 1 1 1 1 1 ⋮

Building posit chips: The race is on •  Like IEEE floats, but simpler and less area (!) •  REX Computing shipping posit-based multiprocessor to A*STAR by 31 August 2017 •  Posit Research Inc. forming to fill out hardware-software-application stack •  Looks ideal for GPUs and Deep Learning; more arithmetic per chip •  Interested companies: Google, IBM, Intel, Samsung, Nvidia, and dozens of others •  LLNL confirmed superior posit performance on their proxy codes, LULESH and Euler2D •  Consortium for Next-Generation Arithmetic is organizing now. Meeting SC’17, SA’18. Regime Shifter Posit Adder

32-bit precision may suffice now! •  Early computers used 36-bit floats. •  IBM System 360 went to 32-bit. •  It wasn’t quite enough. •  What if 32-bit posits could replace 64-bit floats for big data workloads? •  Potential 2x shortcut to exascale. Or more.

�� 0 �� 1/64 �� 1/16 �� 1/8 �� 1/4 �� 3/8 �� 1/2 ��3/4 ��1 �� 3/2 �� 2 �� 3 �� 4 �� 8 �� 16 �� 64 �� ±∞ �� -64 �� -16 �� -8�� -4�� -3 �� -2 �� -3/2 �� -1 �� -3/4 �� -1/2 �� -3/8 �� -1/4 �� -1/8 �� -1/16 �� -1/64 •  Better accuracy with fewer bits •  Consistent, portable results •  Automatic control of rounding errors •  Clean, mathematical design •  Reduces energy, power, bandwidth, storage, and programming costs •  Potentially halves costs for abundant data challenges Summary

For More Information http://www.posithub.org http://www.johngustafson.net/pdfs/BeatingFloatingPoint- superfriversion.pdf https://www.youtube.com/watch?v=aP0Y1uAA-2Y https://github.com/interplanetary-robot/SigmoidNumbers

Beating Floating Point at its Own Game: Posit Arithmetic

More Related Content

What's hot

Similar to Beating Floating Point at its Own Game: Posit Arithmetic

More from inside-BigData.com

Recently uploaded

Beating Floating Point at its Own Game: Posit Arithmetic