Beating Floating Point at its Own Game: Posit Arithmetic John L. Gustafson Visiting Scientist at A*STAR and Professor at National University of Singapore
The “Memory Wall” Opera&on Energy 64-bit	floa+ng-point	mul+ply-add 0.2	nanojoules Read	64	bits	from	memory	(DRAM) 12	nanojoules The	issue	is	communica(on,	not	computa+on: Run	+me,	parts	cost,	electric	power. What	if	we	could	cut	communica+on	in	half by	doubling	informa(on	per	bit?
Decades of asking “How do you know your answer is correct?” •  “(Laughter) “What do you mean?” •  “Well, we used double precision.” •  “Oh. We always get that answer.” •  “Other people get that answer.”
Precision versus Accuracy 150,000 pixels 432 pixels
Metrics for Number Systems •  Relative Error = |(correct x – computed x) / correct x| •  Dynamic range = log10(maxreal / minreal) decades •  Decimal Accuracy = –log10(Relative Error) •  Percentage of operations that are exact (closure under + – × ÷ √ etc.) •  Average accuracy loss when inexact •  Entropy per bit (maximize information) •  Accuracy benchmarks: simple formulas, linear equation solving, math kernels…
COMPUTERS  THEN
COMPUTERS  NOW
ARITHMETIC  THEN
ARITHMETIC  NOW
Analogy: Printing Technology 1970: 30 sec 2017: 30 sec
Challenges for the Existing Arithmetic •  No guarantee of repeatable or portable behavior (!) •  Insufficient 32-bit accuracy forces wasteful use of 64-bit types •  Fails to obey laws of algebra (associative, distributive laws) •  Poor handling of overflow, underflow, Not-a-Number results •  Dynamic ranges are too large, stealing accuracy needed for workloads •  Rounding errors are invisible, hazardous, and costly to debug •  Computations are unstable when parallelized IEEE Standard Floats are a storage-inefficient, 1980s-era design.
Why worry about floating point? a = (3.2e8, 1, –1, 8.0e7) b = (4.0e7, 1, –1, –1.6e8) Single Precision, 32 bits: Note: All values are integers that can be expressed exactly in the IEEE 754 Standard floating-point format (single or double precision) Double Precision, 64 bits: a · b = 0 a · b = 0 Correct answer: a · b = 2 Most linear algebra is unstable with floats! Find the scalar product a · b:
What’s wrong with IEEE 754? A start: •  No guarantee of identical results across systems •  It’s a guideline, not a standard •  Breaks the laws of algebra: a + (b + c) ≠ (a + b) + c a·(b + c) ≠ a·b + a·c •  Overflow to infinity creates infinite relative error. IEEE floats are weapons of math destruction.
What else is wrong with IEEE 754? •  Exponents usually take too many bits •  Accuracy is flat across a vast range, then falls off a cliff •  Subnormal numbers are a headache (“gradual underflow”) •  Divides are messy and slow •  Wasted bit patterns: “negative zero,” too many NaN values Do we really need 9,007,199,254,740,990 numbers to indicate something is Not a Number??
Floats: Designed by a 1980s Committee exponent man+ssa	± “The	“hidden	bit”	is always	1…	unless	it’s	not. Gradual	underflow!” “Sign-magnitude integer?	2’s complement?	1’s complement?	Aw heck,	let’s	just	say	you subtract	126.” “But	subtract	127 if	It’s	a	gradual underflow	case!” “Wait,	this	makes comparing	two	floats really	complicated. It’s	not	like	comparing integers!” “If	a	result	is	Not	a Number,	we’ll	make	it	a number.	That’s	logical, right?	Call	it	‘NaN’.	Let’s have	millions	of	NaN values	instead	of numerical	values.” “I	know.	Let’s	say	that the	square	root	of ‘nega+ve	zero’	is	also nega+ve	zero!” “The	sign	bit	doesn’t apply	to	zero.	Except… some(mes.” “And	the	reciprocal	of	nega+ve zero	can	be	nega+ve	infinity! Cool.	Except…	uh-oh…	infinity equals	nega+ve	infinity?” “My	company	wants	to use	guard	bits	so	we	can say	our	answers	are be6er.	The	Standard beaer	allow	that!” “My	company	has	fused	mul+ply-adds.	Let’s put	that	in	the	standard	so	everyone	else has	to	redesign	to	catch	up	with	us.” “Is	that	German	guy	s(ll telling	us	we	need	an	exact dot	product?	Get	him	out of	here.” “If	the	exponent	is all	1s,	let’s	say	that infinity	is	when	the man+ssa	is	all	0s.” “And	if	a number	gets too	big,	we’ll just	round	it to	infinity.” “We	can	s+ck	lots	of flags	in	the	processor. Someday,	languages will	support	them.” “How	should	we round?	Down? Up?	Toward	zero? To	nearest? I’ve	got	it.	Let’s	put all	four	ways	in	there. Then	we	don’t	have to	pick.” “Let’s	have	more exponent	bits than	we	really need.	That’ll	save transistors.” Good	idea.	Those transistors	are	so expensive!
Contrasting Calculation “Esthetics” IEEE Standard (1985) Floats, f = n × 2m m, n are integers Intervals [f1, f2], all x such that f1 ≤ x ≤ f2 Rounded: cheap, uncertain, “good enough” Rigorous: more work, certain, mathematical If you mix the two esthetics, you end up satisfying neither. “I	need	the	hardware	to	protect	me	from NaN	and	overflow	in	my	code.” “Really?	And	do	you	keep	debugger	mode turned	on	in	your	produc+on	sodware?”
Posits use the Projective Reals •  Posits map reals to standard signed integers. •  Can be as small as 2 bits and still be useful! •  This eliminates “negative zero” and other IEEE float issues
Example with nbits = 3, es = 1. Value at 45° is always If bit string < 0, set sign to – and negate integer. useed useed = 2 es	2 Mapping to the Projective Reals
Rules for inserting new points Between ±maxpos and ±∞, scale up by useed. (New regime bit) Between 0 and ±minpos, scale down by useed. (New regime bit) Between 2m and 2n where n – m ≥ 2, insert 2(m + n)/2. (New exponent bit) useed 2
At nbits = 5, fraction bits appear. Between x and y where y ≤ 2x, insert (x + y)/2. Existing values stay put as trailing bits are added. Appending bits increases accuracy east and west, dynamic range north and south! useed 3
Posit Arithmetic: Beating floats at their own game Fixed size, nbits. es = exponent size = 0, 1, 2,… bits. es is also the number of times you square 2 to get useed: 2, 4, 16, 256, 65536,…
Posit Format Example Here, es = 3. Float-like circuitry is all that is needed (integer add, integer multiply, shifts to scale by 2k) Posits do not overflow. There is no NaN. Relative error ≤ 1. Simpler, faster circuits than IEEE 754 = 3.55⋯×10–6
What reals should we seek to represent? Studies	show	very	rare	use	of	values	outside	10–13	to	1013. Central	Limit	Theorem	says	exponents	distribute	as	a	bell	curve. ⟵ Typical distribution of real values used in computations 10-40 10-20 1 1020 1040
IEEE floats have about 7 decimals accuracy, flat except on the left This	shows	32-bit	float	accuracy.	Dynamic	range	of	83	decades. For	64-bit	floats,	exponent	range	is	even	sillier:	631	decades. Is	flat	accuracy	over	a	huge	range	really	what	we	need? IEEE float accuracy ⟵ Typical distribution of real values used in computations 10-40 10-20 1 1020 1040
Posits provide concise tapered accuracy Posits	have	same	or	beaer	accuracy	on	the	vast	majority	of calcula+ons,	yet	have	greater	dynamic	range. This	is	only	one	major	advantage	of	posits	over	floats. IEEE float accuracy ⟵ Equal or superior accuracy of posits ⟵ Typical distribution of real values used in computations ⟵ Equal or superior accuracy of posits 10-40 10-20 1 1020 1040
Posits: Designed by Mathematics ����� 0 ����� 1/64 ����� 1/16 ����� 1/8 ����� 1/4 ����� 3/8 ����� 1/2 �����3/4 �����1 ����� 3/2 ����� 2 ����� 3 ����� 4 ����� 8 ����� 16 ����� 64 ����� ±∞ ����� -64 ����� -16 ����� -8����� -4����� -3 ����� -2 ����� -3/2 ����� -1 ����� -3/4 ����� -1/2 ����� -3/8 ����� -1/4 ����� -1/8 ����� -1/16 ����� -1/64 •  1-to-1 map of binary integers to ordered real numbers •  Appending bits gives isomorphic increase in precision and dynamic range, automatically •  No “negative zero” •  No bit patterns wasted on “NaN” •  Simpler circuitry, less chip area •  No hidden and unused flags •  More information per bit •  As reproducible as integer math; no “hidden scratchpad” work •  Obeys mathematical laws nega+on	symmetry reciprocal symmetry
Posits v. Floats: a metrics-based study •  Compare quarter-precision IEEE-style floats •  Sign bit, 4 exponent bits, 3 fraction bits •  smallsubnormal = 1/512; maxfloat = 240. •  Dynamic range of five orders of magnitude •  Two bit patterns that mean zero •  Fourteen bit patterns that mean “Not a Number” (NaN)
Float accuracy tapers only on left •  Min: 0.52 decimals •  Avg: 1.40 decimals •  Max: 1.55 decimals Graph shows decimals of accuracy from minfloat to maxfloat.
Posit accuracy tapers on both sides •  Min: 0.22 decimals •  Avg: 1.46 decimals •  Max: 1.86 decimals Graph shows decimals of accuracy from minpos to maxpos. But posits cover seven orders of magnitude, not five.
• Min: 0.22 decimals • Avg: 1.46 decimals • Max: 1.86 decimals • Min: 0.52 decimals • Avg: 1.40 decimals • Max: 1.55 decimals What do these remind you of?
• Min: 0.22 decimals • Avg: 1.46 decimals • Max: 1.86 decimals • Min: 0.52 decimals • Avg: 1.40 decimals • Max: 1.55 decimals What do these remind you of?
Both graphs at once ⇦ Posits ⇦ Floats Where most calculations occur
Matching float dynamic ranges Note: Isaac Yonemoto has shown that 8-bit posits suffice for neural network training, with es = 0
8-bit posits for fast neural nets Sigmoid functions take 1 cycle in posits, vs. dozens of cycles with float math libraries. (Observation by I. Yonemoto)
ROUND 1 Unary Operations 1/x, √x, x2, log2(x), 2x
Closure under Reciprocation, 1/x
Closure under Square Root, √x
Closure under Squaring, x2
Closure under log2(x)
Closure under 2x
ROUND 2 Two-Argument Operations x + y, x × y, x ÷ y
Addition Closure Plot: Floats 18.533% exact 70.190% inexact 0.635% overflow 10.641% NaN Inexact results are magenta; the larger the error, the brighter the color. Addition can overflow, but cannot underflow.
Addition Closure Plot: Posits Only one case is a NaN: ±∞ + ±∞ With posits, a NaN interrupts the calculation. (Optional mode uses ±∞ as quiet NaN.) 25.005% exact 74.994% inexact 0.000% overflow 0.002% NaN
All decimal losses, sorted
Multiplication Closure Plot: Floats 22.272% exact 51.233% inexact 3.345% underflow 12.500% overflow 10.651% NaN Floats score their first win: more exact products than posits… but at a terrible cost!
Multiplication Closure Plot: Posits 18.002% exact 81.995% inexact 0.000% underflow 0.000% overflow 0.003% NaN Only two cases produce a NaN: ±∞ × 0 0 × ±∞
The sorted losses tell the real story
ROUND 3 Higher-Precision Operations 32-bit formula evaluation LINPACK solved perfectly with… 16 bits!
Accuracy on a 32-Bit Budget 27 /10 − e π − 2 + 3( ) ⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ 67 /16 = 302.8827196…Compute: with ≤ 32 bits per number. Number Type Dynamic Range Answer Error IEEE 32-bit float 83 decades 302.912⋯ 0.0297⋯ 32-bit posits, no fusing 144 decades 302.8823⋯ 0.00040⋯ 32-bit posits, fused ops 144 decades 302.882713⋯ 0.0000063⋯ Posits beat floats at both dynamic range and accuracy.
LINPACK: Solve Ax = b 16-bit posits versus 16-bit floats •  A is a 100 by 100 dense matrix; random Aij entries in (0, 1) •  b chosen so x should be all 1s exactly •  Use classic LINPACK method: LU factorization with partial pivoting. Allow refinement using residual. IEEE 16-bit Floats Dynamic range: 12 decades Maximum error: 0.011 Decimal accuracy: 1.96 16-bit Posits Dynamic range: 16 decades Maximum error: NONE Decimal accuracy: ∞ Work funded in part by DARPA under contract BAA 16-39
LINPACK: 64-bit float versus 16-bit posits 64-bit IEEE Floats 1.0000000000000124344978758017532527446746826171875 0.9999999999999837907438404727145098149776458740234375 1.0000000000000193178806284777238033711910247802734375 0.99999999999998501198916756038670428097248077392578125 0.9999999999999911182158029987476766109466552734375 0.99999999999999900079927783735911361873149871826171875 ⋮ 16-bit Posits 1 1 1 1 1 1 ⋮
Building posit chips: The race is on •  Like IEEE floats, but simpler and less area (!) •  REX Computing shipping posit-based multiprocessor to A*STAR by 31 August 2017 •  Posit Research Inc. forming to fill out hardware-software-application stack •  Looks ideal for GPUs and Deep Learning; more arithmetic per chip •  Interested companies: Google, IBM, Intel, Samsung, Nvidia, and dozens of others •  LLNL confirmed superior posit performance on their proxy codes, LULESH and Euler2D •  Consortium for Next-Generation Arithmetic is organizing now. Meeting SC’17, SA’18. Regime Shifter Posit Adder
32-bit precision may suffice now! •  Early computers used 36-bit floats. •  IBM System 360 went to 32-bit. •  It wasn’t quite enough. •  What if 32-bit posits could replace 64-bit floats for big data workloads? •  Potential 2x shortcut to exascale. Or more.
����� 0 ����� 1/64 ����� 1/16 ����� 1/8 ����� 1/4 ����� 3/8 ����� 1/2 �����3/4 �����1 ����� 3/2 ����� 2 ����� 3 ����� 4 ����� 8 ����� 16 ����� 64 ����� ±∞ ����� -64 ����� -16 ����� -8����� -4����� -3 ����� -2 ����� -3/2 ����� -1 ����� -3/4 ����� -1/2 ����� -3/8 ����� -1/4 ����� -1/8 ����� -1/16 ����� -1/64 •  Better accuracy with fewer bits •  Consistent, portable results •  Automatic control of rounding errors •  Clean, mathematical design •  Reduces energy, power, bandwidth, storage, and programming costs •  Potentially halves costs for abundant data challenges Summary
For More Information http://www.posithub.org http://www.johngustafson.net/pdfs/BeatingFloatingPoint- superfriversion.pdf https://www.youtube.com/watch?v=aP0Y1uAA-2Y https://github.com/interplanetary-robot/SigmoidNumbers

Beating Floating Point at its Own Game: Posit Arithmetic

  • 1.
    Beating Floating Pointat its Own Game: Posit Arithmetic John L. Gustafson Visiting Scientist at A*STAR and Professor at National University of Singapore
  • 2.
    The “Memory Wall” Opera&onEnergy 64-bit floa+ng-point mul+ply-add 0.2 nanojoules Read 64 bits from memory (DRAM) 12 nanojoules The issue is communica(on, not computa+on: Run +me, parts cost, electric power. What if we could cut communica+on in half by doubling informa(on per bit?
  • 3.
    Decades of asking“How do you know your answer is correct?” •  “(Laughter) “What do you mean?” •  “Well, we used double precision.” •  “Oh. We always get that answer.” •  “Other people get that answer.”
  • 4.
  • 5.
    Metrics for NumberSystems •  Relative Error = |(correct x – computed x) / correct x| •  Dynamic range = log10(maxreal / minreal) decades •  Decimal Accuracy = –log10(Relative Error) •  Percentage of operations that are exact (closure under + – × ÷ √ etc.) •  Average accuracy loss when inexact •  Entropy per bit (maximize information) •  Accuracy benchmarks: simple formulas, linear equation solving, math kernels…
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
    Challenges for theExisting Arithmetic •  No guarantee of repeatable or portable behavior (!) •  Insufficient 32-bit accuracy forces wasteful use of 64-bit types •  Fails to obey laws of algebra (associative, distributive laws) •  Poor handling of overflow, underflow, Not-a-Number results •  Dynamic ranges are too large, stealing accuracy needed for workloads •  Rounding errors are invisible, hazardous, and costly to debug •  Computations are unstable when parallelized IEEE Standard Floats are a storage-inefficient, 1980s-era design.
  • 12.
    Why worry aboutfloating point? a = (3.2e8, 1, –1, 8.0e7) b = (4.0e7, 1, –1, –1.6e8) Single Precision, 32 bits: Note: All values are integers that can be expressed exactly in the IEEE 754 Standard floating-point format (single or double precision) Double Precision, 64 bits: a · b = 0 a · b = 0 Correct answer: a · b = 2 Most linear algebra is unstable with floats! Find the scalar product a · b:
  • 13.
    What’s wrong withIEEE 754? A start: •  No guarantee of identical results across systems •  It’s a guideline, not a standard •  Breaks the laws of algebra: a + (b + c) ≠ (a + b) + c a·(b + c) ≠ a·b + a·c •  Overflow to infinity creates infinite relative error. IEEE floats are weapons of math destruction.
  • 14.
    What else iswrong with IEEE 754? •  Exponents usually take too many bits •  Accuracy is flat across a vast range, then falls off a cliff •  Subnormal numbers are a headache (“gradual underflow”) •  Divides are messy and slow •  Wasted bit patterns: “negative zero,” too many NaN values Do we really need 9,007,199,254,740,990 numbers to indicate something is Not a Number??
  • 15.
    Floats: Designed bya 1980s Committee exponent man+ssa ± “The “hidden bit” is always 1… unless it’s not. Gradual underflow!” “Sign-magnitude integer? 2’s complement? 1’s complement? Aw heck, let’s just say you subtract 126.” “But subtract 127 if It’s a gradual underflow case!” “Wait, this makes comparing two floats really complicated. It’s not like comparing integers!” “If a result is Not a Number, we’ll make it a number. That’s logical, right? Call it ‘NaN’. Let’s have millions of NaN values instead of numerical values.” “I know. Let’s say that the square root of ‘nega+ve zero’ is also nega+ve zero!” “The sign bit doesn’t apply to zero. Except… some(mes.” “And the reciprocal of nega+ve zero can be nega+ve infinity! Cool. Except… uh-oh… infinity equals nega+ve infinity?” “My company wants to use guard bits so we can say our answers are be6er. The Standard beaer allow that!” “My company has fused mul+ply-adds. Let’s put that in the standard so everyone else has to redesign to catch up with us.” “Is that German guy s(ll telling us we need an exact dot product? Get him out of here.” “If the exponent is all 1s, let’s say that infinity is when the man+ssa is all 0s.” “And if a number gets too big, we’ll just round it to infinity.” “We can s+ck lots of flags in the processor. Someday, languages will support them.” “How should we round? Down? Up? Toward zero? To nearest? I’ve got it. Let’s put all four ways in there. Then we don’t have to pick.” “Let’s have more exponent bits than we really need. That’ll save transistors.” Good idea. Those transistors are so expensive!
  • 16.
    Contrasting Calculation “Esthetics” IEEEStandard (1985) Floats, f = n × 2m m, n are integers Intervals [f1, f2], all x such that f1 ≤ x ≤ f2 Rounded: cheap, uncertain, “good enough” Rigorous: more work, certain, mathematical If you mix the two esthetics, you end up satisfying neither. “I need the hardware to protect me from NaN and overflow in my code.” “Really? And do you keep debugger mode turned on in your produc+on sodware?”
  • 18.
    Posits use theProjective Reals •  Posits map reals to standard signed integers. •  Can be as small as 2 bits and still be useful! •  This eliminates “negative zero” and other IEEE float issues
  • 19.
    Example with nbits =3, es = 1. Value at 45° is always If bit string < 0, set sign to – and negate integer. useed useed = 2 es 2 Mapping to the Projective Reals
  • 20.
    Rules for insertingnew points Between ±maxpos and ±∞, scale up by useed. (New regime bit) Between 0 and ±minpos, scale down by useed. (New regime bit) Between 2m and 2n where n – m ≥ 2, insert 2(m + n)/2. (New exponent bit) useed 2
  • 21.
    At nbits =5, fraction bits appear. Between x and y where y ≤ 2x, insert (x + y)/2. Existing values stay put as trailing bits are added. Appending bits increases accuracy east and west, dynamic range north and south! useed 3
  • 22.
    Posit Arithmetic: Beating floatsat their own game Fixed size, nbits. es = exponent size = 0, 1, 2,… bits. es is also the number of times you square 2 to get useed: 2, 4, 16, 256, 65536,…
  • 23.
    Posit Format Example Here,es = 3. Float-like circuitry is all that is needed (integer add, integer multiply, shifts to scale by 2k) Posits do not overflow. There is no NaN. Relative error ≤ 1. Simpler, faster circuits than IEEE 754 = 3.55⋯×10–6
  • 24.
    What reals shouldwe seek to represent? Studies show very rare use of values outside 10–13 to 1013. Central Limit Theorem says exponents distribute as a bell curve. ⟵ Typical distribution of real values used in computations 10-40 10-20 1 1020 1040
  • 25.
    IEEE floats haveabout 7 decimals accuracy, flat except on the left This shows 32-bit float accuracy. Dynamic range of 83 decades. For 64-bit floats, exponent range is even sillier: 631 decades. Is flat accuracy over a huge range really what we need? IEEE float accuracy ⟵ Typical distribution of real values used in computations 10-40 10-20 1 1020 1040
  • 26.
    Posits provide concisetapered accuracy Posits have same or beaer accuracy on the vast majority of calcula+ons, yet have greater dynamic range. This is only one major advantage of posits over floats. IEEE float accuracy ⟵ Equal or superior accuracy of posits ⟵ Typical distribution of real values used in computations ⟵ Equal or superior accuracy of posits 10-40 10-20 1 1020 1040
  • 27.
    Posits: Designed byMathematics ����� 0 ����� 1/64 ����� 1/16 ����� 1/8 ����� 1/4 ����� 3/8 ����� 1/2 �����3/4 �����1 ����� 3/2 ����� 2 ����� 3 ����� 4 ����� 8 ����� 16 ����� 64 ����� ±∞ ����� -64 ����� -16 ����� -8����� -4����� -3 ����� -2 ����� -3/2 ����� -1 ����� -3/4 ����� -1/2 ����� -3/8 ����� -1/4 ����� -1/8 ����� -1/16 ����� -1/64 •  1-to-1 map of binary integers to ordered real numbers •  Appending bits gives isomorphic increase in precision and dynamic range, automatically •  No “negative zero” •  No bit patterns wasted on “NaN” •  Simpler circuitry, less chip area •  No hidden and unused flags •  More information per bit •  As reproducible as integer math; no “hidden scratchpad” work •  Obeys mathematical laws nega+on symmetry reciprocal symmetry
  • 28.
    Posits v. Floats:a metrics-based study •  Compare quarter-precision IEEE-style floats •  Sign bit, 4 exponent bits, 3 fraction bits •  smallsubnormal = 1/512; maxfloat = 240. •  Dynamic range of five orders of magnitude •  Two bit patterns that mean zero •  Fourteen bit patterns that mean “Not a Number” (NaN)
  • 29.
    Float accuracy tapersonly on left •  Min: 0.52 decimals •  Avg: 1.40 decimals •  Max: 1.55 decimals Graph shows decimals of accuracy from minfloat to maxfloat.
  • 30.
    Posit accuracy taperson both sides •  Min: 0.22 decimals •  Avg: 1.46 decimals •  Max: 1.86 decimals Graph shows decimals of accuracy from minpos to maxpos. But posits cover seven orders of magnitude, not five.
  • 31.
    • Min: 0.22 decimals • Avg: 1.46 decimals • Max:1.86 decimals • Min: 0.52 decimals • Avg: 1.40 decimals • Max: 1.55 decimals What do these remind you of?
  • 32.
    • Min: 0.22 decimals • Avg: 1.46 decimals • Max:1.86 decimals • Min: 0.52 decimals • Avg: 1.40 decimals • Max: 1.55 decimals What do these remind you of?
  • 33.
    Both graphs atonce ⇦ Posits ⇦ Floats Where most calculations occur
  • 34.
    Matching float dynamicranges Note: Isaac Yonemoto has shown that 8-bit posits suffice for neural network training, with es = 0
  • 35.
    8-bit posits forfast neural nets Sigmoid functions take 1 cycle in posits, vs. dozens of cycles with float math libraries. (Observation by I. Yonemoto)
  • 36.
    ROUND 1 Unary Operations 1/x,√x, x2, log2(x), 2x
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
    Addition Closure Plot:Floats 18.533% exact 70.190% inexact 0.635% overflow 10.641% NaN Inexact results are magenta; the larger the error, the brighter the color. Addition can overflow, but cannot underflow.
  • 44.
    Addition Closure Plot:Posits Only one case is a NaN: ±∞ + ±∞ With posits, a NaN interrupts the calculation. (Optional mode uses ±∞ as quiet NaN.) 25.005% exact 74.994% inexact 0.000% overflow 0.002% NaN
  • 45.
  • 46.
    Multiplication Closure Plot:Floats 22.272% exact 51.233% inexact 3.345% underflow 12.500% overflow 10.651% NaN Floats score their first win: more exact products than posits… but at a terrible cost!
  • 47.
    Multiplication Closure Plot:Posits 18.002% exact 81.995% inexact 0.000% underflow 0.000% overflow 0.003% NaN Only two cases produce a NaN: ±∞ × 0 0 × ±∞
  • 48.
    The sorted lossestell the real story
  • 49.
    ROUND 3 Higher-Precision Operations 32-bitformula evaluation LINPACK solved perfectly with… 16 bits!
  • 50.
    Accuracy on a32-Bit Budget 27 /10 − e π − 2 + 3( ) ⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ 67 /16 = 302.8827196…Compute: with ≤ 32 bits per number. Number Type Dynamic Range Answer Error IEEE 32-bit float 83 decades 302.912⋯ 0.0297⋯ 32-bit posits, no fusing 144 decades 302.8823⋯ 0.00040⋯ 32-bit posits, fused ops 144 decades 302.882713⋯ 0.0000063⋯ Posits beat floats at both dynamic range and accuracy.
  • 51.
    LINPACK: Solve Ax= b 16-bit posits versus 16-bit floats •  A is a 100 by 100 dense matrix; random Aij entries in (0, 1) •  b chosen so x should be all 1s exactly •  Use classic LINPACK method: LU factorization with partial pivoting. Allow refinement using residual. IEEE 16-bit Floats Dynamic range: 12 decades Maximum error: 0.011 Decimal accuracy: 1.96 16-bit Posits Dynamic range: 16 decades Maximum error: NONE Decimal accuracy: ∞ Work funded in part by DARPA under contract BAA 16-39
  • 52.
    LINPACK: 64-bit floatversus 16-bit posits 64-bit IEEE Floats 1.0000000000000124344978758017532527446746826171875 0.9999999999999837907438404727145098149776458740234375 1.0000000000000193178806284777238033711910247802734375 0.99999999999998501198916756038670428097248077392578125 0.9999999999999911182158029987476766109466552734375 0.99999999999999900079927783735911361873149871826171875 ⋮ 16-bit Posits 1 1 1 1 1 1 ⋮
  • 53.
    Building posit chips:The race is on •  Like IEEE floats, but simpler and less area (!) •  REX Computing shipping posit-based multiprocessor to A*STAR by 31 August 2017 •  Posit Research Inc. forming to fill out hardware-software-application stack •  Looks ideal for GPUs and Deep Learning; more arithmetic per chip •  Interested companies: Google, IBM, Intel, Samsung, Nvidia, and dozens of others •  LLNL confirmed superior posit performance on their proxy codes, LULESH and Euler2D •  Consortium for Next-Generation Arithmetic is organizing now. Meeting SC’17, SA’18. Regime Shifter Posit Adder
  • 54.
    32-bit precision maysuffice now! •  Early computers used 36-bit floats. •  IBM System 360 went to 32-bit. •  It wasn’t quite enough. •  What if 32-bit posits could replace 64-bit floats for big data workloads? •  Potential 2x shortcut to exascale. Or more.
  • 55.
    ����� 0 ����� 1/64 ����� 1/16 ����� 1/8 ����� 1/4 ����� 3/8 ����� 1/2 �����3/4 �����1 ����� 3/2 ����� 2 ����� 3 ����� 4 ����� 8 ����� 16 ����� 64 ����� ±∞ ����� -64 ����� -16 ����� -8����� -4����� -3 ����� -2 ����� -3/2 ����� -1 �����-3/4 ����� -1/2 ����� -3/8 ����� -1/4 ����� -1/8 ����� -1/16 ����� -1/64 •  Better accuracy with fewer bits •  Consistent, portable results •  Automatic control of rounding errors •  Clean, mathematical design •  Reduces energy, power, bandwidth, storage, and programming costs •  Potentially halves costs for abundant data challenges Summary
  • 56.