1 Using FPGA in Embedded Devices Andriy Smolskyy Consultant, Engineering 29.03.2017
2 What is FPGA?
3 • Transistor-Transistor Logic - TTL • Programmable Array Logic - PAL • Programmable Logic Device – PLD • Complex PLD – CPLD • FPGA • ASIC History of Programmable Logic
4 Digital Design with TTL Logic Truth table
5 Digital Design with TTL Logic Truth table Karnaugh map
6 Digital Design with TTL Logic Truth table Karnaugh map Logic expression
7 Digital Design with TTL Logic Truth table Karnaugh map Logic expression Final implementation
8 • Logic gates and registers are fixed • Programmable sum of products array and output control Programmable Array Logic (PAL) Implementation Advantages • Fewer devices required • Lower cost • Power savings • Simpler to test and debug • Design security (prevent reverse engineering) • In-system reprogrammability! (in some cases)
9 From PAL to Programmable Logic Device (PLD) • Arrange multiple PAL arrays in a single device
10 • Combine multiple PLDs in single device with programmable interconnect and I/O From PLD to Complex PLD (CPLD) Implementation Advantages • Ample amounts of logic and advanced configurable I/Os • Programmable routing • Instant on • Non-volatile configuration • Reprogrammable
11 Interconnection Problem: Routing Takes Too Much SpaceGlobal Routing Row & Column Routing
12 • LUT inputs are mux select lines • FPGA LABs made up of logic elements (LEs) instead of product terms and macrocells • Solves the Interconnection Problem FPGA LUT and LAB
13 • LABs arranged in an array • Programmable interconnect • Interconnect may span all or part of the array Field Programmable Gate Array (FPGA) Implementation Advantages • Easier to create complex functions through LE cascading • Integration of ready functions and IP blocks: PLLs, memory, arithmetic • High density, high performance • Fast programming
14 • Pros: - Fast time to Market: easy to develop a new device with specific logic or interfaces - Easy to upgrade device logic, fix bugs in hardware - Specific devices: reconfigurable DSP, digital filters • Cons: - Need to be programmed at power on - It is hard to achieve 100% device utilization FPGA vs ASIC • Pros: - Higher performance: consume less power and can operate faster on higher speed - Cheaper in mass production - No configuration at power-on required - Smaller chip size • Cons: - Additional expenses in design preparation - Impossible to fix hardware bugs FPGA ASIC
15 Software and hardware development aspects
16 System on Chip (SoC) + FPGA
17 • In general FPGA generated controllers are similar to Microcontrollers’ peripheral devices • FPGA requires programming of each start, controllers might be not ready at the system start • Take care with DMA, MMU, virtual memory and caching operations • In some designs FPGA can control CPU peripheral devices Software and hardware development aspects
18 • Verilog • VDHL • Visual development FPGA design development
19 • Core IP - SDRAM Controllers - Ethernet PHY, Custom Transceiver PHY - PCIe PHY - SDi, Display Port • Megafunctions - PLL - I/O - Custom logic blocks FPGA design development
20 High speed data processing: OpenCL in FPGA
21 A simple CPU
22 Load immediate value into register
23 Load memory value into register
24 Store register value into memory
25 Add two registers, store result in register
26 A simple program Mem[100] += 42 * Mem[101] CPU instructions: R0  Load Mem[100] R1  Load Mem[101] R2  Load #42 R2  Mul R1, R2 R0  Add R2, R0 Store R0  Mem[100]
27 Single CPU activity, step by step Time
28 Unroll the CPU hardware… Space
29 … and specialize by position 1. Instructions are fixed. Remove “Fetch”
30 … and specialize 1. Instructions are fixed. Remove “Fetch” 2. Remove unused ALU operations
31 … and specialize 1. Instructions are fixed. Remove “Fetch” 2. Remove unused ALU operations 3. Remove unused Load / Store
32 … and specialize 1. Instructions are fixed. Remove “Fetch” 2. Remove unused ALU operations 3. Remove unused Load / Store 4. Wire up registers properly. And propagate state.
33 … and specialize 1. Instructions are fixed. Remove “Fetch” 2. Remove unused ALU operations 3. Remove unused Load / Store 4. Wire up registers properly. And propagate state 5. Remove dead data
34 … and specialize 1. Instructions are fixed. Remove “Fetch” 2. Remove unused ALU operations 3. Remove unused Load / Store 4. Wire up registers properly. And propagate state 5. Remove dead data 6. Reschedule!
35 FPGA datapath = Your algorithm, in silicon • Build exactly what you need: - Operations - Data widths - Memory size, configuration • Efficiency: - Throughput - Latency - Power
36 OpenCL  FPGA • Host + Accelerator Programming Model • Sequential Host program on microprocessor • Function offload onto a highly parallel accelerator device main() { read_data( … ); maninpulate( … ); clEnqueueWriteBuffer( … ); clEnqueueNDRange(…,sum,…); clEnqueueReadBuffer( … ); display_result( … ); } __kernel void sum(__global float *a, __global float *b, __global float *y) { int gid = get_global_id(0); y[gid] = a[gid] + b[gid]; } Host Code FPGA Design User Application Algorithm
37 Loop Pipelining • Analyze any dependencies between iterations • Schedule these operations • Launch the next iteration as soon as possible float array[M]; for (int i=0; i < n*numSets; i++) { for (int j=0; j < M-1; j++) array[j] = array[j+1]; array[M-1] = a[i]; for (int j=0; j < M; j++) answer[i] += array[j] * coefs[j]; } At this point, we can launch the next iteration
38 Loop Pipelining Example With Loop PipeliningNo Loop Pipelining Looks almost like parallel thread execution
39 Digital Filter z-1 z-1 z-1 z-1 z-1 z-1 z-1 X X X X X X X X C0 C1 C2 C3 C4 C5 C6 C7 x(n) + y(n)
40 • Q&A FPGA in Embedded Devices
41 Thank you Andriy Smolskyy Consultant, Engineering andriy.smolskyy@globallogic.com +380-67-701-8637

Using FPGA in Embedded Devices

  • 1.
    1 Using FPGA inEmbedded Devices Andriy Smolskyy Consultant, Engineering 29.03.2017
  • 2.
  • 3.
    3 • Transistor-Transistor Logic- TTL • Programmable Array Logic - PAL • Programmable Logic Device – PLD • Complex PLD – CPLD • FPGA • ASIC History of Programmable Logic
  • 4.
    4 Digital Design withTTL Logic Truth table
  • 5.
    5 Digital Design withTTL Logic Truth table Karnaugh map
  • 6.
    6 Digital Design withTTL Logic Truth table Karnaugh map Logic expression
  • 7.
    7 Digital Design withTTL Logic Truth table Karnaugh map Logic expression Final implementation
  • 8.
    8 • Logic gatesand registers are fixed • Programmable sum of products array and output control Programmable Array Logic (PAL) Implementation Advantages • Fewer devices required • Lower cost • Power savings • Simpler to test and debug • Design security (prevent reverse engineering) • In-system reprogrammability! (in some cases)
  • 9.
    9 From PAL toProgrammable Logic Device (PLD) • Arrange multiple PAL arrays in a single device
  • 10.
    10 • Combine multiplePLDs in single device with programmable interconnect and I/O From PLD to Complex PLD (CPLD) Implementation Advantages • Ample amounts of logic and advanced configurable I/Os • Programmable routing • Instant on • Non-volatile configuration • Reprogrammable
  • 11.
    11 Interconnection Problem: RoutingTakes Too Much SpaceGlobal Routing Row & Column Routing
  • 12.
    12 • LUT inputsare mux select lines • FPGA LABs made up of logic elements (LEs) instead of product terms and macrocells • Solves the Interconnection Problem FPGA LUT and LAB
  • 13.
    13 • LABs arrangedin an array • Programmable interconnect • Interconnect may span all or part of the array Field Programmable Gate Array (FPGA) Implementation Advantages • Easier to create complex functions through LE cascading • Integration of ready functions and IP blocks: PLLs, memory, arithmetic • High density, high performance • Fast programming
  • 14.
    14 • Pros: - Fasttime to Market: easy to develop a new device with specific logic or interfaces - Easy to upgrade device logic, fix bugs in hardware - Specific devices: reconfigurable DSP, digital filters • Cons: - Need to be programmed at power on - It is hard to achieve 100% device utilization FPGA vs ASIC • Pros: - Higher performance: consume less power and can operate faster on higher speed - Cheaper in mass production - No configuration at power-on required - Smaller chip size • Cons: - Additional expenses in design preparation - Impossible to fix hardware bugs FPGA ASIC
  • 15.
    15 Software and hardwaredevelopment aspects
  • 16.
    16 System on Chip(SoC) + FPGA
  • 17.
    17 • In generalFPGA generated controllers are similar to Microcontrollers’ peripheral devices • FPGA requires programming of each start, controllers might be not ready at the system start • Take care with DMA, MMU, virtual memory and caching operations • In some designs FPGA can control CPU peripheral devices Software and hardware development aspects
  • 18.
    18 • Verilog • VDHL •Visual development FPGA design development
  • 19.
    19 • Core IP -SDRAM Controllers - Ethernet PHY, Custom Transceiver PHY - PCIe PHY - SDi, Display Port • Megafunctions - PLL - I/O - Custom logic blocks FPGA design development
  • 20.
    20 High speed dataprocessing: OpenCL in FPGA
  • 21.
  • 22.
  • 23.
    23 Load memory valueinto register
  • 24.
  • 25.
    25 Add two registers,store result in register
  • 26.
    26 A simple program Mem[100]+= 42 * Mem[101] CPU instructions: R0  Load Mem[100] R1  Load Mem[101] R2  Load #42 R2  Mul R1, R2 R0  Add R2, R0 Store R0  Mem[100]
  • 27.
    27 Single CPU activity,step by step Time
  • 28.
    28 Unroll the CPUhardware… Space
  • 29.
    29 … and specializeby position 1. Instructions are fixed. Remove “Fetch”
  • 30.
    30 … and specialize 1.Instructions are fixed. Remove “Fetch” 2. Remove unused ALU operations
  • 31.
    31 … and specialize 1.Instructions are fixed. Remove “Fetch” 2. Remove unused ALU operations 3. Remove unused Load / Store
  • 32.
    32 … and specialize 1.Instructions are fixed. Remove “Fetch” 2. Remove unused ALU operations 3. Remove unused Load / Store 4. Wire up registers properly. And propagate state.
  • 33.
    33 … and specialize 1.Instructions are fixed. Remove “Fetch” 2. Remove unused ALU operations 3. Remove unused Load / Store 4. Wire up registers properly. And propagate state 5. Remove dead data
  • 34.
    34 … and specialize 1.Instructions are fixed. Remove “Fetch” 2. Remove unused ALU operations 3. Remove unused Load / Store 4. Wire up registers properly. And propagate state 5. Remove dead data 6. Reschedule!
  • 35.
    35 FPGA datapath =Your algorithm, in silicon • Build exactly what you need: - Operations - Data widths - Memory size, configuration • Efficiency: - Throughput - Latency - Power
  • 36.
    36 OpenCL  FPGA •Host + Accelerator Programming Model • Sequential Host program on microprocessor • Function offload onto a highly parallel accelerator device main() { read_data( … ); maninpulate( … ); clEnqueueWriteBuffer( … ); clEnqueueNDRange(…,sum,…); clEnqueueReadBuffer( … ); display_result( … ); } __kernel void sum(__global float *a, __global float *b, __global float *y) { int gid = get_global_id(0); y[gid] = a[gid] + b[gid]; } Host Code FPGA Design User Application Algorithm
  • 37.
    37 Loop Pipelining • Analyzeany dependencies between iterations • Schedule these operations • Launch the next iteration as soon as possible float array[M]; for (int i=0; i < n*numSets; i++) { for (int j=0; j < M-1; j++) array[j] = array[j+1]; array[M-1] = a[i]; for (int j=0; j < M; j++) answer[i] += array[j] * coefs[j]; } At this point, we can launch the next iteration
  • 38.
    38 Loop Pipelining Example WithLoop PipeliningNo Loop Pipelining Looks almost like parallel thread execution
  • 39.
    39 Digital Filter z-1 z-1 z-1 z-1 z-1 z-1 z-1 X XX X X X X X C0 C1 C2 C3 C4 C5 C6 C7 x(n) + y(n)
  • 40.
    40 • Q&A FPGA inEmbedded Devices
  • 41.
    41 Thank you Andriy Smolskyy Consultant,Engineering andriy.smolskyy@globallogic.com +380-67-701-8637