SYSTEM DESIGN USING HDL (ECE43) # Digital system design using Verilog, Charles Roth, Lizy Kurian John, Byeong Kil Lee, 1st Edition, 2016, Cengage Learning 1 2.1, 2.2, 2.3 - 2.8, 2.11, 2.13 - 2.15 2 2.9, 2.10, 2.12, 2.16 - 2.19, 8.1, 8.2 3 3.1 - 3.4, 5.1, 5.2.1, 5.3 4 4.1 - 4.5, 4.8, 4.6, 4.7, 4.9, 4.11 5 6.1 - 6.5, 6.7 - 6.12 INTRODUCTION TO PROGRAMMABLE LOGIC DEVICES
Brief overview of Programmable Logic Devices
Need of programmable logic devices: • Implementation of a significant amount of functionality into one physical chip. • Removes the need for multiple off-the-shelf devices. • Easy reprogramming, therefore increased ability to change the design. • Easier to change the design in case of errors or change in the design specifications.
Programmable logic Factory programmable devices ROM (Read only memory) MPGA (Mask Programmable Gate Array) Field Programmable Devices SPLD (Simple Programmable Logic Device) CPLD (Complex programmable Logic Device) FPGA (Field Programmable Gate Array) GAL (Generic Array Logic) PAL (Programmable Array Logic) PLA (Programmable Logic Array) PROM (Programmable Read Only Memory)
• Factory Programmable Devices: Generic devices that are programmed at the factory to meet the Customer’s requirements. Programming can be done only once. Examples: ROM, MPGA • ROM: Primarily meant for memory, but can be used to implement combinational circuits. • MPGA: Also called as gate arrays, they have been a popular technology for creating ASIC.
• Field Programmable Devices: Devices that are programmed by the user, rather than in factory. Factor SPLD CPLD FPGA Density Low (few hundred gates) Low to medium (500 to 12,000 gates) Medium to high (3000 to 5,000,000 gates) Timing Predictable Predictable Unpredictable Cost Low Low to Medium Medium to high Major Vendors (with device families) Lattice (GAL16LV8, GAL22V10), Cypress (PALCE16V8), AMD (22V10) Xilinx (CoolRunner, XC9500), Altera (MAX) Xilinx (Kintex, Artix, Virtex, Spartan), Altera (Stratix, Cyclone, Arria), Lattice (Mach, ECP), Microsemi (Axcelerator, Fusion)
• PLA: It consists of programmable AND array & programmable OR array. • PAL: It is a special case of PLA, where OR array is fixed and only AND array is programmable. It can also contain flip-flops. • Earlier programmable devices were only one time programmable (OTP, PROM); later on, the advent of Ultraviolet and electronically erasable technology gradually led to re-programmable logic devices.
CMOS Electrically Erasable PLDs: • It contains macroblocks with array of gates, flip-flops, multiplexers, or standard building blocks. • PLAs, PALs, GALs & PLDs are collectively referred as SPLDs. GAL (Generic Array Logic): • Lattice semiconductor created similar devices with easy reprogrammability, and called their line of devices as GALs.
ROM (Read Only Memory)
• ROM consists of an array of semiconductor devices that are interconnected to store an array of binary data. • Data stored in ROM can be read out when required, but cannot be changed under normal operating conditions. • Output pattern stored in ROM is called a word. • Each input serves as address, which selects one of the stored words as output.
• Size of ROM is given as follows: 2n X m, where “n” represents the number of input lines and “m” represents the width of output lines.
• A ROM’s size, with 4-bit output line and 3-bit input line can be written as, 8 words X 4 bits. • In the following example, when ABC=010, F0F1F2F3=0111.
• ROM consists of a decoder and a memory array. When a pattern of 1’s and 0’s is applied as input to the decoder, any one of its output becomes 1, which in turn selects that particular stored word from the array. • Types of ROM:  Mask programmable ROM  PROM (user programmable)  EPROM (UV erasure)  EEPROM (Electrically erasable)  Flash memory
• Mask programmable ROM: Data array is permanently stored during manufacture, by selectively including or omitting the switching elements, in the cross-point switch matrix. Special masks are used for this purpose, which is an expensive process. • PROM: One time, user programmable (fuse / antifuse). • EPROM: Programmer uses voltage pulses to store electronic charges in the memory array location. UV light is used for the erasure of complete data that is stored. • EEPROM: Uses electronic pulses for erasure of data. It can be reprogrammed only 100 to 1000 times. • Flash memories: They have built-in programming and erasure capabilities, and data can be written while in-circuit, without needing any separate programmer.
• ROM can implement any combinational circuit, by storing the outputs for all of the input combinations. Hence, this method is also called as LUT method. Ex-1: Implement a 2 bit adder using ROM: Solution: Input : two 2-bit numbers. Output : Sum having 3-bits. • Can be implemented with 16 X 3 ROM.
Data to be stored in memory: 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6
Ex-2: Compute the size of the ROM to implement an 8:3 priority encoder. There will be 256 entries in the ROM. Size of the ROM: 28 X 4
Ex-3: Implement the following state machine, of BCD to Excess-3 code converter, using ROM. PS NS Z X=0 X=1 X=0 X=1 S0 S1 S2 1 0 S1 S3 S4 1 0 S2 S4 S4 0 1 S3 S5 S5 0 1 S4 S5 S6 1 0 S5 S0 S0 0 1 S6 S0 - 1 -
• Sequential circuit is designed using ROM and flip-flops. • ROM is used to realize the output functions and the next state equations. • The state of the circuit is stored in a register of D flip-flops, and fed back to the input of the ROM. • To realize the given Mealy machine, a ROM and 3 flip-flops are necessary.
• The ROM will generate the next state equations and output Z, from the present states and input X. • Q1, Q2, Q3 and X are connected to the address lines, with X connected to the LSB. • Contents of ROM are: 3, 4, 6, 8, 9, 8, A, B, B, C, 0, 1, 1, 0, 0, 0.
Programmable Logic Array (PLA)
• PLA with n-input lines and m-output lines, can realize m-functions of n-variables. • When compared to ROM, instead of decoder, AND array is used to realize the product terms. • Later on, OR array is used to sum the product terms.
Ex-4: Using PLA, Realize the following functions: F0 = ⅀m(0,1,4,6) = (A'B'+AC') F1 = ⅀m(2,3,4,6,7) = (B+AC') F2 = ⅀m(0,1,2,6) = (A'B'+BC') F3 = ⅀m(2,3,5,6,7) = (AC+B) Solution: There are 3 inputs: A, B & C. There are 5 distinct product terms in the 4 outputs. Unlike ROM, in a PLA implementation, the product terms can be shared among the functions.
F0 =⅀m(0,1,4,6)=(A'B'+AC') F1 =⅀m(2,3,4,6,7)=(B+AC') F2 =⅀m(0,1,2,6)=(A'B'+BC') F3 =⅀m(2,3,5,6,7)=(AC+B)
• Instead of AND-OR logic, PLA may use NOR-NOR logic. • 2-input NOR gate can be built using nMOS transistors: • NOR-NOR with inverters at input and output = AND-OR. F0 = ⅀m(0,1,4,6) = (A'B'+AC') F1 = ⅀m(2,3,4,6,7) = (B+AC') F2 = ⅀m(0,1,2,6) = (A'B'+BC') F3 = ⅀m(2,3,5,6,7) = (AC+B)
Ex-5: Using PLA, Realize the following functions: F1 = ⅀m(2,3,5,7,8,9,10,11,13,15) F2 = ⅀m(2,3,5,6,7,10,11,14,15) F3 = ⅀m(6,7,8,9,13,14,15) Solution: After minimization, the simplified functions are : F1 = ⅀m(2,3,5,7,8,9,10,11,13,15) = bd+b'c+ab' F2 = ⅀m(2,3,5,6,7,10,11,14,15) = c+a'bd F3 = ⅀m(6,7,8,9,13,14,15) = bc+ab'c'+abd Here, the PLA requires 8 different product terms.
• To reduce the number of rows in PLA, these functions can be reorganized using K-map. F1 = a'bd+abd+b'c+ab'c' F2 = b'c+bc+a'bd F3 = bc+ab'c'+abd There are only 5 different product terms, and , the PLA table has only 5 rows.
• In case of PLA, unlike memory, the number of terms in each equation is not important, as the size of PLA does not depend on the number of terms within an equation. • To reduce the number of rows in PLA, instead of using K- maps, the Espresso algorithm can be used. This is a complex algorithm, which is used as logic minimization algorithm for VLSI synthesis. F1 = a'bd+abd+b'c+ab'c' F2 = b'c+bc+a'bd F3 = bc+ab'c'+abd The PLA implementation has 4 inputs, 5 product terms & 3 outputs.
Programmable Array Logic (PAL)
• It is a special case of PLA, in which AND array is programmable and OR array is fixed. • Due to this reason, PAL is less expensive than PLA, and is easier to program as well. • The following figure represents a segment of an un-programmed PAL, along with the input buffers which contain two outputs.
Ex-6: Implement I1I2'+I1'I2. Solution: • As OR gates cannot be programmed, AND terms cannot be shared among two or more OR gates. • Typical PALs have 10 to 20 inputs, and 2 to 10 outputs, with 2 to 8 AND gates driving each OR gate.
Ex-7: Implement a full-adder using PAL. Solution: SUM = X'Y'Cin+X'YCin'+XY'Cin'+XYCin COUT = XY+YCin+XCin
• PALs were made available that contained D flip-flops as well, and were called as “sequential PALs”. Ex-8: Implement Q+ = D = A'BQ'+AB'Q. Solution:
PLD/GAL (Programmable Logic Device / Generic Array Logic)
• PALs and PLAs are good for implementing small circuitry. But, they are not re-programmable. • When they are made as erasable/reprogrammable, by incorporating Flash memory, such PALs are often referred as PLDs/GALs. • An example is 22CEV10, which is a CMOS electrically erasable PLD, that can realize both combinational as well as sequential circuits.
• 22CEV10 contains:  12 declared input pins  10 pins that can be programmed as input / output  Programmable AND array (8 till 16 gates feeding each OR gate)  10 OR gates, each of which drives an output macrocell  10 D Flip-flops, with asynchronous reset and synchronous preset  Each macrocell contains the D Flip-flop, multiplexer, and additional programmability at the output • 22CEV10 => 22 pins out of which 10 are bidirectional
• Each macrocell has 2 programmable interconnect bits: S1 & S0. • When the particular bit is programmed, it is connected to 0 V. • Erasing that bit disconnects it from 0 V, and it floats at logic-1. S1 S0 Output 0 0 D Flip-flop output 0 1 D Flip-flop output inverted 1 0 OR output 1 1 OR output inverted
• CAD programs are available for PAL/PLD programming. These programs accept logic equations, truth tables, state graphs or state tables as inputs. • They automatically generate the required bit patterns, which can be downloaded into a PLD programmer, which will create the necessary connections. • PALASM (Programmable Array Logic ASsembler for Military) from MMI & AMD, and ABEL (Advanced Boolean Expression Language) from DATA I/O are the two popular languages that are used for programming.
CPLD (Complex Programmable Logic Device)
• This is a programmable IC which is equivalent to several PLDs in the same silicon chip. Typically a CPLD comprises of 500 to 10,000 logic gates. • It consists of a number of PAL-like logic blocks, along with a programmable interconnect. The interconnect matrix is implemented using crossbar switch. Even though it is expensive, it results in predictable timing. • CPLDs are electronically erasable and reprogrammable, and hence are sometimes referred to as EPLDs (Erasable Programmable Logic Device).
 Typically a CPLD contains a number of macrocells, that are grouped into function blocks.  Each macrocell contains a flip- flop and an OR-gate, and the macrocell has its inputs connected to an AND gate array.  The major manufacturers of CPLD are: Xilinx, Altera, Lattice, Cypress and Atmel.
AN EXAMPLE XILINX COOLRUNNER (XCR3046XL)
• This CPLD has 4 function blocks, and each block has 16 associated macrocells. A function block is a programmable AND-OR array, which is configured as a PLA. • Each macrocell contains a flip-flop and additional multiplexers, that route the signals from the function blocks to the I/O blocks or to the interconnect array. • The interconnect array selects signals from the macrocell outputs and the I/O blocks, and connects them back to function blocks. Thus, a signal generated from any function block can be used as an input to any other function block.
• Initially, two D-inputs have to be generated for the Flip-flops. • Later on, two outputs (Z1, Z2) have to be generated, by utilizing the Flip-flop outputs. • Hence, four macrocells are required for the implementation of the required Mealy machine. Ex-9: Implement a Mealy sequential machine with 2 inputs and 2 outputs.
Ex-10: Implement a parallel adder with accumulator.
• The accumulator register needs one FF for each bit. • But that bit also needs to generate the sum and carry bits corresponding to that particular bit. • Hence, each bit of an adder requires two macrocells, one for the sum and the accumulator, and the other for the carry.
FPGA (Field Programmable Gate Array) They contain an array of identical logic blocks with programmable interconnections. User can program the functions realized by each logic block, and can flexibly program the connections between them.
ADVANTAGES DISADVANTAGES The time-to-market of FPGA product is much much lesser. FPGAs are less dense than MPGAs. With FPGA, it is easier to correct the mistakes in the design. FPGAs are slower, due to the RC delay in programmable points. The prototyping cost is much reduced, with the usage of FPGA. Interconnect delays in FPGAs are unpredictable. At low volumes, FPGAs are cheaper than MPGAs. Programming overhead is much higher, because of the resources. MPGA versus FPGA
When compared to CPLD the major advantage of FPGA is its highly flexible programmable interconnect, and due to this fact itself, the major disadvantage is its unpredictable interconnect delay.
FPGA typically contains three programmable elements: 1. Programmable logic blocks (Configurable Logic Blocks) 2. Programmable routing resources 3. Programmable I/O blocks
• Programmable logic blocks • These are created by Muxes, LUTs, and AND-OR arrays. • Programming refers to: a) Changing the contents of LUT, b) Changing the I/O signals to the Muxes, c) Selecting or not selecting the particular gates in the AND-OR arrays. • Programmable interconnect • For making or breaking the specific connections. • For connecting various blocks in the chip to each other. • For connecting specific I/O pins to specific logic blocks. • Programmable I/O blocks • I/O pads can be programmed as i/p, o/p or bidirectional. • They also can be programmed as inverting, non-inverting, tri- state, slew rate adjustable, passive pull-up etc.
• Based on the topology in which the logic blocks and the interconnect resources are distributed inside, there can be four different basic architectures of FPGAs that are in the market since 1980s: • Matrix based architecture • Row based architecture • Hierarchical PLD architecture • Sea-of-gates architecture • Modern FPGAs that are in the market, contain special purpose blocks including a microprocessor. Architectures of FPGA
1. Matrix based architecture (e.g., Most Xilinx FPGAs) • This architecture is also called as “symmetrical array”, and it contains 8X8 arrays in smaller chips, and 100X100 or larger arrays in larger chips. • Routing is called two-dimensional channeled routing, since routing resources are available in horizontal and vertical directions.
2. Row based architecture (e.g., some Microsemi FPGAs) • The logic blocks are organized into rows, and hence, there are rows of logic blocks, and rows of routing resources. • Routing is called one-dimensional channeled routing, as the routing resources are channeled between the rows.
3. Hierarchical PLD architecture (e.g., Altera APEX20, APEX II) • At the lower level, the FPGAs contain clusters of logic blocks with localized resources for interconnection. • At the higher level, the global interconnect is used for interconnection between the clusters of logic blocks.
4. Sea-of-gates architecture (e.g., Microsemi Fusion) • FPGAs contain a large number of gates, and there is an interconnect superimposed on the sea-of-gates. • There are other terminologies such as sea-of-cells or sea-of-tiles, to indicate the topology with a large number of logic blocks.
• The term “Programming technology” is used to denote the technology by which the programmability in an FPGA is achieved, especially for the programmable interconnect. • Some of the techniques are: • SRAM programming technology • EPROM / EEPROM / Flash programming technology • Antifuse programming technology FPGA Programming Technologies
SRAM Programming Technology • As in the case of ROM, an SRAM can be used to store the “configuration bits” for interconnection, in an LUT. • e.g., Sixteen SRAM cells can implement any function of four variables. • The programmable interconnect can be achieved by SRAM, in the following two ways: • Pass transistor is used for connecting two points • Routing matrices are implemented by using mux
Disadvantages of SRAM Programming Technology 1. Six transistors are required for every SRAM cell. • e.g., if FPGA has 1 million programmable points, 6 million transistors are required for achieving this programmability. 2. Since SRAM is volatile, all the contents are lost during power failure. This is a serious setback when an FPGA is used in the final product. • As a solution, EPROM can be used as “boot ROM”, to store the configuration bits, and its contents can be transferred to SRAM whenever power gets resumed. Advantages of SRAM Programming Technology 1. As SRAM is a volatile memory, new contents can be written again and again, thus providing flexibility during prototyping. 2. Fabrication steps for manufacturing SRAM are same as that for manufacturing other logic cells.
EPROM / EEPROM / Flash Programming Technology • Instead of SRAM, EPROM cells are used to control the programmable interconnections. Each EPROM cell contains a MOSFET, which has two gates: Control gate and Floating gate. • The drain of the transistor can be connected to VDD by means of a pull-up resistor. When a high voltage (10 - 13 V) is applied to the control gate, electrons get injected into the floating gate, and the transistor turns OFF. • The electrons remain trapped at the floating gate. The trapped negative charges can be removed, by exposing the EPROM to UV light.
Disadvantages of EPROM Programming Technology 1. EPROM is slower than SRAM, because of the dual-gate structure. 2. While manufacturing, EPROMs require more processing steps than SRAMs. 3. EPROM based switches have high ON-resistance, and also have high static- power-consumption. 4. For erasure, the EPROM chip has to be physically removed from the PCB.  EEPROM is similar to EPROM, but removal of the gate charge can be done electrically. Hence, for erasure, the chip need not be removed from PCB.  The memory cells can be selectively erased and can be rewritten, and this does not require any additional equipment.  Flash is a form of EEPROM, in which a block of cells can be erased at once, by applying a large voltage at the control gate, causing the electrons to pull off.  By sensing the amount of current flow, each cell in Flash can store multiple bits of information, which in turn depends on the number of trapped electrons.  While writing bits into, Flash is faster than EEPROM, but slower than SRAM.
• Antifuse programming element changes from high resistance (open - OFF) to low resistance (closed - ON), when a high voltage is applied. • Antifuses are built by dielectric layers between N+ diffusion and polysilicon layers, or by amorphous silicon in between metal layers. Antifuse Programming Technology Advantages: • When compared to MOSFETs, the area consumed by the antifuse is smaller. • Antifuse based connections are faster than SRAM / EPROM technologies. Disadvantages: • The antifuse connection is OTP. • Because of this, design change is not possible.
Comparison of FPGA Programming Technologies Programming technology Storage Programmability Area overhead Resistance Capacitance SRAM Volatile In-Circuit reprogrammable Large Medium to high High EPROM Non- volatile Out-of-Circuit reprogrammable Small High High EEPROM / Flash Non- volatile In-Circuit reprogrammable Medium to large High High Antifuse Non- volatile Not reprogrammable Small Low Low
• Manufacturers use different names to denote their logic blocks: • Xilinx calls them as Configurable Logic Blocks (CLB). • Microsemi calls them as VersaTiles. • Altera calls them as Logic Elements (LE), and a group of LEs is called as Logic Array Blocks(LABs). • Mainly two types of logic blocks are used in FPGAs: 1. LUT based programmable logic blocks. 2. Mux based programmable logic blocks. I. Programmable logic block architectures
• Look Up Table contains memory cells along with multiplexers. • The output for each input combination is stored in memory cells. • The input combination is used as control inputs to the multiplexer. • For a 2-variable function, 4 memory cells and a 2:1 mux is required. • For an n-input function, 2n memory cells and 2n :1 mux are required.
1. LUT (Look-Up Table) based Programmable logic block • Each block contains two LUT4 and two flip-flops. • The LUT4 can generate any one function of 4 variables. • The flip-flop has chip enable, set and reset inputs. • A multiplexer is used to select in between the combinational and the latched version of the LUT4 output. • The multiplexer is controlled by a bit stored in memory.
• Choosing X1 as LSB and X4 as MSB, X4 input need not be used, as F1 uses only 3 variables. • To store the contents in the LUT, the truth table of the function has to be constructed. • From the truth table, the contents of LUT to implement the function F1 will be {0,1,1,0,0,0,1,1}. • As LUT4 contains 16 memory cells for output, it is better to store the other 8 bits as well, irrespective of the status of X4. • Thus, the contents of LUT are {0,1,1,0,0,0,1,1,0,1,1,0,0,0,1,1}. Ex-11: Implement the function A'B'C+A'BC'+AB, using LUT.
2. Multiplexer based Programmable logic block • With LUT, it is not necessary to minimize the function, as the number of terms in the function is not important (all o/p bits need to be stored). • But LUT requires storage space. To save it, multiplexers along with basic gates, can be used.
• As there are 3 variables, we can choose a 4:1 mux, which has 2 select lines. • The truth table can be constructed, so as to define the output in terms of C. • The mux select lines can be A & B, and the mux input lines can be connected in accordance with the last column in the truth table. A B C F1 Mux i/p in terms of C 0 0 0 0 C 0 0 1 1 C 0 1 0 1 C' 0 1 1 0 C' 1 0 0 0 0 1 0 1 0 0 1 1 0 1 1 1 1 1 1 1 Ex-12: Implement the function A'B'C+A'BC'+AB, using mux.
II. Programmable interconnect 1. General purpose interconnect The completely non-blocking switch matrix is very expensive. e.g., in a 4X4 matrix, out of 16 switches, only 4 switches are utilized at any point of time. Crosspoint switch matrix 6-way switch To reduce the number of multiple connections for a single route, the crosspoint can be configured as a 6-way switch. But, this crosspoint is more complicated than the earlier one. The interconnect in between the logic blocks should provide flexible interconnection in between the rows and columns (e.g., row- column, row-row, column-column).
2. Direct interconnect Direct interconnect to 4 neighbors Special connections to 8 neighbors To reduce the delay in the switch matrix, many FPGAs provide direct connections between the logic blocks, by means of dedicated switches.
3. Global interconnect lines For high fan-out and low-skew clock distribution, FPGAs provide routing lines that span the entire width & height of the device. When the clock is distributed to a few million gates in the chip, the delay in the wire causes the clock edges to arrive at different times at different parts of the chip. This is called as “clock skew”, which needs to be eradicated, for the faithful functionality of the circuitry on the chip.
Interconnects in row-based FPGAs The previous interconnects discussed, are applicable to matrix-based architecture, which has symmetrical arrays. For row-based architecture, as it is one-dimensional, it has arrays of switches in the routing channel, which is situated in between the logic blocks. i) Non-segmented ii) Segmented When the 3 connections required are x, y & z, they can be done in 2 ways: non- segmented (full length track, faster), segmented (reduced resources, slower) Example nets
II. Programmable I/O blocks • I/O blocks on modern FPGAs allow the use of a pin as true or inverted, direct or latched, input or output, and so on. • The I/O options can be selected by means of the configuration memory cells, indicated in the figure as “M”. • The inversion is performed using an XOR gate, and one memory-bit. • The direction of the pin is decided using a tri-state buffer, and its control can be selected as active high or active low, using another memory-bit. • Similarly, the rate-of-change of output (slew rate), and the pull-up option (open drain, built- in resistor), can be configured using the memory cells (SRAM, EEPROM / Flash, antifuse).
Dedicated Specialized Components in FPGA 1. Dedicated memory: The embedded RAM, can be used to implement the memory needs of the circuit, that is being designed. 2. Dedicated Arithmetic Units: The custom implementation of adders and multipliers inside FPGA, is smaller and faster, than its counterpart that is implemented using FPGA. 3. DSP Blocks: To support DSP applications, the vendors provide the hardware inside the FPGA for encryption/decryption, FFTs, FIR filters, IIR filters, compression/decompression, and so forth. 4. Embedded Processors: This is a hybrid solution where part of the design is in a programmable processor (high flexibility), and the remaining part is implemented in hardware (better performance). 5. Content Addressable Memory: This is a special kind of memory in which the content, and not the address, is used to search the memory.
1. Rapid Prototyping • As FPGAs contain 5 million or more gates, many large real-world systems can prototyped very quickly using a single FPGA. • If a single FPGA will not suffice, multiple FPGAs can be interconnected to realize larger systems, by plugging the boards into a backplane. 2. Final Products in Medium Speed Systems • Circuits realized using FPGAs typically operate in the range of 150-200 MHz. If this speed is sufficient, FPGAs can be used for the final product, instead of the prototype. • In the final product, if enhancements to the system are required, they can be done as software updates, rather than hardware changes. 3. Glue Logic • This is a digital circuitry that works as an interface between two different logic modules. • Using SRAM FPGAs, the new interface logic can be implemented on the same FPGA. 4. Hardware Accelerators / Coprocessors • For a software application, an FPGA can be used as a coprocessor, so that it is used to implement a key kernel, and thus the application can be accelerated. • Examples of such applications are - pattern matching, computer architecture simulator, emulator boards, hardware testing boards, and so on. Applications of FPGA
Design Flow for FPGACreate a behavioral, RTL or structural model of the design using HDL Simulate and Debug the Design Synthesize the design targeting the desired device Run a mapping of the design, that will break the logic diagram into pieces that will fit into the CLBs Run the place-and-route program, to place the logic blocks in FPGA and to route the interconnections Run a program that will generate the bit pattern that is necessary to program the FPGA Download the bit pattern into the configuration cells and test the operation of FPGA 1 & 2 3&4&5 6 & 7 1 2 3 4 5 6 7
STATE MACHINE CHARTS  A “State Machine” is used to control a digital system that carries out a step-by-step procedure or an algorithm.  A “State Diagram” or “State Graph” is used to specify the operation of such state machine.  A “State Machine Chart” is an alternative to state diagram, and the SM chart has the following advantages: • It offers an easier understanding of the digital system. • It automatically satisfies the conditions of the state graph (exactly one true transition from a state at any time, unique definition of the next state for every input combination). • It directly leads to a hardware realization of the system.
• An SM chart contains 3 principal components, as shown. • An SM chart is constructed from SM blocks, where each SM block describes the machine operation during one state. • Therefore, each SM block contains exactly one state box, together with decision boxes and conditional output boxes that are associated with that particular state. • Thus, an SM block contains exactly one entrance path, and one or more exit paths.
• A path through an SM block from entrance to exit is called as “link path”. • In an SM block, when the system enters that state, the outputs in the state box become true. • e.g., when state S1 is entered, Z1 & Z2 become 1. If X1 = 0, then Z3 & Z4 also become 1. If X2 = 0, then the machine goes to the next state via exit path 1. During this condition, Z5 remains at 0. • If X1 = 1, then Z3 & Z4 remain at 0, and if X3 = 0, then Z5 becomes 1, and the machine goes to the next state via exit path 3.
• A given SM block can be drawn in different forms, as shown in the figure. • Here, Z1 = A + BC. As this is a combinational circuit, there is only one state, and there is no state change. • The second SM chart allows for individual testing of input variables, and the function is, Z1 = A + A'BC, which is the same.
Rules for constructing an SM block 1. For every valid combination of input variables, exactly one exit path must be defined. 2. Within an SM block, no internal feedback is allowed. 3. SM block can be drawn either in a serial form or in a parallel form. Both are equivalent, as all the tests take place within one clock time.
A given state graph can be converted into an equivalent SM chart, as shown. This state graph has 3 Moore outputs (Za, Zb, Zc) and 2 Mealy outputs (Z1, Z2). Hence, the Moore outputs will appear in state boxes and Mealy outputs will appear in conditional output boxes. Each SM block will have only one decision box, as there is only one input variable to be tested.
Example: Derivation of SM chart for a Binary multiplier • Abbreviations: St = Start, Sh = Shift, Ad = Add, M = current multiplier bit, K = completion signal. • If M = 1, the multiplicand is added to the contents of accumulator, followed by a right shift. If M = 0, then the addition is skipped, and only the right shift occurs. • Conversion of the SM chart into Verilog code is a straightforward process. • “case” statement can be used to specify each state, and “if” statement can be used for the conditional output boxes.
Verilog code for the Binary multiplier
Realization of SM charts Example-1:  As there are 3 states, the state assignments can be 00, 01 & 11.  Taking these values as A & B, Za = A'B', Zb = A'B, Zc = AB, Z1 = ABX', Z2 = ABX.  From the link paths 2 & 3, the next state of A can be written as, A+ = A'BX + ABX  From the link paths 1, 2 & 3, the next state of B is written as, B+ = A'B'X + A'BX + ABX
Procedure for deriving the next state equation 1. Perform state assignment for all of the states. 2. Write the output equations directly from the SM chart. 3. For the next state, identify all the states in which Q = 1. 4. Find all the link paths that lead into the particular state. 5. For each link path, find a term that has value equal to 1. 6. The expression for Q+ is formed by ORing all the terms. 7. Q+ is realized using D-FF and combinational circuit.
Example-2:  As there are 4 states, the state assignments can be 00, 01, 10 & 11, respectively for S0, S1, S2 & S3.  Load = A'B'St, Ad = A'BM, Sh = A'BM' + AB'  A is true in S2 & S3. Hence, A+ = A'BM + A'BM'K + AB'K  B is true in S1 & S3. Hence, B+ = A'B'St + A'BM'K' + AB'K' + A'BM'K + AB'K Or, B+ = A'B'St + A'BM' + AB'
A B St M K A+ B+ Load Sh Ad Done S0 0 0 0 - - 0 0 0 0 0 0 0 0 1 - - 0 1 1 0 0 0 S1 0 1 - 0 0 0 1 0 1 0 0 0 1 - 0 1 1 1 0 1 0 0 0 1 - 1 - 1 0 0 0 1 0 S2 1 0 - - 0 0 1 0 1 0 0 1 0 - - 1 1 1 0 1 0 0 S3 1 1 - - - 0 0 0 0 0 1 State transition table for multiplier control
System design using HDL - Module 3

System design using HDL - Module 3

  • 1.
    SYSTEM DESIGN USINGHDL (ECE43) # Digital system design using Verilog, Charles Roth, Lizy Kurian John, Byeong Kil Lee, 1st Edition, 2016, Cengage Learning 1 2.1, 2.2, 2.3 - 2.8, 2.11, 2.13 - 2.15 2 2.9, 2.10, 2.12, 2.16 - 2.19, 8.1, 8.2 3 3.1 - 3.4, 5.1, 5.2.1, 5.3 4 4.1 - 4.5, 4.8, 4.6, 4.7, 4.9, 4.11 5 6.1 - 6.5, 6.7 - 6.12 INTRODUCTION TO PROGRAMMABLE LOGIC DEVICES
  • 2.
  • 3.
    Need of programmablelogic devices: • Implementation of a significant amount of functionality into one physical chip. • Removes the need for multiple off-the-shelf devices. • Easy reprogramming, therefore increased ability to change the design. • Easier to change the design in case of errors or change in the design specifications.
  • 4.
    Programmable logic Factory programmabledevices ROM (Read only memory) MPGA (Mask Programmable Gate Array) Field Programmable Devices SPLD (Simple Programmable Logic Device) CPLD (Complex programmable Logic Device) FPGA (Field Programmable Gate Array) GAL (Generic Array Logic) PAL (Programmable Array Logic) PLA (Programmable Logic Array) PROM (Programmable Read Only Memory)
  • 5.
    • Factory ProgrammableDevices: Generic devices that are programmed at the factory to meet the Customer’s requirements. Programming can be done only once. Examples: ROM, MPGA • ROM: Primarily meant for memory, but can be used to implement combinational circuits. • MPGA: Also called as gate arrays, they have been a popular technology for creating ASIC.
  • 6.
    • Field ProgrammableDevices: Devices that are programmed by the user, rather than in factory. Factor SPLD CPLD FPGA Density Low (few hundred gates) Low to medium (500 to 12,000 gates) Medium to high (3000 to 5,000,000 gates) Timing Predictable Predictable Unpredictable Cost Low Low to Medium Medium to high Major Vendors (with device families) Lattice (GAL16LV8, GAL22V10), Cypress (PALCE16V8), AMD (22V10) Xilinx (CoolRunner, XC9500), Altera (MAX) Xilinx (Kintex, Artix, Virtex, Spartan), Altera (Stratix, Cyclone, Arria), Lattice (Mach, ECP), Microsemi (Axcelerator, Fusion)
  • 7.
    • PLA: Itconsists of programmable AND array & programmable OR array. • PAL: It is a special case of PLA, where OR array is fixed and only AND array is programmable. It can also contain flip-flops. • Earlier programmable devices were only one time programmable (OTP, PROM); later on, the advent of Ultraviolet and electronically erasable technology gradually led to re-programmable logic devices.
  • 8.
    CMOS Electrically ErasablePLDs: • It contains macroblocks with array of gates, flip-flops, multiplexers, or standard building blocks. • PLAs, PALs, GALs & PLDs are collectively referred as SPLDs. GAL (Generic Array Logic): • Lattice semiconductor created similar devices with easy reprogrammability, and called their line of devices as GALs.
  • 9.
  • 10.
    • ROM consistsof an array of semiconductor devices that are interconnected to store an array of binary data. • Data stored in ROM can be read out when required, but cannot be changed under normal operating conditions. • Output pattern stored in ROM is called a word. • Each input serves as address, which selects one of the stored words as output.
  • 11.
    • Size ofROM is given as follows: 2n X m, where “n” represents the number of input lines and “m” represents the width of output lines.
  • 12.
    • A ROM’ssize, with 4-bit output line and 3-bit input line can be written as, 8 words X 4 bits. • In the following example, when ABC=010, F0F1F2F3=0111.
  • 13.
    • ROM consistsof a decoder and a memory array. When a pattern of 1’s and 0’s is applied as input to the decoder, any one of its output becomes 1, which in turn selects that particular stored word from the array. • Types of ROM:  Mask programmable ROM  PROM (user programmable)  EPROM (UV erasure)  EEPROM (Electrically erasable)  Flash memory
  • 14.
    • Mask programmableROM: Data array is permanently stored during manufacture, by selectively including or omitting the switching elements, in the cross-point switch matrix. Special masks are used for this purpose, which is an expensive process. • PROM: One time, user programmable (fuse / antifuse). • EPROM: Programmer uses voltage pulses to store electronic charges in the memory array location. UV light is used for the erasure of complete data that is stored. • EEPROM: Uses electronic pulses for erasure of data. It can be reprogrammed only 100 to 1000 times. • Flash memories: They have built-in programming and erasure capabilities, and data can be written while in-circuit, without needing any separate programmer.
  • 15.
    • ROM canimplement any combinational circuit, by storing the outputs for all of the input combinations. Hence, this method is also called as LUT method. Ex-1: Implement a 2 bit adder using ROM: Solution: Input : two 2-bit numbers. Output : Sum having 3-bits. • Can be implemented with 16 X 3 ROM.
  • 16.
    Data to bestored in memory: 0, 1, 2, 3, 1, 2, 3, 4, 2, 3, 4, 5, 3, 4, 5, 6
  • 17.
    Ex-2: Compute thesize of the ROM to implement an 8:3 priority encoder. There will be 256 entries in the ROM. Size of the ROM: 28 X 4
  • 18.
    Ex-3: Implement thefollowing state machine, of BCD to Excess-3 code converter, using ROM. PS NS Z X=0 X=1 X=0 X=1 S0 S1 S2 1 0 S1 S3 S4 1 0 S2 S4 S4 0 1 S3 S5 S5 0 1 S4 S5 S6 1 0 S5 S0 S0 0 1 S6 S0 - 1 -
  • 19.
    • Sequential circuitis designed using ROM and flip-flops. • ROM is used to realize the output functions and the next state equations. • The state of the circuit is stored in a register of D flip-flops, and fed back to the input of the ROM. • To realize the given Mealy machine, a ROM and 3 flip-flops are necessary.
  • 20.
    • The ROMwill generate the next state equations and output Z, from the present states and input X. • Q1, Q2, Q3 and X are connected to the address lines, with X connected to the LSB. • Contents of ROM are: 3, 4, 6, 8, 9, 8, A, B, B, C, 0, 1, 1, 0, 0, 0.
  • 21.
  • 22.
    • PLA withn-input lines and m-output lines, can realize m-functions of n-variables. • When compared to ROM, instead of decoder, AND array is used to realize the product terms. • Later on, OR array is used to sum the product terms.
  • 23.
    Ex-4: Using PLA,Realize the following functions: F0 = ⅀m(0,1,4,6) = (A'B'+AC') F1 = ⅀m(2,3,4,6,7) = (B+AC') F2 = ⅀m(0,1,2,6) = (A'B'+BC') F3 = ⅀m(2,3,5,6,7) = (AC+B) Solution: There are 3 inputs: A, B & C. There are 5 distinct product terms in the 4 outputs. Unlike ROM, in a PLA implementation, the product terms can be shared among the functions.
  • 24.
    F0 =⅀m(0,1,4,6)=(A'B'+AC') F1 =⅀m(2,3,4,6,7)=(B+AC') F2=⅀m(0,1,2,6)=(A'B'+BC') F3 =⅀m(2,3,5,6,7)=(AC+B)
  • 25.
    • Instead ofAND-OR logic, PLA may use NOR-NOR logic. • 2-input NOR gate can be built using nMOS transistors: • NOR-NOR with inverters at input and output = AND-OR. F0 = ⅀m(0,1,4,6) = (A'B'+AC') F1 = ⅀m(2,3,4,6,7) = (B+AC') F2 = ⅀m(0,1,2,6) = (A'B'+BC') F3 = ⅀m(2,3,5,6,7) = (AC+B)
  • 26.
    Ex-5: Using PLA,Realize the following functions: F1 = ⅀m(2,3,5,7,8,9,10,11,13,15) F2 = ⅀m(2,3,5,6,7,10,11,14,15) F3 = ⅀m(6,7,8,9,13,14,15) Solution: After minimization, the simplified functions are : F1 = ⅀m(2,3,5,7,8,9,10,11,13,15) = bd+b'c+ab' F2 = ⅀m(2,3,5,6,7,10,11,14,15) = c+a'bd F3 = ⅀m(6,7,8,9,13,14,15) = bc+ab'c'+abd Here, the PLA requires 8 different product terms.
  • 27.
    • To reducethe number of rows in PLA, these functions can be reorganized using K-map. F1 = a'bd+abd+b'c+ab'c' F2 = b'c+bc+a'bd F3 = bc+ab'c'+abd There are only 5 different product terms, and , the PLA table has only 5 rows.
  • 28.
    • In caseof PLA, unlike memory, the number of terms in each equation is not important, as the size of PLA does not depend on the number of terms within an equation. • To reduce the number of rows in PLA, instead of using K- maps, the Espresso algorithm can be used. This is a complex algorithm, which is used as logic minimization algorithm for VLSI synthesis. F1 = a'bd+abd+b'c+ab'c' F2 = b'c+bc+a'bd F3 = bc+ab'c'+abd The PLA implementation has 4 inputs, 5 product terms & 3 outputs.
  • 29.
  • 30.
    • It isa special case of PLA, in which AND array is programmable and OR array is fixed. • Due to this reason, PAL is less expensive than PLA, and is easier to program as well. • The following figure represents a segment of an un-programmed PAL, along with the input buffers which contain two outputs.
  • 31.
    Ex-6: Implement I1I2'+I1'I2. Solution: •As OR gates cannot be programmed, AND terms cannot be shared among two or more OR gates. • Typical PALs have 10 to 20 inputs, and 2 to 10 outputs, with 2 to 8 AND gates driving each OR gate.
  • 32.
    Ex-7: Implement afull-adder using PAL. Solution: SUM = X'Y'Cin+X'YCin'+XY'Cin'+XYCin COUT = XY+YCin+XCin
  • 33.
    • PALs weremade available that contained D flip-flops as well, and were called as “sequential PALs”. Ex-8: Implement Q+ = D = A'BQ'+AB'Q. Solution:
  • 34.
  • 35.
    • PALs andPLAs are good for implementing small circuitry. But, they are not re-programmable. • When they are made as erasable/reprogrammable, by incorporating Flash memory, such PALs are often referred as PLDs/GALs. • An example is 22CEV10, which is a CMOS electrically erasable PLD, that can realize both combinational as well as sequential circuits.
  • 36.
    • 22CEV10 contains: 12 declared input pins  10 pins that can be programmed as input / output  Programmable AND array (8 till 16 gates feeding each OR gate)  10 OR gates, each of which drives an output macrocell  10 D Flip-flops, with asynchronous reset and synchronous preset  Each macrocell contains the D Flip-flop, multiplexer, and additional programmability at the output • 22CEV10 => 22 pins out of which 10 are bidirectional
  • 38.
    • Each macrocellhas 2 programmable interconnect bits: S1 & S0. • When the particular bit is programmed, it is connected to 0 V. • Erasing that bit disconnects it from 0 V, and it floats at logic-1. S1 S0 Output 0 0 D Flip-flop output 0 1 D Flip-flop output inverted 1 0 OR output 1 1 OR output inverted
  • 39.
    • CAD programsare available for PAL/PLD programming. These programs accept logic equations, truth tables, state graphs or state tables as inputs. • They automatically generate the required bit patterns, which can be downloaded into a PLD programmer, which will create the necessary connections. • PALASM (Programmable Array Logic ASsembler for Military) from MMI & AMD, and ABEL (Advanced Boolean Expression Language) from DATA I/O are the two popular languages that are used for programming.
  • 41.
  • 42.
    • This isa programmable IC which is equivalent to several PLDs in the same silicon chip. Typically a CPLD comprises of 500 to 10,000 logic gates. • It consists of a number of PAL-like logic blocks, along with a programmable interconnect. The interconnect matrix is implemented using crossbar switch. Even though it is expensive, it results in predictable timing. • CPLDs are electronically erasable and reprogrammable, and hence are sometimes referred to as EPLDs (Erasable Programmable Logic Device).
  • 43.
     Typically aCPLD contains a number of macrocells, that are grouped into function blocks.  Each macrocell contains a flip- flop and an OR-gate, and the macrocell has its inputs connected to an AND gate array.  The major manufacturers of CPLD are: Xilinx, Altera, Lattice, Cypress and Atmel.
  • 45.
  • 47.
    • This CPLDhas 4 function blocks, and each block has 16 associated macrocells. A function block is a programmable AND-OR array, which is configured as a PLA. • Each macrocell contains a flip-flop and additional multiplexers, that route the signals from the function blocks to the I/O blocks or to the interconnect array. • The interconnect array selects signals from the macrocell outputs and the I/O blocks, and connects them back to function blocks. Thus, a signal generated from any function block can be used as an input to any other function block.
  • 49.
    • Initially, twoD-inputs have to be generated for the Flip-flops. • Later on, two outputs (Z1, Z2) have to be generated, by utilizing the Flip-flop outputs. • Hence, four macrocells are required for the implementation of the required Mealy machine. Ex-9: Implement a Mealy sequential machine with 2 inputs and 2 outputs.
  • 50.
    Ex-10: Implement aparallel adder with accumulator.
  • 51.
    • The accumulatorregister needs one FF for each bit. • But that bit also needs to generate the sum and carry bits corresponding to that particular bit. • Hence, each bit of an adder requires two macrocells, one for the sum and the accumulator, and the other for the carry.
  • 52.
    FPGA (Field Programmable GateArray) They contain an array of identical logic blocks with programmable interconnections. User can program the functions realized by each logic block, and can flexibly program the connections between them.
  • 53.
    ADVANTAGES DISADVANTAGES The time-to-marketof FPGA product is much much lesser. FPGAs are less dense than MPGAs. With FPGA, it is easier to correct the mistakes in the design. FPGAs are slower, due to the RC delay in programmable points. The prototyping cost is much reduced, with the usage of FPGA. Interconnect delays in FPGAs are unpredictable. At low volumes, FPGAs are cheaper than MPGAs. Programming overhead is much higher, because of the resources. MPGA versus FPGA
  • 54.
    When compared toCPLD the major advantage of FPGA is its highly flexible programmable interconnect, and due to this fact itself, the major disadvantage is its unpredictable interconnect delay.
  • 55.
    FPGA typically containsthree programmable elements: 1. Programmable logic blocks (Configurable Logic Blocks) 2. Programmable routing resources 3. Programmable I/O blocks
  • 56.
    • Programmable logicblocks • These are created by Muxes, LUTs, and AND-OR arrays. • Programming refers to: a) Changing the contents of LUT, b) Changing the I/O signals to the Muxes, c) Selecting or not selecting the particular gates in the AND-OR arrays. • Programmable interconnect • For making or breaking the specific connections. • For connecting various blocks in the chip to each other. • For connecting specific I/O pins to specific logic blocks. • Programmable I/O blocks • I/O pads can be programmed as i/p, o/p or bidirectional. • They also can be programmed as inverting, non-inverting, tri- state, slew rate adjustable, passive pull-up etc.
  • 57.
    • Based onthe topology in which the logic blocks and the interconnect resources are distributed inside, there can be four different basic architectures of FPGAs that are in the market since 1980s: • Matrix based architecture • Row based architecture • Hierarchical PLD architecture • Sea-of-gates architecture • Modern FPGAs that are in the market, contain special purpose blocks including a microprocessor. Architectures of FPGA
  • 58.
    1. Matrix basedarchitecture (e.g., Most Xilinx FPGAs) • This architecture is also called as “symmetrical array”, and it contains 8X8 arrays in smaller chips, and 100X100 or larger arrays in larger chips. • Routing is called two-dimensional channeled routing, since routing resources are available in horizontal and vertical directions.
  • 59.
    2. Row basedarchitecture (e.g., some Microsemi FPGAs) • The logic blocks are organized into rows, and hence, there are rows of logic blocks, and rows of routing resources. • Routing is called one-dimensional channeled routing, as the routing resources are channeled between the rows.
  • 60.
    3. Hierarchical PLDarchitecture (e.g., Altera APEX20, APEX II) • At the lower level, the FPGAs contain clusters of logic blocks with localized resources for interconnection. • At the higher level, the global interconnect is used for interconnection between the clusters of logic blocks.
  • 61.
    4. Sea-of-gates architecture(e.g., Microsemi Fusion) • FPGAs contain a large number of gates, and there is an interconnect superimposed on the sea-of-gates. • There are other terminologies such as sea-of-cells or sea-of-tiles, to indicate the topology with a large number of logic blocks.
  • 62.
    • The term“Programming technology” is used to denote the technology by which the programmability in an FPGA is achieved, especially for the programmable interconnect. • Some of the techniques are: • SRAM programming technology • EPROM / EEPROM / Flash programming technology • Antifuse programming technology FPGA Programming Technologies
  • 63.
    SRAM Programming Technology •As in the case of ROM, an SRAM can be used to store the “configuration bits” for interconnection, in an LUT. • e.g., Sixteen SRAM cells can implement any function of four variables. • The programmable interconnect can be achieved by SRAM, in the following two ways: • Pass transistor is used for connecting two points • Routing matrices are implemented by using mux
  • 64.
    Disadvantages of SRAMProgramming Technology 1. Six transistors are required for every SRAM cell. • e.g., if FPGA has 1 million programmable points, 6 million transistors are required for achieving this programmability. 2. Since SRAM is volatile, all the contents are lost during power failure. This is a serious setback when an FPGA is used in the final product. • As a solution, EPROM can be used as “boot ROM”, to store the configuration bits, and its contents can be transferred to SRAM whenever power gets resumed. Advantages of SRAM Programming Technology 1. As SRAM is a volatile memory, new contents can be written again and again, thus providing flexibility during prototyping. 2. Fabrication steps for manufacturing SRAM are same as that for manufacturing other logic cells.
  • 65.
    EPROM / EEPROM/ Flash Programming Technology • Instead of SRAM, EPROM cells are used to control the programmable interconnections. Each EPROM cell contains a MOSFET, which has two gates: Control gate and Floating gate. • The drain of the transistor can be connected to VDD by means of a pull-up resistor. When a high voltage (10 - 13 V) is applied to the control gate, electrons get injected into the floating gate, and the transistor turns OFF. • The electrons remain trapped at the floating gate. The trapped negative charges can be removed, by exposing the EPROM to UV light.
  • 66.
    Disadvantages of EPROMProgramming Technology 1. EPROM is slower than SRAM, because of the dual-gate structure. 2. While manufacturing, EPROMs require more processing steps than SRAMs. 3. EPROM based switches have high ON-resistance, and also have high static- power-consumption. 4. For erasure, the EPROM chip has to be physically removed from the PCB.  EEPROM is similar to EPROM, but removal of the gate charge can be done electrically. Hence, for erasure, the chip need not be removed from PCB.  The memory cells can be selectively erased and can be rewritten, and this does not require any additional equipment.  Flash is a form of EEPROM, in which a block of cells can be erased at once, by applying a large voltage at the control gate, causing the electrons to pull off.  By sensing the amount of current flow, each cell in Flash can store multiple bits of information, which in turn depends on the number of trapped electrons.  While writing bits into, Flash is faster than EEPROM, but slower than SRAM.
  • 67.
    • Antifuse programmingelement changes from high resistance (open - OFF) to low resistance (closed - ON), when a high voltage is applied. • Antifuses are built by dielectric layers between N+ diffusion and polysilicon layers, or by amorphous silicon in between metal layers. Antifuse Programming Technology Advantages: • When compared to MOSFETs, the area consumed by the antifuse is smaller. • Antifuse based connections are faster than SRAM / EPROM technologies. Disadvantages: • The antifuse connection is OTP. • Because of this, design change is not possible.
  • 68.
    Comparison of FPGAProgramming Technologies Programming technology Storage Programmability Area overhead Resistance Capacitance SRAM Volatile In-Circuit reprogrammable Large Medium to high High EPROM Non- volatile Out-of-Circuit reprogrammable Small High High EEPROM / Flash Non- volatile In-Circuit reprogrammable Medium to large High High Antifuse Non- volatile Not reprogrammable Small Low Low
  • 69.
    • Manufacturers usedifferent names to denote their logic blocks: • Xilinx calls them as Configurable Logic Blocks (CLB). • Microsemi calls them as VersaTiles. • Altera calls them as Logic Elements (LE), and a group of LEs is called as Logic Array Blocks(LABs). • Mainly two types of logic blocks are used in FPGAs: 1. LUT based programmable logic blocks. 2. Mux based programmable logic blocks. I. Programmable logic block architectures
  • 70.
    • Look UpTable contains memory cells along with multiplexers. • The output for each input combination is stored in memory cells. • The input combination is used as control inputs to the multiplexer. • For a 2-variable function, 4 memory cells and a 2:1 mux is required. • For an n-input function, 2n memory cells and 2n :1 mux are required.
  • 71.
    1. LUT (Look-UpTable) based Programmable logic block • Each block contains two LUT4 and two flip-flops. • The LUT4 can generate any one function of 4 variables. • The flip-flop has chip enable, set and reset inputs. • A multiplexer is used to select in between the combinational and the latched version of the LUT4 output. • The multiplexer is controlled by a bit stored in memory.
  • 72.
    • Choosing X1as LSB and X4 as MSB, X4 input need not be used, as F1 uses only 3 variables. • To store the contents in the LUT, the truth table of the function has to be constructed. • From the truth table, the contents of LUT to implement the function F1 will be {0,1,1,0,0,0,1,1}. • As LUT4 contains 16 memory cells for output, it is better to store the other 8 bits as well, irrespective of the status of X4. • Thus, the contents of LUT are {0,1,1,0,0,0,1,1,0,1,1,0,0,0,1,1}. Ex-11: Implement the function A'B'C+A'BC'+AB, using LUT.
  • 73.
    2. Multiplexer basedProgrammable logic block • With LUT, it is not necessary to minimize the function, as the number of terms in the function is not important (all o/p bits need to be stored). • But LUT requires storage space. To save it, multiplexers along with basic gates, can be used.
  • 74.
    • As thereare 3 variables, we can choose a 4:1 mux, which has 2 select lines. • The truth table can be constructed, so as to define the output in terms of C. • The mux select lines can be A & B, and the mux input lines can be connected in accordance with the last column in the truth table. A B C F1 Mux i/p in terms of C 0 0 0 0 C 0 0 1 1 C 0 1 0 1 C' 0 1 1 0 C' 1 0 0 0 0 1 0 1 0 0 1 1 0 1 1 1 1 1 1 1 Ex-12: Implement the function A'B'C+A'BC'+AB, using mux.
  • 76.
    II. Programmable interconnect 1.General purpose interconnect The completely non-blocking switch matrix is very expensive. e.g., in a 4X4 matrix, out of 16 switches, only 4 switches are utilized at any point of time. Crosspoint switch matrix 6-way switch To reduce the number of multiple connections for a single route, the crosspoint can be configured as a 6-way switch. But, this crosspoint is more complicated than the earlier one. The interconnect in between the logic blocks should provide flexible interconnection in between the rows and columns (e.g., row- column, row-row, column-column).
  • 77.
    2. Direct interconnect Directinterconnect to 4 neighbors Special connections to 8 neighbors To reduce the delay in the switch matrix, many FPGAs provide direct connections between the logic blocks, by means of dedicated switches.
  • 78.
    3. Global interconnect lines For highfan-out and low-skew clock distribution, FPGAs provide routing lines that span the entire width & height of the device. When the clock is distributed to a few million gates in the chip, the delay in the wire causes the clock edges to arrive at different times at different parts of the chip. This is called as “clock skew”, which needs to be eradicated, for the faithful functionality of the circuitry on the chip.
  • 79.
    Interconnects in row-based FPGAs Theprevious interconnects discussed, are applicable to matrix-based architecture, which has symmetrical arrays. For row-based architecture, as it is one-dimensional, it has arrays of switches in the routing channel, which is situated in between the logic blocks. i) Non-segmented ii) Segmented When the 3 connections required are x, y & z, they can be done in 2 ways: non- segmented (full length track, faster), segmented (reduced resources, slower) Example nets
  • 80.
    II. Programmable I/Oblocks • I/O blocks on modern FPGAs allow the use of a pin as true or inverted, direct or latched, input or output, and so on. • The I/O options can be selected by means of the configuration memory cells, indicated in the figure as “M”. • The inversion is performed using an XOR gate, and one memory-bit. • The direction of the pin is decided using a tri-state buffer, and its control can be selected as active high or active low, using another memory-bit. • Similarly, the rate-of-change of output (slew rate), and the pull-up option (open drain, built- in resistor), can be configured using the memory cells (SRAM, EEPROM / Flash, antifuse).
  • 81.
    Dedicated Specialized Components inFPGA 1. Dedicated memory: The embedded RAM, can be used to implement the memory needs of the circuit, that is being designed. 2. Dedicated Arithmetic Units: The custom implementation of adders and multipliers inside FPGA, is smaller and faster, than its counterpart that is implemented using FPGA. 3. DSP Blocks: To support DSP applications, the vendors provide the hardware inside the FPGA for encryption/decryption, FFTs, FIR filters, IIR filters, compression/decompression, and so forth. 4. Embedded Processors: This is a hybrid solution where part of the design is in a programmable processor (high flexibility), and the remaining part is implemented in hardware (better performance). 5. Content Addressable Memory: This is a special kind of memory in which the content, and not the address, is used to search the memory.
  • 82.
    1. Rapid Prototyping •As FPGAs contain 5 million or more gates, many large real-world systems can prototyped very quickly using a single FPGA. • If a single FPGA will not suffice, multiple FPGAs can be interconnected to realize larger systems, by plugging the boards into a backplane. 2. Final Products in Medium Speed Systems • Circuits realized using FPGAs typically operate in the range of 150-200 MHz. If this speed is sufficient, FPGAs can be used for the final product, instead of the prototype. • In the final product, if enhancements to the system are required, they can be done as software updates, rather than hardware changes. 3. Glue Logic • This is a digital circuitry that works as an interface between two different logic modules. • Using SRAM FPGAs, the new interface logic can be implemented on the same FPGA. 4. Hardware Accelerators / Coprocessors • For a software application, an FPGA can be used as a coprocessor, so that it is used to implement a key kernel, and thus the application can be accelerated. • Examples of such applications are - pattern matching, computer architecture simulator, emulator boards, hardware testing boards, and so on. Applications of FPGA
  • 83.
    Design Flow forFPGACreate a behavioral, RTL or structural model of the design using HDL Simulate and Debug the Design Synthesize the design targeting the desired device Run a mapping of the design, that will break the logic diagram into pieces that will fit into the CLBs Run the place-and-route program, to place the logic blocks in FPGA and to route the interconnections Run a program that will generate the bit pattern that is necessary to program the FPGA Download the bit pattern into the configuration cells and test the operation of FPGA 1 & 2 3&4&5 6 & 7 1 2 3 4 5 6 7
  • 84.
    STATE MACHINE CHARTS A “State Machine” is used to control a digital system that carries out a step-by-step procedure or an algorithm.  A “State Diagram” or “State Graph” is used to specify the operation of such state machine.  A “State Machine Chart” is an alternative to state diagram, and the SM chart has the following advantages: • It offers an easier understanding of the digital system. • It automatically satisfies the conditions of the state graph (exactly one true transition from a state at any time, unique definition of the next state for every input combination). • It directly leads to a hardware realization of the system.
  • 85.
    • An SMchart contains 3 principal components, as shown. • An SM chart is constructed from SM blocks, where each SM block describes the machine operation during one state. • Therefore, each SM block contains exactly one state box, together with decision boxes and conditional output boxes that are associated with that particular state. • Thus, an SM block contains exactly one entrance path, and one or more exit paths.
  • 86.
    • A paththrough an SM block from entrance to exit is called as “link path”. • In an SM block, when the system enters that state, the outputs in the state box become true. • e.g., when state S1 is entered, Z1 & Z2 become 1. If X1 = 0, then Z3 & Z4 also become 1. If X2 = 0, then the machine goes to the next state via exit path 1. During this condition, Z5 remains at 0. • If X1 = 1, then Z3 & Z4 remain at 0, and if X3 = 0, then Z5 becomes 1, and the machine goes to the next state via exit path 3.
  • 87.
    • A givenSM block can be drawn in different forms, as shown in the figure. • Here, Z1 = A + BC. As this is a combinational circuit, there is only one state, and there is no state change. • The second SM chart allows for individual testing of input variables, and the function is, Z1 = A + A'BC, which is the same.
  • 88.
    Rules for constructingan SM block 1. For every valid combination of input variables, exactly one exit path must be defined. 2. Within an SM block, no internal feedback is allowed. 3. SM block can be drawn either in a serial form or in a parallel form. Both are equivalent, as all the tests take place within one clock time.
  • 89.
    A given stategraph can be converted into an equivalent SM chart, as shown. This state graph has 3 Moore outputs (Za, Zb, Zc) and 2 Mealy outputs (Z1, Z2). Hence, the Moore outputs will appear in state boxes and Mealy outputs will appear in conditional output boxes. Each SM block will have only one decision box, as there is only one input variable to be tested.
  • 90.
    Example: Derivation ofSM chart for a Binary multiplier • Abbreviations: St = Start, Sh = Shift, Ad = Add, M = current multiplier bit, K = completion signal. • If M = 1, the multiplicand is added to the contents of accumulator, followed by a right shift. If M = 0, then the addition is skipped, and only the right shift occurs. • Conversion of the SM chart into Verilog code is a straightforward process. • “case” statement can be used to specify each state, and “if” statement can be used for the conditional output boxes.
  • 91.
    Verilog code forthe Binary multiplier
  • 92.
    Realization of SMcharts Example-1:  As there are 3 states, the state assignments can be 00, 01 & 11.  Taking these values as A & B, Za = A'B', Zb = A'B, Zc = AB, Z1 = ABX', Z2 = ABX.  From the link paths 2 & 3, the next state of A can be written as, A+ = A'BX + ABX  From the link paths 1, 2 & 3, the next state of B is written as, B+ = A'B'X + A'BX + ABX
  • 93.
    Procedure for derivingthe next state equation 1. Perform state assignment for all of the states. 2. Write the output equations directly from the SM chart. 3. For the next state, identify all the states in which Q = 1. 4. Find all the link paths that lead into the particular state. 5. For each link path, find a term that has value equal to 1. 6. The expression for Q+ is formed by ORing all the terms. 7. Q+ is realized using D-FF and combinational circuit.
  • 94.
    Example-2:  Asthere are 4 states, the state assignments can be 00, 01, 10 & 11, respectively for S0, S1, S2 & S3.  Load = A'B'St, Ad = A'BM, Sh = A'BM' + AB'  A is true in S2 & S3. Hence, A+ = A'BM + A'BM'K + AB'K  B is true in S1 & S3. Hence, B+ = A'B'St + A'BM'K' + AB'K' + A'BM'K + AB'K Or, B+ = A'B'St + A'BM' + AB'
  • 95.
    A B StM K A+ B+ Load Sh Ad Done S0 0 0 0 - - 0 0 0 0 0 0 0 0 1 - - 0 1 1 0 0 0 S1 0 1 - 0 0 0 1 0 1 0 0 0 1 - 0 1 1 1 0 1 0 0 0 1 - 1 - 1 0 0 0 1 0 S2 1 0 - - 0 0 1 0 1 0 0 1 0 - - 1 1 1 0 1 0 0 S3 1 1 - - - 0 0 0 0 0 1 State transition table for multiplier control