Velammal Engineering College Department of Computer Science and Engineering Welcome… Dr.S.Gunasundari Mr. A. Arockia Abins & Ms. R. Amirthavalli, CSE, Velammal Engineering College
Subject Code / Name: 19IT202T / Computer Architecture
Syllabus – Unit V UNIT-V MEMORY & I/O SYSTEMS Memory Hierarchy – memory technologies – Cache Memory – Performance Considerations Virtual Memory, TLB’s – Accessing I/O devices – Interrupts – Direct Memory Access – Bus Structure – Bus operation.
Text Books ● Book 1: ● Name: Computer Organization and Design: The Hardware/Software Interface ● Authors: David A. Patterson and John L. Hennessy ● Publisher: Morgan Kaufmann / Elsevier ● Edition: Fifth Edition, 2014 ● Book 2: ● Name: Computer Organization and Embedded Systems Interface ● Authors: Carl Hamacher, Zvonko Vranesic, Safwat Zaky and Naraig Manjikian ● Publisher: Tata McGraw Hill ● Edition: Sixth Edition, 2012
The Memory System
Memory ● A computer memory is the storage space in the computer system where both data and the program instructions are stored.
Memory size ● bit – ‘0’ or ‘1’ ● 8 bits – 1 byte
Traditional Architecture Up to 2k addressable MDR MAR Connection of the memory to the processor. k- bit address bus n- bit data bus Control lines ( , MFC, etc.) Processo r Memory locations Word length = n bits W R /
Basic Concepts ● The maximum size of the memory that can be used in any computer is determined by the addressing scheme. 16-bit addresses = 216 = 64K memory locations ● Most modern computers are byte addressable. 2 k 4 - 2 k 3 - 2 k 2 - 2 k 1 - 2 k 4 - 2 k 4 - 0 1 2 3 4 5 6 7 0 0 4 2 k 1 - 2 k 2 - 2 k 3 - 2 k 4 - 3 2 1 0 7 6 5 4 Byte address Byte address (a) Big-endian assignment (b) Little-endian assignment 4 Word address • • • • • •
Internal organization of memory chips
Memory hierarchy
Memory hierarchy ● A structure that uses multiple levels of memories; as the distance from the processor increases, the size of the memories and the access time both increase.
Speed, Size, and Cost Processor Primary cach e Secondar y cach e Mai n Magnetic disk memory Increasin g siz e Increasing spee d Memory hierarchy secondary memory Increasing cost per bit Registers L 1 L2
● Check whether the following statements are true not? ● Most of the cost of the memory hierarchy is at the highest level. ● Most of the capacity of the memory hierarchy is at the lowest level.
Basic Terms in Memory 1.HIT : If the data requested by the processor appears in some block in the upper level, this is called a hit . 2. Miss : If the data is not found in the upper level, the request is called a miss. The lower level in the hierarchy is then accessed to retrieve the block containing the requested data. 3. HIT RATE OR HIT RATIO : It is the fraction of memory accesses found in the upper level; it is often used as a measure of the performance of the memory hierarchy. 4.MISS RATE : It is the fraction of memory accesses not found in the upper level. 5. HIT TIME : It is the time to access the upper level of the memory hierarchy, which includes the time needed to determine whether the access is a hit or a miss. 6. MISS PENALTY : It is the time to replace a block in the upper level with the corresponding block from the lower level, plus the time to deliver this block to
Memory Technologies Four primary Technologies: 1. Main Memory -- DRAM 2. Cache Memory -- SRAM 3. Flash Memory -- nonvolatile memory is the secondary memory in Personal Mobile Devices 4.Secondary Memory -- Magnetic Disk
SRAM Cell ◼ Two transistor inverters are cross connected to implement a basic flip-flop. ◼ The cell is connected to one word line and two bits lines by transistors T1 and T2 ◼ When word line is at ground level, the transistors are turned off and the latch retains its state ◼ Read operation: In order to read state of SRAM cell, the word line is activated to close switches T1 and T2. Sense/Write circuits at the bottom monitor the state of b and b’ Y X Word line Bit lines b T 2 T1 b ′
SRAM Technology ● The circuits are capable of retaining their state as long as power is applied. ● value is stored on a pair of inverting gates ● very fast but takes up more space than DRAM (4 to 6 transistors) ● SRAMs are said to be volatile memories because their contents are lost when power is interrupted. ● Static RAMs can be accessed very quickly
DRAMs ● Static RAMs are fast, but they cost more area and are more expensive. ● Dynamic RAMs (DRAMs) are cheap and area efficient, but they can not retain their state indefinitely – need to be periodically refreshed. ● value is stored as a charge on capacitor (must be refreshed) ● very small but slower than SRAM (factor of 5 to 10) single-transistor dynamic memory cell T C Word line Bit line
◼ Static RAMs (SRAMs): ▪ Consist of circuits that are capable of retaining their state as long as the power is applied. ▪ Volatile memories, because their contents are lost when power is interrupted. ▪ Access times of static RAMs are in the range of few nanoseconds. ▪ However, the cost is usually high. ◼ Dynamic RAMs (DRAMs): ▪ Do not retain their state indefinitely. ▪ Contents must be periodically refreshed. ▪ Contents may be refreshed while accessing them for reading.
Questions ◼ What is mean by memory hierarchy? ◼ What are the factors to be considered in memory hierarchy? ◼ Define hit and miss. ◼ Comparison between SRAM and DRAM.
Latency & Bandwidth ◼ Memory latency is the time it takes to transfer a word of data to or from memory ◼ Memory bandwidth is the number of bits or bytes that can be transferred in one second.
Flash Memory ▪ Has similar approach to EEPROM. ▪ Read the contents of a single cell, but write the contents of an entire block of cells. ▪ Flash devices have greater density. ▪ Higher capacity and low storage cost per bit. ▪ Power consumption of flash memory is very low, making it attractive for use in equipment that is battery-driven. ▪ Single flash chips are not sufficiently large, so larger memory modules are implemented using flash cards and flash drives.
Secondary Memory ● Permanent memory or Non-volatile memory ● Used to store programs and data on a long-term basis ● Storage capacity : very high (ex: 1 TB) ● Access speed : very slow
Organization of Data on a Disk Sector 0, track 0 Sector 3, track n Organization of one surface of a disk. Sector 0, track 1
Secondary Memory
Tracks divided into sectors Magnetic Disk Structure Surface organized into tracks
Disk Access Head in position above a track
Disk Access Rotation is counter-clockwise
Disk Access – Read About to read blue sector
Disk Access – Read After BLUE read After reading blue sector
Disk Access – Read After BLUE read Red request scheduled next
Disk Access – Seek After BLUE read Seek for RED Seek to red’s track
Disk Access – Rotational Latency After BLUE read Seek for RED Rotational latency Wait for red sector to rotate around
Disk Access – Read After BLUE read Seek for RED Rotational latency After RED read Complete read of red
Disk Access – Service Time Components After BLUE read Seek for RED Rotational latency After RED read Seek Rotational Latency Data Transfer
Speed, Size, and Cost ◼ A big challenge in the design of a computer system is to provide a sufficiently large memory, with a reasonable speed at an affordable cost. ◼ Static RAM: ▪ Very fast, but expensive, because a basic SRAM cell has a complex circuit making it impossible to pack a large number of cells onto a single chip. ◼ Dynamic RAM: ▪ Simpler basic cell circuit, hence are much less expensive, but significantly slower than SRAMs. ◼ Magnetic disks: ▪ Storage provided by DRAMs is higher than SRAMs, but is still less than what is necessary. ▪ Secondary storage such as magnetic disks provide a large amount of storage, but is much slower than DRAMs.
Cache Memories
Cache ● What is cache memory? ● Why we need it? ● Locality of reference (very important) - temporal - spatial ● Cache block – cache line ● A set of contiguous address locations of some size
What is cache memory? ● The cache memory is a small-sized high speed volatile memory that provides high speed data access to a processor. ● The cache memory stores the frequently used computer programs, applications and the program data.
Locality of Reference ◼ Analysis of programs indicates that many instructions in localized areas of a program are executed repeatedly during some period of time, while the others are accessed relatively less frequently. ▪ These instructions may be the ones in a loop, nested loop or few procedures calling each other repeatedly. ▪ This is called “locality of reference”. ◼ Temporal locality of reference: ▪ Recently executed instruction is likely to be executed again very soon. ◼ Spatial locality of reference: ▪ Instructions with addresses close to a recently instruction are likely to be executed soon.
Cache ● Replacement algorithm ● Hit / miss ● Write-through / Write-back ● Load through Use of a cache memory. Cache Main memory Processor
Cache hit • If the data is in the cache it is called a Read or Write hit. • Read hit: ▪ The data is obtained from the cache. • Write hit: ▪ Cache has a replica of the contents of the main memory. ▪ Contents of the cache and the main memory may be updated simultaneously. This is the write-through protocol. ▪ Update the contents of the cache, and mark it as updated by setting a bit known as the dirty bit or modified bit. The contents of the main memory are updated when this block is replaced. This is write-back or copy-back protocol.
Cache miss • If the data is not present in the cache, then a Read miss or Write miss occurs. • Read miss: ▪ Block of words containing this requested word is transferred from the memory. ▪ After the block is transferred, the desired word is forwarded to the processor. ▪ The desired word may also be forwarded to the processor as soon as it is transferred without waiting for the entire block to be transferred. This is called load-through or early-restart. • Write-miss: ▪ Write-through protocol is used, then the contents of the main memory are updated directly. ▪ If write-back protocol is used, the block containing the addressed word is first brought into the cache. The desired word is overwritten with new information.
Mapping functions ◼ Mapping functions determine how memory blocks are placed in the cache. ◼ A simple processor example: ▪ Cache consisting of 128 blocks of 16 words each. ▪ Total size of cache is 2048 (2K) words. ▪ Main memory is addressable by a 16-bit address. ▪ Main memory has 64K words. ▪ Main memory has 4K blocks of 16 words each. ◼ Three mapping functions: ▪ Direct mapping ▪ Associative mapping ▪ Set-associative mapping.
Direct Mapping ta g ta g ta g Cache Main memor y Block 0 Block 1 Block 127 Block 128 Block 129 Block 255 Block 256 Block 257 Block 4095 Block 0 Block 1 Block 127 7 4 Main memory address Tag Block Wor d Figure 5.15. Direct-mapped cache. 5 4: one of 16 words. (each block has 16=24 words) 7: points to a particular block in the cache (128=27 ) 5: 5 tag bits are compared with the tag bits associated with its location in the cache. Identify which of the 32 blocks that are resident in the cache (4096/128). Block j of main memory maps onto block j modulo 128 of the cache
Direct Mapping ● Tag: 11101 ● Block: 1111111=127, in the 127th block of the cache ● Word:1100=12, the 12th word of the 127th block in the cache 7 4 Main memory address Tag Block Wor d 5 11101,1111111,1100
Associative Mapping 4 ta g ta g ta g Cache Main memor y Block 0 Block 1 Block i Block 4095 Block 0 Block 1 Block 127 12 Main memory address Figure 5.16. Associative-mapped cache. Tag Wor d 4: one of 16 words. (each block has 16=24 words) 12: 12 tag bits Identify which of the 4096 blocks that are resident in the cache 4096=212 .
Associative Mapping ● Tag: 111011111111 ● Word:1100=12, the 12th word of a block in the cache 111011111111,1100 4 12 Main memory address Tag Wor d
Set-Associative Mapping ta g ta g ta g Cache Main memor y Block 0 Block 1 Block 63 Block 64 Block 65 Block 127 Block 128 Block 129 Block 4095 Block 0 Block 1 Block 126 ta g ta g Block 2 Block 3 ta g Block 127 Main memory address 6 6 4 Tag Se t Wor d Set 0 Set 1 Set 63 Figure 5.17. Set-associative-mapped cache with two blocks per set. 4: one of 16 words. (each block has 16=24 words) 6: points to a particular set in the cache (128/2=64=26 ) 6: 6 tag bits is used to check if the desired block is present (4096/64=26 ).
Set-Associative Mapping ● Tag: 111011 ● Set: 111111=63, in the 63th set of the cache ● Word:1100=12, the 12th word of the 63th set in the cache Main memory address 6 6 4 Tag Se t Wor d 111011,111111,1100
Replacement Algorithms ● Difficult to determine which blocks to kick out ● Least Recently Used (LRU) block ● The cache controller tracks references to all blocks as computation proceeds. ● Increase / clear track counters when a hit/miss occurs
● List out the mapping functions. ● What is mean by cache memory?
Virtual Memories
Overview ● Physical main memory is not as large as the address space spanned by an address issued by the processor. 232 = 4 GB, 264 = … ● When a program does not completely fit into the main memory, the parts of it not currently being executed are stored on secondary storage devices. ● Techniques that automatically move program and data blocks into the physical main memory when they are required for execution are called virtual-memory techniques. ● Virtual addresses will be translated into physical addresses.
Virtual Memory ◼ Virtual memory is an architectural solution to increase the effective size of the memory system.
Overview Memory Management Unit
Address Translation ● All programs and data are composed of fixed- length units called pages, each of which consists of a block of words that occupy contiguous locations in the main memory. ● Page cannot be too small or too large. ● The virtual memory mechanism bridges the size and speed gaps between the main memory and secondary storage – similar to cache.
Address Translation Page frame Virtual address from processor in memory Offset Offset Virtual page number Page table address Page table base register Figure 5.27. Virtual-memory address translation. Contro l bit s Physical address in main memory PAGE TABLE Page frame +
Address Translation ● The page table information is used by the MMU for every access, so it is supposed to be with the MMU. ● However, since MMU is on the processor chip and the page table is rather large, only small portion of it, which consists of the page table entries that correspond to the most recently accessed pages, can be accommodated within the MMU. ● Translation Lookaside Buffer (TLB)
TLB Figure 5.28. Use of an associative-mapped TLB. No Yes Hit Miss Virtual address from processor TLB Offset Virtual page number number Virtual page Page frame in memory Control bits Offset Physical address in main memory Page frame =?
Address translation (contd..) ◼ What happens if a program generates an access to a page that is not in the main memory? ◼ In this case, a page fault is said to occur. ▪ Whole page must be brought into the main memory from the disk, before the execution can proceed. ◼ Upon detecting a page fault by the MMU, following actions occur: ▪ MMU asks the operating system to intervene by raising an exception. ▪ Processing of the active task which caused the page fault is interrupted. ▪ Control is transferred to the operating system. ▪ Operating system copies the requested page from secondary storage to the main memory. ▪ Once the page is copied, control is returned to the task which was interrupted. 62
Address translation (contd..) ◼ When a new page is to be brought into the main memory from secondary storage, the main memory may be full. ▪ Some page from the main memory must be replaced with this new page. ◼ How to choose which page to replace? ▪ This is similar to the replacement that occurs when the cache is full. ▪ The principle of locality of reference (?) can also be applied here. ▪ A replacement strategy similar to LRU can be applied. ◼ Since the size of the main memory is relatively larger compared to cache, a relatively large amount of programs and data can be held in the main memory. ▪ Minimizes the frequency of transfers between secondary storage and main memory. 63
Cache & Virtual Memory ◼ Cache memory: ▪ Introduced to bridge the speed gap between the processor and the main memory. ▪ Implemented in hardware. ◼ Virtual memory: ▪ Introduced to bridge the speed gap between the main memory and secondary storage. ▪ Implemented in part by software. 64
INPUT / OUTPUT ORGANIZATION
Accessing I/O Devices
Accessing I/O devices Bus I/O device 1 I/O device n Processor Memory • Multiple I/O devices may be connected to the processor and the memory via a bus. • Bus consists of three sets of lines to carry address, data and control signals. • Each I/O device is assigned an unique address. • To access an I/O device, the processor places the address on the address lines. • The device recognizes the address, and responds to the control signals.
Accessing I/O devices (contd..)  I/O devices and the memory may share the same address space:  Memory-mapped I/O.  Any machine instruction that can access memory can be used to transfer data to or from an I/O device.  Simpler software.  I/O devices and the memory may have different address spaces:  Special instructions to transfer data to and from I/O devices.  I/O devices may have to deal with fewer address lines.  I/O address lines need not be physically separate from memory address lines.  In fact, address lines may be shared between I/O devices and memory, with a control signal to indicate whether it is a memory address or an I/O address. 68
Accessing I/O devices (contd..) I/O interface decoder Address Data and status registers Control circuits Input device Bus Address lines Data lines Control lines • I/O device is connected to the bus using an I/O interface circuit which has: - Address decoder, control circuit, and data and status registers. • Address decoder decodes the address placed on the address lines thus enabling the device to recognize its address. • Data register holds the data being transferred to or from the processor. • Status register holds information necessary for the operation of the I/O device. • Data and status registers are connected to the data lines, and have unique addresses. • I/O interface circuit coordinates I/O transfers.
Accessing I/O devices (contd..)  Recall that the rate of transfer to and from I/O devices is slower than the speed of the processor. This creates the need for mechanisms to synchronize data transfers between them.  Program-controlled I/O:  Processor repeatedly monitors a status flag to achieve the necessary synchronization.  Processor polls the I/O device.  Two other mechanisms used for synchronizing data transfers between the processor and memory:  Interrupts.  Direct Memory Access.
Interrupts
Interrupts • In program-controlled I/O, when the processor continuously monitors the status of the device, it does not perform any useful tasks. • An alternate approach would be for the I/O device to alert the processor when it becomes ready. o Do so by sending a hardware signal called an interrupt to the processor. o At least one of the bus control lines, called an interrupt-request line is dedicated for this purpose. • Processor can perform other useful tasks while it is waiting for the device to be ready.
Interrupt • An interrupt is an event that causes the execution of one program to be suspended and execution of another program to be begin.
Interrupts (contd..) Interrupt Service routine Program 1 here Interrupt occurs M i 2 1 i 1 + • Processor is executing the instruction located at address i when an interrupt occurs. • Routine executed in response to an interrupt request is called the interrupt-service routine. • When an interrupt occurs, control must be transferred to the interrupt service routine. • But before transferring control, the current contents of the PC (i+1), must be saved in a known location. • This will enable the return-from-interrupt instruction to resume execution at i+1. • Return address, or the contents of the PC are usually stored on the processor stack.
Example • Some computations + print • Two subroutines: COMPUTE and PRINT • The printer accepts only one line of text at a time. • Try to overlap printing and computation.  COMPUTE produces first n lines of text;  PRINT sends the first line to the printer; then PRINT is suspended; COMPUTE continues to perform other computations;  After the printer finishes printing the first line, it send an interrupt-request signal to the processor;  In response, the processor interrupts execution of COMPUTE and transfers control to PRINT to send the next line;  COMPUTE resumes;  …
Handling Multiple Devices • How can the processor recognize the device requesting an interrupt? • Given that different devices are likely to require different interrupt-service routines, how can the processor obtain the starting address of the appropriate routine in each case? • (Vectored interrupts) • Should a device be allowed to interrupt the processor while another interrupt is being serviced? • (Interrupt nesting) • How should two or more simultaneous interrupt requests be handled? • (Daisy-chain)
Vectored Interrupts • The device requesting an interrupt may identify itself directly to the processor. o Device can do so by sending a special code (4 to 8 bits) the processor over the bus. o Code supplied by the device may represent a part of the starting address of the interrupt-service routine. o The remainder of the starting address is obtained by the processor based on other information such as the range of memory addresses where interrupt service routines are located. • Usually the location pointed to by the interrupting device is used to store the starting address of the interrupt-service routine.
Interrupt Nesting Priority arbitration Device 1 Device 2 Device p Processor INTA1 INTR1 INTRp INTAp • Each device has a separate interrupt-request and interrupt-acknowledge line. • Each interrupt-request line is assigned a different priority level. • Interrupt requests received over these lines are sent to a priority arbitration circuit in the processor. • If the interrupt request has a higher priority level than the priority of the processor, then the request is accepted.
Interrupts (contd..) Processor Device 2 INTR INTA Device n Device 1 Polling scheme: • If the processor uses a polling mechanism to poll the status registers of I/O devices to determine which device is requesting an interrupt. • In this case the priority is determined by the order in which the devices are polled. • The first device with status bit set to 1 is the device whose interrupt request is accepted. Daisy chain scheme: • Devices are connected to form a daisy chain. • Devices share the interrupt-request line, and interrupt-acknowledge line is connected to form a daisy chain. • When devices raise an interrupt request, the interrupt-request line is activated. • The processor in response activates interrupt-acknowledge. • Received by device 1, if device 1 does not need service, it passes the signal to device 2. • Device that is electrically closest to the processor has the highest priority.
Interrupts (contd..) • When I/O devices were organized into a priority structure, each device had its own interrupt-request and interrupt-acknowledge line. • When I/O devices were organized in a daisy chain fashion, the devices shared an interrupt-request line, and the interrupt-acknowledge propagated through the devices. • A combination of priority structure and daisy chain scheme can also used. Device Device circuit Priority arbitration Processor Device Device INTR1 INTR p INTA1 INTAp • Devices are organized into groups. • Each group is assigned a different priority level. • All the devices within a single group share an interrupt-request line, and are connected to form a daisy chain.
Exceptions  Interrupts caused by interrupt-requests sent by I/O devices.  Interrupts could be used in many other situations where the execution of one program needs to be suspended and execution of another program needs to be started.  In general, the term exception is used to refer to any event that causes an interruption.  Interrupt-requests from I/O devices is one type of an exception.  Other types of exceptions are:  Recovery from errors  Debugging  Privilege exception
Direct Memory Access
Direct Memory Access (contd..)  Direct Memory Access (DMA):  A special control unit may be provided to transfer a block of data directly between an I/O device and the main memory, without continuous intervention by the processor.  Control unit which performs these transfers is a part of the I/O device’s interface circuit. This control unit is called as a DMA controller.  DMA controller performs functions that would be normally carried out by the processor:  For each word, it provides the memory address and all the control signals.  To transfer a block of data, it increments the memory addresses and keeps track of the number of transfers.
Direct Memory Access (contd..)  DMA controller can transfer a block of data from an external device to the processor, without any intervention from the processor.  However, the operation of the DMA controller must be under the control of a program executed by the processor. That is, the processor must initiate the DMA transfer.  To initiate the DMA transfer, the processor informs the DMA controller of:  Starting address,  Number of words in the block.  Direction of transfer (I/O device to the memory, or memory to the I/O device).  Once the DMA controller completes the DMA transfer, it informs the processor by raising an interrupt signal.
Direct Memory Access memory Processor System bus Main Keyboard Disk/DMA controller Printer DMA controller Disk Disk • DMA controller connects a high-speed network to the computer bus. • Disk controller, which controls two disks also has DMA capability. It provides two DMA channels. • It can perform two independent DMA operations, as if each disk has its own DMA controller. The registers to store the memory address, word count and status and control information are duplicated. Network Interface
Direct Memory Access (contd..)  Processor and DMA controllers have to use the bus in an interwoven fashion to access the memory.  DMA devices are given higher priority than the processor to access the bus.  Among different DMA devices, high priority is given to high-speed peripherals such as a disk or a graphics display device.  Processor originates most memory access cycles on the bus.  DMA controller can be said to “steal” memory access cycles from the bus. This interweaving technique is called as “cycle stealing”.  An alternate approach is the provide a DMA controller an exclusive capability to initiate transfers on the bus, and hence exclusive access to the main memory. This is known as the block or burst mode.
Bus arbitration  Processor and DMA controllers both need to initiate data transfers on the bus and access main memory.  The device that is allowed to initiate transfers on the bus at any given time is called the bus master.  When the current bus master relinquishes its status as the bus master, another device can acquire this status.  The process by which the next device to become the bus master is selected and bus mastership is transferred to it is called bus arbitration.  Centralized arbitration:  A single bus arbiter performs the arbitration.  Distributed arbitration:  All devices participate in the selection of the next bus master.
Centralized Bus Arbitration Processor DMA controller 1 DMA controller 2 BG1 BG2 B R B BS Y
Centralized Bus Arbitration(cont.,) • Bus arbiter may be the processor or a separate unit connected to the bus. • Normally, the processor is the bus master, unless it grants bus membership to one of the DMA controllers. • DMA controller requests the control of the bus by asserting the Bus Request (BR) line. • In response, the processor activates the Bus-Grant1 (BG1) line, indicating that the controller may use the bus when it is free. • BG1 signal is connected to all DMA controllers in a daisy chain fashion. • BBSY signal is 0, it indicates that the bus is busy. When BBSY becomes 1, the DMA controller which asserted BR can acquire control of the bus.
Centralized arbitration (contd..) BBSY BG1 BG2 Bus master BR Processor DMA controller 2 Processor Time DMA controller 2 asserts the BR signal. Processor asserts the BG1 signal BG1 signal propagates to DMA#2. Processor relinquishes control of the bus by setting BBSY to 1.
Distributed arbitration  All devices waiting to use the bus share the responsibility of carrying out the arbitration process.  Arbitration process does not depend on a central arbiter and hence distributed arbitration has higher reliability.  Each device is assigned a 4-bit ID number.  All the devices are connected using 5 lines, 4 arbitration lines to transmit the ID, and one line for the Start-Arbitration signal.  To request the bus a device:  Asserts the Start-Arbitration signal.  Places its 4-bit ID number on the arbitration lines.  The pattern that appears on the arbitration lines is the logical-OR of all the 4-bit device IDs placed on the arbitration lines.
Distributed arbitration
Distributed arbitration(Contd.,) • Arbitration process: o Each device compares the pattern that appears on the arbitration lines to its own ID, starting with MSB. o If it detects a difference, it transmits 0s on the arbitration lines for that and all lower bit positions. o The pattern that appears on the arbitration lines is the logical-OR of all the 4-bit device IDs placed on the arbitration lines.
Distributed arbitration (contd..) • Device A has the ID 5 and wants to request the bus: - Transmits the pattern 0101 on the arbitration lines. • Device B has the ID 6 and wants to request the bus: - Transmits the pattern 0110 on the arbitration lines. • Pattern that appears on the arbitration lines is the logical OR of the patterns: - Pattern 0111 appears on the arbitration lines. Arbitration process: • Each device compares the pattern that appears on the arbitration lines to its own ID, starting with MSB. • If it detects a difference, it transmits 0s on the arbitration lines for that and all lower bit positions. • Device A compares its ID 5 with a pattern 0101 to pattern 0111. • It detects a difference at bit position 0, as a result, it transmits a pattern 0100 on the arbitration lines. • The pattern that appears on the arbitration lines is the logical-OR of 0100 and 0110, which is 0110. • This pattern is the same as the device ID of B, and hence B has won the arbitration.
Buses
Buses • Processor, main memory, and I/O devices are interconnected by means of a bus. • Bus provides a communication path for the transfer of data. o Bus also includes lines to support interrupts and arbitration. • A bus protocol is the set of rules that govern the behavior of various devices connected to the bus, as to when to place information on the bus, when to assert control signals, etc.
Buses (contd..)  Bus lines may be grouped into three types:  Data  Address  Control  Control signals specify:  Whether it is a read or a write operation.  Required size of the data, when several operand sizes (byte, word, long word) are possible.  Timing information to indicate when the processor and I/O devices may place data or receive data from the bus.  Schemes for timing of data transfers over a bus can be classified into:  Synchronous,  Asynchronous.
Synchronous bus Bus clock Bus cycle
Synchronous bus (contd..) Bus cycle Data Bus clock command Address and t0 t1 t2 Time Master places the device address and command on the bus, and indicates that it is a Read operation. Addressed slave places data on the data lines Master “strobes” the data on the data lines into its input buffer, for a Read operation. • In case of a Write operation, the master places the data on the bus along with the address and commands at time t0. • The slave strobes the data into its input buffer at time t2.
Synchronous bus (contd..) • Once the master places the device address and command on the bus, it takes time for this information to propagate to the devices: o This time depends on the physical and electrical characteristics of the bus. • Also, all the devices have to be given enough time to decode the address and control signals, so that the addressed slave can place data on the bus. • Width of the pulse t1 - t0 depends on: o Maximum propagation delay between two devices connected to the bus. o Time taken by all the devices to decode the address and control signals, so that the addressed slave can respond at time t1.
Synchronous bus (contd..) • At the end of the clock cycle, at time t2, the master strobes the data on the data lines into its input buffer if it’s a Read operation. o “Strobe” means to capture the values of the data and store them into a buffer. • When data are to be loaded into a storage buffer register, the data should be available for a period longer than the setup time of the device. • Width of the pulse t2 - t1 should be longer than: o Maximum propagation time of the bus plus o Set up time of the input buffer register of the master.
Synchronous bus (contd..) Data Bus clock command Address and t 0 t1 t 2 command Address and Data Seen by master Seen by slave tAM tAS tDS tDM Time • Signals do not appear on the bus as soon as they are placed on the bus, due to the propagation delay in the interface circuits. • Signals reach the devices after a propagation delay which depends on the characteristics of the bus. • Data must remain on the bus for some time after t2 equal to the hold time of the buffer. Address & command appear on the bus. Address & command reach the slave. Data appears on the bus. Data reaches the master.
Synchronous bus (contd..) • Data transfer has to be completed within one clock cycle. o Clock period t2 - t0 must be such that the longest propagation delay on the bus and the slowest device interface must be accommodated. o Forces all the devices to operate at the speed of the slowest device. • Processor just assumes that the data are available at t2 in case of a Read operation, or are read by the device in case of a Write operation. o What if the device is actually failed, and never really responded?
Synchronous bus (contd..) • Most buses have control signals to represent a response from the slave. • Control signals serve two purposes: o Inform the master that the slave has recognized the address, and is ready to participate in a data transfer operation. o Enable to adjust the duration of the data transfer operation based on the speed of the participating slaves. • High-frequency bus clock is used: o Data transfer spans several clock cycles instead of just one clock cycle as in the earlier case.
Synchronous bus (contd..) 1 2 3 4 Clock Address Command Data Slave-ready Time Address & command requesting a Read operation appear on the bus. Slave places the data on the bus, and asserts Slave-ready signal. Master strobes data into the input buffer. Clock changes are seen by all the devices at the same time.
Asynchronous bus  Data transfers on the bus is controlled by a handshake between the master and the slave.  Common clock in the synchronous bus case is replaced by two timing control lines:  Master-ready,  Slave-ready.  Master-ready signal is asserted by the master to indicate to the slave that it is ready to participate in a data transfer.  Slave-ready signal is asserted by the slave in response to the master-ready from the master, and it indicates to the master that the slave is ready to participate in a data transfer.
Asynchronous bus (contd..) • Data transfer using the handshake protocol: o Master places the address and command information on the bus. o Asserts the Master-ready signal to indicate to the slaves that the address and command information has been placed on the bus. o All devices on the bus decode the address. o Address slave performs the required operation, and informs the processor it has done so by asserting the Slave-ready signal. o Master removes all the signals from the bus, once Slave-ready is asserted. o If the operation is a Read operation, Master also strobes the data into its input buffer.
Asynchronous bus (contd..) Slave-ready Data Master-ready and command Address Bus cycle t1 t2 t3 t4 t5 t0 Time t0 - Master places the address and command information on the bus. t1 - Master asserts the Master-ready signal. Master-ready signal is asserted at t1 instead of t0 t2 - Addressed slave places the data on the bus and asserts the Slave-ready signal. t3 - Slave-ready signal arrives at the master . t4 - Master removes the address and command information. t5 - Slave receives the transition of the Master-ready signal from 1 to 0. It removes
Asynchronous vs. Synchronous bus • Advantages of asynchronous bus: o Eliminates the need for synchronization between the sender and the receiver. o Can accommodate varying delays automatically, using the Slave-ready signal. • Disadvantages of asynchronous bus: o Data transfer rate with full handshake is limited by two-round trip delays. o Data transfers using a synchronous bus involves only one round trip delay, and hence a synchronous bus can achieve faster rates.
Thank You…

Computer Architecture Input Output Memory.pptx

  • 1.
    Velammal Engineering College Departmentof Computer Science and Engineering Welcome… Dr.S.Gunasundari Mr. A. Arockia Abins & Ms. R. Amirthavalli, CSE, Velammal Engineering College
  • 2.
    Subject Code /Name: 19IT202T / Computer Architecture
  • 3.
    Syllabus – UnitV UNIT-V MEMORY & I/O SYSTEMS Memory Hierarchy – memory technologies – Cache Memory – Performance Considerations Virtual Memory, TLB’s – Accessing I/O devices – Interrupts – Direct Memory Access – Bus Structure – Bus operation.
  • 4.
    Text Books ● Book1: ● Name: Computer Organization and Design: The Hardware/Software Interface ● Authors: David A. Patterson and John L. Hennessy ● Publisher: Morgan Kaufmann / Elsevier ● Edition: Fifth Edition, 2014 ● Book 2: ● Name: Computer Organization and Embedded Systems Interface ● Authors: Carl Hamacher, Zvonko Vranesic, Safwat Zaky and Naraig Manjikian ● Publisher: Tata McGraw Hill ● Edition: Sixth Edition, 2012
  • 5.
  • 6.
    Memory ● A computermemory is the storage space in the computer system where both data and the program instructions are stored.
  • 7.
    Memory size ● bit– ‘0’ or ‘1’ ● 8 bits – 1 byte
  • 8.
    Traditional Architecture Up to2k addressable MDR MAR Connection of the memory to the processor. k- bit address bus n- bit data bus Control lines ( , MFC, etc.) Processo r Memory locations Word length = n bits W R /
  • 9.
    Basic Concepts ● Themaximum size of the memory that can be used in any computer is determined by the addressing scheme. 16-bit addresses = 216 = 64K memory locations ● Most modern computers are byte addressable. 2 k 4 - 2 k 3 - 2 k 2 - 2 k 1 - 2 k 4 - 2 k 4 - 0 1 2 3 4 5 6 7 0 0 4 2 k 1 - 2 k 2 - 2 k 3 - 2 k 4 - 3 2 1 0 7 6 5 4 Byte address Byte address (a) Big-endian assignment (b) Little-endian assignment 4 Word address • • • • • •
  • 10.
  • 11.
  • 12.
    Memory hierarchy ● Astructure that uses multiple levels of memories; as the distance from the processor increases, the size of the memories and the access time both increase.
  • 13.
    Speed, Size, andCost Processor Primary cach e Secondar y cach e Mai n Magnetic disk memory Increasin g siz e Increasing spee d Memory hierarchy secondary memory Increasing cost per bit Registers L 1 L2
  • 14.
    ● Check whetherthe following statements are true not? ● Most of the cost of the memory hierarchy is at the highest level. ● Most of the capacity of the memory hierarchy is at the lowest level.
  • 15.
    Basic Terms inMemory 1.HIT : If the data requested by the processor appears in some block in the upper level, this is called a hit . 2. Miss : If the data is not found in the upper level, the request is called a miss. The lower level in the hierarchy is then accessed to retrieve the block containing the requested data. 3. HIT RATE OR HIT RATIO : It is the fraction of memory accesses found in the upper level; it is often used as a measure of the performance of the memory hierarchy. 4.MISS RATE : It is the fraction of memory accesses not found in the upper level. 5. HIT TIME : It is the time to access the upper level of the memory hierarchy, which includes the time needed to determine whether the access is a hit or a miss. 6. MISS PENALTY : It is the time to replace a block in the upper level with the corresponding block from the lower level, plus the time to deliver this block to
  • 16.
    Memory Technologies Four primaryTechnologies: 1. Main Memory -- DRAM 2. Cache Memory -- SRAM 3. Flash Memory -- nonvolatile memory is the secondary memory in Personal Mobile Devices 4.Secondary Memory -- Magnetic Disk
  • 17.
    SRAM Cell ◼ Twotransistor inverters are cross connected to implement a basic flip-flop. ◼ The cell is connected to one word line and two bits lines by transistors T1 and T2 ◼ When word line is at ground level, the transistors are turned off and the latch retains its state ◼ Read operation: In order to read state of SRAM cell, the word line is activated to close switches T1 and T2. Sense/Write circuits at the bottom monitor the state of b and b’ Y X Word line Bit lines b T 2 T1 b ′
  • 18.
    SRAM Technology ● Thecircuits are capable of retaining their state as long as power is applied. ● value is stored on a pair of inverting gates ● very fast but takes up more space than DRAM (4 to 6 transistors) ● SRAMs are said to be volatile memories because their contents are lost when power is interrupted. ● Static RAMs can be accessed very quickly
  • 19.
    DRAMs ● Static RAMsare fast, but they cost more area and are more expensive. ● Dynamic RAMs (DRAMs) are cheap and area efficient, but they can not retain their state indefinitely – need to be periodically refreshed. ● value is stored as a charge on capacitor (must be refreshed) ● very small but slower than SRAM (factor of 5 to 10) single-transistor dynamic memory cell T C Word line Bit line
  • 20.
    ◼ Static RAMs(SRAMs): ▪ Consist of circuits that are capable of retaining their state as long as the power is applied. ▪ Volatile memories, because their contents are lost when power is interrupted. ▪ Access times of static RAMs are in the range of few nanoseconds. ▪ However, the cost is usually high. ◼ Dynamic RAMs (DRAMs): ▪ Do not retain their state indefinitely. ▪ Contents must be periodically refreshed. ▪ Contents may be refreshed while accessing them for reading.
  • 21.
    Questions ◼ What ismean by memory hierarchy? ◼ What are the factors to be considered in memory hierarchy? ◼ Define hit and miss. ◼ Comparison between SRAM and DRAM.
  • 22.
    Latency & Bandwidth ◼Memory latency is the time it takes to transfer a word of data to or from memory ◼ Memory bandwidth is the number of bits or bytes that can be transferred in one second.
  • 23.
    Flash Memory ▪ Hassimilar approach to EEPROM. ▪ Read the contents of a single cell, but write the contents of an entire block of cells. ▪ Flash devices have greater density. ▪ Higher capacity and low storage cost per bit. ▪ Power consumption of flash memory is very low, making it attractive for use in equipment that is battery-driven. ▪ Single flash chips are not sufficiently large, so larger memory modules are implemented using flash cards and flash drives.
  • 24.
    Secondary Memory ● Permanentmemory or Non-volatile memory ● Used to store programs and data on a long-term basis ● Storage capacity : very high (ex: 1 TB) ● Access speed : very slow
  • 25.
    Organization of Dataon a Disk Sector 0, track 0 Sector 3, track n Organization of one surface of a disk. Sector 0, track 1
  • 26.
  • 27.
    Tracks divided intosectors Magnetic Disk Structure Surface organized into tracks
  • 28.
    Disk Access Head inposition above a track
  • 29.
    Disk Access Rotation iscounter-clockwise
  • 30.
    Disk Access –Read About to read blue sector
  • 31.
    Disk Access –Read After BLUE read After reading blue sector
  • 32.
    Disk Access –Read After BLUE read Red request scheduled next
  • 33.
    Disk Access –Seek After BLUE read Seek for RED Seek to red’s track
  • 34.
    Disk Access –Rotational Latency After BLUE read Seek for RED Rotational latency Wait for red sector to rotate around
  • 35.
    Disk Access –Read After BLUE read Seek for RED Rotational latency After RED read Complete read of red
  • 36.
    Disk Access –Service Time Components After BLUE read Seek for RED Rotational latency After RED read Seek Rotational Latency Data Transfer
  • 37.
    Speed, Size, andCost ◼ A big challenge in the design of a computer system is to provide a sufficiently large memory, with a reasonable speed at an affordable cost. ◼ Static RAM: ▪ Very fast, but expensive, because a basic SRAM cell has a complex circuit making it impossible to pack a large number of cells onto a single chip. ◼ Dynamic RAM: ▪ Simpler basic cell circuit, hence are much less expensive, but significantly slower than SRAMs. ◼ Magnetic disks: ▪ Storage provided by DRAMs is higher than SRAMs, but is still less than what is necessary. ▪ Secondary storage such as magnetic disks provide a large amount of storage, but is much slower than DRAMs.
  • 38.
  • 39.
    Cache ● What iscache memory? ● Why we need it? ● Locality of reference (very important) - temporal - spatial ● Cache block – cache line ● A set of contiguous address locations of some size
  • 40.
    What is cachememory? ● The cache memory is a small-sized high speed volatile memory that provides high speed data access to a processor. ● The cache memory stores the frequently used computer programs, applications and the program data.
  • 41.
    Locality of Reference ◼Analysis of programs indicates that many instructions in localized areas of a program are executed repeatedly during some period of time, while the others are accessed relatively less frequently. ▪ These instructions may be the ones in a loop, nested loop or few procedures calling each other repeatedly. ▪ This is called “locality of reference”. ◼ Temporal locality of reference: ▪ Recently executed instruction is likely to be executed again very soon. ◼ Spatial locality of reference: ▪ Instructions with addresses close to a recently instruction are likely to be executed soon.
  • 42.
    Cache ● Replacement algorithm ●Hit / miss ● Write-through / Write-back ● Load through Use of a cache memory. Cache Main memory Processor
  • 43.
    Cache hit • Ifthe data is in the cache it is called a Read or Write hit. • Read hit: ▪ The data is obtained from the cache. • Write hit: ▪ Cache has a replica of the contents of the main memory. ▪ Contents of the cache and the main memory may be updated simultaneously. This is the write-through protocol. ▪ Update the contents of the cache, and mark it as updated by setting a bit known as the dirty bit or modified bit. The contents of the main memory are updated when this block is replaced. This is write-back or copy-back protocol.
  • 44.
    Cache miss • Ifthe data is not present in the cache, then a Read miss or Write miss occurs. • Read miss: ▪ Block of words containing this requested word is transferred from the memory. ▪ After the block is transferred, the desired word is forwarded to the processor. ▪ The desired word may also be forwarded to the processor as soon as it is transferred without waiting for the entire block to be transferred. This is called load-through or early-restart. • Write-miss: ▪ Write-through protocol is used, then the contents of the main memory are updated directly. ▪ If write-back protocol is used, the block containing the addressed word is first brought into the cache. The desired word is overwritten with new information.
  • 45.
    Mapping functions ◼ Mappingfunctions determine how memory blocks are placed in the cache. ◼ A simple processor example: ▪ Cache consisting of 128 blocks of 16 words each. ▪ Total size of cache is 2048 (2K) words. ▪ Main memory is addressable by a 16-bit address. ▪ Main memory has 64K words. ▪ Main memory has 4K blocks of 16 words each. ◼ Three mapping functions: ▪ Direct mapping ▪ Associative mapping ▪ Set-associative mapping.
  • 46.
    Direct Mapping ta g ta g ta g Cache Main memor y Block 0 Block1 Block 127 Block 128 Block 129 Block 255 Block 256 Block 257 Block 4095 Block 0 Block 1 Block 127 7 4 Main memory address Tag Block Wor d Figure 5.15. Direct-mapped cache. 5 4: one of 16 words. (each block has 16=24 words) 7: points to a particular block in the cache (128=27 ) 5: 5 tag bits are compared with the tag bits associated with its location in the cache. Identify which of the 32 blocks that are resident in the cache (4096/128). Block j of main memory maps onto block j modulo 128 of the cache
  • 47.
    Direct Mapping ● Tag:11101 ● Block: 1111111=127, in the 127th block of the cache ● Word:1100=12, the 12th word of the 127th block in the cache 7 4 Main memory address Tag Block Wor d 5 11101,1111111,1100
  • 48.
    Associative Mapping 4 ta g ta g ta g Cache Main memor y Block 0 Block1 Block i Block 4095 Block 0 Block 1 Block 127 12 Main memory address Figure 5.16. Associative-mapped cache. Tag Wor d 4: one of 16 words. (each block has 16=24 words) 12: 12 tag bits Identify which of the 4096 blocks that are resident in the cache 4096=212 .
  • 49.
    Associative Mapping ● Tag:111011111111 ● Word:1100=12, the 12th word of a block in the cache 111011111111,1100 4 12 Main memory address Tag Wor d
  • 50.
    Set-Associative Mapping ta g ta g ta g Cache Main memor y Block 0 Block1 Block 63 Block 64 Block 65 Block 127 Block 128 Block 129 Block 4095 Block 0 Block 1 Block 126 ta g ta g Block 2 Block 3 ta g Block 127 Main memory address 6 6 4 Tag Se t Wor d Set 0 Set 1 Set 63 Figure 5.17. Set-associative-mapped cache with two blocks per set. 4: one of 16 words. (each block has 16=24 words) 6: points to a particular set in the cache (128/2=64=26 ) 6: 6 tag bits is used to check if the desired block is present (4096/64=26 ).
  • 51.
    Set-Associative Mapping ● Tag:111011 ● Set: 111111=63, in the 63th set of the cache ● Word:1100=12, the 12th word of the 63th set in the cache Main memory address 6 6 4 Tag Se t Wor d 111011,111111,1100
  • 52.
    Replacement Algorithms ● Difficultto determine which blocks to kick out ● Least Recently Used (LRU) block ● The cache controller tracks references to all blocks as computation proceeds. ● Increase / clear track counters when a hit/miss occurs
  • 53.
    ● List outthe mapping functions. ● What is mean by cache memory?
  • 54.
  • 55.
    Overview ● Physical mainmemory is not as large as the address space spanned by an address issued by the processor. 232 = 4 GB, 264 = … ● When a program does not completely fit into the main memory, the parts of it not currently being executed are stored on secondary storage devices. ● Techniques that automatically move program and data blocks into the physical main memory when they are required for execution are called virtual-memory techniques. ● Virtual addresses will be translated into physical addresses.
  • 56.
    Virtual Memory ◼ Virtualmemory is an architectural solution to increase the effective size of the memory system.
  • 57.
  • 58.
    Address Translation ● Allprograms and data are composed of fixed- length units called pages, each of which consists of a block of words that occupy contiguous locations in the main memory. ● Page cannot be too small or too large. ● The virtual memory mechanism bridges the size and speed gaps between the main memory and secondary storage – similar to cache.
  • 59.
    Address Translation Page frame Virtual addressfrom processor in memory Offset Offset Virtual page number Page table address Page table base register Figure 5.27. Virtual-memory address translation. Contro l bit s Physical address in main memory PAGE TABLE Page frame +
  • 60.
    Address Translation ● Thepage table information is used by the MMU for every access, so it is supposed to be with the MMU. ● However, since MMU is on the processor chip and the page table is rather large, only small portion of it, which consists of the page table entries that correspond to the most recently accessed pages, can be accommodated within the MMU. ● Translation Lookaside Buffer (TLB)
  • 61.
    TLB Figure 5.28. Useof an associative-mapped TLB. No Yes Hit Miss Virtual address from processor TLB Offset Virtual page number number Virtual page Page frame in memory Control bits Offset Physical address in main memory Page frame =?
  • 62.
    Address translation (contd..) ◼What happens if a program generates an access to a page that is not in the main memory? ◼ In this case, a page fault is said to occur. ▪ Whole page must be brought into the main memory from the disk, before the execution can proceed. ◼ Upon detecting a page fault by the MMU, following actions occur: ▪ MMU asks the operating system to intervene by raising an exception. ▪ Processing of the active task which caused the page fault is interrupted. ▪ Control is transferred to the operating system. ▪ Operating system copies the requested page from secondary storage to the main memory. ▪ Once the page is copied, control is returned to the task which was interrupted. 62
  • 63.
    Address translation (contd..) ◼When a new page is to be brought into the main memory from secondary storage, the main memory may be full. ▪ Some page from the main memory must be replaced with this new page. ◼ How to choose which page to replace? ▪ This is similar to the replacement that occurs when the cache is full. ▪ The principle of locality of reference (?) can also be applied here. ▪ A replacement strategy similar to LRU can be applied. ◼ Since the size of the main memory is relatively larger compared to cache, a relatively large amount of programs and data can be held in the main memory. ▪ Minimizes the frequency of transfers between secondary storage and main memory. 63
  • 64.
    Cache & VirtualMemory ◼ Cache memory: ▪ Introduced to bridge the speed gap between the processor and the main memory. ▪ Implemented in hardware. ◼ Virtual memory: ▪ Introduced to bridge the speed gap between the main memory and secondary storage. ▪ Implemented in part by software. 64
  • 65.
  • 66.
  • 67.
    Accessing I/O devices Bus I/Odevice 1 I/O device n Processor Memory • Multiple I/O devices may be connected to the processor and the memory via a bus. • Bus consists of three sets of lines to carry address, data and control signals. • Each I/O device is assigned an unique address. • To access an I/O device, the processor places the address on the address lines. • The device recognizes the address, and responds to the control signals.
  • 68.
    Accessing I/O devices (contd..) I/O devices and the memory may share the same address space:  Memory-mapped I/O.  Any machine instruction that can access memory can be used to transfer data to or from an I/O device.  Simpler software.  I/O devices and the memory may have different address spaces:  Special instructions to transfer data to and from I/O devices.  I/O devices may have to deal with fewer address lines.  I/O address lines need not be physically separate from memory address lines.  In fact, address lines may be shared between I/O devices and memory, with a control signal to indicate whether it is a memory address or an I/O address. 68
  • 69.
    Accessing I/O devices (contd..) I/O interface decoder AddressData and status registers Control circuits Input device Bus Address lines Data lines Control lines • I/O device is connected to the bus using an I/O interface circuit which has: - Address decoder, control circuit, and data and status registers. • Address decoder decodes the address placed on the address lines thus enabling the device to recognize its address. • Data register holds the data being transferred to or from the processor. • Status register holds information necessary for the operation of the I/O device. • Data and status registers are connected to the data lines, and have unique addresses. • I/O interface circuit coordinates I/O transfers.
  • 70.
    Accessing I/O devices (contd..) Recall that the rate of transfer to and from I/O devices is slower than the speed of the processor. This creates the need for mechanisms to synchronize data transfers between them.  Program-controlled I/O:  Processor repeatedly monitors a status flag to achieve the necessary synchronization.  Processor polls the I/O device.  Two other mechanisms used for synchronizing data transfers between the processor and memory:  Interrupts.  Direct Memory Access.
  • 71.
  • 72.
    Interrupts • In program-controlledI/O, when the processor continuously monitors the status of the device, it does not perform any useful tasks. • An alternate approach would be for the I/O device to alert the processor when it becomes ready. o Do so by sending a hardware signal called an interrupt to the processor. o At least one of the bus control lines, called an interrupt-request line is dedicated for this purpose. • Processor can perform other useful tasks while it is waiting for the device to be ready.
  • 73.
    Interrupt • An interruptis an event that causes the execution of one program to be suspended and execution of another program to be begin.
  • 74.
    Interrupts (contd..) Interrupt Serviceroutine Program 1 here Interrupt occurs M i 2 1 i 1 + • Processor is executing the instruction located at address i when an interrupt occurs. • Routine executed in response to an interrupt request is called the interrupt-service routine. • When an interrupt occurs, control must be transferred to the interrupt service routine. • But before transferring control, the current contents of the PC (i+1), must be saved in a known location. • This will enable the return-from-interrupt instruction to resume execution at i+1. • Return address, or the contents of the PC are usually stored on the processor stack.
  • 75.
    Example • Some computations+ print • Two subroutines: COMPUTE and PRINT • The printer accepts only one line of text at a time. • Try to overlap printing and computation.  COMPUTE produces first n lines of text;  PRINT sends the first line to the printer; then PRINT is suspended; COMPUTE continues to perform other computations;  After the printer finishes printing the first line, it send an interrupt-request signal to the processor;  In response, the processor interrupts execution of COMPUTE and transfers control to PRINT to send the next line;  COMPUTE resumes;  …
  • 76.
    Handling Multiple Devices • Howcan the processor recognize the device requesting an interrupt? • Given that different devices are likely to require different interrupt-service routines, how can the processor obtain the starting address of the appropriate routine in each case? • (Vectored interrupts) • Should a device be allowed to interrupt the processor while another interrupt is being serviced? • (Interrupt nesting) • How should two or more simultaneous interrupt requests be handled? • (Daisy-chain)
  • 77.
    Vectored Interrupts • Thedevice requesting an interrupt may identify itself directly to the processor. o Device can do so by sending a special code (4 to 8 bits) the processor over the bus. o Code supplied by the device may represent a part of the starting address of the interrupt-service routine. o The remainder of the starting address is obtained by the processor based on other information such as the range of memory addresses where interrupt service routines are located. • Usually the location pointed to by the interrupting device is used to store the starting address of the interrupt-service routine.
  • 78.
    Interrupt Nesting Priority arbitration Device1 Device 2 Device p Processor INTA1 INTR1 INTRp INTAp • Each device has a separate interrupt-request and interrupt-acknowledge line. • Each interrupt-request line is assigned a different priority level. • Interrupt requests received over these lines are sent to a priority arbitration circuit in the processor. • If the interrupt request has a higher priority level than the priority of the processor, then the request is accepted.
  • 79.
    Interrupts (contd..) Processor Device 2 INTR INTA Devicen Device 1 Polling scheme: • If the processor uses a polling mechanism to poll the status registers of I/O devices to determine which device is requesting an interrupt. • In this case the priority is determined by the order in which the devices are polled. • The first device with status bit set to 1 is the device whose interrupt request is accepted. Daisy chain scheme: • Devices are connected to form a daisy chain. • Devices share the interrupt-request line, and interrupt-acknowledge line is connected to form a daisy chain. • When devices raise an interrupt request, the interrupt-request line is activated. • The processor in response activates interrupt-acknowledge. • Received by device 1, if device 1 does not need service, it passes the signal to device 2. • Device that is electrically closest to the processor has the highest priority.
  • 80.
    Interrupts (contd..) • WhenI/O devices were organized into a priority structure, each device had its own interrupt-request and interrupt-acknowledge line. • When I/O devices were organized in a daisy chain fashion, the devices shared an interrupt-request line, and the interrupt-acknowledge propagated through the devices. • A combination of priority structure and daisy chain scheme can also used. Device Device circuit Priority arbitration Processor Device Device INTR1 INTR p INTA1 INTAp • Devices are organized into groups. • Each group is assigned a different priority level. • All the devices within a single group share an interrupt-request line, and are connected to form a daisy chain.
  • 81.
    Exceptions  Interrupts causedby interrupt-requests sent by I/O devices.  Interrupts could be used in many other situations where the execution of one program needs to be suspended and execution of another program needs to be started.  In general, the term exception is used to refer to any event that causes an interruption.  Interrupt-requests from I/O devices is one type of an exception.  Other types of exceptions are:  Recovery from errors  Debugging  Privilege exception
  • 82.
  • 83.
    Direct Memory Access (contd..) Direct Memory Access (DMA):  A special control unit may be provided to transfer a block of data directly between an I/O device and the main memory, without continuous intervention by the processor.  Control unit which performs these transfers is a part of the I/O device’s interface circuit. This control unit is called as a DMA controller.  DMA controller performs functions that would be normally carried out by the processor:  For each word, it provides the memory address and all the control signals.  To transfer a block of data, it increments the memory addresses and keeps track of the number of transfers.
  • 84.
    Direct Memory Access (contd..) DMA controller can transfer a block of data from an external device to the processor, without any intervention from the processor.  However, the operation of the DMA controller must be under the control of a program executed by the processor. That is, the processor must initiate the DMA transfer.  To initiate the DMA transfer, the processor informs the DMA controller of:  Starting address,  Number of words in the block.  Direction of transfer (I/O device to the memory, or memory to the I/O device).  Once the DMA controller completes the DMA transfer, it informs the processor by raising an interrupt signal.
  • 85.
    Direct Memory Access memory Processor Systembus Main Keyboard Disk/DMA controller Printer DMA controller Disk Disk • DMA controller connects a high-speed network to the computer bus. • Disk controller, which controls two disks also has DMA capability. It provides two DMA channels. • It can perform two independent DMA operations, as if each disk has its own DMA controller. The registers to store the memory address, word count and status and control information are duplicated. Network Interface
  • 86.
    Direct Memory Access (contd..) Processor and DMA controllers have to use the bus in an interwoven fashion to access the memory.  DMA devices are given higher priority than the processor to access the bus.  Among different DMA devices, high priority is given to high-speed peripherals such as a disk or a graphics display device.  Processor originates most memory access cycles on the bus.  DMA controller can be said to “steal” memory access cycles from the bus. This interweaving technique is called as “cycle stealing”.  An alternate approach is the provide a DMA controller an exclusive capability to initiate transfers on the bus, and hence exclusive access to the main memory. This is known as the block or burst mode.
  • 87.
    Bus arbitration  Processorand DMA controllers both need to initiate data transfers on the bus and access main memory.  The device that is allowed to initiate transfers on the bus at any given time is called the bus master.  When the current bus master relinquishes its status as the bus master, another device can acquire this status.  The process by which the next device to become the bus master is selected and bus mastership is transferred to it is called bus arbitration.  Centralized arbitration:  A single bus arbiter performs the arbitration.  Distributed arbitration:  All devices participate in the selection of the next bus master.
  • 88.
  • 89.
    Centralized Bus Arbitration(cont.,) • Busarbiter may be the processor or a separate unit connected to the bus. • Normally, the processor is the bus master, unless it grants bus membership to one of the DMA controllers. • DMA controller requests the control of the bus by asserting the Bus Request (BR) line. • In response, the processor activates the Bus-Grant1 (BG1) line, indicating that the controller may use the bus when it is free. • BG1 signal is connected to all DMA controllers in a daisy chain fashion. • BBSY signal is 0, it indicates that the bus is busy. When BBSY becomes 1, the DMA controller which asserted BR can acquire control of the bus.
  • 90.
    Centralized arbitration (contd..) BBSY BG1 BG2 Bus master BR Processor DMAcontroller 2 Processor Time DMA controller 2 asserts the BR signal. Processor asserts the BG1 signal BG1 signal propagates to DMA#2. Processor relinquishes control of the bus by setting BBSY to 1.
  • 91.
    Distributed arbitration  Alldevices waiting to use the bus share the responsibility of carrying out the arbitration process.  Arbitration process does not depend on a central arbiter and hence distributed arbitration has higher reliability.  Each device is assigned a 4-bit ID number.  All the devices are connected using 5 lines, 4 arbitration lines to transmit the ID, and one line for the Start-Arbitration signal.  To request the bus a device:  Asserts the Start-Arbitration signal.  Places its 4-bit ID number on the arbitration lines.  The pattern that appears on the arbitration lines is the logical-OR of all the 4-bit device IDs placed on the arbitration lines.
  • 92.
  • 93.
    Distributed arbitration(Contd.,) • Arbitration process: oEach device compares the pattern that appears on the arbitration lines to its own ID, starting with MSB. o If it detects a difference, it transmits 0s on the arbitration lines for that and all lower bit positions. o The pattern that appears on the arbitration lines is the logical-OR of all the 4-bit device IDs placed on the arbitration lines.
  • 94.
    Distributed arbitration (contd..) • DeviceA has the ID 5 and wants to request the bus: - Transmits the pattern 0101 on the arbitration lines. • Device B has the ID 6 and wants to request the bus: - Transmits the pattern 0110 on the arbitration lines. • Pattern that appears on the arbitration lines is the logical OR of the patterns: - Pattern 0111 appears on the arbitration lines. Arbitration process: • Each device compares the pattern that appears on the arbitration lines to its own ID, starting with MSB. • If it detects a difference, it transmits 0s on the arbitration lines for that and all lower bit positions. • Device A compares its ID 5 with a pattern 0101 to pattern 0111. • It detects a difference at bit position 0, as a result, it transmits a pattern 0100 on the arbitration lines. • The pattern that appears on the arbitration lines is the logical-OR of 0100 and 0110, which is 0110. • This pattern is the same as the device ID of B, and hence B has won the arbitration.
  • 95.
  • 96.
    Buses • Processor, mainmemory, and I/O devices are interconnected by means of a bus. • Bus provides a communication path for the transfer of data. o Bus also includes lines to support interrupts and arbitration. • A bus protocol is the set of rules that govern the behavior of various devices connected to the bus, as to when to place information on the bus, when to assert control signals, etc.
  • 97.
    Buses (contd..)  Buslines may be grouped into three types:  Data  Address  Control  Control signals specify:  Whether it is a read or a write operation.  Required size of the data, when several operand sizes (byte, word, long word) are possible.  Timing information to indicate when the processor and I/O devices may place data or receive data from the bus.  Schemes for timing of data transfers over a bus can be classified into:  Synchronous,  Asynchronous.
  • 98.
  • 99.
    Synchronous bus (contd..) Buscycle Data Bus clock command Address and t0 t1 t2 Time Master places the device address and command on the bus, and indicates that it is a Read operation. Addressed slave places data on the data lines Master “strobes” the data on the data lines into its input buffer, for a Read operation. • In case of a Write operation, the master places the data on the bus along with the address and commands at time t0. • The slave strobes the data into its input buffer at time t2.
  • 100.
    Synchronous bus (contd..) •Once the master places the device address and command on the bus, it takes time for this information to propagate to the devices: o This time depends on the physical and electrical characteristics of the bus. • Also, all the devices have to be given enough time to decode the address and control signals, so that the addressed slave can place data on the bus. • Width of the pulse t1 - t0 depends on: o Maximum propagation delay between two devices connected to the bus. o Time taken by all the devices to decode the address and control signals, so that the addressed slave can respond at time t1.
  • 101.
    Synchronous bus (contd..) •At the end of the clock cycle, at time t2, the master strobes the data on the data lines into its input buffer if it’s a Read operation. o “Strobe” means to capture the values of the data and store them into a buffer. • When data are to be loaded into a storage buffer register, the data should be available for a period longer than the setup time of the device. • Width of the pulse t2 - t1 should be longer than: o Maximum propagation time of the bus plus o Set up time of the input buffer register of the master.
  • 102.
    Synchronous bus (contd..) Data Busclock command Address and t 0 t1 t 2 command Address and Data Seen by master Seen by slave tAM tAS tDS tDM Time • Signals do not appear on the bus as soon as they are placed on the bus, due to the propagation delay in the interface circuits. • Signals reach the devices after a propagation delay which depends on the characteristics of the bus. • Data must remain on the bus for some time after t2 equal to the hold time of the buffer. Address & command appear on the bus. Address & command reach the slave. Data appears on the bus. Data reaches the master.
  • 103.
    Synchronous bus (contd..) •Data transfer has to be completed within one clock cycle. o Clock period t2 - t0 must be such that the longest propagation delay on the bus and the slowest device interface must be accommodated. o Forces all the devices to operate at the speed of the slowest device. • Processor just assumes that the data are available at t2 in case of a Read operation, or are read by the device in case of a Write operation. o What if the device is actually failed, and never really responded?
  • 104.
    Synchronous bus (contd..) •Most buses have control signals to represent a response from the slave. • Control signals serve two purposes: o Inform the master that the slave has recognized the address, and is ready to participate in a data transfer operation. o Enable to adjust the duration of the data transfer operation based on the speed of the participating slaves. • High-frequency bus clock is used: o Data transfer spans several clock cycles instead of just one clock cycle as in the earlier case.
  • 105.
    Synchronous bus (contd..) 12 3 4 Clock Address Command Data Slave-ready Time Address & command requesting a Read operation appear on the bus. Slave places the data on the bus, and asserts Slave-ready signal. Master strobes data into the input buffer. Clock changes are seen by all the devices at the same time.
  • 106.
    Asynchronous bus  Datatransfers on the bus is controlled by a handshake between the master and the slave.  Common clock in the synchronous bus case is replaced by two timing control lines:  Master-ready,  Slave-ready.  Master-ready signal is asserted by the master to indicate to the slave that it is ready to participate in a data transfer.  Slave-ready signal is asserted by the slave in response to the master-ready from the master, and it indicates to the master that the slave is ready to participate in a data transfer.
  • 107.
    Asynchronous bus (contd..) • Datatransfer using the handshake protocol: o Master places the address and command information on the bus. o Asserts the Master-ready signal to indicate to the slaves that the address and command information has been placed on the bus. o All devices on the bus decode the address. o Address slave performs the required operation, and informs the processor it has done so by asserting the Slave-ready signal. o Master removes all the signals from the bus, once Slave-ready is asserted. o If the operation is a Read operation, Master also strobes the data into its input buffer.
  • 108.
    Asynchronous bus (contd..) Slave-ready Data Master-ready and command Address Buscycle t1 t2 t3 t4 t5 t0 Time t0 - Master places the address and command information on the bus. t1 - Master asserts the Master-ready signal. Master-ready signal is asserted at t1 instead of t0 t2 - Addressed slave places the data on the bus and asserts the Slave-ready signal. t3 - Slave-ready signal arrives at the master . t4 - Master removes the address and command information. t5 - Slave receives the transition of the Master-ready signal from 1 to 0. It removes
  • 109.
    Asynchronous vs. Synchronous bus •Advantages of asynchronous bus: o Eliminates the need for synchronization between the sender and the receiver. o Can accommodate varying delays automatically, using the Slave-ready signal. • Disadvantages of asynchronous bus: o Data transfer rate with full handshake is limited by two-round trip delays. o Data transfers using a synchronous bus involves only one round trip delay, and hence a synchronous bus can achieve faster rates.
  • 110.

Editor's Notes

  • #27 Goal: Show the inefficeincy of current disk requests. Conveyed Ideas: Rotational latency is wasted time that can be used to service tasks Background Information: None. Slide Background: None. Kill text and arrows
  • #28 Goal: Show the inefficeincy of current disk requests. Conveyed Ideas: Rotational latency is wasted time that can be used to service tasks Background Information: None. Slide Background: None. Kill text and arrows
  • #29 Goal: Show the inefficeincy of current disk requests. Conveyed Ideas: Rotational latency is wasted time that can be used to service tasks Background Information: None. Slide Background: None. Kill text and arrows
  • #30 Goal: Show the inefficeincy of current disk requests. Conveyed Ideas: Rotational latency is wasted time that can be used to service tasks Background Information: None. Slide Background: None. Kill text and arrows
  • #31 Goal: Show the inefficeincy of current disk requests. Conveyed Ideas: Rotational latency is wasted time that can be used to service tasks Background Information: None. Slide Background: None. Kill text and arrows
  • #32 Goal: Show the inefficeincy of current disk requests. Conveyed Ideas: Rotational latency is wasted time that can be used to service tasks Background Information: None. Slide Background: None. Kill text and arrows
  • #33 Goal: Show the inefficeincy of current disk requests. Conveyed Ideas: Rotational latency is wasted time that can be used to service tasks Background Information: None. Slide Background: None. Kill text and arrows
  • #34 Goal: Show the inefficeincy of current disk requests. Conveyed Ideas: Rotational latency is wasted time that can be used to service tasks Background Information: None. Slide Background: None. Kill text and arrows
  • #35 Goal: Show the inefficeincy of current disk requests. Conveyed Ideas: Rotational latency is wasted time that can be used to service tasks Background Information: None. Slide Background: None. Kill text and arrows
  • #36 Goal: Show the inefficeincy of current disk requests. Conveyed Ideas: Rotational latency is wasted time that can be used to service tasks Background Information: None. Slide Background: None. Kill text and arrows
  • #83 This alternative approach is called as direct memory access. DMA consists of a special control unit which is provided to transfer a block of data directly between an I/O device and the main memory without continuous intervention by the processor. A control unit which performs these transfers without the intervention of the processor is a part of the I/O device’s interface circuit, and this controller is called as the DMA controller. DMA controller performs functions that would be normally be performed by the processor. The processor will have to provide a memory address and all the control signals. So, the DMA controller will also provide with the memory address where the data is going to be stored along with the necessary control signals. When a block of data needs to be transferred, the DMA controller will also have to increment the memory addresses and keep track of the number of words that have been transferred.
  • #84 Repeat DMA controller. DMA controller can be used to transfer a block of data from an external device to the processor, without requiring any help from the processor. As a result the processor is free to execute other programs. However, the DMA controller should perform the task of transferring data to or from an I/O device for a program that is being executed by a processor. That is, the DMA controller does not and should not have the capability to determine when a data transfer operation should take place. The processor must initiate DMA transfer of data, when it is indicated or required by the program that is being executed by the processor. When the processor determines that the program that is being executed requires a DMA transfer, it informs the DMA controller which sits in the interface circuit of the device of three things, namely, the starting address of the memory location, the number of words that needs to be transferred, and the direction of transfer that is, whether the data needs to be transferred from the I/O device to the memory or from the memory to the I/O device. After initiating the DMA transfer, the processor suspends the program that initiated the transfer, and continues with the execution of some other program. The program whose execution is suspended is said to be in the blocked state.
  • #85 Let us consider a memory organization with two DMA controllers. In this memory organization, a DMA controller is used to connect a high speed network to the computer bus. In addition, disk controller which also controls two disks may have DMA capability. The disk controller controls two disks and it also has DMA capability. The disk controller provides two DMA channels. The disk controller can two independent DMA operations, as if each disk has its own DMA controller. Each DMA controller has three registers, one to store the memory address, one to store the word count, and the last to store the status and control information. There are two copies of these three registers in order to perform independent DMA operations. That is, these registers are duplicated.
  • #86 Processor also has to transfer data to and from the main memory. Also, the DMA controller is responsible for transferring data to and from the I/O device to the main memory. Both the processor and the DMA controller have to use the external bus to talk to the main memory. Usually, DMA controllers are given higher priority than the processor to access the bus. Now, we also need to decide the priority among different DMA devices that may need to use the bus. Among these different DMA devices, high priority is given to high speed peripherals such as a disk or a graphics display device. Usually, the processor originates most cycles on the bus. The DMA controller can be said to steal memory access cycles on from the bus. Thus, the processor and the DMA controller use the bus in an interwoven fashion. This interweaving technique is called as cycle stealing. An alternate approach would be to provide DMA controllers exclusive capability to initiate transfers on the bus, and hence exclusive access to the main memory. This is known as the block mode or the burst mode of operation.
  • #87 Processor and DMA controllers both need to initiate data transfers on the bus and access main memory. The process of using the bus to perform a data transfer operation is called as the initiation of a transfer operation. At any point in time only one device is allowed to initiate transfers on the bus. The device that is allowed to initiate transfers on the bus at any given time is called the bus master. When the current bus master releases control of the bus, another device can acquire the status of the bus master. How does one determine which is the next device which will acquire the status of the bus master. Note that there may be several DMA controllers plus the processor which requires access to the bus. The process by which the next device to become the bus master is selected and bus mastership is transferred to it is called bus arbitration. There are two types of bus arbitration processes. Centralized arbitration and distributed arbitration. In case of centralized arbitration, a single bus arbiter performs the arbitration. Whereas in case of distributed arbitration all devices which need to initiate data transfers on the bus participate or are involved in the selection of the next bus master.
  • #97 Recall that one device plays the role of a master. The device that initiates the data transfer on the bus by issuing read or write control signals is called as a master. The device that is being addressed by the master is called a slave or a target.
  • #105 Slave-ready signal is an acknowledgement from the slave to the master to confirm that the valid data has been sent. Depending on when the slave-ready signal is asserted, the duration of the data transfer can change.