Module 4 in Put Out Put Organization
Module 4 in Put Out Put Organization
I/O access can be categorized into two broad methods: Programmed I/O, Interrupt-
driven I/O, and Direct Memory Access (DMA).
o Process:
1. The CPU sends a request to the I/O device to start a data transfer.
4. Once the device is ready, the CPU reads from or writes to the
device.
o Advantages:
o Disadvantages:
Wastes CPU cycles, especially when the device is slow or not ready.
Example: Reading a character from a keyboard where the CPU continuously checks if the
key has been pressed.
2. Interrupt-driven I/O
o Process:
4. The CPU interrupts its current task, saves its state, and begins
processing the I/O operation.
5. Once the data transfer is complete, the CPU resumes its previous
task.
o Advantages:
Allows the CPU to perform other tasks until the device signals that it
is ready.
o Disadvantages:
Example: A printer sending an interrupt to notify the CPU that it has finished printing a
page.
o Process:
1. The CPU initializes the DMA controller by specifying the source and
destination addresses in memory and the number of data bytes to
transfer.
2. The DMA controller takes control of the system bus and transfers
the data directly between the I/O device and memory.
3. The DMA controller signals the CPU once the transfer is complete,
typically using an interrupt to notify the CPU.
o Advantages:
Frees up the CPU to perform other tasks while the DMA controller
handles the data transfer.
o Disadvantages:
DMA transfers may cause contention for system bus access when
both the CPU and DMA controller need the bus simultaneously.
Example: Transferring a large block of data from a hard drive to memory without
involving the CPU.
I/O devices can be classified into several types, based on the type of data they handle
and their communication method:
2. Output Devices: These devices receive data from the computer to produce
output.
o Examples: Hard drives, solid-state drives (SSDs), USB flash drives, optical
discs.
1. Serial Ports: These ports transfer data one bit at a time. They are often used for
communication with external devices like modems or printers.
2. Parallel Ports: These ports transfer multiple bits simultaneously. They are
generally used for devices like printers and scanners.
3. USB (Universal Serial Bus): USB is a modern interface used for connecting
many types of devices (input devices, external storage, etc.). It supports both
data transfer and power delivery.
o Example: USB flash drives, external hard drives, keyboards.
I/O Control
I/O devices are controlled by the system's I/O controller, which is responsible for
managing data transfer between devices and the CPU/memory. This can be done through
several techniques:
Memory-Mapped I/O: In this method, I/O devices are treated like memory
locations. The CPU uses standard memory instructions to interact with I/O devices.
Port-Mapped I/O (I/O Mapped I/O): In this approach, special instructions are
used to access I/O devices, and they are not treated as part of the regular
memory space.
1. Synchronous I/O: The CPU and I/O device work in sync, meaning the CPU waits
for the device to complete the transfer before continuing with its task.
Interrupts
Interrupts help the CPU focus on other tasks without having to constantly check for
conditions that might require attention (such as I/O devices being ready for data
transfer). Instead, when a device requires attention or an event occurs, it interrupts the
CPU's current operations, prompting the system to handle the event immediately.
Types of Interrupts
1. Hardware Interrupts
2. Software Interrupts
4. External Interrupts
When an interrupt occurs, the CPU must interrupt its current execution and handle the
event. The basic steps in interrupt handling are as follows:
o The I/O device or software triggers an interrupt. This request is sent to the
CPU via a dedicated line or bus, signaling that an interrupt has occurred.
2. Interrupt Acknowledgment:
3. Context Saving:
o Before the CPU can handle the interrupt, it saves the context of the current
process (i.e., the contents of the registers and program counter) to ensure
it can resume execution after the interrupt is processed.
4. Interrupt Vector:
o The CPU jumps to the address of the ISR specified by the interrupt vector.
The ISR is a special function that handles the interrupt, such as reading
data from an I/O device, handling errors, or performing specific system
calls.
6. Context Restoration:
o After the ISR finishes handling the interrupt, the CPU restores the saved
context (i.e., the previous state of registers, program counter, etc.) to
resume execution of the interrupted process.
o The CPU returns to the process it was executing before the interrupt,
continuing from the point it was interrupted.
Interrupt Vectors
Interrupt vectors are used to manage the mapping of interrupt types to their
corresponding service routines. An interrupt vector table holds the addresses of all
ISRs. When an interrupt occurs, the CPU consults this table to find the appropriate ISR
and handles the event accordingly.
The interrupt vector table is typically located at a fixed memory address, and
the CPU uses the interrupt number (or interrupt request level) to find the correct
address in the table.
The structure of an interrupt vector table may depend on the architecture of the
CPU or system. For example, in x86 systems, the interrupt vector table is
typically located at the beginning of memory.
1. Interrupt Priority:
2. Interrupt Masking:
Nested Interrupts
In some systems, while handling one interrupt, another interrupt may occur. Nested
interrupts refer to the ability of a system to handle an interrupt while another interrupt
is still being processed. For example, if a higher-priority interrupt occurs while the CPU is
handling a lower-priority interrupt, the higher-priority interrupt can preempt the current
ISR.
To manage nested interrupts, the CPU typically saves the state of the current ISR,
handles the higher-priority interrupt, and then resumes processing the lower-
priority interrupt.
Interrupt-Driven I/O
Interrupt-driven I/O is a significant advantage over polling because it allows the CPU to
perform other tasks instead of continuously checking I/O devices. With interrupt-driven
I/O, the device interrupts the CPU when it is ready to transfer data, making the system
more efficient.
Interrupt-driven I/O is especially useful for devices like keyboards, mice, and
network interfaces, where events are sporadic and may not need constant
monitoring.
Advantages of Interrupts
1. Efficiency: Interrupts allow the CPU to perform useful work while waiting for I/O
operations, rather than being wasted on polling or waiting for device readiness.
3. Multitasking: Interrupts allow for better multitasking because the CPU can be
interrupted to handle higher-priority tasks while still working on lower-priority
tasks in the background.
4. Reduced CPU Usage: Interrupts reduce the need for the CPU to waste cycles
checking for device statuses or performing unnecessary checks.
Disadvantages of Interrupts
Processor Examples
In computer systems, a processor (also known as the central processing unit or CPU)
is the core component responsible for executing instructions and performing
computations. Different processor architectures have been developed over time, each
with its unique features, design philosophies, and use cases. Below are some common
processor examples that are widely used or historically significant in computer
organization.
CISC processors are designed with a rich instruction set that allows each instruction to
perform multiple operations. These processors are typically capable of executing complex
instructions with a single instruction cycle.
Architecture: The x86 architecture is one of the most well-known and widely
used CISC processor architectures. It has been a dominant architecture for
personal computers, laptops, and servers.
Instruction Set: The x86 instruction set includes a wide variety of instructions,
including complex operations that may involve multiple memory accesses or
involve arithmetic operations on multiple data types.
Key Features:
o Variable-length instructions.
Key Features:
RISC processors, in contrast to CISC, are designed with a smaller set of simple
instructions that can be executed very quickly. Each instruction typically performs a
single operation, and RISC processors emphasize high performance with a focus on
simple instructions that execute in one clock cycle.
Instruction Set: ARM's instruction set is simplified and highly optimized for
performance. ARM processors typically execute most instructions in a single cycle.
Use Cases: Smartphones, tablets, embedded devices, IoT devices, and some
servers.
Key Features:
o Scalable cores with various versions for different device needs (e.g., ARM
Cortex-A, Cortex-M, Cortex-R).
Key Features:
o Power-efficient performance.
3. Hybrid Processors
Hybrid processors combine features of both RISC and CISC architectures. These
processors often allow software developers to use the benefits of a simpler, RISC-like
instruction set while still retaining compatibility with complex instructions when needed.
Key Features:
Architecture: AMD’s Ryzen processors are based on the x86-64 architecture but
include innovations similar to those found in RISC architectures, such as a
simplified pipeline and efficient branch prediction.
Key Features:
o Modern CPU features such as Precision Boost, which adjusts the clock
speed based on workload.
4. Specialized Processors
Some processors are designed for specific purposes and may not follow the typical RISC
or CISC patterns. These specialized processors are optimized for particular tasks such as
graphics processing, neural network computations, or networking.
Architecture: GPUs like those from NVIDIA's CUDA series are specialized
processors designed for parallel processing tasks, particularly useful in graphics
rendering, scientific simulations, and machine learning applications.
Key Features:
Key Features:
5. Vector Processors
Vector processors are designed to handle vector operations, where a single instruction
operates on multiple data points simultaneously. These processors are highly optimized
for tasks that involve large arrays or matrices of data.
Architecture: Cray vector processors are designed for scientific and engineering
computations. The Cray-1, one of the earliest vector processors, revolutionized
high-performance computing.
Key Features:
Direct Memory Access (DMA) is a method used in computer systems that allows
peripherals (such as hard drives, sound cards, network interfaces, etc.) to communicate
directly with the system's memory, bypassing the central processing unit (CPU). This
process helps improve the system's efficiency and performance by reducing CPU load
during large data transfers.
DMA is particularly useful for high-speed data transfer applications, such as data
streaming, file transfers, and high-volume input/output operations, where the CPU does
not need to be involved in each transfer of data.
Key Concepts in DMA
o The DMA controller manages the timing, initiation, and completion of the
transfer. It is responsible for accessing memory addresses, determining the
direction of data flow, and handling any necessary interrupts once the data
transfer is complete.
2. DMA Channel
3. DMA Transfer Types DMA supports different types of data transfers, each
serving specific needs:
o Burst Transfer: In burst mode, the DMA controller transfers all the data in
a single sequence, without releasing the system bus until the transfer is
complete. This is fast but may prevent other devices from using the bus
during the transfer.
o Cycle Stealing: In cycle-stealing mode, the DMA controller steals one bus
cycle at a time to transfer a piece of data. The CPU and DMA controller
take turns using the system bus, allowing the CPU to continue its
operations while DMA transfers occur.
o Demand Transfer: In demand mode, the DMA controller waits for the
device to request access to memory, and then it proceeds with the transfer
when the device is ready.
4. DMA Process The DMA process typically involves the following steps:
o The CPU initializes the DMA controller by setting the memory address, the
data size, and the direction of the transfer (from memory to device or from
device to memory).
o The DMA controller takes control of the system bus and begins the
transfer without involving the CPU.
Advantages of DMA
1. Reduced CPU Load: DMA offloads data transfer tasks from the CPU, freeing the
processor to focus on other tasks, thus improving overall system performance.
2. Faster Data Transfer: DMA allows for faster data transfer compared to CPU-
driven I/O operations, especially for high-speed devices or large data sets.
3. Efficient Data Handling: DMA can handle large amounts of data, such as
audio/video streaming or large file transfers, more efficiently than conventional
I/O methods.
Disadvantages of DMA
2. Bus Contention: DMA uses the system bus, which could lead to contention
between the CPU and other peripherals if multiple devices attempt to use the bus
simultaneously.
In a typical DMA system, the DMA controller will issue an interrupt to the CPU once the
data transfer is complete. This allows the CPU to be notified that the transfer has finished
and that the data is now available for processing.
The interrupt generated at the end of a DMA operation ensures that the CPU is
aware of the transfer's completion. The interrupt serves to avoid the need for
continuous polling by the CPU, thus improving system efficiency.
1. Disk I/O: When reading from or writing to a hard drive or solid-state drive, DMA
allows for faster transfer of large blocks of data between the storage device and
the system’s main memory.
2. Networking: Network interfaces often use DMA to move packets of data directly
into memory buffers, reducing the CPU’s involvement in the transfer process and
enabling faster network data processing.
4. Peripherals: Many devices, such as printers and scanners, use DMA for efficient
data transfer to/from memory, enabling faster and more efficient I/O operations.
Buses:
A bus typically consists of multiple parallel lines or conductors, where each line serves a
specific purpose, such as data transfer, addressing, or control. The efficient operation of
these buses is crucial to the overall performance and organization of the system.
Types of Buses
There are several types of buses in a computer system, each designed for a specific
purpose:
1. Data Bus
o The data bus is responsible for carrying the actual data being transferred
between the components of the computer system, such as between the
CPU and memory, or between the CPU and I/O devices.
o The width of the data bus (measured in bits) determines how much data
can be transferred at once. A wider data bus allows for faster data transfer.
2. Address Bus
o The address bus carries the addresses of memory locations or I/O ports
that the CPU wants to read from or write to.
3. Control Bus
o The control bus carries control signals that dictate the operation of the
system and coordinate the activities of the CPU, memory, and I/O devices.
o The control bus is typically unidirectional, with the CPU or other controlling
components sending signals to direct the actions of other devices.
Bus Architecture
o Multiple bus architectures use more than one bus to handle different
types of data transfer simultaneously. For example, there may be separate
buses for data, addresses, and control signals, or multiple data buses for
different components (e.g., one bus for the CPU and memory, another for
I/O devices).
3. System Bus
1. Synchronous Bus
o Disadvantages: The bus speed is limited by the clock speed, and the
system can become inefficient if components operate at different speeds.
2. Asynchronous Bus
Bus Standards
Various bus standards and protocols have been developed to define the physical and
logical characteristics of the buses. Some notable standards include:
o PCIe is the successor to PCI and is used for high-speed data transfer in
modern systems, particularly for graphics cards, storage devices, and
other high-performance peripherals.
o It uses a serial bus architecture instead of a parallel one, which allows for
faster and more efficient data transfer.
o SATA is a bus interface used for connecting hard drives, SSDs, and optical
drives to the computer. It replaced the older Parallel ATA (PATA) interface,
offering faster data transfer rates and improved power efficiency.
Bus Arbitration
In systems with multiple devices connected to the same bus, bus arbitration is used to
manage access to the bus and ensure that only one device communicates at any given
time. There are two common types of arbitration:
1. Centralized Arbitration
2. Distributed Arbitration
Interface Circuits
There are various types of interface circuits, depending on the type of devices they
connect and the signals they manage. Some of the common types of interface circuits
include:
o Serial Interface: Used for devices like printers or modems, where data is
sent one bit at a time over a single channel (e.g., RS-232).
o Parallel Interface: Used for devices like printers and hard drives, where
multiple bits of data are transferred simultaneously over multiple channels
(e.g., IEEE 1284).
o Memory interface circuits are responsible for connecting the CPU to the
memory subsystem (RAM or ROM). These circuits manage the flow of data
between the processor and memory, ensuring that the correct memory
address is accessed and that data is read or written as needed.
o These circuits often include features like bus arbitration (to decide which
device can access the bus) and bus control signals (to manage the
timing and direction of data flow).
o The DMA controller, which is often part of the DMA interface circuit,
manages the memory addresses, data size, and direction of data transfer.
1. Signal Conversion
Interface circuits are responsible for converting data signals between different
formats. For example, the CPU communicates using parallel signals, while many
peripheral devices use serial signals. Interface circuits convert the signal types
and data formats to ensure compatibility between the CPU and external devices.
2. Voltage Level Conversion
Different devices and subsystems may operate at different voltage levels.
Interface circuits perform voltage level conversion to ensure that signals are
compatible between devices with different voltage requirements.
3. Data Synchronization
Since different subsystems and devices may operate at different speeds, interface
circuits are responsible for synchronizing data transfers. This ensures that data is
transferred correctly, even if the sending and receiving devices do not operate in
lockstep.
5. Buffering
Interface circuits often include buffers to temporarily store data while waiting for it
to be processed. For example, when data is transferred from memory to an I/O
device, the interface circuit may buffer the data to prevent data loss if the
receiving device is temporarily unavailable.
6. Interrupt Handling
Interface circuits may include mechanisms for handling interrupts, which are
signals sent by I/O devices to notify the CPU of the need for attention (e.g., when
input is available or when a task is completed). The interface circuits help manage
the communication between the device and the CPU during the interrupt handling
process.
o PCI interface circuits manage the routing of data between the CPU and the
devices, ensuring that data flows correctly and efficiently.
o SPI interface circuits manage data transmission and reception between the
devices, ensuring proper synchronization.
1. Speed Mismatch
o Peripheral devices and the CPU may operate at different speeds, leading to
potential performance bottlenecks. Interface circuits must handle these
speed mismatches, often by buffering data or using methods like DMA
(Direct Memory Access) for efficient data transfer.
2. Signal Integrity
3. Power Consumption
4. Compatibility
In computer organization, I/O interfaces allow communication between the CPU and
peripheral devices, enabling data exchange. Standard I/O interfaces define protocols,
connectors, and electrical characteristics that ensure compatibility between devices and
subsystems. These interfaces are essential for enabling communication between
computers and external devices like keyboards, mice, printers, storage devices, network
cards, etc.
Serial communication is a method where data is transmitted one bit at a time over a
single channel or wire.
o Data Transfer: It transfers data one bit at a time over a single wire,
typically at speeds from 9600 to 115200 baud.
o Features:
o Data Transfer: USB supports data rates from 1.5 Mbps (USB 1.0) up to 40
Gbps (USB 4.0).
o Features:
o Features:
o Features:
Smaller connectors and cables than the older Parallel ATA (PATA)
standard.
o Data Transfer: Data rates vary depending on the version of PCI used
(e.g., PCI 2.0 supports up to 533 MB/s, while PCIe 3.0 supports up to 32
GB/s per lane).
o Features:
o Common Usage: PCI and PCIe are primarily used for connecting high-
performance peripherals like GPUs, network cards, and storage controllers.
Ethernet
o Description: Ethernet is the most widely used LAN (Local Area Network)
technology for computer networking. It uses a bus or star topology to
connect computers and peripheral devices over short to medium distances
(within a building or campus).
o Features:
o Data Transfer: Wi-Fi supports data transfer speeds up to 9.6 Gbps (Wi-Fi
6E).
o Features:
Operates over the 2.4 GHz and 5 GHz frequency bands (and 6 GHz
for Wi-Fi 6E).
5. Memory Interfaces
o Data Transfer: DIMMs typically operate with data rates up to 25.6 GB/s in
DDR4 memory, and DDR5 offers even faster speeds.
o Features:
A key interface for high-performance systems.
Bluetooth
o Features:
1. Data Transfer Speed: Different I/O interfaces offer varying speeds, from low-
speed interfaces like RS-232 to high-speed interfaces like PCIe and Ethernet. The
interface should be chosen based on the data transfer needs of the system.
2. Compatibility: It's important to ensure that the I/O interface is compatible with
both the system's hardware and the connected peripheral device. This includes
matching the physical connectors, voltage levels, and data formats.
4. Bus Width: The number of bits transferred per clock cycle (e.g., 8 bits for
parallel, 1 bit for serial) determines the bandwidth and performance of the
interface.
5. Signal Integrity and Noise Immunity: The design of the interface must
minimize signal degradation over long distances and through interference,
especially for high-speed interfaces like SATA, PCIe, and Ethernet.
A hash table is a data structure that offers an efficient way to store and retrieve data
using a key-value mapping. The concept of hashing is used to quickly locate a data
record given its search key. Hash tables are widely used in computer science and
software engineering, primarily because they provide constant-time average time
complexity for operations like insertion, deletion, and lookup.
Hashing is the process of converting an input (or 'key') into a fixed-size integer, which is
used as an index in a hash table. The key is processed through a hash function, and the
resulting value is referred to as a hash code. This hash code determines the position (or
index) where the corresponding value will be stored in the table.
Hash Function: A hash function is a function that takes an input (or key) and
returns an integer value, which is used as the index in a hash table. A good hash
function should uniformly distribute the keys across the hash table, minimizing
collisions.
Collisions: A collision occurs when two different keys produce the same hash
value. Since the hash table uses the hash value as an index, two different keys
being mapped to the same index leads to a conflict. Efficient handling of collisions
is crucial for maintaining the performance of a hash table.
Array: The underlying data structure is an array, where each slot is a bucket that
can store one or more entries (key-value pairs).
Bucket: Each bucket in the hash table holds one or more records. If a bucket can
only hold one entry, then the entire entry is replaced upon a collision, while in a
more advanced approach, it might be handled by storing multiple entries in the
same bucket (linked list, for example).
1. Insertion:
o The key is passed through the hash function to calculate its hash code.
2. Search:
3. Deletion:
o The key is hashed to find the index of the value.
Since hash functions can sometimes generate the same index for different keys, we need
strategies to handle collisions. Two common techniques for handling collisions are
chaining and open addressing.
1. Chaining:
o When a collision occurs, the new element is simply added to the list at that
bucket.
2. Open Addressing:
o Linear Probing: The next slot is checked sequentially (i.e., the next index
is tried).
Load Factor
The load factor of a hash table is a measure of how full the table is. It is calculated as
the ratio of the number of elements (n) to the size of the hash table (m):
A higher load factor means more collisions and slower performance. Typically, hash
tables resize (rehash) when the load factor exceeds a certain threshold, ensuring that the
average time complexity for operations remains constant.
Efficiency of Hash Tables
o Search: O(1)
o Insert: O(1)
o Delete: O(1)
o If the hash function causes many collisions, all entries might end up in the
same bucket, leading to a time complexity of O(n). However, with a good
hash function and a proper collision handling strategy, the average time
complexity is usually O(1).
1. Fast Access: Hash tables provide very fast access to data with average-case
constant time complexity (O(1)).
2. Efficient Memory Use: Hash tables can be resized dynamically based on the
load factor, ensuring efficient memory usage.
1. Database Indexing: Hash tables are used in databases for indexing and quick
data retrieval.
2. Caches: They are used in caching algorithms (like the LRU Cache) to quickly
access recently used items.
3. Sets and Maps: Hash tables provide efficient implementations for data
structures like sets (collections of unique items) and maps (associative arrays or
dictionaries).
4. Symbol Tables: In compilers, hash tables are used for symbol tables to store
variable/function names and their respective information.
1. Hash Function:
2. Insertion:
o Insert the following keys into the table: 12, 22, 32, 42.
o For key 12: hash(12) = 12 % 10 = 2, so store 12 at index 2.
Index Values
2 12 → 22 → 32 → 42
3. Search:
Static Hashing
Static hashing is a type of hashing where the size of the hash table remains fixed
during its lifetime. The number of slots or buckets in the hash table is predetermined,
and it does not change dynamically as data is inserted or deleted. In static hashing, once
the hash table is created, the number of slots remains constant, and the table is not
resized or rehashed. This method is simple and efficient for scenarios where the number
of records to be stored is known in advance or doesn’t change significantly over time.
1. Fixed Size:
o The number of buckets (or slots) in the hash table is fixed. Once the table
is initialized, it cannot grow or shrink based on the number of elements
inserted.
2. Hash Function:
o A hash function is used to map keys to an index within the fixed-size table.
The function is typically based on the key, such as hash(key) = key %
table_size, where table_size is the fixed size of the hash table.
3. Collision Handling:
o Since the number of buckets is fixed, collisions can still occur when two
different keys hash to the same index. These collisions are handled using
various techniques such as chaining or open addressing.
4. No Dynamic Resizing:
To handle collisions in static hashing, various techniques are used. The most common
ones are:
1. Chaining
In chaining, each bucket of the hash table is implemented as a linked list (or
another dynamic data structure like a tree). When a collision occurs, the new
element is added to the list in the corresponding bucket.
Advantages:
o Easy to implement.
Disadvantages:
2. Open Addressing
In open addressing, all elements are stored directly in the hash table. When a
collision occurs, a probing technique is used to find the next available slot in the
table.
o Double Hashing: Use a second hash function to calculate the step size for
probing.
Advantages:
o No extra memory for linked lists or additional structures.
Disadvantages:
1. Simplicity:
o Static hashing is easy to implement and does not require complex data
structures or algorithms.
2. Predictable Performance:
o Since the hash table size is fixed, the performance is predictable if the data
set is known beforehand and the load factor is controlled.
1. Limited Scalability:
o The main drawback of static hashing is that the size of the table cannot
grow dynamically, so the system can either waste memory (if the table is
too large) or suffer from a high number of collisions (if the table is too
small).
o If the number of records grows significantly beyond the size of the table,
many collisions will occur, degrading the performance of the hash table,
especially in open addressing schemes.
3. Resizing Challenges:
Let’s consider a simple static hash table of size 5 and a hash function hash(key) = key %
5.
Insertion:
Index Values
0 5 → 10
2 12 → 22 → 7
Search:
Deletion:
Dynamic Hashing
Dynamic hashing is a method used in hash table implementations where the size of the
hash table can grow or shrink dynamically as data is inserted or deleted. Unlike static
hashing, where the size of the hash table is fixed, dynamic hashing allows the table to
adjust its size to maintain optimal performance, minimizing collisions and ensuring
efficient use of memory.
Dynamic hashing is particularly useful when the size of the dataset is not known in
advance, and the number of records may change over time. It helps handle issues like
high load factors and collisions by resizing the hash table and redistributing the records.
o Dynamic hashing allows the hash table to expand when the number of
elements exceeds the capacity of the table, and it can shrink when the
table is underutilized. The goal is to maintain an appropriate load factor
and avoid performance degradation due to high collision rates.
2. Bucket Splitting:
o When a bucket in a hash table overflows (i.e., when more entries hash to
the same index), dynamic hashing splits the bucket into two smaller
buckets. The entries in the overflowing bucket are then redistributed
across the two new buckets. This process allows the table to handle more
entries without increasing the number of collisions.
3. Directory:
1. Extendible Hashing
Extendible hashing is one of the most common forms of dynamic hashing. In extendible
hashing, the hash table uses a directory that can expand or shrink dynamically as the
number of records increases or decreases. The idea is to keep the hash function simple
and use a directory that maps keys to buckets.
Directory: The directory stores pointers to the hash buckets and is indexed by
the first few bits of the hash value.
Splitting Buckets: When a bucket overflows, the directory is expanded, and the
bucket is split into two. This process is done by using more bits of the hash key,
effectively increasing the number of buckets.
Directory Doubling: If the number of buckets exceeds the current directory size,
the directory is doubled to accommodate the new buckets.
Example:
Suppose you have a hash function that generates a 4-bit hash. Initially, the
directory might have only 2^1 = 2 entries (with 1 bit used to index the buckets).
When the table grows, the directory size doubles, and 2 additional bits are used to
rehash and redistribute entries.
Advantages:
Disadvantages:
Requires a directory structure, which consumes additional memory.
2. Linear Hashing
Linear hashing is another form of dynamic hashing. Unlike extendible hashing, linear
hashing incrementally increases the size of the hash table in a more controlled way.
Splitting Buckets: Rather than doubling the directory size, linear hashing
gradually splits the buckets by incrementally adding new buckets to the hash
table. When a bucket overflows, the next bucket is split, and the records are
redistributed across the hash table.
Advantages:
Does not require the directory structure used in extendible hashing, making it
simpler to implement.
Can handle overflow efficiently without requiring frequent rehashing of the entire
table.
Disadvantages:
May lead to uneven distribution of keys during transitions from one bucket to the
next.
1. Insertion:
o Insert a key into the hash table using the hash function.
o If the corresponding bucket overflows, the bucket is split, and the hash
table might need to grow or adjust its structure (e.g., directory doubling in
extendible hashing or linear bucket splitting in linear hashing).
2. Search:
3. Deletion:
o Delete a key by finding the corresponding bucket and removing the key.
o If necessary, the hash table structure might be adjusted to shrink (in the
case of linear hashing, or directory shrinking in extendible hashing).
1. Scalability:
o Dynamic hashing allows the table to grow and shrink as needed, which is
ideal for situations where the size of the data is unpredictable or changes
frequently.
3. Reduced Collisions:
o The ability to expand the hash table as needed reduces the likelihood of
collisions, which helps maintain fast performance.
4. Improved Performance:
o By avoiding the need for frequent rehashing of the entire table (as in static
hashing), dynamic hashing helps maintain near-constant time complexity
for operations like insertions, deletions, and lookups.
1. Memory Overhead:
2. Complexity:
3. Gradual Resizing:
Consider a hash table of size 4, where each entry in the directory points to a bucket. The
hash function is hash(key) = key % 4.
1. Initially, the directory has 2 entries: one bit used to index the 4 buckets.
o Directory: 00 → Bucket 0, 01 → Bucket 1, 10 → Bucket 2, 11 → Bucket 3
After splitting: