2.1 Architecture • Parallel DBMS architecture aims to improve performance and scalability by distributing the workload across multiple processors and storage devices. There are three primary types of parallel database architectures: • Shared Memory Architecture: • Description: Multiple processors share a single memory space. • Advantages: Low communication overhead, easy to program. • Disadvantages: Limited scalability due to memory contention and bus bandwidth limitations. • Example: Symmetric Multiprocessing (SMP) systems.
Shared Disk Architecture: • Description: Multiple processors have their own private memory but share a common disk storage. • Advantages: Better scalability than shared memory, fault tolerance (as data is mirrored). • Disadvantages: Potential bottleneck at the disk subsystem, more complex management. • Example: Clustered systems like Oracle RAC (Real Application Clusters).
Shared Nothing Architecture: • Description: Each processor has its own private memory and disk storage. • Advantages: High scalability, no single point of contention, fault tolerance. • Disadvantages: Higher communication overhead, complex to implement. • Example: Massively Parallel Processing (MPP) systems like Google BigQuery.
2.2 Query Evaluation • In a parallel DBMS, query evaluation involves dividing a query into smaller tasks that can be executed concurrently. The key components of query evaluation in a parallel DBMS are: • Query Parsing: • Function: Translates the SQL query into an internal representation, typically a query tree or query graph. • Purpose: To understand the structure and components of the query. • Query Decomposition: • Function: Breaks down the query into sub-queries or operations that can be executed in parallel. • Purpose: To enable parallel execution by dividing the workload. • Parallel Execution: • Function: Distributes the sub-queries or operations across multiple processors for concurrent execution. • Purpose: To reduce the overall query execution time by leveraging multiple processors.
• Result Merging: • Function: Combines the results of the parallel operations into a final result set. • Purpose: To provide a complete and accurate query result to the user.
2.3. Query Optimization • Parallel query optimization aims to minimize query execution time by efficiently utilizing available resources. Key techniques include: • Cost-Based Optimization: • Description: Estimates the cost of different execution plans based on factors such as CPU, I/O, and communication costs. • Purpose: To select the most efficient execution plan for the query. • Example: An optimizer might choose a parallel hash join over a nested loop join for large tables.
• Partitioning: • Description: Divides data into smaller segments that can be processed independently. • Purpose: To enable parallel processing by distributing data across multiple processors. • Example: Horizontal partitioning of a table into ranges of rows. • Pipelining: • Description: Allows the output of one operation to be used as the input for another operation without intermediate storage. • Purpose: To improve performance by reducing the need for intermediate results storage. • Example: Streaming results from a selection operation directly to an aggregation operation. • Parallel Join Algorithms: • Description: Techniques such as parallel hash join and parallel nested loop join are used to join tables in parallel. • Purpose: To speed up join operations by distributing the workload across multiple processors. • Example: Using a parallel hash join to join two large tables based on a common key.
2.4. Parallelizing Individual Operations • Parallel DBMS can parallelize individual operations to improve performance. Key operations include: • Scans: • Description: Dividing data across multiple disks, allowing each processor to scan its assigned portion concurrently. • Purpose: To speed up full table scans and improve I/O throughput. • Example: Parallel scan of a large table with each disk containing a segment of the table. • Joins: • Description: Using parallel join algorithms to distribute the join operation across multiple processors. • Purpose: To reduce the time required for join operations on large tables. • Example: Parallel hash join where each processor handles a subset of the data.
•Aggregations: •Description: Performing aggregation operations on data segments in parallel, then combining the results. •Purpose: To improve the efficiency of aggregate functions such as SUM, COUNT, and AVERAGE. •Example: Parallel computation of total sales by summing sales figures across multiple processors. •Sorts: •Description: Dividing data into segments, sorting each segment in parallel, and then merging the sorted segments. •Purpose: To reduce the time required for sorting large datasets. •Example: Parallel sorting of a large customer database by name.

chapter21-parallel processing. computing

  • 2.
    2.1 Architecture • ParallelDBMS architecture aims to improve performance and scalability by distributing the workload across multiple processors and storage devices. There are three primary types of parallel database architectures: • Shared Memory Architecture: • Description: Multiple processors share a single memory space. • Advantages: Low communication overhead, easy to program. • Disadvantages: Limited scalability due to memory contention and bus bandwidth limitations. • Example: Symmetric Multiprocessing (SMP) systems.
  • 3.
    Shared Disk Architecture: •Description: Multiple processors have their own private memory but share a common disk storage. • Advantages: Better scalability than shared memory, fault tolerance (as data is mirrored). • Disadvantages: Potential bottleneck at the disk subsystem, more complex management. • Example: Clustered systems like Oracle RAC (Real Application Clusters).
  • 4.
    Shared Nothing Architecture: •Description: Each processor has its own private memory and disk storage. • Advantages: High scalability, no single point of contention, fault tolerance. • Disadvantages: Higher communication overhead, complex to implement. • Example: Massively Parallel Processing (MPP) systems like Google BigQuery.
  • 5.
    2.2 Query Evaluation •In a parallel DBMS, query evaluation involves dividing a query into smaller tasks that can be executed concurrently. The key components of query evaluation in a parallel DBMS are: • Query Parsing: • Function: Translates the SQL query into an internal representation, typically a query tree or query graph. • Purpose: To understand the structure and components of the query. • Query Decomposition: • Function: Breaks down the query into sub-queries or operations that can be executed in parallel. • Purpose: To enable parallel execution by dividing the workload. • Parallel Execution: • Function: Distributes the sub-queries or operations across multiple processors for concurrent execution. • Purpose: To reduce the overall query execution time by leveraging multiple processors.
  • 6.
    • Result Merging: •Function: Combines the results of the parallel operations into a final result set. • Purpose: To provide a complete and accurate query result to the user.
  • 7.
    2.3. Query Optimization •Parallel query optimization aims to minimize query execution time by efficiently utilizing available resources. Key techniques include: • Cost-Based Optimization: • Description: Estimates the cost of different execution plans based on factors such as CPU, I/O, and communication costs. • Purpose: To select the most efficient execution plan for the query. • Example: An optimizer might choose a parallel hash join over a nested loop join for large tables.
  • 8.
    • Partitioning: • Description:Divides data into smaller segments that can be processed independently. • Purpose: To enable parallel processing by distributing data across multiple processors. • Example: Horizontal partitioning of a table into ranges of rows. • Pipelining: • Description: Allows the output of one operation to be used as the input for another operation without intermediate storage. • Purpose: To improve performance by reducing the need for intermediate results storage. • Example: Streaming results from a selection operation directly to an aggregation operation. • Parallel Join Algorithms: • Description: Techniques such as parallel hash join and parallel nested loop join are used to join tables in parallel. • Purpose: To speed up join operations by distributing the workload across multiple processors. • Example: Using a parallel hash join to join two large tables based on a common key.
  • 9.
    2.4. Parallelizing IndividualOperations • Parallel DBMS can parallelize individual operations to improve performance. Key operations include: • Scans: • Description: Dividing data across multiple disks, allowing each processor to scan its assigned portion concurrently. • Purpose: To speed up full table scans and improve I/O throughput. • Example: Parallel scan of a large table with each disk containing a segment of the table. • Joins: • Description: Using parallel join algorithms to distribute the join operation across multiple processors. • Purpose: To reduce the time required for join operations on large tables. • Example: Parallel hash join where each processor handles a subset of the data.
  • 10.
    •Aggregations: •Description: Performing aggregationoperations on data segments in parallel, then combining the results. •Purpose: To improve the efficiency of aggregate functions such as SUM, COUNT, and AVERAGE. •Example: Parallel computation of total sales by summing sales figures across multiple processors. •Sorts: •Description: Dividing data into segments, sorting each segment in parallel, and then merging the sorted segments. •Purpose: To reduce the time required for sorting large datasets. •Example: Parallel sorting of a large customer database by name.