Query processing-and-optimization

Query Processing and Optimization

Basic Concepts 2 • Query Processing – activities involved in retrieving data from the database: – SQL query translation into low-level language implementing relational algebra – Query execution • Query Optimization – selection of an efficient query execution plan

Relational Algebra • Relational algebra defines basic operations on relation instances • Results of operations are also relation instances 4

Basic Operations • Unary algebra operations: – Selection – Projection • Binary algebra operations: – Union – Set difference – Cross-product 5

Additional Operations • Can be expressed through 5 basic operations: – Join – Intersection – Division 6

Selection σcriterion (I) where criterion – selection condition, and I- an instance of a relation. • Result: – the same schema – A subset of tuples from the instance I • Criterion: conjunction (AND) and disjunction (OR) • Comparison operators: <,<=,=,≠,>=,> 7

Projection • Vertical subset of input relation instance • The schema of the result : – is determined by the list of desired fields – types of fields are inherited Πa1,a2,…,am (I), where a1,a2,…,am – desired fields from the relation with the instance I 8

Binary Operations • Union-compatible relations: – The same number of fields – Corresponding fields have the same domains • Union of 2 relations • Intersection of 2 relations • Set-difference • Cross-product – does not require union- compatibility Marina G. Erechtchoukova 9

Joins • Join is defined as cross-product followed by selections • Based on the conditions, joins are classified: – Theta-joins – Natural joins – Other… 10

Theta Join RCond S = σCond (R x S) Where Cond – refers to the attributes of both relations R and S in the form of comparison expressions with operators: <,<=,=,≠,>=,> 11

Relational Algebra Expressions • The result of a relational operation is a relation instance • Relational algebra expression combines relation instances using relational algebra operations • Relational algebra expression produces the result of a query 12

Simple SQL Query SELECT select-list  πselect-list FROM from-list  Cross Product WHERE qualification;  σqualification 13

Conceptual Evaluation Strategy for Simple Query • Compute the cross-product of tables in from- list • Delete those rows which fail the qualification condition • Delete all columns that do not appear in the select-list • If DISTINCT clause is specified, eliminate duplicate rows. 14

Nested Queries • Query block: – Single SELECT_FROM_WHERE expression – May include GROUP BY and HAVING • Query block – basic unit that is translated into RA expression and optimized • SQL query is decomposed into query blocks 15

Different Processing Strategies • Algorithms implementing basic relational algebra operations • Algorithms implementing additional relational algebra operations • Example: Find the students who have marks higher than 75 and are younger than 23 16

Query Decomposition • Analysis – Relational algebra tree • Normalization • Semantic analysis • Simplification • Query restructuring 17

Analysis • Analyze query using compiler techniques • Verify that relations and attributes exist • Verify that operations are appropriate for object type • Transform the query into some internal representation 18

Relational Algebra Tree • Leaf nodes are created for each base relation. • Non-leaf nodes are created for each intermediate relation produced by RA operation. • Root of the tree represents query result. • Sequence is directed from leaves to root. 19

Relational Algebra Tree (Cont…) 20 Root Intermediate operations Intermediate operations Leaves …

Criterion Normalization • Conjunctive normal form – a sequence of boolean expressions connected by conjunction (AND): – Each expression contains terms of comparison operators connected by disjunctions (OR) • Disjunctive normal form – a sequence of boolean expressions connected by disjunction (OR): – Each expression contains terms of comparison operators connected by conjunction (AND) 21

Criterion Normalization (Cont…) • Arbitrary complex qualification condition can be converted into one of the normal forms • Algorithms for computation: – CNF – only tuples that satisfy all expressions – DNF – tuples that are the result of union of tuples that satisfy the exprssions 22

Semantic Analysis • Applied to normalized queries • Rejects contradictory queries: – Qualification condition cannot be satisfied by any tuple • Rejects incorrectly formulated queries: – Condition components do not contribute to generation of the result. 23

Relation Connection Graph • Conjunctive queries without negation • Each node corresponds to a base relation and the result • An edge between two nodes is created: – If there a join – If a node is a source for projection. • If the graph is not connected, the query is incorrectly formulated 24

Simplification • Eliminates redundancy in qualification • Queries against views: – Access privileges – Redundancy in qualification • Transform query to equivalent efficiently computed form • Main tool – rules of boolean algebra 25

Queries against Views • View resolution: – View select-list is translated into corresponding select-list in the view defining query – From-list of the query is modified to hold the names of base tables – Qualifications from WHERE clause are combined – GROUP BY and HAVING clauses are modified 26

Rules of Boolean Algebra ptruep pfalsep falsefalsep ppp ppp ≡∧ ≡∨ ≡∧ ≡∨ ≡∧ )( )( pqpp pqpp truepp falsepp truetruep ≡∧∨ ≡∨∧ ≡¬∨ ≡¬∧ ≡∨ )( )( )( )( 27

Query Restructuring • Rewriting a query using relational algebra operations • Modifying relational algebra expression to provide more efficient implementation 28

Query Optimization • Optimization criteria: – Reduce total execution time of the query: • Minimize the sum of the execution times of all individual operations • Reduce the number of disk accesses – Reduce response time of the query: • Maximize parallel operations • Dynamic vs. static optimization 29

Heuristic Approach • Heuristic - problem-solving by experimental methods • Applying general rules to choose the most appropriate internal query representation • Based on transformation rules for relational algebra operations 30

Transformation Rules • Cascade of selection operations: • Commutativity of selection operations • Sequence of projection operations where )...( )(... NML R LNML ∩∩⊂ ∏=∏∏∏ )))((()( RR rqprqp σσσσ =∧∧ 31 ))(())(( RR pqqp σσσσ =

Transformation Rules (Cont…) • Commutativity of selection and projection where p involves only attributes from {A1,…,Am} • Commutativity of binary operations ; ; ; ))(())(( ,...,,..., 11 RR mm AAppAA ∏=∏ σσ 32 RSSR RSSR pp ×=× =  RSSR RSSR ∪=∪ ∩=∩

Transformation Rules (Cont…) • Commutativity of selection and theta join • Commutativity of projection and theta join Where A1contains only attributes from R and A2- only attributes from S SRRR rprp  ))(()( σσ = 33 )()()( 2121 SRSR ArArAA ∏∏=∏ ∪ 

Transformation Rules (Cont…) • Commutativity of projection and union • Associativity of binary operations 34 )()()( SRSR LLL ∏∪∏=∪∏ ).()( );()( );()( );()( TSRTSR TSRTSR TRSTRR TRSTSR ××=×× = ∩∩=∩∩ ∪∪=∪∪ 

Heirustic Rules • Perform selection as early as possible • Combine Cross product with a subsequent selection • Rearrange base relations so that the most restrictive selection is executed first. • Perform projection as early as possible • Compute common expressions once. 35

Cost Estimation Components • Cost of access to secondary storage • Storage cost – cost of storing intermediate results • Computation cost • Memory usage cost – usage of RAM buffers 36

Cost Estimation for Relational Algebra Expressions • Formulae for cost estimation of each operation • Estimation of relational algebra expression • Choosing the expression with the lowest cost 37

Cost Estimation in Query Optimization • Based on relational algebra tree • For each node in the tree the estimation is to be done for: – the cost of performing the operation; – the size of the result of the operation; – whether the result is sorted. 38

Database Statistics for a Relation • Cardinality of relation instance • Block (of tuples) – page • Number of blocks required to store a relation (data) • Blocking factor – number of tuples in one block • Number of blocks required to store an index 39

Database Statistics for an Attribute of a Relation • The number of distinct values • Possible minimum and maximum values • Selection cardinality of an attribute: – For equality condition on the attribute – For inequality condition on the attribute 40

Algorithms for Relational Algebra Operations Implementation • Linear search • Binary search • Sort-merge • External sorting • Hashing 41

File Organization • The physical arrangement of data in a file into records and blocks (pages) on secondary storage • Storing and retrieving data depends on the file organization 42

Heap Files • Unordered files • Records are placed in the file in the same order as they are inserted • If there is insufficient space in the last block, a new block is added. • Records are retrieved based on scan 43

Ordered Files • Files sorted on the values of the ordering fields • Ordering key – ordering fields with unique constraint • Under certain conditions records can be retrieved based on binary search 44

Hash Files • Records are randomly distributed across the available space • To store a record the address of the block (page) is calculated by Hash function • Blocks are kept at about 80% occupancy • To retrieve the data all blocks are scanned which is about 1.25 times more than for heap files 45

Indexes • A data structure that allows the DBMS to locate particular records • Index files are not required but very helpful • Index files can be ordered by the values of indexing fields 46

Retrieval Algorithms • Files without indexes: – Records are selected by scanning data files • Indexed files: – Matching selection condition – Records are selected by scanning index files and finding corresponding blocks in data files 47

Search Space • Collection of possible execution strategies for a query • Strategies can use: – Different join ordering – Different selection methods – Different join methods • Enumeration algorithm – an algorithm to determine an optimal strategy from the search space 48

Pipelining • Materialization - saving intermediate results in a temporary table • Pipelining – submitting the results of one operation to another operation without creating a temporary table • A pipeline is implemented for each join operation • Requires specific algorithms 49

Linear Trees • In a linear tree at least one child of a join node is a base relation • Left-deep tree – the right child of each join node is a base relation • Right-deep tree – the left child of each join node is a base relation • Bushy tree – non-linear tree 50

Left-Deep Tree • Supports fully pipelined strategies • Advantage: – Reduces search space • Disadvantage: – Excludes alternative strategies which may be of a lower cost 51

Query Optimization in Oracle • Rule-based optimizer – Specify the goal in init.ora file OPTIMIZER_MODE = RULE • Cost-based optimizer – Specify the goal in init.ora file OPTIMIZER_MODE = CHOOSE 52

Rule-Based Optimizer • 15 rules are ranked • RowID describes the physical location of the record • RowID is associated with table indeces • Access path for a table only chosen if statement contains a predicate or other construct that makes that access path available. 53

Cost-Based Optimizer • Statistics: – ANALYZE - command to generates statistics – PL/SQL package DBMS_STAT • Hints – To access full table – To use a rule – To use a certain index – … 54

Example • SELECT /*+ full(student) */ sname FROM student WHERE Y_of_B = 1983; 55

Query processing-and-optimization

In this document

More Related Content

What's hot

Viewers also liked

Similar to Query processing-and-optimization

More from WBUTTUTORIALS

Recently uploaded

Query processing-and-optimization