The document discusses query processing and optimization. It describes the basic concepts including query processing, query optimization, and the phases of query processing. It also explains relational algebra operations like selection, projection, joins, and additional operations. The document then covers topics like query decomposition, analysis, normalization, simplification, and restructuring during query optimization. It discusses cost estimation and algorithms for implementing relational algebra operations and file organization.
Basic Concepts 2 • QueryProcessing – activities involved in retrieving data from the database: – SQL query translation into low-level language implementing relational algebra – Query execution • Query Optimization – selection of an efficient query execution plan
Selection σcriterion (I) where criterion –selection condition, and I- an instance of a relation. • Result: – the same schema – A subset of tuples from the instance I • Criterion: conjunction (AND) and disjunction (OR) • Comparison operators: <,<=,=,≠,>=,> 7
8.
Projection • Vertical subsetof input relation instance • The schema of the result : – is determined by the list of desired fields – types of fields are inherited Πa1,a2,…,am (I), where a1,a2,…,am – desired fields from the relation with the instance I 8
9.
Binary Operations • Union-compatiblerelations: – The same number of fields – Corresponding fields have the same domains • Union of 2 relations • Intersection of 2 relations • Set-difference • Cross-product – does not require union- compatibility Marina G. Erechtchoukova 9
10.
Joins • Join isdefined as cross-product followed by selections • Based on the conditions, joins are classified: – Theta-joins – Natural joins – Other… 10
11.
Theta Join RCond S =σCond (R x S) Where Cond – refers to the attributes of both relations R and S in the form of comparison expressions with operators: <,<=,=,≠,>=,> 11
12.
Relational Algebra Expressions •The result of a relational operation is a relation instance • Relational algebra expression combines relation instances using relational algebra operations • Relational algebra expression produces the result of a query 12
13.
Simple SQL Query SELECTselect-list πselect-list FROM from-list Cross Product WHERE qualification; σqualification 13
14.
Conceptual Evaluation Strategyfor Simple Query • Compute the cross-product of tables in from- list • Delete those rows which fail the qualification condition • Delete all columns that do not appear in the select-list • If DISTINCT clause is specified, eliminate duplicate rows. 14
15.
Nested Queries • Queryblock: – Single SELECT_FROM_WHERE expression – May include GROUP BY and HAVING • Query block – basic unit that is translated into RA expression and optimized • SQL query is decomposed into query blocks 15
16.
Different Processing Strategies •Algorithms implementing basic relational algebra operations • Algorithms implementing additional relational algebra operations • Example: Find the students who have marks higher than 75 and are younger than 23 16
Analysis • Analyze queryusing compiler techniques • Verify that relations and attributes exist • Verify that operations are appropriate for object type • Transform the query into some internal representation 18
19.
Relational Algebra Tree •Leaf nodes are created for each base relation. • Non-leaf nodes are created for each intermediate relation produced by RA operation. • Root of the tree represents query result. • Sequence is directed from leaves to root. 19
Criterion Normalization • Conjunctivenormal form – a sequence of boolean expressions connected by conjunction (AND): – Each expression contains terms of comparison operators connected by disjunctions (OR) • Disjunctive normal form – a sequence of boolean expressions connected by disjunction (OR): – Each expression contains terms of comparison operators connected by conjunction (AND) 21
22.
Criterion Normalization (Cont…) •Arbitrary complex qualification condition can be converted into one of the normal forms • Algorithms for computation: – CNF – only tuples that satisfy all expressions – DNF – tuples that are the result of union of tuples that satisfy the exprssions 22
23.
Semantic Analysis • Appliedto normalized queries • Rejects contradictory queries: – Qualification condition cannot be satisfied by any tuple • Rejects incorrectly formulated queries: – Condition components do not contribute to generation of the result. 23
24.
Relation Connection Graph •Conjunctive queries without negation • Each node corresponds to a base relation and the result • An edge between two nodes is created: – If there a join – If a node is a source for projection. • If the graph is not connected, the query is incorrectly formulated 24
25.
Simplification • Eliminates redundancyin qualification • Queries against views: – Access privileges – Redundancy in qualification • Transform query to equivalent efficiently computed form • Main tool – rules of boolean algebra 25
26.
Queries against Views •View resolution: – View select-list is translated into corresponding select-list in the view defining query – From-list of the query is modified to hold the names of base tables – Qualifications from WHERE clause are combined – GROUP BY and HAVING clauses are modified 26
Query Restructuring • Rewritinga query using relational algebra operations • Modifying relational algebra expression to provide more efficient implementation 28
29.
Query Optimization • Optimizationcriteria: – Reduce total execution time of the query: • Minimize the sum of the execution times of all individual operations • Reduce the number of disk accesses – Reduce response time of the query: • Maximize parallel operations • Dynamic vs. static optimization 29
30.
Heuristic Approach • Heuristic- problem-solving by experimental methods • Applying general rules to choose the most appropriate internal query representation • Based on transformation rules for relational algebra operations 30
31.
Transformation Rules • Cascadeof selection operations: • Commutativity of selection operations • Sequence of projection operations where )...( )(... NML R LNML ∩∩⊂ ∏=∏∏∏ )))((()( RR rqprqp σσσσ =∧∧ 31 ))(())(( RR pqqp σσσσ =
32.
Transformation Rules (Cont…) •Commutativity of selection and projection where p involves only attributes from {A1,…,Am} • Commutativity of binary operations ; ; ; ))(())(( ,...,,..., 11 RR mm AAppAA ∏=∏ σσ 32 RSSR RSSR pp ×=× = RSSR RSSR ∪=∪ ∩=∩
33.
Transformation Rules (Cont…) •Commutativity of selection and theta join • Commutativity of projection and theta join Where A1contains only attributes from R and A2- only attributes from S SRRR rprp ))(()( σσ = 33 )()()( 2121 SRSR ArArAA ∏∏=∏ ∪
34.
Transformation Rules (Cont…) •Commutativity of projection and union • Associativity of binary operations 34 )()()( SRSR LLL ∏∪∏=∪∏ ).()( );()( );()( );()( TSRTSR TSRTSR TRSTRR TRSTSR ××=×× = ∩∩=∩∩ ∪∪=∪∪
35.
Heirustic Rules • Performselection as early as possible • Combine Cross product with a subsequent selection • Rearrange base relations so that the most restrictive selection is executed first. • Perform projection as early as possible • Compute common expressions once. 35
36.
Cost Estimation Components •Cost of access to secondary storage • Storage cost – cost of storing intermediate results • Computation cost • Memory usage cost – usage of RAM buffers 36
37.
Cost Estimation forRelational Algebra Expressions • Formulae for cost estimation of each operation • Estimation of relational algebra expression • Choosing the expression with the lowest cost 37
38.
Cost Estimation inQuery Optimization • Based on relational algebra tree • For each node in the tree the estimation is to be done for: – the cost of performing the operation; – the size of the result of the operation; – whether the result is sorted. 38
39.
Database Statistics fora Relation • Cardinality of relation instance • Block (of tuples) – page • Number of blocks required to store a relation (data) • Blocking factor – number of tuples in one block • Number of blocks required to store an index 39
40.
Database Statistics foran Attribute of a Relation • The number of distinct values • Possible minimum and maximum values • Selection cardinality of an attribute: – For equality condition on the attribute – For inequality condition on the attribute 40
41.
Algorithms for RelationalAlgebra Operations Implementation • Linear search • Binary search • Sort-merge • External sorting • Hashing 41
42.
File Organization • Thephysical arrangement of data in a file into records and blocks (pages) on secondary storage • Storing and retrieving data depends on the file organization 42
43.
Heap Files • Unorderedfiles • Records are placed in the file in the same order as they are inserted • If there is insufficient space in the last block, a new block is added. • Records are retrieved based on scan 43
44.
Ordered Files • Filessorted on the values of the ordering fields • Ordering key – ordering fields with unique constraint • Under certain conditions records can be retrieved based on binary search 44
45.
Hash Files • Recordsare randomly distributed across the available space • To store a record the address of the block (page) is calculated by Hash function • Blocks are kept at about 80% occupancy • To retrieve the data all blocks are scanned which is about 1.25 times more than for heap files 45
46.
Indexes • A datastructure that allows the DBMS to locate particular records • Index files are not required but very helpful • Index files can be ordered by the values of indexing fields 46
47.
Retrieval Algorithms • Fileswithout indexes: – Records are selected by scanning data files • Indexed files: – Matching selection condition – Records are selected by scanning index files and finding corresponding blocks in data files 47
48.
Search Space • Collectionof possible execution strategies for a query • Strategies can use: – Different join ordering – Different selection methods – Different join methods • Enumeration algorithm – an algorithm to determine an optimal strategy from the search space 48
49.
Pipelining • Materialization -saving intermediate results in a temporary table • Pipelining – submitting the results of one operation to another operation without creating a temporary table • A pipeline is implemented for each join operation • Requires specific algorithms 49
50.
Linear Trees • Ina linear tree at least one child of a join node is a base relation • Left-deep tree – the right child of each join node is a base relation • Right-deep tree – the left child of each join node is a base relation • Bushy tree – non-linear tree 50
51.
Left-Deep Tree • Supportsfully pipelined strategies • Advantage: – Reduces search space • Disadvantage: – Excludes alternative strategies which may be of a lower cost 51
52.
Query Optimization inOracle • Rule-based optimizer – Specify the goal in init.ora file OPTIMIZER_MODE = RULE • Cost-based optimizer – Specify the goal in init.ora file OPTIMIZER_MODE = CHOOSE 52
53.
Rule-Based Optimizer • 15rules are ranked • RowID describes the physical location of the record • RowID is associated with table indeces • Access path for a table only chosen if statement contains a predicate or other construct that makes that access path available. 53
54.
Cost-Based Optimizer • Statistics: –ANALYZE - command to generates statistics – PL/SQL package DBMS_STAT • Hints – To access full table – To use a rule – To use a certain index – … 54