: How do I tell? A: SELECT * FROM T WHERE g.STIntersects(@x) = 1
SELECT * FROM T WITH(INDEX(T_g_idx)) WHERE g.STIntersects(@x) = 1
Plan choice is cost-based QO uses various information, including cardinality EXEC sp_executesql SELECT DECLARE*@x geometry = 'POINT (0 0)' SELECT N'SELECT * FROM T * WHERE FROM T FROM TT.g.STIntersects('POINT (0 0)') = 1 WHERE T.g.STIntersects(@x) = 1', WHERE T.g.STIntersects(@x) = 1 N'@x geometry', N'POINT (0 0)' When can we estimate cardinality? Variables: never Literals: not for spatial since they are not literals under the covers Parameters: yes, but cached, so first call matters
C B D A B A B D A Primary Filter Secondary Filter E (Index lookup) (Original predicate) In general, split predicates in two Primary filter finds all candidates, possibly with false positives (but never false negatives) Secondary filter removes false positives The index provides our primary filter Original predicate is our secondary filter Some tweaks to this scheme Sometimes possible to skip secondary filter
Secondary Filter IndexingFilter Primary Phase 1 2 15 16 1. 4 3 14 13 5 8 9 12 3. 6 7 10 11 2. 5. 4. Apply actual on the spatial 3. Intersecting grids method 2. Identify a gridfor query on 1. Overlay gridsCLR spatialobject(s) identifies candidates to find matches object to store in index
/4/2/3/1 / (“cell 0”) Deepest-cell Optimization: Only keep the lowest level cell in index Covering Optimization: Only record higher level cells when all lower cells are completely covered by the object Cell-per-object Optimization: User restricts max number of cells per object
0 – cell at least touches the object (but not 1 or 2) Spatial Reference ID 1 – guarantee thatto be encoding Varbinary(5) the same to produce Have object partially covers cell 15 columns and 2 – object covers cell id 895 byte limitation of grid cell match Prim_key geography Prim_key cell_id srid cell_attr 1 0x00007 42 0 1 g1 3 0x00007 42 1 2 g2 3 0x0000A 42 2 3 g3 3 0x0000B 42 0 3 0x0000C 42 1 Base Table T 1 0x0000D 42 0 2 0x00014 42 1 CREATE SPATIAL INDEX sixd Internal Table for sixd ON T(geography)
Create index example GEOMETRY: CREATE SPATIAL INDEX sixd ON spatial_table(geom_column) WITH (BOUNDING_BOX = (0, 0, 500, 500), GRIDS = (LOW, LOW, MEDIUM, HIGH), CELLS_PER_OBJECT = 20) Create index example GEOGRAPHY: CREATE SPATIAL INDEX sixd ON spatial_table(geogr_column) USING GEOGRAPHY_GRID WITH (GRIDS = (LOW, LOW, MEDIUM, HIGH), CELLS_PER_OBJECT = 20) NEW IN SQL Server 2012 (equivalent to default creation): CREATE SPATIAL INDEX sixd ON spatial_table(geom_column) USING GEOGRAPHY_AUTO_GRID WITH (CELLS_PER_OBJECT = 20) 14 Use ALTER and DROP INDEX for maintenance.
Geometry: STIntersects() = 1 • • STOverlaps() = 1 • STEquals()= 1 • STTouches() = 1 • • STWithin() = 1 • STContains() = 1 • STDistance() < val • STDistance() <= val Nearest Neighbor Filter() = 1
Itself Ancestors Descendants 7 7.2 7.2.4 Spatial Index S
Optional Sort Remove dup ranges T(@g) Ranges Spatial Index Seek
Fully contained cells Partially contained cells
Optimal value (theoretical) is somewhere between two extremes Default values: Time needed to 512 - Geometry AUTO grid process false positives 768 - Geography AUTO grid 1024 - SELECT * FROM table t WITH MANUAL grids (SPATIAL_WINDOW_MAX_CELLS=256) WHERE t.geom.STIntersects(@window)=1;
Give me the closest 5 Italian restaurants SQL Server 2008/2008 R2: table scan SQL Server 2012: uses spatial index SELECT TOP(5) * FROM Restaurants r WHERE r.type = ‘Italian’ AND r.pos.STDistance(@me) IS NOT NULL ORDER BY r.pos.STDistance(@me)
Find the closest 50 business points to a specific location (out of 22 million in total)
sp_spatial_help_geometry_histogram sp_spatial_help_geography_histogram Used for spatial data and index analysis
Arguments Parameter Type Description @tabname nvarchar(776) the name of the table for which the index has been specified @indexname sysname the index name to be investigated @verboseoutput tinyint 0 core set of properties is reported 1 all properties are being reported @query_sample geometry A representative query sample that will be used to test the usefulness of the index. It may be a representative object or a query window. PropName: nvarchar(256) PropValue: sql_variant
Parameter Type Description @tabname nvarchar(776) the name of the table for which the index has been specified @indexname sysname the index name to be investigated @verboseoutput tinyint 0 core set of properties is reported 1 all properties are being reported @query_sample geography A representative query sample that will be used to test the usefulness of the index. It may be a representative object or a query window. @xml_output xml This is an output parameter that contains the returned properties in an XML fragment
Property Type Description Base_Table_Rows Bigint All Number of rows in the base table Index properties - All index properties: bounding box, grid densities, cell per object Total_Primary_Index_R Bigint All Number of rows in the index ows Total_Primary_Index_P Bigint All Number of pages in the index ages Total_Number_Of_Obje Bigint Core Indicates whether the representative query sample falls outside of the ctCells_In_Level0_For_ bounding box of the geometry index and into the root cell (level 0 cell). This is QuerySample either 0 (not in level 0 cell) or 1. If it is in the level 0 cell, then the investigated index is not an appropriate index for the query sample. Total_Number_Of_Obje Bigint Core Number of cell instances of indexed objects that are tessellated in level 0. For ctCells_In_Level0_In_I geometry indexes, this will happen if the bounding box of the index is smaller ndex than the data domain. A high number of objects in level 0 may require a costly application of secondary filters if the query window falls partially outside the bounding box. If the query window falls inside the bounding box, having a high number of objects in level 0 may actually improve the performance.
Property Type Description Number_Of_Rows_Selected_By_Primary_ bigint Core P = Number of rows selected by the primary filter. Filter Number_Of_Rows_Selected_By_Internal_ bigint Core S = Number of rows selected by the internal filter. For Filter these rows, the secondary filter is not called. Number_Of_Times_Secondary_Filter_Is_ bigint Core Number of times the secondary filter is called. Called Percentage_Of_Rows_NotSelected_By_Pri float Core Suppose there are N rows in the base table, suppose P mary_Filter are selected by the primary filter. This is (N-P)/N as percentage. Percentage_Of_Primary_Filter_Rows_Sele float Core This is S/P as a percentage. The higher the percentage, cted_By_Internal_Filter the better is the index in avoiding the more expensive secondary filter. Number_Of_Rows_Output bigint Core O=Number of rows output by the query. Internal_Filter_Efficiency float Core This is S/O as a percentage. Primary_Filter_Efficiency float Core This is O/P as a percentage. The higher the efficiency is, the less false positives have to be processed by the secondary filter.
http://social.technet.microsoft.com/wiki/contents/articles/updated-spatial-features-in-the-sql- azure-q4-2011-service-release.aspx Forum: http://forums.microsoft.com/MSDN/ShowForum.aspx?ForumID=1629&SiteID=1

SQLBits X SQL Server 2012 Spatial Indexing

  • 2.
    : How doI tell? A: SELECT * FROM T WHERE g.STIntersects(@x) = 1
  • 3.
    SELECT * FROM TWITH(INDEX(T_g_idx)) WHERE g.STIntersects(@x) = 1
  • 4.
    Plan choice iscost-based QO uses various information, including cardinality EXEC sp_executesql SELECT DECLARE*@x geometry = 'POINT (0 0)' SELECT N'SELECT * FROM T * WHERE FROM T FROM TT.g.STIntersects('POINT (0 0)') = 1 WHERE T.g.STIntersects(@x) = 1', WHERE T.g.STIntersects(@x) = 1 N'@x geometry', N'POINT (0 0)' When can we estimate cardinality? Variables: never Literals: not for spatial since they are not literals under the covers Parameters: yes, but cached, so first call matters
  • 5.
    C B D A B A B D A Primary Filter Secondary Filter E (Index lookup) (Original predicate) In general, split predicates in two Primary filter finds all candidates, possibly with false positives (but never false negatives) Secondary filter removes false positives The index provides our primary filter Original predicate is our secondary filter Some tweaks to this scheme Sometimes possible to skip secondary filter
  • 8.
    Secondary Filter IndexingFilter Primary Phase 1 2 15 16 1. 4 3 14 13 5 8 9 12 3. 6 7 10 11 2. 5. 4. Apply actual on the spatial 3. Intersecting grids method 2. Identify a gridfor query on 1. Overlay gridsCLR spatialobject(s) identifies candidates to find matches object to store in index
  • 10.
    /4/2/3/1 / (“cell 0”) Deepest-cell Optimization: Only keep the lowest level cell in index Covering Optimization: Only record higher level cells when all lower cells are completely covered by the object Cell-per-object Optimization: User restricts max number of cells per object
  • 11.
    0 – cellat least touches the object (but not 1 or 2) Spatial Reference ID 1 – guarantee thatto be encoding Varbinary(5) the same to produce Have object partially covers cell 15 columns and 2 – object covers cell id 895 byte limitation of grid cell match Prim_key geography Prim_key cell_id srid cell_attr 1 0x00007 42 0 1 g1 3 0x00007 42 1 2 g2 3 0x0000A 42 2 3 g3 3 0x0000B 42 0 3 0x0000C 42 1 Base Table T 1 0x0000D 42 0 2 0x00014 42 1 CREATE SPATIAL INDEX sixd Internal Table for sixd ON T(geography)
  • 14.
    Create index exampleGEOMETRY: CREATE SPATIAL INDEX sixd ON spatial_table(geom_column) WITH (BOUNDING_BOX = (0, 0, 500, 500), GRIDS = (LOW, LOW, MEDIUM, HIGH), CELLS_PER_OBJECT = 20) Create index example GEOGRAPHY: CREATE SPATIAL INDEX sixd ON spatial_table(geogr_column) USING GEOGRAPHY_GRID WITH (GRIDS = (LOW, LOW, MEDIUM, HIGH), CELLS_PER_OBJECT = 20) NEW IN SQL Server 2012 (equivalent to default creation): CREATE SPATIAL INDEX sixd ON spatial_table(geom_column) USING GEOGRAPHY_AUTO_GRID WITH (CELLS_PER_OBJECT = 20) 14 Use ALTER and DROP INDEX for maintenance.
  • 16.
    Geometry: STIntersects() = 1 • • STOverlaps() = 1 • STEquals()= 1 • STTouches() = 1 • • STWithin() = 1 • STContains() = 1 • STDistance() < val • STDistance() <= val Nearest Neighbor Filter() = 1
  • 19.
    Itself Ancestors Descendants 7 7.2 7.2.4 Spatial Index S
  • 20.
    Optional Sort Remove dup ranges T(@g) Ranges Spatial Index Seek
  • 21.
    Fully contained cells Partially contained cells
  • 22.
    Optimal value (theoretical)is somewhere between two extremes Default values: Time needed to 512 - Geometry AUTO grid process false positives 768 - Geography AUTO grid 1024 - SELECT * FROM table t WITH MANUAL grids (SPATIAL_WINDOW_MAX_CELLS=256) WHERE t.geom.STIntersects(@window)=1;
  • 28.
    Give me theclosest 5 Italian restaurants SQL Server 2008/2008 R2: table scan SQL Server 2012: uses spatial index SELECT TOP(5) * FROM Restaurants r WHERE r.type = ‘Italian’ AND r.pos.STDistance(@me) IS NOT NULL ORDER BY r.pos.STDistance(@me)
  • 30.
    Find the closest50 business points to a specific location (out of 22 million in total)
  • 37.
  • 39.
    Arguments Parameter Type Description @tabname nvarchar(776) the name of the table for which the index has been specified @indexname sysname the index name to be investigated @verboseoutput tinyint 0 core set of properties is reported 1 all properties are being reported @query_sample geometry A representative query sample that will be used to test the usefulness of the index. It may be a representative object or a query window. PropName: nvarchar(256) PropValue: sql_variant
  • 40.
    Parameter Type Description @tabname nvarchar(776) the name of the table for which the index has been specified @indexname sysname the index name to be investigated @verboseoutput tinyint 0 core set of properties is reported 1 all properties are being reported @query_sample geography A representative query sample that will be used to test the usefulness of the index. It may be a representative object or a query window. @xml_output xml This is an output parameter that contains the returned properties in an XML fragment
  • 41.
    Property Type Description Base_Table_Rows Bigint All Number of rows in the base table Index properties - All index properties: bounding box, grid densities, cell per object Total_Primary_Index_R Bigint All Number of rows in the index ows Total_Primary_Index_P Bigint All Number of pages in the index ages Total_Number_Of_Obje Bigint Core Indicates whether the representative query sample falls outside of the ctCells_In_Level0_For_ bounding box of the geometry index and into the root cell (level 0 cell). This is QuerySample either 0 (not in level 0 cell) or 1. If it is in the level 0 cell, then the investigated index is not an appropriate index for the query sample. Total_Number_Of_Obje Bigint Core Number of cell instances of indexed objects that are tessellated in level 0. For ctCells_In_Level0_In_I geometry indexes, this will happen if the bounding box of the index is smaller ndex than the data domain. A high number of objects in level 0 may require a costly application of secondary filters if the query window falls partially outside the bounding box. If the query window falls inside the bounding box, having a high number of objects in level 0 may actually improve the performance.
  • 42.
    Property Type Description Number_Of_Rows_Selected_By_Primary_ bigint Core P = Number of rows selected by the primary filter. Filter Number_Of_Rows_Selected_By_Internal_ bigint Core S = Number of rows selected by the internal filter. For Filter these rows, the secondary filter is not called. Number_Of_Times_Secondary_Filter_Is_ bigint Core Number of times the secondary filter is called. Called Percentage_Of_Rows_NotSelected_By_Pri float Core Suppose there are N rows in the base table, suppose P mary_Filter are selected by the primary filter. This is (N-P)/N as percentage. Percentage_Of_Primary_Filter_Rows_Sele float Core This is S/P as a percentage. The higher the percentage, cted_By_Internal_Filter the better is the index in avoiding the more expensive secondary filter. Number_Of_Rows_Output bigint Core O=Number of rows output by the query. Internal_Filter_Efficiency float Core This is S/O as a percentage. Primary_Filter_Efficiency float Core This is O/P as a percentage. The higher the efficiency is, the less false positives have to be processed by the secondary filter.
  • 46.
    http://social.technet.microsoft.com/wiki/contents/articles/updated-spatial-features-in-the-sql- azure-q4-2011-service-release.aspx Forum: http://forums.microsoft.com/MSDN/ShowForum.aspx?ForumID=1629&SiteID=1

Editor's Notes

  • #12 ADD USING Syntax to show new tessellation scheme
  • #19 Procedure:Construct 4 points/ranges for each cell in TRemove duplicatesSort (optionally)Seek
  • #20 Clustering imposes ordering on index
  • #21 Procedure:Construct 4 points/ranges for each cell in TRemove duplicatesSort (optionally)Seek
  • #22 TB
  • #39 ADD Tesselation
  • #45 Experimentation: For instance, consider this dataset: US Highways.  In this dataset some of the LineStrings are quite long (over 2000 miles) and others are quite short (400 meters or less). For optimal performance, the following two indexes were roughly equivalent:Geography Index: MEDIUM, MEDIUM, MEDIUM, MEDIUM 1024Geometry Index: LOW, LOW, LOW, LOW 1024