Cosmos DB for DBAs & DEVs Niko Neugebauer – Consultant @ OH22
Speaker Niko speaks regularly at events such as PASS Summit, SQLRally, SQLBits, and SQLSaturday events around the world. Niko Neugebauer Professional Focus Community Lead the first international SQLSaturday PASS User Group Leader TUGA Non-Profit Association Leader /in/webcaravela/ @NikoNeugebauer Data Platform (especially from Microsoft) Columnstore Blogger (110+) at http://www.nikoport.com/columnstore Creator of CISL – Columnstore Indexes Script Library (https://github.com/NikoNeugebauer/CSIL)
Niko Neugebauer Consultant, OH22 IS Professional Focus Data Platform (especially from Microsoft) Columnstore Blogger (110+) at http://www.nikoport.com/columnstore Creator of CISL – Columnstore Indexes Script Library (https://github.com/NikoNeugebauer/CSIL) Lead the first international SQLSaturday PASS User Group Leader TUGA Non-Profit Association Leader Speaker Niko speaks regularly at events such as PASS Summit, SQLRally, SQLBits, and SQLSaturday events around the world.• /in/webcaravela/ • @NikoNeugebauer
CAP Theorem – old wisdom: pick just 2! • Consistency • Availability • Partition tolerance
So close, so far ...
CosmosDB: The new Wisdom
Agenda • What is CosmosDB ? • Why CosmosDB ? • How CosmosDB ? • Use CosmosDB • CosmosDB for Developers • CosmosDB for DBAs
CosmosDB: What is it ?
What is CosmosDB • Azure Cosmos DB is Microsoft's globally distributed, multi-model database. • With the click of a button, Azure Cosmos DB enables you to elastically and independently scale throughput and storage across any number of Azure's geographic regions. • It offers throughput, latency, availability, and consistency guarantees with comprehensive service level agreements (SLAs), something no other database service can offer.
What is CosmosDB
Data Models in CosmosDB • Database engine operates on atom-record-sequence based type system. All data models translated to A-R-S • API and wire protocols supported via extensible modules Currently supported data models: • Documents, Graphs, Key-Value, Column-Value
API (30-11-2017) • DocumentDB API • SQL-like API • MongoDB API • Table API • Graph API (TinkerPop, Gremlin/Groove) • Cassandra API • Spark • Geospatial support • more will be coming!
A word on Table API vs Azure Table Storage comparison Table Storage Cosmos Table API Latency Fast Single-digit millisecond latency Throughput Variable, scalalbe up to 20.000 operations/second Highly scalable with dedicated reserved throughput per table, up to 10 million operations/sec Global Distribution Single Region Turnkey global distribution Indexing Only Primary Index on PartitionKey and RowKey Automatic and complete indexing on all properties, no index management (LOL). Query Query execution uses index for primary key, and scans otherwise. Queries can take advantage of automatic indexing on properties for fast query times. Consistency Strong in Primary Region, Eventual in Secondary Reg. 5 well-defined consistency levels
Resource Model
CosmosDB: Partitioning
CosmosDB Partitioning
CosmosDB Partitioning
Partitioning • Implemented on the Tenant-level (Collection, Graph, Table) • A resource partition is a resource-governed primitive, which is limited to a subset of keys. • Capable of doing Splits, Merges, etc from the Partitions
Partitioning Best Practices - Select a PartitionKey for the best data distribution - Use location-aware partition key for the best access locality - Select a PartitionKey which can be a transaction scope - Don’t use Timestamps for write-heavy workloads. Use time ranges (hour, month, week, day, year) for even data distribution.
CosmosDB: Why
Why creating CosmosDB? • Traditional relational databases were designed in 70s-80s • Data is Growing (Petabytes, Exabytes, etc) • Think about Internet-Scale and distributed systems • Provide API Choices Think about: • Availability • Performance • Costs
CosmosDB: the focus on the performance Reads (1KB) Indexed Writes (1KB) 50th < 2ms < 6ms 99th < 10ms < 15ms ▪ Globally distributed with reads and writes served from/to local region ▪ Write-optimised, latch-free engine designed for SSD ▪ Synchronous/Asynchronous automatic indexing
Azure Cosmos DB • Azure Cosmos DB is fully schema agnostic. • Uses JSON to describe the supported data models • Automatic indexing of all ingested content • Resource Governed, write-optimised engine • Online Index operations
Core pieces of CosmosDB Architecture • Global distribution • Resource Governance • Schema-agnostic service
Consisteny Levels (and there are 5 of them): • You pick a stronger consistency level like strong/bounded staleness because for your account, because a critical path in your e- commerce/LOB application needs the guarantee • But for some less-critical operations (like a reporting dashboard query), you would choose a weaker-consistency level because it consumes only half the throughput. • The current offering for the Consistency levels is: Strong / Bounded Staleness / Session / Consistent Prefix / Eventual
Consisteny Levels in 1 Picture:
Default Consisteny Levels: • Strong - Linear. Reads are guaranteed to return the most recent version of an item. • Bounded Staleness - Consistent Prefix. Reads lag behind writes by k prefixes or t interval • Session - Consistent Prefix. Monotonic reads, monotonic writes, read-your-writes, write-follows-reads in your geographical location. • Consistent Prefix - Updates returned are some prefix of all the updates, with no gaps. If you applied sequential transactions, the previous ones are available on request. • Eventual - Out of order reads
Indexing & Consisteny Levels: Indexing Mode Reads Queries Consistent Select from strong, bounded staleness, session, consistent prefix, or eventual Select from strong, bounded staleness, session, or eventual Lazy Select from strong, bounded staleness, session, consistent prefix, or eventual Eventual None Select from strong, bounded staleness, session, consistent prefix, or eventual Eventual
Throughoutput • RU – Requests Unit • % Memory / % CPU / % IOPS just like for Azure SQLDB • READ / INSERT / UPSERT / DELETE / QUERY - operations • QUERY = Scans + Index Lookups + Query Complexity + Instruction Cost • Everything is calculated by Azure ML 
Throughoutput • RU – Requests Per Unit • 400 RU/sec – 10.000 RU/sec (Collections) • 2.500 RU/sec – Unlimited? RU/sec (Partitioned Collections) • Min Increase / Decrease is 100 RU/sec
Scaling Cosmos DB Up & Out • Scale Up – Increase the number of RUs • Scale Out – Increase the number of partitions for your collections/graphs/tables
Stored Procs, User-Defined Functions, Triggers, etc • Is a Server-Side JavaScript Programming • Procedural Logic • Atomic Transactions • Batching • Pre-Compilation • Encapsulation
Stored Procs for CosmosDB
User-Defined Functions
Triggers (validation and Node.JS registration)
Stored Procedures using Javascript API DO NOT!
Azure Functions Are supported 
Real Life Problems • Data Quality (Data Types Casting, Missing Connections) • Complex Questions (joins)
CosmosDB: Behind the Scenes
CosmosDB • Introduction (Availability (Ring 0), Consistency, 5 9s, PaaS, Scaling) • Blah • Stored Procedures • UDFs • Triggers
At the Data Centre • Solid State Drives storage (SSD) • Fusion IO 160GB Drives • Fast Private Network Connections
Move to CosmosDB
Azure CosmosDB Data Migration Tool • Allows you to migrate your data into the CosmosDB • Supports a range of the sources • Does not support GraphDB ... yet
CosmosDB: Developers
CosmosDB Query Playground • https://www.documentdb.com/sql/demo
Try CosmosDB for free (need an Azure account): • https://azure.microsoft.com/en-us/try/cosmosdb/ 46
CosmosDB in Azure Storage Explorer
Azure Cosmos DB Emulator Software requirements: • Windows Server 2012 R2, Windows Server 2016, or Windows 10 Minimum Hardware requirements: • 2 GB RAM • 10 GB available hard disk space
CosmosDB: DBAs DBA as in DCT = Data Care Taker
Indexing Example:
Indexing Policy Modes • Consistent – follows the same consistency level as specified for the point- reads (i.e. strong, bounded-staleness, session or eventual). The index is updated synchronously as part of the document update. The workload target is “write quickly, query immediately”. • Lazy - To allow maximum document ingestion throughput, an Azure Cosmos DB collection can be configured with lazy consistency; meaning queries are eventually consistent. The index is updated asynchronously when an Azure Cosmos DB collection is quite. • None - A collection marked with index mode of “None” has no index associated with it. This is commonly used if Azure Cosmos DB is utilized as a key-value storage and documents are accessed only by their ID property.
Indexing Policy Modes Consistency Indexing Mode: Consistent Indexing Mode: Lazy Strong Strong Eventual Bounded Staleness Bounded Staleness Eventual Session Session Eventual Eventual Eventual Eventual
Indexing Policy Modes with EnableScanInQuery Consistency Indexing Mode: Consistent Indexing Mode: Lazy Indexing Mode: None Strong Strong Eventual Strong Bounded Staleness Bounded Staleness Eventual Bounded Staleness Session Session Eventual Session Eventual Eventual Eventual Eventual
Indexing Paths Path Description / Default path for the collection. Recursive /name/? Hash or Range Indexes for predicates and sorts /name/* Index path for all paths under the specified label. (multiple levels down) /name/[]/prop/? Index path required to serve iteration and JOIN queries against arrays of objects like [{prop: "a"}, {prop: "b"}]:
Indexes Types, Kinds & Precisions DataTypes: • String • Number • Point • Polygon • LineString
Indexes Types, Kinds & Precisions Index Types: • Hash – Hash Indexes, think Hekaton (Hash Indexes). Supports equality and JOIN queries, for the most queries default value of 3 bytes is sufficient. DataType can be String or Number. • Range – Range Indexes, think Hekaton (BW-Tree). Supports equality & range queries (<,>,<=,>=,!=) and ORDER BY queries. DataType can be String or Number. • Spatial – Spatial Queries for Points, Polygons & LineString. Supports efficient spatial (within & distance queries) queries.
Indexes Precision Lets you tradeoff between index storage overhead and query performance. For numbers, Microsoft recommends using the defulat precision -1 (“maximum”). Notice that numbers are 8 bytes in JSON. Picking smaller numbers for precision (1-7) means collisions and hence more RU’s consumption. For String ranges, which can be of arbitrary lengths, the index precision can impact the performance of range search queries and impact storage. The precision can be specified between 1 to 100. Important: if you need sorting on the results (ORDER BY), you must specify the precision of 100.
Indexes Inclusion / Exclusion includedPaths: [ { “path”: “/mainContent/*”, “indexes”:[ { “kind”: “Hash”, “dataType”: “String”, “precision”: 20 } ] } ] excludedPaths: [ { “path”: “/nonIndexedContent/*” } ]
Indexing Policy Changes – What for ? • When importing bulk data using lazy indexing models for faster writes, switching then to consistent indexing for regular operation. • When reducing the throughput for writes as well as the storage space used by hand selecting the properties to be indexed and changing them over time, or by varying the index precision of individual properties. • When using new indexing features on your current DocumentDB collections like Order By and string range queries which require the newly introduced string range index kind.
Indexing Policy Changes - how ?
CosmosDB: Backups
Backups for the CosmosDB:
Backup for DBAs: • Every 4 hours (approx.) a backup is taken (to Azure BLOB Storage) • At least 2 backups are stored at all times • If you lost your data, you need to contact Azure Support within 8 hours • Backup retention: 30 days for deleted partitions/databases • If you want to maintain your own snapshots, you can use the export to JSON option in the Azure Cosmos DB Data Migration tool to schedule additional backups.
Backup for DBAs – read carefully: • As soon as corruption is detected, the user should delete the corrupted container (collection/graph/table) so that backups are protected from being overwritten with corrupted data. Source: https://docs.microsoft.com/en-us/azure/cosmos- db/online-backup-and-restore
Backup for DBAs – the alternative: • Extract JSON files of your databases/collections/graphs with the help of the Azure Migration Tool
CosmosDB: Failovers
Global Distribution aka Geo-Replication aka Reional Failover
Global Distribution aka Geo-Replication aka Reional Failover
Global Distribution aka Geo-Replication aka Reional Failover
Manual Failover
Manual Failover Scenarios: • Follow the clock model: If your applications have predictable traffic patterns based on the time of the day, you can periodically change the write status to the most active geographic region based on time of the day. • Service update: Certain globally distributed application deployment may involve rerouting traffic to different region via traffic manager during their planned service update. Such application deployment now can use manual failover to keep the write status to the region where there is going to be active traffic during the service update window. • Business Continuity and Disaster Recovery (BCDR) and High Availability and Disaster Recovery (HADR) drills: Most enterprise applications include business continuity tests as part of their development and release process. BCDR and HADR testing is often an important step in compliance certifications and guaranteeing service availability in the case of regional outages. You can test the BCDR readiness of your applications that use Cosmos DB for storage by triggering a manual failover of your Cosmos DB account and/or adding and removing a region dynamically.
Global Distribution aka Geo-Replication aka Reional Failover • Configuration • First, deploy your application in multiple regions • To ensure low latency access from every region your application is deployed, configure the corresponding preferred regions list for each region via one of the supported SDKs.
GraphDB
GraphDB • Based on Apache TinkerPop (open source) • Supporting Gremlin & Groove (How much?) languages
GraphDB - possibilities • Querying across graph collections - not supported right now • Duplicate Edges detection • Duplicate Vertex detection • Betweness Centrality • Eigenvector (PageRank) • Recommendation (as Products in SSAS) • ...
GraphDB Gremlin querying • g.V().count(); // Documents • g.V().hasLabel(‘person’).has(‘age’,gt(40)); // People aged over 40 • g.V().hasLabel('person').values('firstName'); // List People’s first names Under the hood, the query • g.V().hasLabel('Azure') transforms into • {"query":"SELECT N_2 FROM Node N_2 WHERE (IS_DEFINED(N_2._isEdge) = false AND (N_2.label = 'Azure'))"}
GraphDB Migrations • Neo4J: https://github.com/bsherwin/neo2cosmos • Migration Tool (soon)
Data Migration Tool: • https://www.microsoft.com/en-us/download/details.aspx?id=46436
Limitations: • Returning big amounts of data • No support for Group BY (SQL Api)
PowerBI • Via Spark - https://github.com/Azure/azure-cosmosdb- spark/wiki/Configuring-Power-BI-Direct-Query-to-Azure- Cosmos-DB-via-Apache-Spark-(HDI)
Geospatial • Working with geospatial and GeoJSON location data in Azure Cosmos DB: https://docs.microsoft.com/en-us/azure/cosmos- db/geospatial • Azure Cosmos DB: Expanded geospatial support, including automatic indexing of Polygon and LineString objects: https://azure.microsoft.com/en-us/updates/documentdb- expanded-geospatial-support-including-automatic- indexing-of-polygons-and-lines/
CosmosDB Links • https://www.microsoft.com/en-us/download/details.aspx?id=46436 • https://docs.microsoft.com/en-us/azure/cosmos-db/consistency-levels • Azure CosmosDB Emulator: https://docs.microsoft.com/en-us/azure/cosmos-db/local-emulator • Indexing Policies: https://docs.microsoft.com/en-us/azure/cosmos-db/indexing-policies • Use the Azure Cosmos DB Emulator for local development and testing: https://docs.microsoft.com/en-us/azure/cosmos-db/local-emulator • Tunable data consistency levels in Azure Cosmos DB: https://docs.microsoft.com/en-us/azure/cosmos-db/consistency-levels
CosmosDB Links • Gremlin Console: http://tinkerpop.apache.org/docs/current/tutorials/the-gremlin- console/ • Tunable data consistency levels in Azure Cosmos DB:
DÚVIDAS?
DÚVIDAS?
OBRIGADO POR PARTICIPAREM
Database Console Commands Rodrigo Crespi, SQL Server specialist A seguir….

CosmosDB for DBAs & Developers

  • 1.
    Cosmos DB forDBAs & DEVs Niko Neugebauer – Consultant @ OH22
  • 2.
    Speaker Niko speaks regularlyat events such as PASS Summit, SQLRally, SQLBits, and SQLSaturday events around the world. Niko Neugebauer Professional Focus Community Lead the first international SQLSaturday PASS User Group Leader TUGA Non-Profit Association Leader /in/webcaravela/ @NikoNeugebauer Data Platform (especially from Microsoft) Columnstore Blogger (110+) at http://www.nikoport.com/columnstore Creator of CISL – Columnstore Indexes Script Library (https://github.com/NikoNeugebauer/CSIL)
  • 3.
    Niko Neugebauer Consultant, OH22IS Professional Focus Data Platform (especially from Microsoft) Columnstore Blogger (110+) at http://www.nikoport.com/columnstore Creator of CISL – Columnstore Indexes Script Library (https://github.com/NikoNeugebauer/CSIL) Lead the first international SQLSaturday PASS User Group Leader TUGA Non-Profit Association Leader Speaker Niko speaks regularly at events such as PASS Summit, SQLRally, SQLBits, and SQLSaturday events around the world.• /in/webcaravela/ • @NikoNeugebauer
  • 4.
    CAP Theorem –old wisdom: pick just 2! • Consistency • Availability • Partition tolerance
  • 5.
    So close, sofar ...
  • 6.
  • 7.
    Agenda • What isCosmosDB ? • Why CosmosDB ? • How CosmosDB ? • Use CosmosDB • CosmosDB for Developers • CosmosDB for DBAs
  • 8.
  • 9.
    What is CosmosDB •Azure Cosmos DB is Microsoft's globally distributed, multi-model database. • With the click of a button, Azure Cosmos DB enables you to elastically and independently scale throughput and storage across any number of Azure's geographic regions. • It offers throughput, latency, availability, and consistency guarantees with comprehensive service level agreements (SLAs), something no other database service can offer.
  • 10.
  • 11.
    Data Models inCosmosDB • Database engine operates on atom-record-sequence based type system. All data models translated to A-R-S • API and wire protocols supported via extensible modules Currently supported data models: • Documents, Graphs, Key-Value, Column-Value
  • 12.
    API (30-11-2017) • DocumentDBAPI • SQL-like API • MongoDB API • Table API • Graph API (TinkerPop, Gremlin/Groove) • Cassandra API • Spark • Geospatial support • more will be coming!
  • 13.
    A word onTable API vs Azure Table Storage comparison Table Storage Cosmos Table API Latency Fast Single-digit millisecond latency Throughput Variable, scalalbe up to 20.000 operations/second Highly scalable with dedicated reserved throughput per table, up to 10 million operations/sec Global Distribution Single Region Turnkey global distribution Indexing Only Primary Index on PartitionKey and RowKey Automatic and complete indexing on all properties, no index management (LOL). Query Query execution uses index for primary key, and scans otherwise. Queries can take advantage of automatic indexing on properties for fast query times. Consistency Strong in Primary Region, Eventual in Secondary Reg. 5 well-defined consistency levels
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
    Partitioning • Implemented onthe Tenant-level (Collection, Graph, Table) • A resource partition is a resource-governed primitive, which is limited to a subset of keys. • Capable of doing Splits, Merges, etc from the Partitions
  • 19.
    Partitioning Best Practices -Select a PartitionKey for the best data distribution - Use location-aware partition key for the best access locality - Select a PartitionKey which can be a transaction scope - Don’t use Timestamps for write-heavy workloads. Use time ranges (hour, month, week, day, year) for even data distribution.
  • 20.
  • 21.
    Why creating CosmosDB? •Traditional relational databases were designed in 70s-80s • Data is Growing (Petabytes, Exabytes, etc) • Think about Internet-Scale and distributed systems • Provide API Choices Think about: • Availability • Performance • Costs
  • 22.
    CosmosDB: the focuson the performance Reads (1KB) Indexed Writes (1KB) 50th < 2ms < 6ms 99th < 10ms < 15ms ▪ Globally distributed with reads and writes served from/to local region ▪ Write-optimised, latch-free engine designed for SSD ▪ Synchronous/Asynchronous automatic indexing
  • 23.
    Azure Cosmos DB •Azure Cosmos DB is fully schema agnostic. • Uses JSON to describe the supported data models • Automatic indexing of all ingested content • Resource Governed, write-optimised engine • Online Index operations
  • 24.
    Core pieces ofCosmosDB Architecture • Global distribution • Resource Governance • Schema-agnostic service
  • 25.
    Consisteny Levels (andthere are 5 of them): • You pick a stronger consistency level like strong/bounded staleness because for your account, because a critical path in your e- commerce/LOB application needs the guarantee • But for some less-critical operations (like a reporting dashboard query), you would choose a weaker-consistency level because it consumes only half the throughput. • The current offering for the Consistency levels is: Strong / Bounded Staleness / Session / Consistent Prefix / Eventual
  • 26.
  • 27.
    Default Consisteny Levels: •Strong - Linear. Reads are guaranteed to return the most recent version of an item. • Bounded Staleness - Consistent Prefix. Reads lag behind writes by k prefixes or t interval • Session - Consistent Prefix. Monotonic reads, monotonic writes, read-your-writes, write-follows-reads in your geographical location. • Consistent Prefix - Updates returned are some prefix of all the updates, with no gaps. If you applied sequential transactions, the previous ones are available on request. • Eventual - Out of order reads
  • 28.
    Indexing & ConsistenyLevels: Indexing Mode Reads Queries Consistent Select from strong, bounded staleness, session, consistent prefix, or eventual Select from strong, bounded staleness, session, or eventual Lazy Select from strong, bounded staleness, session, consistent prefix, or eventual Eventual None Select from strong, bounded staleness, session, consistent prefix, or eventual Eventual
  • 29.
    Throughoutput • RU –Requests Unit • % Memory / % CPU / % IOPS just like for Azure SQLDB • READ / INSERT / UPSERT / DELETE / QUERY - operations • QUERY = Scans + Index Lookups + Query Complexity + Instruction Cost • Everything is calculated by Azure ML 
  • 30.
    Throughoutput • RU –Requests Per Unit • 400 RU/sec – 10.000 RU/sec (Collections) • 2.500 RU/sec – Unlimited? RU/sec (Partitioned Collections) • Min Increase / Decrease is 100 RU/sec
  • 31.
    Scaling Cosmos DBUp & Out • Scale Up – Increase the number of RUs • Scale Out – Increase the number of partitions for your collections/graphs/tables
  • 32.
    Stored Procs, User-DefinedFunctions, Triggers, etc • Is a Server-Side JavaScript Programming • Procedural Logic • Atomic Transactions • Batching • Pre-Compilation • Encapsulation
  • 33.
  • 34.
  • 35.
    Triggers (validation andNode.JS registration)
  • 36.
    Stored Procedures usingJavascript API DO NOT!
  • 37.
  • 38.
    Real Life Problems •Data Quality (Data Types Casting, Missing Connections) • Complex Questions (joins)
  • 39.
  • 40.
    CosmosDB • Introduction (Availability(Ring 0), Consistency, 5 9s, PaaS, Scaling) • Blah • Stored Procedures • UDFs • Triggers
  • 41.
    At the DataCentre • Solid State Drives storage (SSD) • Fusion IO 160GB Drives • Fast Private Network Connections
  • 42.
  • 43.
    Azure CosmosDB DataMigration Tool • Allows you to migrate your data into the CosmosDB • Supports a range of the sources • Does not support GraphDB ... yet
  • 44.
  • 45.
    CosmosDB Query Playground •https://www.documentdb.com/sql/demo
  • 46.
    Try CosmosDB forfree (need an Azure account): • https://azure.microsoft.com/en-us/try/cosmosdb/ 46
  • 47.
    CosmosDB in AzureStorage Explorer
  • 48.
    Azure Cosmos DBEmulator Software requirements: • Windows Server 2012 R2, Windows Server 2016, or Windows 10 Minimum Hardware requirements: • 2 GB RAM • 10 GB available hard disk space
  • 49.
    CosmosDB: DBAs DBA asin DCT = Data Care Taker
  • 50.
  • 51.
    Indexing Policy Modes •Consistent – follows the same consistency level as specified for the point- reads (i.e. strong, bounded-staleness, session or eventual). The index is updated synchronously as part of the document update. The workload target is “write quickly, query immediately”. • Lazy - To allow maximum document ingestion throughput, an Azure Cosmos DB collection can be configured with lazy consistency; meaning queries are eventually consistent. The index is updated asynchronously when an Azure Cosmos DB collection is quite. • None - A collection marked with index mode of “None” has no index associated with it. This is commonly used if Azure Cosmos DB is utilized as a key-value storage and documents are accessed only by their ID property.
  • 52.
    Indexing Policy Modes ConsistencyIndexing Mode: Consistent Indexing Mode: Lazy Strong Strong Eventual Bounded Staleness Bounded Staleness Eventual Session Session Eventual Eventual Eventual Eventual
  • 53.
    Indexing Policy Modeswith EnableScanInQuery Consistency Indexing Mode: Consistent Indexing Mode: Lazy Indexing Mode: None Strong Strong Eventual Strong Bounded Staleness Bounded Staleness Eventual Bounded Staleness Session Session Eventual Session Eventual Eventual Eventual Eventual
  • 54.
    Indexing Paths Path Description /Default path for the collection. Recursive /name/? Hash or Range Indexes for predicates and sorts /name/* Index path for all paths under the specified label. (multiple levels down) /name/[]/prop/? Index path required to serve iteration and JOIN queries against arrays of objects like [{prop: "a"}, {prop: "b"}]:
  • 55.
    Indexes Types, Kinds& Precisions DataTypes: • String • Number • Point • Polygon • LineString
  • 56.
    Indexes Types, Kinds& Precisions Index Types: • Hash – Hash Indexes, think Hekaton (Hash Indexes). Supports equality and JOIN queries, for the most queries default value of 3 bytes is sufficient. DataType can be String or Number. • Range – Range Indexes, think Hekaton (BW-Tree). Supports equality & range queries (<,>,<=,>=,!=) and ORDER BY queries. DataType can be String or Number. • Spatial – Spatial Queries for Points, Polygons & LineString. Supports efficient spatial (within & distance queries) queries.
  • 57.
    Indexes Precision Lets youtradeoff between index storage overhead and query performance. For numbers, Microsoft recommends using the defulat precision -1 (“maximum”). Notice that numbers are 8 bytes in JSON. Picking smaller numbers for precision (1-7) means collisions and hence more RU’s consumption. For String ranges, which can be of arbitrary lengths, the index precision can impact the performance of range search queries and impact storage. The precision can be specified between 1 to 100. Important: if you need sorting on the results (ORDER BY), you must specify the precision of 100.
  • 58.
    Indexes Inclusion /Exclusion includedPaths: [ { “path”: “/mainContent/*”, “indexes”:[ { “kind”: “Hash”, “dataType”: “String”, “precision”: 20 } ] } ] excludedPaths: [ { “path”: “/nonIndexedContent/*” } ]
  • 59.
    Indexing Policy Changes– What for ? • When importing bulk data using lazy indexing models for faster writes, switching then to consistent indexing for regular operation. • When reducing the throughput for writes as well as the storage space used by hand selecting the properties to be indexed and changing them over time, or by varying the index precision of individual properties. • When using new indexing features on your current DocumentDB collections like Order By and string range queries which require the newly introduced string range index kind.
  • 60.
  • 61.
  • 62.
    Backups for theCosmosDB:
  • 63.
    Backup for DBAs: •Every 4 hours (approx.) a backup is taken (to Azure BLOB Storage) • At least 2 backups are stored at all times • If you lost your data, you need to contact Azure Support within 8 hours • Backup retention: 30 days for deleted partitions/databases • If you want to maintain your own snapshots, you can use the export to JSON option in the Azure Cosmos DB Data Migration tool to schedule additional backups.
  • 64.
    Backup for DBAs– read carefully: • As soon as corruption is detected, the user should delete the corrupted container (collection/graph/table) so that backups are protected from being overwritten with corrupted data. Source: https://docs.microsoft.com/en-us/azure/cosmos- db/online-backup-and-restore
  • 65.
    Backup for DBAs– the alternative: • Extract JSON files of your databases/collections/graphs with the help of the Azure Migration Tool
  • 66.
  • 67.
    Global Distribution akaGeo-Replication aka Reional Failover
  • 68.
    Global Distribution akaGeo-Replication aka Reional Failover
  • 69.
    Global Distribution akaGeo-Replication aka Reional Failover
  • 70.
  • 71.
    Manual Failover Scenarios: •Follow the clock model: If your applications have predictable traffic patterns based on the time of the day, you can periodically change the write status to the most active geographic region based on time of the day. • Service update: Certain globally distributed application deployment may involve rerouting traffic to different region via traffic manager during their planned service update. Such application deployment now can use manual failover to keep the write status to the region where there is going to be active traffic during the service update window. • Business Continuity and Disaster Recovery (BCDR) and High Availability and Disaster Recovery (HADR) drills: Most enterprise applications include business continuity tests as part of their development and release process. BCDR and HADR testing is often an important step in compliance certifications and guaranteeing service availability in the case of regional outages. You can test the BCDR readiness of your applications that use Cosmos DB for storage by triggering a manual failover of your Cosmos DB account and/or adding and removing a region dynamically.
  • 72.
    Global Distribution akaGeo-Replication aka Reional Failover • Configuration • First, deploy your application in multiple regions • To ensure low latency access from every region your application is deployed, configure the corresponding preferred regions list for each region via one of the supported SDKs.
  • 73.
  • 74.
    GraphDB • Based onApache TinkerPop (open source) • Supporting Gremlin & Groove (How much?) languages
  • 75.
    GraphDB - possibilities •Querying across graph collections - not supported right now • Duplicate Edges detection • Duplicate Vertex detection • Betweness Centrality • Eigenvector (PageRank) • Recommendation (as Products in SSAS) • ...
  • 76.
    GraphDB Gremlin querying •g.V().count(); // Documents • g.V().hasLabel(‘person’).has(‘age’,gt(40)); // People aged over 40 • g.V().hasLabel('person').values('firstName'); // List People’s first names Under the hood, the query • g.V().hasLabel('Azure') transforms into • {"query":"SELECT N_2 FROM Node N_2 WHERE (IS_DEFINED(N_2._isEdge) = false AND (N_2.label = 'Azure'))"}
  • 77.
    GraphDB Migrations • Neo4J:https://github.com/bsherwin/neo2cosmos • Migration Tool (soon)
  • 78.
    Data Migration Tool: •https://www.microsoft.com/en-us/download/details.aspx?id=46436
  • 79.
    Limitations: • Returning bigamounts of data • No support for Group BY (SQL Api)
  • 80.
    PowerBI • Via Spark- https://github.com/Azure/azure-cosmosdb- spark/wiki/Configuring-Power-BI-Direct-Query-to-Azure- Cosmos-DB-via-Apache-Spark-(HDI)
  • 81.
    Geospatial • Working withgeospatial and GeoJSON location data in Azure Cosmos DB: https://docs.microsoft.com/en-us/azure/cosmos- db/geospatial • Azure Cosmos DB: Expanded geospatial support, including automatic indexing of Polygon and LineString objects: https://azure.microsoft.com/en-us/updates/documentdb- expanded-geospatial-support-including-automatic- indexing-of-polygons-and-lines/
  • 82.
    CosmosDB Links • https://www.microsoft.com/en-us/download/details.aspx?id=46436 •https://docs.microsoft.com/en-us/azure/cosmos-db/consistency-levels • Azure CosmosDB Emulator: https://docs.microsoft.com/en-us/azure/cosmos-db/local-emulator • Indexing Policies: https://docs.microsoft.com/en-us/azure/cosmos-db/indexing-policies • Use the Azure Cosmos DB Emulator for local development and testing: https://docs.microsoft.com/en-us/azure/cosmos-db/local-emulator • Tunable data consistency levels in Azure Cosmos DB: https://docs.microsoft.com/en-us/azure/cosmos-db/consistency-levels
  • 83.
    CosmosDB Links • GremlinConsole: http://tinkerpop.apache.org/docs/current/tutorials/the-gremlin- console/ • Tunable data consistency levels in Azure Cosmos DB:
  • 84.
  • 85.
  • 86.
  • 87.
    Database Console Commands RodrigoCrespi, SQL Server specialist A seguir….