Distributed RDBMS Data Distribution Policy: Part 3 Changing your data distribution policy October 2014
Data Distribution Policy: Part 3 Distributed RDBMSs provide many scalability, availability and performance advantages. This presentation takes a deeper look at distributed RDBMS efficiency over the long haul as application usage patterns, user requirements, and workloads change. The presentation discusses: • Three stages of your data distribution policy’s lifecycle. • Adapting the distributed RDBMS to match application changes. • Ensuring that your distributed relational database is flexible and 2 elastic enough to accommodate endless growth and change.
3 Why is a Distributed Relational Database Good? Distributed relational databases are a perfect match for Cloud computing models and distributed Cloud infrastructure. They are the way forward for delivering web scale applications and keeping ACID properties. • Social apps • Games • Many concurrent users • High transaction throughput • Very large data volumes
What Is a Data Distribution Policy? – Recap A data distribution policy describes the rules under which data is distributed across a distributed RDBMS. (a virtual database made up of many database instances, or “shards”). A good data distribution policy aims to: 1. Maintain full relational database integrity 2. Distribute workloads in an even and predictable manner 3. Minimize the amount of joins across the array of 4 database instances 4. Align with workflow and application usage patterns 5. Yield database scalability
5 “Change, nothing stays the same…” ...shouted 80’s rock band Van Halen proudly in their song “Unchained”. Just as music fashions change, we know databases must adapt to follow new usage patterns. Unexpected influxes of data or transactions are difficult to predict. You may have only anticipated and planed for a specific amount of growth and capacity. But what if you underestimate your success?
Imagine your application is taking off with great success. Sounds like good news, right? However, it might be hard on your database, as your business success can generate significantly more transactions, concurrent users and data that all needs to be accommodated. 6 Taking Additional Data into Consideration These types of situations are hard to predict and occur on a daily basis.
If you already have a distributed RDBMS, the original data distribution policy was (hopefully) created based on specific application usage patterns and workflows (see Part 2 of this presentation series). Over time, application workflows and application usage patterns can change. This can lead to database hotspots, bottlenecks, and database clusters that are overloaded compared to other clusters. 7 Adapting to Change
Data Distribution Policy Lifecycle 8 There are three main situations to accommodate during a data distribution policy’s lifecycle: 1. Changing Demand and Traffic Loads 2. Changing Application Usage 3. New Product Capabilities The answer to these changes is typically the same: “Rebalance” the distributed database Distribution policy management through lifecycle changes is a key issue to test in any distributed RDBMS technology.
9 Rebalancing the Distributed Database In the past, changing a data distribution policy has been hard to address. Manually changing sharding code within an application was the frontline battle zone of changing data distribution. Today, software like ScaleBase can accommodate all these changes easily for you, quickly and with minimal disruptions to live systems.
Scenario 1: Adapting to Changing Demand and Traffic Loads Data distribution policies should always be designed so that data that is frequently accessed together is aggregated into the same database instance (or shard) as this provides the greatest efficiency and scalability benefits. Data distributions are built according to anticipated traffic predictions (both reads and writes), but traffic loads change. 10
Scenario 1: Adapting to Changing Demand and Traffic Loads – Typical Challenges 1. Be aware of changes in workload patterns and understand 11 their impact on your distributed relational database. 2. A specific application function’s sudden popularity, or changes in your business environment can lead to usage spikes and transaction bottlenecks from increased demand and unexpected transaction patterns. 3. If workloads appear where the distribution policy was not optimized, new and unplanned operations may cause more costly execution paths that result in sub-par performance and scalability. * Automated threshold alerting and various other monitoring can help you stay ahead of peaks and bottlenecks, so look for these facilities in any distributed solution you choose.
Scenario 2: Adapting to Application Usage Changes Over time, it’s quite common for application usage to change. When this happens: 1. The system’s new behavior patterns need to be understood 12 in order to make appropriate changes and optimizations. 2. Adapting to change is typically where do-it-yourself home-grown sharding fails. 3. Re-writing the custom application code that did the initial data distribution to provide new data distribution can lead to errors that are easy to make, hard to uncover, and hard to recover from. 4. Identifying the distribution policy changes required to optimally re-balance workloads around new application usage patterns needs some of the analysis that we described in Part 2.
Scenario 3: Adapting to New Product Capabilities The final challenge comes is modifying an application that add new capabilities to your product or service. 1. Updated business requirements can necessitate different 13 functions to integrate new solutions with existing systems, extending the current application and database to accommodate relevant new business needs. 2. Old-fashioned do-it-yourself distribution policy hardcoding eliminates flexibility and often does not allow changes to be made, turning an implementation attempt into a very complicated and daunting task.
Scenario 3: Adapting to New Product Capabilities (Continued) You can’t stop your business while you’re rebalancing the distributed database with data placement changes. Rewriting application data redistribution code creates yet another challenge in implementing a change while keeping existing data and operations intact. As a result, many cases companies have opted to rebuild their system again from scratch instead of attempting to make modifications. If data distribution logic is built into the actual application, it can be very hard to make system modifications on the fly. This is costly and ultimately results in maintenance nightmares, performance degradation, and downtime. This is not good! 14
What Can You Do? To simplify the management of the data distribution policy that underlies your distributed RDBMS you MUST make a strict separation between your application and where the database distribution policy is defined, managed, and maintained. • If you’re a startup building a new app, or if you have an 15 existing app that needs to scale for growth, you want to “hit the road, running!” (again, to quote Van Halen). ScaleBase software was created to handle changes like the ones previously mentioned, providing customers with the peace of mind they need to grow successfully, in any manner, at any rate and to any scale.
ScaleBase Software • ScaleBase is a distributed database built on MySQL and 16 optimized for the cloud. It deploys in minutes so your database can handle an unlimited number of users, humongous volumes of data, and faster transactions. • It dynamically optimizes workloads and availability by logically distributing data across public, private, and geo-distributed clouds.
Try ScaleBase Today ScaleBase software is available for free: • ScaleBase Website • Amazon Marketplace • Rackspace Marketplace • IBM Cloud marketplace • ScaleBase’s free online Analysis Genie service AWS Marketplace Guide and a AWS Getting Started Tutorial are available from the documentation section of the ScaleBase website. 17 Contact ScaleBase sales@scalebase.com
Data Distribution Policy: Part 1 and 2 Data Distribution Policy Part 1: • What a data distribution policy is • The challenges faced when data is distributed via sharding • What defines a good data distribution policy • The best way to distribute data for your application and 18 workload Data Distribution Policy Part 2: • The different approaches to data distribution • How to create your own data distribution policy, whether you are scaling an existing application or creating a new app. • How ScaleBase can help you create your policy
Distributed RDBMS Data Distribution Policy: Part 3 Changing your data distribution policy October 2014

Distributed RDBMS: Data Distribution Policy: Part 3 - Changing Your Data Distribution Policy

  • 1.
    Distributed RDBMS DataDistribution Policy: Part 3 Changing your data distribution policy October 2014
  • 2.
    Data Distribution Policy:Part 3 Distributed RDBMSs provide many scalability, availability and performance advantages. This presentation takes a deeper look at distributed RDBMS efficiency over the long haul as application usage patterns, user requirements, and workloads change. The presentation discusses: • Three stages of your data distribution policy’s lifecycle. • Adapting the distributed RDBMS to match application changes. • Ensuring that your distributed relational database is flexible and 2 elastic enough to accommodate endless growth and change.
  • 3.
    3 Why isa Distributed Relational Database Good? Distributed relational databases are a perfect match for Cloud computing models and distributed Cloud infrastructure. They are the way forward for delivering web scale applications and keeping ACID properties. • Social apps • Games • Many concurrent users • High transaction throughput • Very large data volumes
  • 4.
    What Is aData Distribution Policy? – Recap A data distribution policy describes the rules under which data is distributed across a distributed RDBMS. (a virtual database made up of many database instances, or “shards”). A good data distribution policy aims to: 1. Maintain full relational database integrity 2. Distribute workloads in an even and predictable manner 3. Minimize the amount of joins across the array of 4 database instances 4. Align with workflow and application usage patterns 5. Yield database scalability
  • 5.
    5 “Change, nothingstays the same…” ...shouted 80’s rock band Van Halen proudly in their song “Unchained”. Just as music fashions change, we know databases must adapt to follow new usage patterns. Unexpected influxes of data or transactions are difficult to predict. You may have only anticipated and planed for a specific amount of growth and capacity. But what if you underestimate your success?
  • 6.
    Imagine your applicationis taking off with great success. Sounds like good news, right? However, it might be hard on your database, as your business success can generate significantly more transactions, concurrent users and data that all needs to be accommodated. 6 Taking Additional Data into Consideration These types of situations are hard to predict and occur on a daily basis.
  • 7.
    If you alreadyhave a distributed RDBMS, the original data distribution policy was (hopefully) created based on specific application usage patterns and workflows (see Part 2 of this presentation series). Over time, application workflows and application usage patterns can change. This can lead to database hotspots, bottlenecks, and database clusters that are overloaded compared to other clusters. 7 Adapting to Change
  • 8.
    Data Distribution PolicyLifecycle 8 There are three main situations to accommodate during a data distribution policy’s lifecycle: 1. Changing Demand and Traffic Loads 2. Changing Application Usage 3. New Product Capabilities The answer to these changes is typically the same: “Rebalance” the distributed database Distribution policy management through lifecycle changes is a key issue to test in any distributed RDBMS technology.
  • 9.
    9 Rebalancing theDistributed Database In the past, changing a data distribution policy has been hard to address. Manually changing sharding code within an application was the frontline battle zone of changing data distribution. Today, software like ScaleBase can accommodate all these changes easily for you, quickly and with minimal disruptions to live systems.
  • 10.
    Scenario 1: Adaptingto Changing Demand and Traffic Loads Data distribution policies should always be designed so that data that is frequently accessed together is aggregated into the same database instance (or shard) as this provides the greatest efficiency and scalability benefits. Data distributions are built according to anticipated traffic predictions (both reads and writes), but traffic loads change. 10
  • 11.
    Scenario 1: Adaptingto Changing Demand and Traffic Loads – Typical Challenges 1. Be aware of changes in workload patterns and understand 11 their impact on your distributed relational database. 2. A specific application function’s sudden popularity, or changes in your business environment can lead to usage spikes and transaction bottlenecks from increased demand and unexpected transaction patterns. 3. If workloads appear where the distribution policy was not optimized, new and unplanned operations may cause more costly execution paths that result in sub-par performance and scalability. * Automated threshold alerting and various other monitoring can help you stay ahead of peaks and bottlenecks, so look for these facilities in any distributed solution you choose.
  • 12.
    Scenario 2: Adaptingto Application Usage Changes Over time, it’s quite common for application usage to change. When this happens: 1. The system’s new behavior patterns need to be understood 12 in order to make appropriate changes and optimizations. 2. Adapting to change is typically where do-it-yourself home-grown sharding fails. 3. Re-writing the custom application code that did the initial data distribution to provide new data distribution can lead to errors that are easy to make, hard to uncover, and hard to recover from. 4. Identifying the distribution policy changes required to optimally re-balance workloads around new application usage patterns needs some of the analysis that we described in Part 2.
  • 13.
    Scenario 3: Adaptingto New Product Capabilities The final challenge comes is modifying an application that add new capabilities to your product or service. 1. Updated business requirements can necessitate different 13 functions to integrate new solutions with existing systems, extending the current application and database to accommodate relevant new business needs. 2. Old-fashioned do-it-yourself distribution policy hardcoding eliminates flexibility and often does not allow changes to be made, turning an implementation attempt into a very complicated and daunting task.
  • 14.
    Scenario 3: Adaptingto New Product Capabilities (Continued) You can’t stop your business while you’re rebalancing the distributed database with data placement changes. Rewriting application data redistribution code creates yet another challenge in implementing a change while keeping existing data and operations intact. As a result, many cases companies have opted to rebuild their system again from scratch instead of attempting to make modifications. If data distribution logic is built into the actual application, it can be very hard to make system modifications on the fly. This is costly and ultimately results in maintenance nightmares, performance degradation, and downtime. This is not good! 14
  • 15.
    What Can YouDo? To simplify the management of the data distribution policy that underlies your distributed RDBMS you MUST make a strict separation between your application and where the database distribution policy is defined, managed, and maintained. • If you’re a startup building a new app, or if you have an 15 existing app that needs to scale for growth, you want to “hit the road, running!” (again, to quote Van Halen). ScaleBase software was created to handle changes like the ones previously mentioned, providing customers with the peace of mind they need to grow successfully, in any manner, at any rate and to any scale.
  • 16.
    ScaleBase Software •ScaleBase is a distributed database built on MySQL and 16 optimized for the cloud. It deploys in minutes so your database can handle an unlimited number of users, humongous volumes of data, and faster transactions. • It dynamically optimizes workloads and availability by logically distributing data across public, private, and geo-distributed clouds.
  • 17.
    Try ScaleBase Today ScaleBase software is available for free: • ScaleBase Website • Amazon Marketplace • Rackspace Marketplace • IBM Cloud marketplace • ScaleBase’s free online Analysis Genie service AWS Marketplace Guide and a AWS Getting Started Tutorial are available from the documentation section of the ScaleBase website. 17 Contact ScaleBase sales@scalebase.com
  • 18.
    Data Distribution Policy:Part 1 and 2 Data Distribution Policy Part 1: • What a data distribution policy is • The challenges faced when data is distributed via sharding • What defines a good data distribution policy • The best way to distribute data for your application and 18 workload Data Distribution Policy Part 2: • The different approaches to data distribution • How to create your own data distribution policy, whether you are scaling an existing application or creating a new app. • How ScaleBase can help you create your policy
  • 19.
    Distributed RDBMS DataDistribution Policy: Part 3 Changing your data distribution policy October 2014