How to Improve NOSQL Database Performance

Explore top LinkedIn content from expert professionals.

Summary

Improving NoSQL database performance means making your data systems run faster and smoother, especially when handling large amounts of information. NoSQL databases store and retrieve data without relying on traditional table structures, so smart data organization and strategic querying are crucial for boosting speed and reliability.

Rethink partition strategy: Design your data partitions using a mix of attributes, such as user ID or region, rather than relying solely on timestamps, to avoid overload in any single area.
Create covering indexes: Include all relevant query columns in your indexes so the database can answer queries directly from these indexes and skip unnecessary lookups.
Compress and group data: Store your data in compressed formats and physically group similar records together, which helps speed up queries by reducing disk read times.

Summarized by AI based on LinkedIn member posts

John Kutay

Data & AI Engineering Leader

9,650 followers 7mo
Report this post
If you’re clustering or partitioning your data on timestamp-based keys—especially in systems like BigQuery or Snowflake, etc. this diagram should look familiar 👇 Hotspots in partitioned databases are one of those things you don’t notice until your write performance nosedives. When I work with teams building time-series datasets or event logs, one of the most common pitfalls I see is sequential writes to a single partition. Timestamp as a partition key sounds intuitive (and easy), but here’s what actually happens: 🔹 Writes start hitting a narrow window of partitions (like t1–t2 in this example) 🔹 That partition becomes a hotspot, overloaded with inserts 🔹 Meanwhile, surrounding partitions (t0–t1, t2–t3) sit nearly idle 🔹 Performance drops, latency increases, and in some systems—throughput throttling or even write failures kick in This is why choosing the right clustering/partitioning strategy is so critical. A few things that’ve worked well for us: ✅ Add high-cardinality attributes (like user_id, region, device) to the partitioning scheme ✅ Randomize write distribution if real-time access isn’t required (e.g., hash bucketing) ✅ Use ingestion time or write time sparingly, only when access patterns make sense ✅ Monitor partition skew early and often—tools like system views and query plans help! Partitioning should balance read performance and write throughput. Optimizing for just one leads to trouble. If you're building on time-series data, don’t sleep on this. The write patterns you define today can make or break your infra six months from now. #dataengineering
No more previous content

No more next content

John Kutay

Data & AI Engineering Leader

If you’re clustering or partitioning your data on timestamp-based keys—especially in systems like BigQuery or Snowflake, etc. this diagram should look familiar 👇 Hotspots in partitioned databases are one of those things you don’t notice until your write performance nosedives. When I work with teams building time-series datasets or event logs, one of the most common pitfalls I see is sequential writes to a single partition. Timestamp as a partition key sounds intuitive (and easy), but here’s what actually happens: 🔹 Writes start hitting a narrow window of partitions (like t1–t2 in this example) 🔹 That partition becomes a hotspot, overloaded with inserts 🔹 Meanwhile, surrounding partitions (t0–t1, t2–t3) sit nearly idle 🔹 Performance drops, latency increases, and in some systems—throughput throttling or even write failures kick in This is why choosing the right clustering/partitioning strategy is so critical. A few things that’ve worked well for us: ✅ Add high-cardinality attributes (like user_id, region, device) to the partitioning scheme ✅ Randomize write distribution if real-time access isn’t required (e.g., hash bucketing) ✅ Use ingestion time or write time sparingly, only when access patterns make sense ✅ Monitor partition skew early and often—tools like system views and query plans help! Partitioning should balance read performance and write throughput. Optimizing for just one leads to trouble. If you're building on time-series data, don’t sleep on this. The write patterns you define today can make or break your infra six months from now. #dataengineering

Like Comment
Like Comment
Raul Junco

Simplifying System Design

123,956 followers 1y
Report this post
Don’t index just filters. Index what you need. If you index only your WHERE columns, you leave performance on the table. One of the most effective yet overlooked techniques is Covering Indexes. Unlike standard indexes that only help filter rows, covering indexes include all columns required for a query. It will reduce query execution time by eliminating the need to access the main table. 𝗪𝗵𝘆 𝗖𝗼𝘃𝗲𝗿𝗶𝗻𝗴 𝗜𝗻𝗱𝗲𝘅𝗲𝘀? • By including all required columns, the query can be resolved entirely from the index, avoiding table lookups. • Can speed up join queries by reducing access to the base table. 𝗖𝗼𝗹𝘂𝗺𝗻𝘀 𝘁𝗼 𝗜𝗻𝗰𝗹𝘂𝗱𝗲: • WHERE: Filters rows. • SELECT: Data to retrieve. • ORDER BY: Sorting columns. 𝗦𝘁𝗲𝗽𝘀 𝘁𝗼 𝗖𝗿𝗲𝗮𝘁𝗲 𝗖𝗼𝘃𝗲𝗿𝗶𝗻𝗴 𝗜𝗻𝗱𝗲𝘅𝗲𝘀 1- Use execution plans to identify queries that perform frequent table lookups. 2- Focus on columns in WHERE, SELECT, and ORDER BY. 3- Don’t create multiple indexes with overlapping columns unnecessarily. 𝗖𝗼𝘃𝗲𝗿𝗶𝗻𝗴 𝗜𝗻𝗱𝗲𝘅𝗲𝘀 𝗮𝗿𝗲 𝗻𝗼𝘁 𝗳𝗼𝗿 𝗳𝗿𝗲𝗲. • Each insert, update, or delete operation must update the index, which can slow down write-heavy workloads. • Covering indexes consumes more disk space. Covering indexes are a powerful tool for database performance, especially for read-heavy applications. While they can increase write costs, the trade-off is often worth it for the dramatic speedups in query performance. Every table lookup wastes precious time. Fix it!
No more previous content

No more next content

Raul Junco

Simplifying System Design

Don’t index just filters. Index what you need. If you index only your WHERE columns, you leave performance on the table. One of the most effective yet overlooked techniques is Covering Indexes. Unlike standard indexes that only help filter rows, covering indexes include all columns required for a query. It will reduce query execution time by eliminating the need to access the main table. 𝗪𝗵𝘆 𝗖𝗼𝘃𝗲𝗿𝗶𝗻𝗴 𝗜𝗻𝗱𝗲𝘅𝗲𝘀? • By including all required columns, the query can be resolved entirely from the index, avoiding table lookups. • Can speed up join queries by reducing access to the base table. 𝗖𝗼𝗹𝘂𝗺𝗻𝘀 𝘁𝗼 𝗜𝗻𝗰𝗹𝘂𝗱𝗲: • WHERE: Filters rows. • SELECT: Data to retrieve. • ORDER BY: Sorting columns. 𝗦𝘁𝗲𝗽𝘀 𝘁𝗼 𝗖𝗿𝗲𝗮𝘁𝗲 𝗖𝗼𝘃𝗲𝗿𝗶𝗻𝗴 𝗜𝗻𝗱𝗲𝘅𝗲𝘀 1- Use execution plans to identify queries that perform frequent table lookups. 2- Focus on columns in WHERE, SELECT, and ORDER BY. 3- Don’t create multiple indexes with overlapping columns unnecessarily. 𝗖𝗼𝘃𝗲𝗿𝗶𝗻𝗴 𝗜𝗻𝗱𝗲𝘅𝗲𝘀 𝗮𝗿𝗲 𝗻𝗼𝘁 𝗳𝗼𝗿 𝗳𝗿𝗲𝗲. • Each insert, update, or delete operation must update the index, which can slow down write-heavy workloads. • Covering indexes consumes more disk space. Covering indexes are a powerful tool for database performance, especially for read-heavy applications. While they can increase write costs, the trade-off is often worth it for the dramatic speedups in query performance. Every table lookup wastes precious time. Fix it!

83 Comments

Like Comment
83 Comments
Like Comment
Aliaksandr Valialkin

Founder and CTO at @VictoriaMetrics

3,760 followers 5mo
Report this post
There is a common misconception that the performance of a heavy query in databases with hundreds of terabytes of data can be improved by adding more CPU and RAM. This is true until the data, which is accessed by the query, fits the OS page cache (the size of this cache is proportional to the available RAM), and the same (or similar) queries are executed repeatedly, so they could read the data from the OS page cache instead of reading it from persistent storage. If the query needs to read hundreds of terabytes of data, then it cannot fit RAM on typical hosts. This means that the performance of such queries is limited by the disk read speed in this case, and it cannot be improved by adding more RAM and CPU. Which techniques do exist for speeding up heavy queries, which need to read a lot of data? 1. Compression. It is better to spend additional CPU time on decompression of the compressed data stored on disk instead of waiting for much longer until the uncompressed data is read from disk. For example, typical compression ratio for real production logs is 10x-50x. This allows speeding up heavy queries by 10x-50x compared to the case when the data is stored on disk in uncompressed form. 2. Physically grouping and sorting similar rows close to each other, and compress blocks of such rows. This increases the compression ratio compared to the case when rows are stored and compressed without additional grouping and sorting. 3. Physically storing per-column data in distinct locations (files). This is known as column-oriented storage. Then the query needs to read the data only for the referred columns, while skipping the data for the rest of the columns. 4. Using time-based partitioning, bloom filters, min-max indexes and coarse-grained indexes for skipping reading data blocks, which do not have rows needed for the query. These techniques allow increasing heavy query performance by 1000x and more on systems where the bottleneck is disk read IO bandwidth. All these techniques are automatically used by VictoriaLogs for increasing performance of heavy queries over hundreds of terabytes of logs.

19 Comments
Like Comment

How to Improve NOSQL Database Performance

Summary

More in Performance Optimization Techniques

Explore categories