Add Benchmark Framework for ducktape #2030

k-raina · 2025-08-21T12:44:40Z

What

Key Features:

MetricsCollector: Real-time performance metrics collection with latency tracking, memory monitoring, and throughput analysis
MetricsBounds: Configurable performance thresholds with automatic validation
Enhanced Tests: All existing ducktape tests now include integrated benchmark metrics
Rich Reporting: Detailed performance reports with P50/P95/P99 latencies, memory usage, and batch efficiency

Metrics Collected:

Throughput: Send/delivery rates (msg/s, MB/s) with realistic bounds (1k+ msg/s)
Latency: P50/P95/P99 percentiles using Python's statistics.quantiles()
Memory: Peak usage and growth tracking via psutil
Efficiency: Messages per poll, buffer utilization, per-topic/partition breakdowns
Reliability: Success/error rates with comprehensive validation

Files Added:

tests/ducktape/benchmark_metrics.py - Complete benchmark framework

Files Modified:

tests/ducktape/test_producer.py - Enhanced all tests with integrated metrics
tests/ducktape/README.md - Updated documentation

Checklist

Contains customer facing changes? Including API/behavior changes
- No breaking changes - all existing tests enhanced with metrics, not replaced
Did you add sufficient unit test and/or integration test coverage for this PR?
- Yes - all existing ducktape tests now include comprehensive metrics validation
- Validated with 348k+ msg/s throughput and sub-100ms P95 latency

References

Original TODO: Initialize ducktape setup #2021 (comment)

Test & Review

# Run enhanced ducktape tests with integrated benchmarks ./tests/ducktape/run_ducktape_test.py

confluent-cla-assistant · 2025-08-21T12:44:52Z

🎉 All Contributor License Agreements have been signed. Ready to merge.
_{Please push an empty commit if you would like to re-run the checks to verify CLA status for all contributors.}

Copilot

Pull Request Overview

This PR adds a comprehensive benchmark framework for Kafka producer testing in the ducktape test suite. The framework provides real-time performance metrics collection, validation against configurable bounds, and detailed reporting capabilities.

Implements a complete MetricsCollector system with latency tracking, memory monitoring, and throughput analysis
Enhances all existing ducktape tests with integrated benchmark metrics without breaking changes
Adds configurable performance bounds validation with realistic thresholds (1k+ msg/s throughput)

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.

File	Description
`tests/ducktape/benchmark_metrics.py`	New comprehensive benchmark framework with MetricsCollector, MetricsBounds, and reporting utilities
`tests/ducktape/test_producer.py`	Enhanced all producer tests with integrated metrics collection and validation
`tests/ducktape/README.md`	Updated documentation to reflect new metrics capabilities and additional psutil dependency

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

tests/ducktape/test_producer.py

tests/ducktape/benchmark_metrics.py

tests/ducktape/test_producer.py

Copilot · 2025-08-21T12:47:32Z

tests/ducktape/benchmark_metrics.py

+
+ # Use quantiles for P95, P99 (more accurate than custom implementation)
+ try:
+ quantiles = statistics.quantiles(self.delivery_latencies, n=100)


Computing quantiles with n=100 for every summary is computationally expensive. Consider using a more efficient approach like numpy.percentile or caching the sorted data.

MSeal

Minor comments. I was debating if we should use something like locust for this.. might be worth switching to down the road but you kind of have to hack it to do any non-RESTful patterns for testing. e.g. https://github.com/SvenskaSpel/locust-plugins/blob/master/examples/kafka_ex.py

MSeal · 2025-08-24T21:40:10Z

tests/ducktape/benchmark_metrics.py

+ except (psutil.NoSuchProcess, psutil.AccessDenied, psutil.ZombieProcess):
+ # Handle edge cases where process might not exist or be accessible
+ return None
+ except Exception:


I would not catch generic Exception here and just let it boil up to be remediated

MSeal · 2025-08-24T21:40:43Z

tests/ducktape/benchmark_metrics.py

+ return None
+
+
+class MetricsBounds:


Maybe add a TODO: load from config file?

Implemented in commit eb493bb

MSeal · 2025-08-24T22:12:40Z

tests/ducktape/test_producer.py

+ latency_ms = (time.time() - send_times[msg_key]) * 1000
+ del send_times[msg_key] # Clean up
+ else:
+ latency_ms = 5.0 # Default latency if timing info not available


maybe better to just set to 0 or None

MSeal · 2025-08-24T22:15:46Z

Let's touch up small things, get a merge then iterate / change things if we want later. I want to get this into the history so we can build abstractions above for simpler test definitions and swap the implementation details as needed / remove conflicts on future PRs

fangnx

Great work :) Just left some questions

fangnx · 2025-08-26T03:48:10Z

tests/ducktape/benchmark_metrics.py

+ }
+
+ return {
+ # Basic metrics


Is the basic vs enhanced classification coming from some other source (e.g. some client benchmarking guides)? The list LGTM but just curious :)

There were no guidelines i refered to, it made sense to me having basic metrics and enhances metrics segregated for ease of reviewer.

Later on we can further divide these metrics in code/comments as "latency", "throughput", "message delivery" etc.

Make sense!

fangnx · 2025-08-26T03:51:41Z

tests/ducktape/README.md

+
+## Configuration
+
+Performance bounds are loaded from a JSON config file. By default, it loads `benchmark_bounds.json`, but you can override this with the `BENCHMARK_BOUNDS_CONFIG` environment variable:


Is default benchmark_bounds.json going to be added in the next PR

Added bounds in latest commit d0f4793

sonarqube-confluent · 2025-08-26T16:13:08Z

Analysis Details

5 Issues

0 Bugs
0 Vulnerabilities
5 Code Smells

Coverage and Duplications

No coverage information (66.40% Estimated after merge)
No duplication information (5.60% Estimated after merge)

Project ID: confluent-kafka-python

View in SonarQube

MSeal

Thanks for addressing minor comments. Let's move anything additional that's not a fix of a glaring issue to future PRs to unblock the history

* Asyncio Producer and Consumer implementation with same API as the sync ones * Add unit tests for asyncio producer + consumer (#2036) * update * update * fix producer and add consumer uts * refactor * rename * Add Benchmark Framework for ducktape (#2030) * Integrate Schema Registry with ducktape load tests (#2027) * basic sr test * more tests * update * update * lint * Add ducktape benchmark tests for consumer (sync + async) (#2045) * draft * update and cleanup * add perf comments, add batch_size param to consume test, lint fix * linter fix * Fix linter issues in consumer testing code (#2051) * Add remaining functions missing in async producer & consumer (#2050) * Add semaphore block for ducktape tests (#2037) * Add semaphore block for ducktape tests * Increase kafka start timeout * Increase kafka start timeout * Increase kafka start timeout * Add logs to debug pipeline * Start kafka in kraft mode * Fix directory failures * Fix directory failures * Fix directory failures * templatise path * Fix ductape run * Fix kafka broker listner * Fix ducktape version error * Cleanup * Fix bound voilation should fail tests * Now expand bounds for success * Add schema registry instance * Update Schema Registry hostname * Update Schema Registry hostname * Update Schema Registry hostname * Fix for linux CI environment * Address minor feedback * Fix semaphore * Minor fix after rebase * Add more async consumer unit & integration tests (#2052) * basic rebalance test * rebalance tests * refactor and linter fix * feebdack * refactor and cleanup * update * remove jit imports * Add produce batch api to producer (#2047) * Add integration tests for transactions (#2056) * add tests * cleanup and linter fix * remove jit import * refactor * cleanup * minot rlinter * Update AsyncIO producer architecture to improve performance (#2044) * Fix helper function name to avoid ducktape test discovery * Integrate schema registry with producer sync/async performance test + clean up the old SR test (#2063) * Add comprehensive producer benchmark tests with Schema Registry support - Updated message serialization to use comprehensive structure with all protobuf fields - Implemented proper strategy pattern for sync/async serializers - Added Schema Registry authentication configuration - Fixed JSON serialization issues (schema title, async serializer initialization) - Added performance validation with configurable JSON validation - Enhanced producer strategies with comprehensive Avro, JSON, and Protobuf support * remove * remove confusing msg * Minor: Producer close calls flush() (#2066) * Integrate schema registry with consumer sync/async performance test (#2059) * update * remove auth * cleanup and ensure same msg size * more cleanup * Add comprehensive producer benchmark tests with Schema Registry support - Updated message serialization to use comprehensive structure with all protobuf fields - Implemented proper strategy pattern for sync/async serializers - Added Schema Registry authentication configuration - Fixed JSON serialization issues (schema title, async serializer initialization) - Added performance validation with configurable JSON validation - Enhanced producer strategies with comprehensive Avro, JSON, and Protobuf support * update * Group messages by topic partition before passing to produce_batch API (#2069) * Merge master to async (#2068) * Pre release (#2067) * Attempting to add python versioning to read from project toml and setting beta flag * Updated docs to read project toml version as well * Updated to read from c file for now. Updaed docs and fixed bad AI code * NPI-7572: Add content for AsyncIO Python client (#2070) * Updates for AsyncIO and other improvements * Add updates based on asyncio blog * Add SR updates relatd to AsyncIO * Reorganize content, remove redundancy, and improve content * Edits to diagram and other content * Add why to use this client in both readme files * Improve CHANGELOG title * Add release dates to versions in CHANGELOG * Add release dates back to v2.4.0 * Edits based on feedback * AsyncIO: Only clear messages from buffer if executor passed (#2071) * Fix async producer transaction behavior + add transactional produce benchmark test (#2072) * update * linter fix * Fix the async transaction behavior related to flush() (#2073) * fix * linter * more linter fix * linter and add link * Removed very old librdkafka version checks * Resolved admin import conflict issue * Fix test_version unit test (#2079) * Fix broken tests (#2077) * fix tests * fix linter * Removed set operation from test --------- Co-authored-by: Matthew Seal <mseal@confluent.io> * Async fix buffer cleanup (#2078) * Fix buffer cleanup logic * Add tests * fix linter * Remove SR key * Removed incorrect assert * Change ducktape tests to install more dependencies * Fix semaphore for producer ducktape tests + clean up files that should've been removed (#2081) * update * use warning for producer validate * remove unnecessary assert --------- Co-authored-by: Emanuele Sabellico <esabellico@confluent.io> Co-authored-by: Matthew Seal <mseal@confluent.io> Co-authored-by: Kaushik Raina <103954755+k-raina@users.noreply.github.com> Co-authored-by: Matthew Seal <mseal007@gmail.com> Co-authored-by: Steve Bang <sbang@confluent.io>

Add metrics collertor and bounds

762a258

Copilot AI review requested due to automatic review settings August 21, 2025 12:44

k-raina requested review from a team and MSeal as code owners August 21, 2025 12:44

Copilot AI reviewed Aug 21, 2025

View reviewed changes

This comment has been minimized.

Sign in to view

Address copilot recommendations

cd1296e

This comment has been minimized.

Sign in to view

MSeal requested changes Aug 24, 2025

View reviewed changes

k-raina added 3 commits August 25, 2025 17:49

Address minor comments

16b9f77

Move bounds into config

eb493bb

Fix linting issues

e174323

k-raina requested a review from MSeal August 25, 2025 16:34

This comment has been minimized.

Sign in to view

fangnx reviewed Aug 26, 2025

View reviewed changes

Add default bechmark bounds json

d0f4793

This comment has been minimized.

Sign in to view

k-raina requested a review from fangnx August 26, 2025 08:45

MSeal approved these changes Aug 26, 2025

View reviewed changes

fangnx approved these changes Aug 26, 2025

View reviewed changes

k-raina merged commit 858e77c into master Aug 27, 2025
3 checks passed

k-raina deleted the kraina-add-benchmark-famework branch August 27, 2025 10:43

airlock-confluentinc bot pushed a commit that referenced this pull request Sep 4, 2025

Add Benchmark Framework for ducktape (#2030)

299e11e

airlock-confluentinc bot pushed a commit that referenced this pull request Sep 4, 2025

Add Benchmark Framework for ducktape (#2030)

edbd118

airlock-confluentinc bot pushed a commit that referenced this pull request Sep 8, 2025

Add Benchmark Framework for ducktape (#2030)

8449fd6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Benchmark Framework for ducktape #2030

Add Benchmark Framework for ducktape #2030

Uh oh!

k-raina commented Aug 21, 2025

confluent-cla-assistant bot commented Aug 21, 2025

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Aug 21, 2025

This comment has been minimized.

This comment has been minimized.

MSeal left a comment

MSeal Aug 24, 2025

MSeal Aug 24, 2025

k-raina Aug 25, 2025

MSeal Aug 24, 2025

MSeal commented Aug 24, 2025

This comment has been minimized.

fangnx left a comment

fangnx Aug 26, 2025

k-raina Aug 26, 2025

fangnx Aug 26, 2025

fangnx Aug 26, 2025

k-raina Aug 26, 2025

This comment has been minimized.

sonarqube-confluent bot commented Aug 26, 2025

MSeal left a comment

Uh oh!

Labels

3 participants


		## Configuration

		Performance bounds are loaded from a JSON config file. By default, it loads `benchmark_bounds.json`, but you can override this with the `BENCHMARK_BOUNDS_CONFIG` environment variable:

Add Benchmark Framework for ducktape #2030

Add Benchmark Framework for ducktape #2030

Uh oh!

Conversation

k-raina commented Aug 21, 2025

What

Checklist

References

Test & Review

confluent-cla-assistant bot commented Aug 21, 2025

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Aug 21, 2025

Choose a reason for hiding this comment

This comment has been minimized.

This comment has been minimized.

MSeal left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MSeal commented Aug 24, 2025

This comment has been minimized.

fangnx left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment has been minimized.

sonarqube-confluent bot commented Aug 26, 2025

Analysis Details

5 Issues

Coverage and Duplications

MSeal left a comment

Choose a reason for hiding this comment

Uh oh!

Labels

3 participants