- Notifications
You must be signed in to change notification settings - Fork 2.8k
Description
Description
feat: As a user, I want error ratio-based circuit breaking in api-breaker plugin, so that I can have more intelligent circuit breaking based on error rates instead of just failure counts
Description
Currently, the api-breaker plugin only supports failure count-based circuit breaking (unhealthy-count policy), which triggers circuit breaker when consecutive failure count reaches a threshold. This approach may not be suitable for all scenarios, especially when dealing with varying traffic patterns.
I would like to propose adding an error ratio-based circuit breaking policy (unhealthy-ratio) that triggers circuit breaker based on error rate within a sliding time window, providing more intelligent and adaptive circuit breaking behavior.
Motivation
Current Limitations
- The existing failure count-based approach only considers consecutive failures
- It doesn't account for the overall error rate in relation to total requests
- May be too sensitive during low traffic periods or not sensitive enough during high traffic periods
Benefits of Error Ratio-based Circuit Breaking
- More accurate representation of service health by considering error rate rather than just failure count
- Better handling of varying traffic patterns
- Configurable sliding time window for flexible error rate calculation
- Support for circuit breaker states: CLOSED, OPEN, and HALF_OPEN
Proposed Solution
Add a new policy parameter to the api-breaker plugin with two options:
unhealthy-count(default, existing behavior)unhealthy-ratio(new error ratio-based policy)
New Configuration Parameters for unhealthy-ratio Policy
| Parameter | Type | Default | Description |
|---|---|---|---|
policy | string | "unhealthy-count" | Circuit breaker policy |
unhealthy.error_ratio | number | 0.5 | Error rate threshold (0-1) to trigger circuit breaker |
unhealthy.min_request_threshold | integer | 10 | Minimum requests needed before evaluating error rate |
unhealthy.sliding_window_size | integer | 300 | Sliding window size in seconds for error rate calculation |
unhealthy.permitted_number_of_calls_in_half_open_state | integer | 3 | Number of permitted calls in half-open state |
healthy.success_ratio | number | 0.6 | Success rate threshold to close circuit breaker from half-open state |
Example Configuration
{ "plugins": { "api-breaker": { "break_response_code": 503, "policy": "unhealthy-ratio", "max_breaker_sec": 60, "unhealthy": { "http_statuses": [500, 502, 503, 504], "error_ratio": 0.5, "min_request_threshold": 10, "sliding_window_size": 300, "permitted_number_of_calls_in_half_open_state": 3 }, "healthy": { "http_statuses": [200, 201, 202], "success_ratio": 0.6 } } } }Implementation Details
Circuit Breaker States
- CLOSED: Normal request forwarding
- OPEN: Direct circuit breaker response without forwarding requests
- HALF_OPEN: Limited requests allowed to test service recovery
Algorithm
- Track requests and errors within a sliding time window
- When request count ≥
min_request_thresholdand error rate ≥error_ratio, open circuit breaker - After
max_breaker_sec, transition to half-open state - In half-open state, allow up to
permitted_number_of_calls_in_half_open_staterequests - If sufficient successful requests, close circuit breaker; otherwise, reopen
Backward Compatibility
This enhancement is fully backward compatible:
- Existing configurations continue to work without changes
- Default
policyis"unhealthy-count"(existing behavior) - No breaking changes to existing APIs
Testing
Comprehensive test coverage will be provided including:
- Schema validation tests for new parameters
- Functional tests for error ratio calculation
- Circuit breaker state transition tests
- Integration tests with various traffic patterns
- Backward compatibility tests
Use Cases
- High-traffic services: Better handling of error spikes in high-volume scenarios
- Variable traffic patterns: Adaptive behavior for services with fluctuating request rates
- Microservices architectures: More precise circuit breaking for service mesh environments
- SLA-based circuit breaking: Configure circuit breaker based on acceptable error rates
Files to be Modified
apisix/plugins/api-breaker.lua- Core plugin logict/plugin/api-breaker.t- Test cases (new test file for ratio-based tests)docs/en/latest/plugins/api-breaker.md- English documentationdocs/zh/latest/plugins/api-breaker.md- Chinese documentation
Additional Information
This feature has been implemented and tested locally. I'm ready to submit a PR with:
- Complete implementation of the error ratio-based circuit breaking
- Comprehensive test suite following APISIX testing standards
- Updated documentation in both English and Chinese
- Backward compatibility preservation
Would appreciate feedback on this proposal and guidance on the contribution process.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status