Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Oct 23, 2025

📄 213% (2.13x) speedup for filter_sensitive_headers in src/deepgram/extensions/core/telemetry_events.py

⏱️ Runtime : 3.64 milliseconds 1.16 milliseconds (best of 305 runs)

📝 Explanation and details

The optimization replaces any(key_lower.startswith(prefix) for prefix in sensitive_prefixes) with the more efficient key_lower.startswith(sensitive_prefixes).

Key Change:

  • Direct tuple prefix checking: Python's str.startswith() method natively accepts a tuple of prefixes, eliminating the need for a generator expression and any() function call.

Why This is Faster:

  • Eliminates generator overhead: The original code creates a generator object and iterates through it with any(), which involves Python's iterator protocol overhead
  • Reduces function calls: Instead of multiple startswith() calls wrapped in any(), there's a single startswith() call that handles the tuple internally in optimized C code
  • Better memory efficiency: No temporary generator object creation

Performance Impact:
The line profiler shows the prefix checking line (if key_lower.startswith...) dropped from 65.3% of total runtime (13.57ms) to 23.6% (2.12ms) - a ~6.4x improvement on this specific line. This optimization is particularly effective for:

  • Large header sets: Test cases with 500-1000 headers show 200-350% speedups
  • Mixed sensitive/safe headers: Cases with both types benefit most (45-90% faster)
  • Frequent prefix matching: When many headers match sensitive prefixes, the reduced overhead compounds

The overall 212% speedup demonstrates how optimizing the most expensive operation (prefix checking) in a tight loop can dramatically improve performance across all test scenarios.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 46 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 2 Passed
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from __future__ import annotations from typing import Dict, Mapping # imports import pytest from deepgram.extensions.core.telemetry_events import filter_sensitive_headers # unit tests # 1. Basic Test Cases def test_empty_headers_none(): """Test that None input returns None.""" codeflash_output = filter_sensitive_headers(None) # 360ns -> 330ns (9.09% faster) def test_empty_headers_dict(): """Test that empty dict input returns None.""" codeflash_output = filter_sensitive_headers({}) # 359ns -> 348ns (3.16% faster) def test_only_safe_headers(): """Test that safe headers are preserved.""" headers = { "Content-Type": "application/json", "User-Agent": "pytest", "Accept": "application/xml" } expected = { "Content-Type": "application/json", "User-Agent": "pytest", "Accept": "application/xml" } codeflash_output = filter_sensitive_headers(headers) # 4.25μs -> 2.62μs (61.8% faster) def test_only_sensitive_headers(): """Test that only sensitive headers are removed, resulting in None.""" headers = { "Authorization": "Bearer xyz", "Cookie": "sessionid=abc", "X-API-Key": "12345", "Set-Cookie": "foo=bar", "x-auth-token": "token" } codeflash_output = filter_sensitive_headers(headers) # 2.01μs -> 1.93μs (4.09% faster) def test_mixed_headers(): """Test that sensitive headers are removed and safe ones are kept.""" headers = { "Authorization": "Bearer xyz", "Content-Type": "application/json", "User-Agent": "pytest", "Cookie": "sessionid=abc" } expected = { "Content-Type": "application/json", "User-Agent": "pytest" } codeflash_output = filter_sensitive_headers(headers) # 3.72μs -> 2.43μs (53.3% faster) def test_case_insensitivity(): """Test that header filtering is case-insensitive.""" headers = { "authorization": "Bearer xyz", "AUTHORIZATION": "Bearer abc", "Content-Type": "application/json", "cookie": "sessionid=abc", "Set-Cookie": "foo=bar", "X-API-KEY": "12345", "x-auth-token": "token", "User-Agent": "pytest" } expected = { "Content-Type": "application/json", "User-Agent": "pytest" } codeflash_output = filter_sensitive_headers(headers) # 3.99μs -> 2.64μs (51.1% faster) def test_safe_headers_with_similar_names(): """Test that headers with similar names but not matching sensitive rules are kept.""" headers = { "Authorization-Info": "not sensitive", "Cookie-Policy": "strict", "X-API-Keys": "multiple", "X-Auths": "many", "Set-Cookier": "should stay", "Content-Type": "application/json" } expected = { "Authorization-Info": "not sensitive", "Cookie-Policy": "strict", "X-API-Keys": "multiple", "X-Auths": "many", "Set-Cookier": "should stay", "Content-Type": "application/json" } codeflash_output = filter_sensitive_headers(headers) # 5.58μs -> 2.90μs (92.4% faster) # 2. Edge Test Cases def test_header_with_empty_string_key_and_value(): """Test that an empty string key is kept (not sensitive).""" headers = { "": "", "Content-Type": "application/json" } expected = { "": "", "Content-Type": "application/json" } codeflash_output = filter_sensitive_headers(headers) # 3.18μs -> 2.01μs (57.9% faster) def test_header_with_none_value(): """Test that None values are converted to 'None' string.""" headers = { "Content-Type": None, "User-Agent": "pytest" } expected = { "Content-Type": "None", "User-Agent": "pytest" } codeflash_output = filter_sensitive_headers(headers) # 3.36μs -> 2.02μs (66.4% faster) def test_header_with_integer_value(): """Test that integer values are converted to string.""" headers = { "Content-Length": 123, "User-Agent": "pytest" } expected = { "Content-Length": "123", "User-Agent": "pytest" } codeflash_output = filter_sensitive_headers(headers) # 3.20μs -> 2.11μs (51.8% faster) def test_sensitive_header_with_whitespace(): """Test that sensitive headers with leading/trailing whitespace are not filtered (since not exact match).""" headers = { " Authorization ": "Bearer xyz", "Cookie ": "sessionid=abc", "X-API-Key ": "12345", "Content-Type": "application/json" } expected = { " Authorization ": "Bearer xyz", "Cookie ": "sessionid=abc", "X-API-Key ": "12345", "Content-Type": "application/json" } codeflash_output = filter_sensitive_headers(headers) # 4.87μs -> 2.45μs (98.9% faster) def test_sensitive_header_with_prefix_and_suffix(): """Test that headers starting with sensitive prefixes are filtered, even with suffix.""" headers = { "Authorization-Extra": "should be filtered", "Sec-WebSocket-Key": "should be filtered", "Cookie-Policy": "should be kept", "X-API-Key-Secondary": "should be filtered", "X-Auth-Extra": "should be filtered", "User-Agent": "pytest" } expected = { "Cookie-Policy": "should be kept", "User-Agent": "pytest" } codeflash_output = filter_sensitive_headers(headers) # 5.31μs -> 2.79μs (90.6% faster) def test_sensitive_header_substring_not_filtered(): """Test that headers containing sensitive substrings but not as prefix are kept.""" headers = { "My-Authorization": "not filtered", "Api-Cookie": "not filtered", "Key-X-API": "not filtered", "Auth-X": "not filtered", "User-Agent": "pytest" } expected = { "My-Authorization": "not filtered", "Api-Cookie": "not filtered", "Key-X-API": "not filtered", "Auth-X": "not filtered", "User-Agent": "pytest" } codeflash_output = filter_sensitive_headers(headers) # 4.85μs -> 2.72μs (78.3% faster) def test_sensitive_header_bearer(): """Test that 'bearer' is filtered as a sensitive header.""" headers = { "Bearer": "token", "User-Agent": "pytest" } expected = { "User-Agent": "pytest" } codeflash_output = filter_sensitive_headers(headers) # 2.64μs -> 1.85μs (42.7% faster) def test_sensitive_header_set_cookie(): """Test that 'Set-Cookie' is filtered as a sensitive header.""" headers = { "Set-Cookie": "foo=bar", "User-Agent": "pytest" } expected = { "User-Agent": "pytest" } codeflash_output = filter_sensitive_headers(headers) # 2.57μs -> 1.78μs (44.5% faster) def test_sensitive_header_sec_prefix(): """Test that headers starting with 'sec-' are filtered.""" headers = { "Sec-WebSocket-Key": "should be filtered", "Sec-Fetch-Site": "should be filtered", "User-Agent": "pytest" } expected = { "User-Agent": "pytest" } codeflash_output = filter_sensitive_headers(headers) # 3.76μs -> 2.14μs (75.5% faster) def test_sensitive_header_x_auth_prefix(): """Test that headers starting with 'x-auth' are filtered.""" headers = { "X-Auth-Token": "should be filtered", "X-Auth-Extra": "should be filtered", "User-Agent": "pytest" } expected = { "User-Agent": "pytest" } codeflash_output = filter_sensitive_headers(headers) # 3.68μs -> 2.01μs (83.1% faster) def test_sensitive_header_x_api_key_prefix(): """Test that headers starting with 'x-api-key' are filtered.""" headers = { "X-API-Key": "should be filtered", "X-API-Key-Secondary": "should be filtered", "User-Agent": "pytest" } expected = { "User-Agent": "pytest" } codeflash_output = filter_sensitive_headers(headers) # 3.50μs -> 2.08μs (68.1% faster) def test_sensitive_header_cookie_prefix(): """Test that headers starting with 'cookie' are filtered.""" headers = { "Cookie": "should be filtered", "Cookie-Policy": "should be filtered", "CookieExtra": "should be filtered", "User-Agent": "pytest" } expected = { "User-Agent": "pytest" } codeflash_output = filter_sensitive_headers(headers) # 3.87μs -> 2.28μs (70.1% faster) # 3. Large Scale Test Cases def test_large_number_of_safe_headers(): """Test performance and correctness with 1000 safe headers.""" headers = {f"Safe-Header-{i}": f"value-{i}" for i in range(1000)} expected = {f"Safe-Header-{i}": f"value-{i}" for i in range(1000)} codeflash_output = filter_sensitive_headers(headers) # 487μs -> 156μs (212% faster) def test_large_number_of_sensitive_headers(): """Test that all sensitive headers are filtered out from a large set.""" sensitive_names = [ "Authorization", "Cookie", "Set-Cookie", "X-API-Key", "X-Auth-Token", "Bearer", "Sec-WebSocket-Key", "Sec-Fetch-Site", "X-Auth-Extra", "X-API-Key-Secondary" ] headers = {name + f"-{i}": f"value-{i}" for i, name in enumerate(sensitive_names)} # Only those with sensitive prefix will be filtered, but those with suffix are not always filtered unless prefix matches. # Let's add some that are exact matches too for name in sensitive_names: headers[name] = "should be filtered" # Add some safe headers for i in range(10): headers[f"Safe-Header-{i}"] = f"value-{i}" # Only safe headers should remain expected = {f"Safe-Header-{i}": f"value-{i}" for i in range(10)} codeflash_output = filter_sensitive_headers(headers) # 15.7μs -> 6.71μs (134% faster) def test_large_mixed_headers(): """Test with 500 safe and 500 sensitive headers mixed together.""" headers = {} # Add 500 safe headers for i in range(500): headers[f"Safe-Header-{i}"] = f"value-{i}" # Add 500 sensitive headers (using sensitive prefixes and exact names) for i in range(250): headers[f"Authorization-{i}"] = f"sensitive-{i}" headers[f"X-API-Key-{i}"] = f"sensitive-{i}" for i in range(250): headers["Cookie"] = "should be filtered" headers["Set-Cookie"] = "should be filtered" # Only safe headers should remain expected = {f"Safe-Header-{i}": f"value-{i}" for i in range(500)} codeflash_output = filter_sensitive_headers(headers) # 428μs -> 130μs (228% faster) def test_large_headers_all_filtered(): """Test that all headers are filtered when all are sensitive.""" headers = {} for i in range(1000): headers[f"Authorization-{i}"] = f"value-{i}" codeflash_output = filter_sensitive_headers(headers) # 289μs -> 97.1μs (198% faster) def test_large_headers_all_safe(): """Test that all headers are kept when none are sensitive.""" headers = {} for i in range(1000): headers[f"Safe-Header-{i}"] = f"value-{i}" expected = {f"Safe-Header-{i}": f"value-{i}" for i in range(1000)} codeflash_output = filter_sensitive_headers(headers) # 487μs -> 154μs (216% faster) # codeflash_output is used to check that the output of the original code is the same as that of the optimized code. #------------------------------------------------ from collections import OrderedDict # function to test from typing import Dict, Mapping # imports import pytest from deepgram.extensions.core.telemetry_events import filter_sensitive_headers # unit tests # -------------------------- # BASIC TEST CASES # -------------------------- def test_none_input_returns_none(): # Should return None if input is None codeflash_output = filter_sensitive_headers(None) # 347ns -> 354ns (1.98% slower) def test_empty_dict_returns_none(): # Should return None if input is an empty dict codeflash_output = filter_sensitive_headers({}) # 344ns -> 347ns (0.865% slower) def test_no_sensitive_headers_returns_all(): # No sensitive headers, all should be returned as strings headers = {'Content-Type': 'application/json', 'Accept': 'text/html'} codeflash_output = filter_sensitive_headers(headers); result = codeflash_output # 3.95μs -> 2.47μs (60.2% faster) def test_sensitive_header_removed(): # Sensitive header should be removed headers = {'Authorization': 'secret', 'Content-Type': 'application/json'} codeflash_output = filter_sensitive_headers(headers); result = codeflash_output # 2.90μs -> 2.06μs (41.0% faster) def test_case_insensitive_sensitive_header(): # Should match sensitive headers regardless of case headers = {'AUTHORIZATION': 'secret', 'Content-Type': 'application/json'} codeflash_output = filter_sensitive_headers(headers); result = codeflash_output # 2.83μs -> 2.05μs (38.3% faster) def test_sensitive_prefix_header_removed(): # Should remove headers with sensitive prefixes (case-insensitive) headers = {'X-Api-Key': 'abc', 'X-Auth-Token': 'def', 'Content-Type': 'text/plain'} codeflash_output = filter_sensitive_headers(headers); result = codeflash_output # 2.89μs -> 2.06μs (40.1% faster) def test_multiple_sensitive_and_non_sensitive(): # Only non-sensitive headers should remain headers = { 'Authorization': 'secret', 'Cookie': 'yum', 'Set-Cookie': 'id=1', 'Content-Type': 'application/json', 'Accept': 'text/html', 'X-Api-Key': 'hidden' } codeflash_output = filter_sensitive_headers(headers); result = codeflash_output # 3.88μs -> 2.67μs (45.1% faster) def test_value_conversion_to_string(): # All values should be converted to strings headers = {'Content-Length': 123, 'Accept': True} codeflash_output = filter_sensitive_headers(headers); result = codeflash_output # 3.54μs -> 2.28μs (55.5% faster) # -------------------------- # EDGE TEST CASES # -------------------------- def test_sensitive_header_within_word_not_removed(): # Only headers that start with the sensitive prefix should be removed, not those containing it headers = {'My-Authorization-Header': 'public', 'Content-Type': 'application/json'} codeflash_output = filter_sensitive_headers(headers); result = codeflash_output # 3.33μs -> 2.06μs (61.8% faster) def test_sensitive_header_with_spaces(): # Headers with spaces should not be matched as sensitive headers = {' Authorization ': 'secret', 'Content-Type': 'application/json'} codeflash_output = filter_sensitive_headers(headers); result = codeflash_output # 3.24μs -> 1.94μs (67.1% faster) def test_sensitive_header_with_mixed_case_and_prefix(): # Should match prefix regardless of case headers = {'x-AUTH-token': 'abc', 'Accept': 'yes'} codeflash_output = filter_sensitive_headers(headers); result = codeflash_output # 2.56μs -> 1.88μs (35.9% faster) def test_sensitive_header_as_substring_only(): # Should not remove headers where sensitive word is only a substring, not prefix headers = {'my-cookie-jar': 'open', 'Content-Type': 'text/html'} codeflash_output = filter_sensitive_headers(headers); result = codeflash_output # 3.22μs -> 2.06μs (56.5% faster) def test_all_sensitive_headers_returns_none(): # If all headers are sensitive, should return None headers = { 'Authorization': 'secret', 'Cookie': 'yum', 'Set-Cookie': 'id=1', 'X-Api-Key': 'hidden', 'X-Auth': 'token' } codeflash_output = filter_sensitive_headers(headers) # 3.46μs -> 2.07μs (66.9% faster) def test_ordered_dict_input(): # Should accept OrderedDict and preserve keys headers = OrderedDict([('Accept', 'a'), ('Authorization', 'b'), ('X-Api-Key', 'c')]) codeflash_output = filter_sensitive_headers(headers); result = codeflash_output # 3.30μs -> 2.48μs (33.0% faster) def test_header_key_is_empty_string(): # Empty string as a key should not be filtered headers = {'': 'empty', 'Authorization': 'secret'} codeflash_output = filter_sensitive_headers(headers); result = codeflash_output # 3.74μs -> 2.60μs (43.5% faster) def test_header_value_is_none(): # None values should be converted to string 'None' headers = {'Accept': None, 'Authorization': 'secret'} codeflash_output = filter_sensitive_headers(headers); result = codeflash_output # 3.20μs -> 2.23μs (43.1% faster) # -------------------------- # LARGE SCALE TEST CASES # -------------------------- def test_large_number_of_non_sensitive_headers(): # Should handle a large number of headers efficiently headers = {f'Header-{i}': f'value-{i}' for i in range(500)} codeflash_output = filter_sensitive_headers(headers); result = codeflash_output # 250μs -> 81.6μs (207% faster) def test_large_number_of_sensitive_headers(): # Should filter out all sensitive headers in a large set headers = {f'Authorization-{i}': 'secret' for i in range(500)} # None of these should be filtered out, as they do not match exactly or by prefix (except for those with prefix) codeflash_output = filter_sensitive_headers(headers); result = codeflash_output # 147μs -> 50.3μs (192% faster) def test_mixed_large_sensitive_and_non_sensitive_headers(): # Mix of sensitive and non-sensitive headers headers = {f'Header-{i}': f'value-{i}' for i in range(500)} headers.update({f'Authorization-{i}': 'secret' for i in range(500)}) headers.update({f'X-Api-Key-{i}': 'key' for i in range(200)}) headers.update({'Accept': 'all'}) codeflash_output = filter_sensitive_headers(headers); result = codeflash_output # 476μs -> 148μs (221% faster) # Only non-sensitive headers should remain expected = {f'Header-{i}': f'value-{i}' for i in range(500)} expected['Accept'] = 'all' def test_large_headers_with_varied_cases(): # Test case-insensitivity with large input headers = {f'X-API-KEY-{i}': 'key' for i in range(500)} headers.update({f'header-{i}': 'value' for i in range(500)}) codeflash_output = filter_sensitive_headers(headers); result = codeflash_output # 459μs -> 130μs (251% faster) # All 'X-API-KEY-{i}' should be filtered out expected = {f'header-{i}': 'value' for i in range(500)} def test_large_input_all_filtered_out(): # All headers are sensitive, should return None headers = {f'X-Auth-{i}': 'token' for i in range(500)} codeflash_output = filter_sensitive_headers(headers) # 237μs -> 54.5μs (335% faster) def test_large_input_all_non_sensitive_with_some_empty_keys(): # Large input, some keys are empty strings headers = {f'Header-{i}': f'value-{i}' for i in range(499)} headers[''] = 'empty' codeflash_output = filter_sensitive_headers(headers); result = codeflash_output # 243μs -> 78.3μs (211% faster) expected = {f'Header-{i}': f'value-{i}' for i in range(499)} expected[''] = 'empty' # codeflash_output is used to check that the output of the original code is the same as that of the optimized code. #------------------------------------------------ from deepgram.extensions.core.telemetry_events import filter_sensitive_headers def test_filter_sensitive_headers(): filter_sensitive_headers({'': ''}) def test_filter_sensitive_headers_2(): filter_sensitive_headers({})
🔎 Concolic Coverage Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_d0k9fm5y/tmpdeqw_shz/test_concolic_coverage.py::test_filter_sensitive_headers 2.55μs 1.76μs 45.0%✅
codeflash_concolic_d0k9fm5y/tmpdeqw_shz/test_concolic_coverage.py::test_filter_sensitive_headers_2 365ns 378ns -3.44%⚠️

To edit these changes git checkout codeflash/optimize-filter_sensitive_headers-mh2vh80f and push.

Codeflash

The optimization replaces `any(key_lower.startswith(prefix) for prefix in sensitive_prefixes)` with the more efficient `key_lower.startswith(sensitive_prefixes)`. **Key Change:** - **Direct tuple prefix checking**: Python's `str.startswith()` method natively accepts a tuple of prefixes, eliminating the need for a generator expression and `any()` function call. **Why This is Faster:** - **Eliminates generator overhead**: The original code creates a generator object and iterates through it with `any()`, which involves Python's iterator protocol overhead - **Reduces function calls**: Instead of multiple `startswith()` calls wrapped in `any()`, there's a single `startswith()` call that handles the tuple internally in optimized C code - **Better memory efficiency**: No temporary generator object creation **Performance Impact:** The line profiler shows the prefix checking line (`if key_lower.startswith...`) dropped from 65.3% of total runtime (13.57ms) to 23.6% (2.12ms) - a **~6.4x improvement** on this specific line. This optimization is particularly effective for: - **Large header sets**: Test cases with 500-1000 headers show 200-350% speedups - **Mixed sensitive/safe headers**: Cases with both types benefit most (45-90% faster) - **Frequent prefix matching**: When many headers match sensitive prefixes, the reduced overhead compounds The overall 212% speedup demonstrates how optimizing the most expensive operation (prefix checking) in a tight loop can dramatically improve performance across all test scenarios.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 23, 2025 03:38
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Oct 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

1 participant