Codeflash trace decorator #59

alvin-r · 2025-03-19T23:02:59Z

PR Type

Enhancement, Tests

Description

Added codeflash_trace decorator for function instrumentation.
Introduced benchmark tracing with SQLite storage and replay tests.
Integrated a pytest plugin and CLI/config support for benchmarking.
Enhanced the optimizer pipeline to use benchmark and replay timing data.

Changes walkthrough 📝

Relevant files

Formatting

1 files

bubble_sort.py `Normalize return statement in bubble sort function`	+1/-1

Enhancement

16 files

bubble_sort_codeflash_trace.py `Added traced bubble sort functions and class methods`	+46/-0
bubble_sort_multithread.py `Introduced multithreaded sorter using traced bubble sort`	+23/-0
process_and_bubble_sort.py `Added computation and pairwise products with sorter call`	+28/-0
process_and_bubble_sort_codeflash_trace.py `Added traced process and sort function variant`	+28/-0
benchmark_database_utils.py `New module for managing benchmark trace data via SQLite`	+296/-0
codeflash_trace.py `Introduced codeflash_trace decorator implementation`	+80/-0
instrument_codeflash_trace.py `Added transformer to instrument functions with codeflash_trace`	+109/-0
plugin.py `Added a pytest plugin to integrate Codeflash benchmark tracing`	+62/-0
pytest_new_process_trace_benchmarks.py `New script to run benchmark tests and record trace data`	+33/-0
replay_test.py `Added replay test generation from captured benchmark trace data`	+282/-0
trace_benchmarks.py `Added function to trigger benchmark tracing via subprocess`	+42/-0
utils.py `Added utilities to process and display benchmark timing data`	+123/-0
functions_to_optimize.py `Enhanced static method detection for functions to optimize`	+10/-4
explanation.py `Updated explanation to include benchmark details information`	+20/-8
function_optimizer.py `Integrated benchmark timing and replay test data into optimization`	+51/-6
optimizer.py `Enhanced optimizer to run benchmarks and generate replay tests`	+69/-6

Tests

9 files

test_benchmark_bubble_sort.py `Added benchmark tests for traced bubble sort functionality`	+13/-0
test_process_and_sort.py `Added benchmark tests for process and sort traced functions`	+8/-0
test_multithread_sort.py `Added multithread benchmark test for sorter function`	+4/-0
test_benchmark_bubble_sort.py `Added additional tests for bubble sort trace decorator`	+20/-0
test_process_and_sort.py `Added replay and benchmark tests for process and sort functions`	+8/-0
test_codeflash_trace_decorator.py `Added tests for codeflash_trace decorator functionality`	+15/-0
test_instrument_codeflash_trace.py `Added tests for AST-based instrumentation of codeflash_trace decorator`	+246/-0
test_trace_benchmarks.py `Added tests to validate benchmark trace and replay test generation`	+212/-0
test_unit_test_discovery.py `Updated unit test discovery to handle benchmark test exclusion`	+14/-1

Configuration changes

2 files

cli.py `Extended CLI arguments to support benchmark options`	+9/-1
config_parser.py `Integrated benchmarks-root into configuration parser`	+2/-2

Additional files

10 files

test_bubble_sort.py	+18/-18
test_bubble_sort_parametrized.py	+18/-18
__init__.py	[link]
__init__.py	[link]
PrComment.py	+6/-2
models.py	+44/-0
create_pr.py	+2/-0
test_results.py	+29/-0
test_runner.py	+2/-0
verification_utils.py	+2/-1

Need help?
Type /help how to ... in the comments thread for any questions about PR-Agent usage.
Check out the documentation for more information.

…jecting speedup

# Conflicts: # codeflash/optimization/optimizer.py # codeflash/tracer.py # codeflash/verification/test_results.py # codeflash/verification/verification_utils.py # pyproject.toml

# Conflicts: # codeflash/tracer.py

…ods, class methods, init. basic instrumentation logic for codeflash_trace done.

github-actions · 2025-03-19T23:04:55Z

PR Reviewer Guide 🔍

(Review updated until commit `77f43a5`)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 5 🔵🔵🔵🔵🔵
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Benchmark Integration The PR adds several new parameters and processing steps for benchmarking (e.g. function_benchmark_timings, total_benchmark_timings, replay performance gain, and benchmark details). It is recommended to verify that the behavior is correct for both benchmark-enabled and disabled modes, and that fallback defaults are handled gracefully. best_optimization.candidate.explanation, title="Best Candidate Explanation", border_style="blue" ) ) processed_benchmark_info = None if self.args.benchmark: processed_benchmark_info = process_benchmark_data( replay_performance_gain=best_optimization.replay_performance_gain, fto_benchmark_timings=self.function_benchmark_timings, total_benchmark_timings=self.total_benchmark_timings ) explanation = Explanation( Trace Overhead & Env Handling The new tracing decorator utilizes time.thread_time_ns() and manipulates environment variables to control benchmarking behavior. It is important to review that the measurement overhead is minimal and that the environment-based switching does not introduce unintended side effects in non-benchmark scenarios. import functools import os import pickle import time from typing import Callable class CodeflashTrace: """A class that provides both a decorator for tracing function calls and a context manager for managing the tracing data lifecycle. """ def __init__(self) -> None: self.function_calls_data = [] def __exit__(self, exc_type, exc_val, exc_tb) -> None: # Cleanup is optional here pass def __call__(self, func: Callable) -> Callable: """Use as a decorator to trace function execution. Args: func: The function to be decorated Returns: The wrapped function """ @functools.wraps(func) def wrapper(args, kwargs): # Measure execution time start_time = time.thread_time_ns() result = func(args, kwargs) end_time = time.thread_time_ns() # Calculate execution time execution_time = end_time - start_time # Measure overhead overhead_start_time = time.thread_time_ns() try: # Check if currently in pytest benchmark fixture if os.environ.get("CODEFLASH_BENCHMARKING", "False") == "False": return result # Pickle the arguments pickled_args = pickle.dumps(args, protocol=pickle.HIGHEST_PROTOCOL) pickled_kwargs = pickle.dumps(kwargs, protocol=pickle.HIGHEST_PROTOCOL) # Get benchmark info from environment benchmark_function_name = os.environ.get("CODEFLASH_BENCHMARK_FUNCTION_NAME", "") benchmark_file_name = os.environ.get("CODEFLASH_BENCHMARK_FILE_NAME", "") benchmark_line_number = os.environ.get("CODEFLASH_BENCHMARK_LINE_NUMBER", "") # Get class name class_name = "" qualname = func.__qualname__ if "." in qualname: class_name = qualname.split(".")[0] # Calculate overhead time overhead_end_time = time.thread_time_ns() overhead_time = overhead_end_time - overhead_start_time self.function_calls_data.append( (func.__name__, class_name, func.__module__, func.__code__.co_filename, benchmark_function_name, benchmark_file_name, benchmark_line_number, execution_time, overhead_time, pickled_args, pickled_kwargs) ) print("appended") except Exception as e: print(f"Error in codeflash_trace: {e}") return result return wrapper # Create a singleton instance codeflash_trace = CodeflashTrace() Test Robustness** New tests are introduced for trace benchmark functionality using SQLite and replay tests. Please validate that the expected record counts and file output behavior remain robust under various conditions and edge cases. import sqlite3 from codeflash.benchmarking.benchmark_database_utils import BenchmarkDatabaseUtils from codeflash.benchmarking.trace_benchmarks import trace_benchmarks_pytest from codeflash.benchmarking.replay_test import generate_replay_test from pathlib import Path from codeflash.benchmarking.utils import print_benchmark_table, validate_and_format_benchmark_table import shutil def test_trace_benchmarks(): # Test the trace_benchmarks function project_root = Path(__file__).parent.parent / "code_to_optimize" benchmarks_root = project_root / "tests" / "pytest" / "benchmarks_test" tests_root = project_root / "tests" / "test_trace_benchmarks" tests_root.mkdir(parents=False, exist_ok=False) output_file = (tests_root / Path("test_trace_benchmarks.trace")).resolve() trace_benchmarks_pytest(benchmarks_root, tests_root, project_root, output_file) assert output_file.exists() try: # check contents of trace file # connect to database conn = sqlite3.connect(output_file.as_posix()) cursor = conn.cursor() # Get the count of records # Get all records cursor.execute( "SELECT function_name, class_name, module_name, file_name, benchmark_function_name, benchmark_file_name, benchmark_line_number FROM function_calls ORDER BY benchmark_file_name, benchmark_function_name, function_name") function_calls = cursor.fetchall() # Assert the length of function calls assert len(function_calls) == 7, f"Expected 6 function calls, but got {len(function_calls)}" bubble_sort_path = (project_root / "bubble_sort_codeflash_trace.py").as_posix() process_and_bubble_sort_path = (project_root / "process_and_bubble_sort_codeflash_trace.py").as_posix() # Expected function calls expected_calls = [ ("__init__", "Sorter", "code_to_optimize.bubble_sort_codeflash_trace", f"{bubble_sort_path}", "test_class_sort", "test_benchmark_bubble_sort.py", 20), ("sort_class", "Sorter", "code_to_optimize.bubble_sort_codeflash_trace", f"{bubble_sort_path}", "test_class_sort", "test_benchmark_bubble_sort.py", 18), ("sort_static", "Sorter", "code_to_optimize.bubble_sort_codeflash_trace", f"{bubble_sort_path}", "test_class_sort", "test_benchmark_bubble_sort.py", 19), ("sorter", "Sorter", "code_to_optimize.bubble_sort_codeflash_trace", f"{bubble_sort_path}", "test_class_sort", "test_benchmark_bubble_sort.py", 17), ("sorter", "", "code_to_optimize.bubble_sort_codeflash_trace", f"{bubble_sort_path}", "test_sort", "test_benchmark_bubble_sort.py", 7), ("compute_and_sort", "", "code_to_optimize.process_and_bubble_sort_codeflash_trace", f"{process_and_bubble_sort_path}", "test_compute_and_sort", "test_process_and_sort.py", 4), ("sorter", "", "code_to_optimize.bubble_sort_codeflash_trace", f"{bubble_sort_path}", "test_no_func", "test_process_and_sort.py", 8), ] for idx, (actual, expected) in enumerate(zip(function_calls, expected_calls)): assert actual[0] == expected[0], f"Mismatch at index {idx} for function_name" assert actual[1] == expected[1], f"Mismatch at index {idx} for class_name" assert actual[2] == expected[2], f"Mismatch at index {idx} for module_name" assert Path(actual[3]).name == Path(expected[3]).name, f"Mismatch at index {idx} for file_name" assert actual[4] == expected[4], f"Mismatch at index {idx} for benchmark_function_name" assert actual[5] == expected[5], f"Mismatch at index {idx} for benchmark_file_name" assert actual[6] == expected[6], f"Mismatch at index {idx} for benchmark_line_number" # Close connection conn.close() generate_replay_test(output_file, tests_root) test_class_sort_path = tests_root / Path("test_benchmark_bubble_sort_py_test_class_sort__replay_test_0.py") assert test_class_sort_path.exists() test_class_sort_code = f""" import dill as pickle from code_to_optimize.bubble_sort_codeflash_trace import \\ Sorter as code_to_optimize_bubble_sort_codeflash_trace_Sorter from codeflash.benchmarking.replay_test import get_next_arg_and_return functions = ['sorter', 'sort_class', 'sort_static'] trace_file_path = r"{output_file.as_posix()}" def test_code_to_optimize_bubble_sort_codeflash_trace_Sorter_sorter(): for args_pkl, kwargs_pkl in get_next_arg_and_return(trace_file=trace_file_path, function_name="sorter", file_name=r"{bubble_sort_path}", class_name="Sorter", num_to_get=100): args = pickle.loads(args_pkl) kwargs = pickle.loads(kwargs_pkl) function_name = "sorter" if not args: raise ValueError("No arguments provided for the method.") if function_name == "__init__": ret = code_to_optimize_bubble_sort_codeflash_trace_Sorter(args[1:], kwargs) else: instance = args[0] # self ret = instance.sorter(args[1:], *kwargs) def test_code_to_optimize_bubble_sort_codeflash_trace_Sorter_sort_class(): for args_pkl, kwargs_pkl in get_next_arg_and_return(trace_file=trace_file_path, function_name="sort_class", file_name=r"{bubble_sort_path}", class_name="Sorter", num_to_get=100): args = pickle.loads(args_pkl) kwargs = pickle.loads(kwargs_pkl) if not args: raise ValueError("No arguments provided for the method.") ret = code_to_optimize_bubble_sort_codeflash_trace_Sorter.sort_class(args[1:], *kwargs) def test_code_to_optimize_bubble_sort_codeflash_trace_Sorter_sort_static(): for args_pkl, kwargs_pkl in get_next_arg_and_return(trace_file=trace_file_path, function_name="sort_static", file_name=r"{bubble_sort_path}", class_name="Sorter", num_to_get=100): args = pickle.loads(args_pkl) kwargs = pickle.loads(kwargs_pkl) ret = code_to_optimize_bubble_sort_codeflash_trace_Sorter.sort_static(args, *kwargs) def test_code_to_optimize_bubble_sort_codeflash_trace_Sorter___init__(): for args_pkl, kwargs_pkl in get_next_arg_and_return(trace_file=trace_file_path, function_name="__init__", file_name=r"{bubble_sort_path}", class_name="Sorter", num_to_get=100): args = pickle.loads(args_pkl) kwargs = pickle.loads(kwargs_pkl) function_name = "__init__" if not args: raise ValueError("No arguments provided for the method.") if function_name == "__init__": ret = code_to_optimize_bubble_sort_codeflash_trace_Sorter(args[1:], *kwargs) else: instance = args[0] # self ret = instance(args[1:], *kwargs) """ assert test_class_sort_path.read_text("utf-8").strip()==test_class_sort_code.strip() test_sort_path = tests_root / Path("test_benchmark_bubble_sort_py_test_sort__replay_test_0.py") assert test_sort_path.exists() test_sort_code = f""" import dill as pickle from code_to_optimize.bubble_sort_codeflash_trace import \\ sorter as code_to_optimize_bubble_sort_codeflash_trace_sorter from codeflash.benchmarking.replay_test import get_next_arg_and_return functions = ['sorter'] trace_file_path = r"{output_file}" def test_code_to_optimize_bubble_sort_codeflash_trace_sorter(): for args_pkl, kwargs_pkl in get_next_arg_and_return(trace_file=trace_file_path, function_name="sorter", file_name=r"{bubble_sort_path}", num_to_get=100): args = pickle.loads(args_pkl) kwargs = pickle.loads(kwargs_pkl) ret = code_to_optimize_bubble_sort_codeflash_trace_sorter(args, **kwargs) """ assert test_sort_path.read_text("utf-8").strip()==test_sort_code.strip() finally: # cleanup shutil.rmtree(tests_root) pass def test_trace_multithreaded_benchmark() -> None: project_root = Path(__file__).parent.parent / "code_to_optimize" benchmarks_root = project_root / "tests" / "pytest" / "benchmarks_multithread" tests_root = project_root / "tests" / "test_trace_benchmarks" tests_root.mkdir(parents=False, exist_ok=False) output_file = (tests_root / Path("test_trace_benchmarks.trace")).resolve() trace_benchmarks_pytest(benchmarks_root, tests_root, project_root, output_file) assert output_file.exists() try: # check contents of trace file # connect to database conn = sqlite3.connect(output_file.as_posix()) cursor = conn.cursor() # Get the count of records # Get all records cursor.execute( "SELECT function_name, class_name, module_name, file_name, benchmark_function_name, benchmark_file_name, benchmark_line_number FROM function_calls ORDER BY benchmark_file_name, benchmark_function_name, function_name") function_calls = cursor.fetchall() # Assert the length of function calls assert len(function_calls) == 10, f"Expected 10 function calls, but got {len(function_calls)}" function_benchmark_timings = BenchmarkDatabaseUtils.get_function_benchmark_timings(output_file) total_benchmark_timings = BenchmarkDatabaseUtils.get_benchmark_timings(output_file) function_to_results = validate_and_format_benchmark_table(function_benchmark_timings, total_benchmark_timings) assert "code_to_optimize.bubble_sort_codeflash_trace.sorter" in function_to_results test_name, total_time, function_time, percent = function_to_results["code_to_optimize.bubble_sort_codeflash_trace.sorter"][0] assert total_time > 0.0 assert function_time > 0.0 assert percent > 0.0 bubble_sort_path = (project_root / "bubble_sort_codeflash_trace.py").as_posix() # Expected function calls expected_calls = [ ("sorter", "", "code_to_optimize.bubble_sort_codeflash_trace", f"{bubble_sort_path}", "test_benchmark_sort", "test_multithread_sort.py", 4), ] for idx, (actual, expected) in enumerate(zip(function_calls, expected_calls)): assert actual[0] == expected[0], f"Mismatch at index {idx} for function_name" assert actual[1] == expected[1], f"Mismatch at index {idx} for class_name" assert actual[2] == expected[2], f"Mismatch at index {idx} for module_name" assert Path(actual[3]).name == Path(expected[3]).name, f"Mismatch at index {idx} for file_name" assert actual[4] == expected[4], f"Mismatch at index {idx} for benchmark_function_name" assert actual[5] == expected[5], f"Mismatch at index {idx} for benchmark_file_name" assert actual[6] == expected[6], f"Mismatch at index {idx} for benchmark_line_number" # Close connection conn.close() finally: # cleanup shutil.rmtree(tests_root) pass

github-actions · 2025-03-19T23:05:32Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
General	Remove debugging prints Remove or disable debug print statements to clean production logs. codeflash/discovery/functions_to_optimize.py [361-366] elif any( isinstance(decorator, ast.Name) and decorator.id == "staticmethod" for decorator in body_node.decorator_list ): self.is_staticmethod = True - print(f"static method found: {self.function_name}") Suggestion importance[1-10]: 5 __ Why: The suggestion cleanly removes an unnecessary debug print, which improves production log quality without impacting functionality.	Low

# Conflicts: # codeflash/discovery/pytest_new_process_discovery.py

…codeflash --benchmark

…the check

…s flaky github actions test, as sometimes the machines allocated are different)

… its flaky with github action machines

tests/test_trace_benchmarks.py

codeflash/verification/comparator.py

alvin-r added 20 commits February 25, 2025 13:18

initial implementation for pytest benchmark discovery

3480718

Merge branch 'refs/heads/main' into pytest-benchmark

7f917a0

initial implementation for tracing benchmarks using a plugin, and pro…

133a9e3

…jecting speedup

initial implementation of tracing benchmarks via the plugin

2f26695

Merge branch 'main' into pytest-benchmark

32b0d3b

# Conflicts: # codeflash/optimization/optimizer.py # codeflash/tracer.py # codeflash/verification/test_results.py # codeflash/verification/verification_utils.py # pyproject.toml

basic version working on bubble sort

6b4b68a

initial attempt for codeflash_trace_decorator

887e3cb

improvements

84bd0f0

Merge branch 'refs/heads/main' into codeflash-trace-decorator

1c3919d

work on new replay_test logic

c4694b7

Merge branch 'main' into codeflash-trace-decorator

c150c05

# Conflicts: # codeflash/tracer.py

initial replay test version working

1801d41

Merge branch 'main' into codeflash-trace-decorator

88a11d3

replay test functionality working for functions, methods, static meth…

f7466a5

…ods, class methods, init. basic instrumentation logic for codeflash_trace done.

restored overwritten logic

4c19e6f

functioning end to end, gets the funciton impact on benchmarks

7eba031

modified printing of results, handle errors when collecting benchmarks

896aa52

tests pass

ad17de4

revert pyproject.toml

92e6bf5

mypy fixes

4784723

alvin-r marked this pull request as draft March 19, 2025 23:04

Merge branch 'main' into codeflash-trace-decorator

5f05711

github-actions bot added the Review effort 5/5 label Mar 19, 2025

alvin-r added 5 commits March 20, 2025 10:10

import changes

b77a979

Merge branch 'pytest-plugin-blocker' into codeflash-trace-decorator

8878baf

# Conflicts: # codeflash/discovery/pytest_new_process_discovery.py

removed benchmark skip command

0c2a3b6

shifted benchmark class in plugin, improved display of benchmark info

9a41bdd

cleanup tests better

5577cd5

alvin-r had a problem deploying to external-trusted-contributors April 11, 2025 17:40 — with GitHub Actions Failure

alvin-r had a problem deploying to external-trusted-contributors April 11, 2025 17:40 — with GitHub Actions Error

alvin-r added 19 commits April 11, 2025 15:49

fixes to sync with main

790d77c

Merge branch 'main' into codeflash-trace-decorator

fce641e

Merge branch 'main' into codeflash-trace-decorator

b70c4c9

cmd init changes

28fd746

created benchmarks for codeflash, modified codeflash-optimize to use …

4e8483b

…codeflash --benchmark

Merge branch 'main' into codeflash-trace-decorator

efc91d6

added benchmarks root

0680f79

removed comment

583b464

debugging

1eaaad7

debugging

ab9079b

removed benchmark-skip

d7274ec

added pytest-benchmark as dependency

a624221

updated pyproject

605d078

gha failing on multithreaded t est

78871fe

line number test is off by 1 for python versions 39 and 310, removed …

0146d82

…the check

Merge branch 'main' into codeflash-trace-decorator

6c1a369

100 max function calls before flushing to disk instead of 1000

3017ccf

skip multithreaded benchmark test if machine is single threaded (fixe…

f14cf01

…s flaky github actions test, as sometimes the machines allocated are different)

marked multithreaded trace benchmarks test to be skipped during CI as…

e5ca10f

… its flaky with github action machines

misrasaurabh1 reviewed Apr 17, 2025

View reviewed changes

tests/test_trace_benchmarks.py Show resolved Hide resolved

misrasaurabh1 reviewed Apr 17, 2025

View reviewed changes

codeflash/verification/comparator.py Outdated Show resolved Hide resolved

shift check for pickle placerholder access error in comparator

683c9f6

alvin-r requested a review from misrasaurabh1 April 17, 2025 22:21

misrasaurabh1 approved these changes Apr 17, 2025

View reviewed changes

alvin-r merged commit 711ee5e into main Apr 17, 2025
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Codeflash trace decorator #59

Codeflash trace decorator #59

Uh oh!

alvin-r commented Mar 19, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Mar 19, 2025 •

edited

Loading

github-actions bot commented Mar 19, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Labels

2 participants

Codeflash trace decorator #59

Codeflash trace decorator #59

Uh oh!

Conversation

alvin-r commented Mar 19, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Description

Changes walkthrough 📝

github-actions bot commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 77f43a5)

github-actions bot commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Uh oh!

Uh oh!

Uh oh!

Labels

2 participants

alvin-r commented Mar 19, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Mar 19, 2025 •

edited

Loading

(Review updated until commit `77f43a5`)

github-actions bot commented Mar 19, 2025 •

edited

Loading