Skip to content

Conversation

alvin-r
Copy link
Contributor

@alvin-r alvin-r commented Mar 19, 2025

PR Type

Enhancement, Tests


Description

  • Added codeflash_trace decorator for function instrumentation.

  • Introduced benchmark tracing with SQLite storage and replay tests.

  • Integrated a pytest plugin and CLI/config support for benchmarking.

  • Enhanced the optimizer pipeline to use benchmark and replay timing data.


Changes walkthrough 📝

Relevant files
Formatting
1 files
bubble_sort.py
Normalize return statement in bubble sort function             
+1/-1     
Enhancement
16 files
bubble_sort_codeflash_trace.py
Added traced bubble sort functions and class methods         
+46/-0   
bubble_sort_multithread.py
Introduced multithreaded sorter using traced bubble sort 
+23/-0   
process_and_bubble_sort.py
Added computation and pairwise products with sorter call 
+28/-0   
process_and_bubble_sort_codeflash_trace.py
Added traced process and sort function variant                     
+28/-0   
benchmark_database_utils.py
New module for managing benchmark trace data via SQLite   
+296/-0 
codeflash_trace.py
Introduced codeflash_trace decorator implementation           
+80/-0   
instrument_codeflash_trace.py
Added transformer to instrument functions with codeflash_trace
+109/-0 
plugin.py
Added a pytest plugin to integrate Codeflash benchmark tracing
+62/-0   
pytest_new_process_trace_benchmarks.py
New script to run benchmark tests and record trace data   
+33/-0   
replay_test.py
Added replay test generation from captured benchmark trace data
+282/-0 
trace_benchmarks.py
Added function to trigger benchmark tracing via subprocess
+42/-0   
utils.py
Added utilities to process and display benchmark timing data
+123/-0 
functions_to_optimize.py
Enhanced static method detection for functions to optimize
+10/-4   
explanation.py
Updated explanation to include benchmark details information
+20/-8   
function_optimizer.py
Integrated benchmark timing and replay test data into optimization
+51/-6   
optimizer.py
Enhanced optimizer to run benchmarks and generate replay tests
+69/-6   
Tests
9 files
test_benchmark_bubble_sort.py
Added benchmark tests for traced bubble sort functionality
+13/-0   
test_process_and_sort.py
Added benchmark tests for process and sort traced functions
+8/-0     
test_multithread_sort.py
Added multithread benchmark test for sorter function         
+4/-0     
test_benchmark_bubble_sort.py
Added additional tests for bubble sort trace decorator     
+20/-0   
test_process_and_sort.py
Added replay and benchmark tests for process and sort functions
+8/-0     
test_codeflash_trace_decorator.py
Added tests for codeflash_trace decorator functionality   
+15/-0   
test_instrument_codeflash_trace.py
Added tests for AST-based instrumentation of codeflash_trace decorator
+246/-0 
test_trace_benchmarks.py
Added tests to validate benchmark trace and replay test generation
+212/-0 
test_unit_test_discovery.py
Updated unit test discovery to handle benchmark test exclusion
+14/-1   
Configuration changes
2 files
cli.py
Extended CLI arguments to support benchmark options           
+9/-1     
config_parser.py
Integrated benchmarks-root into configuration parser         
+2/-2     
Additional files
10 files
test_bubble_sort.py +18/-18 
test_bubble_sort_parametrized.py +18/-18 
__init__.py [link]   
__init__.py [link]   
PrComment.py +6/-2     
models.py +44/-0   
create_pr.py +2/-0     
test_results.py +29/-0   
test_runner.py +2/-0     
verification_utils.py +2/-1     

Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • @alvin-r alvin-r marked this pull request as draft March 19, 2025 23:04
    Copy link

    github-actions bot commented Mar 19, 2025

    PR Reviewer Guide 🔍

    (Review updated until commit 77f43a5)

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 5 🔵🔵🔵🔵🔵
    🧪 PR contains tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Benchmark Integration

    The PR adds several new parameters and processing steps for benchmarking (e.g. function_benchmark_timings, total_benchmark_timings, replay performance gain, and benchmark details). It is recommended to verify that the behavior is correct for both benchmark-enabled and disabled modes, and that fallback defaults are handled gracefully.

     best_optimization.candidate.explanation, title="Best Candidate Explanation", border_style="blue" ) ) processed_benchmark_info = None if self.args.benchmark: processed_benchmark_info = process_benchmark_data( replay_performance_gain=best_optimization.replay_performance_gain, fto_benchmark_timings=self.function_benchmark_timings, total_benchmark_timings=self.total_benchmark_timings ) explanation = Explanation(
    Trace Overhead & Env Handling

    The new tracing decorator utilizes time.thread_time_ns() and manipulates environment variables to control benchmarking behavior. It is important to review that the measurement overhead is minimal and that the environment-based switching does not introduce unintended side effects in non-benchmark scenarios.

    import functools import os import pickle import time from typing import Callable class CodeflashTrace: """A class that provides both a decorator for tracing function calls  and a context manager for managing the tracing data lifecycle.  """ def __init__(self) -> None: self.function_calls_data = [] def __exit__(self, exc_type, exc_val, exc_tb) -> None: # Cleanup is optional here pass def __call__(self, func: Callable) -> Callable: """Use as a decorator to trace function execution.   Args:  func: The function to be decorated   Returns:  The wrapped function   """ @functools.wraps(func) def wrapper(*args, **kwargs): # Measure execution time start_time = time.thread_time_ns() result = func(*args, **kwargs) end_time = time.thread_time_ns() # Calculate execution time execution_time = end_time - start_time # Measure overhead overhead_start_time = time.thread_time_ns() try: # Check if currently in pytest benchmark fixture if os.environ.get("CODEFLASH_BENCHMARKING", "False") == "False": return result # Pickle the arguments pickled_args = pickle.dumps(args, protocol=pickle.HIGHEST_PROTOCOL) pickled_kwargs = pickle.dumps(kwargs, protocol=pickle.HIGHEST_PROTOCOL) # Get benchmark info from environment benchmark_function_name = os.environ.get("CODEFLASH_BENCHMARK_FUNCTION_NAME", "") benchmark_file_name = os.environ.get("CODEFLASH_BENCHMARK_FILE_NAME", "") benchmark_line_number = os.environ.get("CODEFLASH_BENCHMARK_LINE_NUMBER", "") # Get class name class_name = "" qualname = func.__qualname__ if "." in qualname: class_name = qualname.split(".")[0] # Calculate overhead time overhead_end_time = time.thread_time_ns() overhead_time = overhead_end_time - overhead_start_time self.function_calls_data.append( (func.__name__, class_name, func.__module__, func.__code__.co_filename, benchmark_function_name, benchmark_file_name, benchmark_line_number, execution_time, overhead_time, pickled_args, pickled_kwargs) ) print("appended") except Exception as e: print(f"Error in codeflash_trace: {e}") return result return wrapper # Create a singleton instance codeflash_trace = CodeflashTrace()
    Test Robustness

    New tests are introduced for trace benchmark functionality using SQLite and replay tests. Please validate that the expected record counts and file output behavior remain robust under various conditions and edge cases.

    import sqlite3 from codeflash.benchmarking.benchmark_database_utils import BenchmarkDatabaseUtils from codeflash.benchmarking.trace_benchmarks import trace_benchmarks_pytest from codeflash.benchmarking.replay_test import generate_replay_test from pathlib import Path from codeflash.benchmarking.utils import print_benchmark_table, validate_and_format_benchmark_table import shutil def test_trace_benchmarks(): # Test the trace_benchmarks function project_root = Path(__file__).parent.parent / "code_to_optimize" benchmarks_root = project_root / "tests" / "pytest" / "benchmarks_test" tests_root = project_root / "tests" / "test_trace_benchmarks" tests_root.mkdir(parents=False, exist_ok=False) output_file = (tests_root / Path("test_trace_benchmarks.trace")).resolve() trace_benchmarks_pytest(benchmarks_root, tests_root, project_root, output_file) assert output_file.exists() try: # check contents of trace file # connect to database conn = sqlite3.connect(output_file.as_posix()) cursor = conn.cursor() # Get the count of records # Get all records cursor.execute( "SELECT function_name, class_name, module_name, file_name, benchmark_function_name, benchmark_file_name, benchmark_line_number FROM function_calls ORDER BY benchmark_file_name, benchmark_function_name, function_name") function_calls = cursor.fetchall() # Assert the length of function calls assert len(function_calls) == 7, f"Expected 6 function calls, but got {len(function_calls)}" bubble_sort_path = (project_root / "bubble_sort_codeflash_trace.py").as_posix() process_and_bubble_sort_path = (project_root / "process_and_bubble_sort_codeflash_trace.py").as_posix() # Expected function calls expected_calls = [ ("__init__", "Sorter", "code_to_optimize.bubble_sort_codeflash_trace", f"{bubble_sort_path}", "test_class_sort", "test_benchmark_bubble_sort.py", 20), ("sort_class", "Sorter", "code_to_optimize.bubble_sort_codeflash_trace", f"{bubble_sort_path}", "test_class_sort", "test_benchmark_bubble_sort.py", 18), ("sort_static", "Sorter", "code_to_optimize.bubble_sort_codeflash_trace", f"{bubble_sort_path}", "test_class_sort", "test_benchmark_bubble_sort.py", 19), ("sorter", "Sorter", "code_to_optimize.bubble_sort_codeflash_trace", f"{bubble_sort_path}", "test_class_sort", "test_benchmark_bubble_sort.py", 17), ("sorter", "", "code_to_optimize.bubble_sort_codeflash_trace", f"{bubble_sort_path}", "test_sort", "test_benchmark_bubble_sort.py", 7), ("compute_and_sort", "", "code_to_optimize.process_and_bubble_sort_codeflash_trace", f"{process_and_bubble_sort_path}", "test_compute_and_sort", "test_process_and_sort.py", 4), ("sorter", "", "code_to_optimize.bubble_sort_codeflash_trace", f"{bubble_sort_path}", "test_no_func", "test_process_and_sort.py", 8), ] for idx, (actual, expected) in enumerate(zip(function_calls, expected_calls)): assert actual[0] == expected[0], f"Mismatch at index {idx} for function_name" assert actual[1] == expected[1], f"Mismatch at index {idx} for class_name" assert actual[2] == expected[2], f"Mismatch at index {idx} for module_name" assert Path(actual[3]).name == Path(expected[3]).name, f"Mismatch at index {idx} for file_name" assert actual[4] == expected[4], f"Mismatch at index {idx} for benchmark_function_name" assert actual[5] == expected[5], f"Mismatch at index {idx} for benchmark_file_name" assert actual[6] == expected[6], f"Mismatch at index {idx} for benchmark_line_number" # Close connection conn.close() generate_replay_test(output_file, tests_root) test_class_sort_path = tests_root / Path("test_benchmark_bubble_sort_py_test_class_sort__replay_test_0.py") assert test_class_sort_path.exists() test_class_sort_code = f""" import dill as pickle  from code_to_optimize.bubble_sort_codeflash_trace import \\  Sorter as code_to_optimize_bubble_sort_codeflash_trace_Sorter from codeflash.benchmarking.replay_test import get_next_arg_and_return  functions = ['sorter', 'sort_class', 'sort_static'] trace_file_path = r"{output_file.as_posix()}"  def test_code_to_optimize_bubble_sort_codeflash_trace_Sorter_sorter():  for args_pkl, kwargs_pkl in get_next_arg_and_return(trace_file=trace_file_path, function_name="sorter", file_name=r"{bubble_sort_path}", class_name="Sorter", num_to_get=100):  args = pickle.loads(args_pkl)  kwargs = pickle.loads(kwargs_pkl)  function_name = "sorter"  if not args:  raise ValueError("No arguments provided for the method.")  if function_name == "__init__":  ret = code_to_optimize_bubble_sort_codeflash_trace_Sorter(*args[1:], **kwargs)  else:  instance = args[0] # self  ret = instance.sorter(*args[1:], **kwargs)  def test_code_to_optimize_bubble_sort_codeflash_trace_Sorter_sort_class():  for args_pkl, kwargs_pkl in get_next_arg_and_return(trace_file=trace_file_path, function_name="sort_class", file_name=r"{bubble_sort_path}", class_name="Sorter", num_to_get=100):  args = pickle.loads(args_pkl)  kwargs = pickle.loads(kwargs_pkl)  if not args:  raise ValueError("No arguments provided for the method.")  ret = code_to_optimize_bubble_sort_codeflash_trace_Sorter.sort_class(*args[1:], **kwargs)  def test_code_to_optimize_bubble_sort_codeflash_trace_Sorter_sort_static():  for args_pkl, kwargs_pkl in get_next_arg_and_return(trace_file=trace_file_path, function_name="sort_static", file_name=r"{bubble_sort_path}", class_name="Sorter", num_to_get=100):  args = pickle.loads(args_pkl)  kwargs = pickle.loads(kwargs_pkl)  ret = code_to_optimize_bubble_sort_codeflash_trace_Sorter.sort_static(*args, **kwargs)  def test_code_to_optimize_bubble_sort_codeflash_trace_Sorter___init__():  for args_pkl, kwargs_pkl in get_next_arg_and_return(trace_file=trace_file_path, function_name="__init__", file_name=r"{bubble_sort_path}", class_name="Sorter", num_to_get=100):  args = pickle.loads(args_pkl)  kwargs = pickle.loads(kwargs_pkl)  function_name = "__init__"  if not args:  raise ValueError("No arguments provided for the method.")  if function_name == "__init__":  ret = code_to_optimize_bubble_sort_codeflash_trace_Sorter(*args[1:], **kwargs)  else:  instance = args[0] # self  ret = instance(*args[1:], **kwargs)  """ assert test_class_sort_path.read_text("utf-8").strip()==test_class_sort_code.strip() test_sort_path = tests_root / Path("test_benchmark_bubble_sort_py_test_sort__replay_test_0.py") assert test_sort_path.exists() test_sort_code = f""" import dill as pickle  from code_to_optimize.bubble_sort_codeflash_trace import \\  sorter as code_to_optimize_bubble_sort_codeflash_trace_sorter from codeflash.benchmarking.replay_test import get_next_arg_and_return  functions = ['sorter'] trace_file_path = r"{output_file}"  def test_code_to_optimize_bubble_sort_codeflash_trace_sorter():  for args_pkl, kwargs_pkl in get_next_arg_and_return(trace_file=trace_file_path, function_name="sorter", file_name=r"{bubble_sort_path}", num_to_get=100):  args = pickle.loads(args_pkl)  kwargs = pickle.loads(kwargs_pkl)  ret = code_to_optimize_bubble_sort_codeflash_trace_sorter(*args, **kwargs)  """ assert test_sort_path.read_text("utf-8").strip()==test_sort_code.strip() finally: # cleanup shutil.rmtree(tests_root) pass def test_trace_multithreaded_benchmark() -> None: project_root = Path(__file__).parent.parent / "code_to_optimize" benchmarks_root = project_root / "tests" / "pytest" / "benchmarks_multithread" tests_root = project_root / "tests" / "test_trace_benchmarks" tests_root.mkdir(parents=False, exist_ok=False) output_file = (tests_root / Path("test_trace_benchmarks.trace")).resolve() trace_benchmarks_pytest(benchmarks_root, tests_root, project_root, output_file) assert output_file.exists() try: # check contents of trace file # connect to database conn = sqlite3.connect(output_file.as_posix()) cursor = conn.cursor() # Get the count of records # Get all records cursor.execute( "SELECT function_name, class_name, module_name, file_name, benchmark_function_name, benchmark_file_name, benchmark_line_number FROM function_calls ORDER BY benchmark_file_name, benchmark_function_name, function_name") function_calls = cursor.fetchall() # Assert the length of function calls assert len(function_calls) == 10, f"Expected 10 function calls, but got {len(function_calls)}" function_benchmark_timings = BenchmarkDatabaseUtils.get_function_benchmark_timings(output_file) total_benchmark_timings = BenchmarkDatabaseUtils.get_benchmark_timings(output_file) function_to_results = validate_and_format_benchmark_table(function_benchmark_timings, total_benchmark_timings) assert "code_to_optimize.bubble_sort_codeflash_trace.sorter" in function_to_results test_name, total_time, function_time, percent = function_to_results["code_to_optimize.bubble_sort_codeflash_trace.sorter"][0] assert total_time > 0.0 assert function_time > 0.0 assert percent > 0.0 bubble_sort_path = (project_root / "bubble_sort_codeflash_trace.py").as_posix() # Expected function calls expected_calls = [ ("sorter", "", "code_to_optimize.bubble_sort_codeflash_trace", f"{bubble_sort_path}", "test_benchmark_sort", "test_multithread_sort.py", 4), ] for idx, (actual, expected) in enumerate(zip(function_calls, expected_calls)): assert actual[0] == expected[0], f"Mismatch at index {idx} for function_name" assert actual[1] == expected[1], f"Mismatch at index {idx} for class_name" assert actual[2] == expected[2], f"Mismatch at index {idx} for module_name" assert Path(actual[3]).name == Path(expected[3]).name, f"Mismatch at index {idx} for file_name" assert actual[4] == expected[4], f"Mismatch at index {idx} for benchmark_function_name" assert actual[5] == expected[5], f"Mismatch at index {idx} for benchmark_file_name" assert actual[6] == expected[6], f"Mismatch at index {idx} for benchmark_line_number" # Close connection conn.close() finally: # cleanup shutil.rmtree(tests_root) pass
    Copy link

    github-actions bot commented Mar 19, 2025

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    General
    Remove debugging prints

    Remove or disable debug print statements to clean production logs.

    codeflash/discovery/functions_to_optimize.py [361-366]

     elif any( isinstance(decorator, ast.Name) and decorator.id == "staticmethod" for decorator in body_node.decorator_list ): self.is_staticmethod = True - print(f"static method found: {self.function_name}")
    Suggestion importance[1-10]: 5

    __

    Why: The suggestion cleanly removes an unnecessary debug print, which improves production log quality without impacting functionality.

    Low
    @alvin-r alvin-r had a problem deploying to external-trusted-contributors April 11, 2025 17:40 — with GitHub Actions Failure
    @alvin-r alvin-r had a problem deploying to external-trusted-contributors April 11, 2025 17:40 — with GitHub Actions Failure
    @alvin-r alvin-r had a problem deploying to external-trusted-contributors April 11, 2025 17:40 — with GitHub Actions Failure
    @alvin-r alvin-r had a problem deploying to external-trusted-contributors April 11, 2025 17:40 — with GitHub Actions Failure
    @alvin-r alvin-r had a problem deploying to external-trusted-contributors April 11, 2025 17:40 — with GitHub Actions Error
    @alvin-r alvin-r requested a review from misrasaurabh1 April 17, 2025 22:21
    @alvin-r alvin-r merged commit 711ee5e into main Apr 17, 2025
    17 checks passed
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    Review effort 5/5 workflow-modified This PR modifies GitHub Actions workflows

    2 participants