Skip to content

Conversation

@arthrod
Copy link
Owner

@arthrod arthrod commented Oct 8, 2025

User description

This commit introduces a FastAPI application to provide a web interface for the document redlining functionality.

The key features of this change are:

  • A FastAPI application with two endpoints:
    • /redline/single: Accepts two .docx files and returns a single redlined document.
    • /redline/double: Accepts two .docx files and returns a zip archive with two redlined documents (original vs. modified and modified vs. original).
  • API token authentication to secure the endpoints.
  • A 10MB file size limit to prevent abuse.
  • A Dockerfile and docker-compose.yml for easy containerization and deployment.
  • An updated build script (build_differ.py) to correctly build and package the C# binaries for multiple platforms.
  • Comprehensive tests for the new API endpoints, including tests for authentication and file size limits.
  • An updated pyproject.toml to manage dependencies and configure the test environment.
  • A .gitignore file to exclude build artifacts and environment files.
  • A .env.example file to provide a template for setting the API token.

PR Type

Enhancement


Description

  • Add FastAPI web service for document redlining

  • Implement API token authentication and file size limits

  • Create Docker containerization with multi-platform builds

  • Add comprehensive test coverage for API endpoints


Diagram Walkthrough

flowchart LR A["FastAPI Service"] --> B["Authentication"] A --> C["File Upload"] C --> D["Redline Processing"] D --> E["Single/Double Output"] F["Docker Container"] --> A G["Build Script"] --> H["Multi-platform Binaries"] H --> F 
Loading

File Walkthrough

Relevant files
Enhancement
2 files
api.py
FastAPI service with redlining endpoints                                 
+126/-0 
build_differ.py
Enhanced multi-platform build script                                         
+54/-50 
Tests
1 files
test_api.py
Comprehensive API endpoint tests                                                 
+86/-0   
Configuration changes
3 files
Dockerfile
Docker containerization setup                                                       
+38/-0   
docker-compose.yml
Docker compose configuration                                                         
+8/-0     
.env.example
Environment variables template                                                     
+3/-0     
Dependencies
2 files
dotnet-install.sh
.NET SDK installation script                                                         
+1888/-0
pyproject.toml
Updated dependencies and API environment                                 
+18/-2   
Additional files
2 files
modified.docx [link]   
original.docx [link]   

This commit introduces a FastAPI application to provide a web interface for the document redlining functionality. The key features of this change are: - A FastAPI application with two endpoints: - `/redline/single`: Accepts two `.docx` files and returns a single redlined document. - `/redline/double`: Accepts two `.docx` files and returns a zip archive with two redlined documents (original vs. modified and modified vs. original). - API token authentication to secure the endpoints. - A 10MB file size limit to prevent abuse. - A `Dockerfile` and `docker-compose.yml` for easy containerization and deployment. - An updated build script (`build_differ.py`) to correctly build and package the C# binaries for multiple platforms. - Comprehensive tests for the new API endpoints, including tests for authentication and file size limits. - An updated `pyproject.toml` to manage dependencies and configure the test environment. - A `.gitignore` file to exclude build artifacts and environment files. - A `.env.example` file to provide a template for setting the API token.
@gemini-code-assist
Copy link

Summary of Changes

Hello @arthrod, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant enhancement by integrating a FastAPI-based web interface for the existing document redlining functionality. This change allows users to interact with the redlining service via a RESTful API, making it more accessible and easier to integrate into other systems. The addition of Docker support simplifies deployment, while robust authentication and file size limits ensure the service is secure and stable. The underlying build process for the C# redlining engine has also been refined to support a wider range of platforms more reliably.

Highlights

  • FastAPI Web Interface: Introduced a new FastAPI application to provide a web interface for the document redlining functionality, offering two main endpoints: /redline/single for a single redlined document and /redline/double for a zip archive containing two redlined documents (original vs. modified and modified vs. original).
  • API Security and File Limits: Implemented API token authentication to secure the new endpoints and added a 10MB file size limit to prevent abuse, ensuring controlled and safe usage of the redlining service.
  • Containerization and Deployment: Added a Dockerfile and docker-compose.yml to facilitate easy containerization and deployment of the FastAPI application, streamlining the setup process for development and production environments.
  • Improved C# Binary Build Process: Updated the build_differ.py script to correctly build and package C# binaries for multiple platforms (Linux x64/ARM64, Windows x64/ARM64, macOS x64/ARM64) in a more robust and maintainable way, including error handling for build failures.
  • Dependency Management and Testing: Updated pyproject.toml to manage new FastAPI-related dependencies and configured a dedicated api environment. Comprehensive tests for the new API endpoints, covering authentication and file size limits, were also added.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@qodo-merge-pro
Copy link

qodo-merge-pro bot commented Oct 8, 2025

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Unpinned remote script execution

Description: The Dockerfile downloads and executes dotnet-install.sh over HTTPS without pinning or
signature verification, allowing a potential supply-chain attack if the remote script is
compromised.
Dockerfile [8-13]

Referred Code
RUN apt-get update && \ apt-get install -y wget && \ wget https://dot.net/v1/dotnet-install.sh -O dotnet-install.sh && \ chmod +x ./dotnet-install.sh && \ ./dotnet-install.sh --version 8.0.100 && \ rm dotnet-install.sh
Unbounded request body memory usage

Description: The API reads entire uploaded files into memory before enforcing the 10MB limit, enabling
memory consumption spikes and potential DoS via large multipart bodies since
FastAPI/Starlette default limits are not set.
api.py [66-73]

Referred Code
original_content = await original.read() modified_content = await modified.read() if len(original_content) > MAX_FILE_SIZE or len(modified_content) > MAX_FILE_SIZE: raise HTTPException( status_code=status.HTTP_413_REQUEST_ENTITY_TOO_LARGE, detail=f"File size exceeds the limit of {MAX_FILE_SIZE // (1024*1024)}MB", )
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
No custom compliance provided

Follow the guide to enable custom compliance check.

Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label
@arthrod
Copy link
Owner Author

arthrod commented Oct 8, 2025

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Oct 8, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Added token-protected API for DOCX redlining with single and double modes, file size limits, and ZIP download for dual results.
  • Tests

    • Introduced async tests covering success paths, authentication failures, and file size enforcement.
  • Documentation

    • Added .env example with guidance for setting an API token.
  • Chores

    • Simplified .gitignore.
    • Added containerization and local orchestration via Docker and Compose.
    • Introduced automated .NET installation and improved multi-platform build packaging.
    • Updated project configuration and scripts for API runtime and testing.

Walkthrough

Introduces a FastAPI-based redlining API with token auth, adds tests, containerization (Dockerfile, docker-compose), environment scaffolding (.env.example), updated build/packaging (build_differ.py, pyproject.toml), a .NET installer script, and a simplified .gitignore.

Changes

Cohort / File(s) Summary
API Implementation
src/python_redlines/api.py
Adds FastAPI app with token-based auth. Implements POST /redline/single (DOCX out) and POST /redline/double (ZIP stream). Enforces 10 MB per-file limit, loads token from env, uses XmlPowerToolsEngine, streams in-memory ZIP for double. Exposes app, engine, MAX_FILE_SIZE, get_api_token, verify_token.
API Tests
tests/test_api.py
Adds async tests covering success paths, missing/invalid token, and file size limit. Uses httpx AsyncClient with ASGITransport against app and DOCX fixtures.
Build and Packaging
build_differ.py
Refactors build flow: looped multi-platform dotnet publish, improved error handling, archive naming, arcname support in compression, safe dist creation and cleanup. Updates compress_files signature to include arcname.
Project Configuration
pyproject.toml
Adds Hatch envs for API and testing; dependencies: fastapi, uvicorn[standard], httpx, pytest-asyncio, python-multipart, python-dotenv. Adds scripts for starting API and running tests; sets env vars.
Containerization
Dockerfile, docker-compose.yml
New Dockerfile: python:3.11-slim base, installs .NET 8 SDK (via script), installs hatch, builds project, exposes 8000, starts API. Compose: builds service, maps 8000:8000, mounts project to /app.
.NET Installer
dotnet-install.sh
Adds comprehensive Bash script to download/install .NET SDK/runtime with OS/arch detection, channel/version handling, download/extract, and verification options.
Environment and Repo Hygiene
.env.example, .gitignore
Adds .env example with API_TOKEN placeholder and guidance. Simplifies .gitignore: consolidates Python envs, IDE/editor folders, C# bin/obj, dist outputs, and env files.

Sequence Diagram(s)

sequenceDiagram autonumber participant C as Client participant A as FastAPI App participant V as verify_token participant E as XmlPowerToolsEngine participant Z as Zip Builder (in-memory) Note over A: Env loaded (.env) to get API_TOKEN C->>A: POST /redline/single (files, author_tag, X-API-Token) A->>V: Validate token (Header vs env) V-->>A: Valid / 401 on mismatch A->>A: Validate file sizes (<= MAX_FILE_SIZE) A->>E: Compare Original vs Modified E-->>A: DOCX bytes A-->>C: 200 DOCX (Content-Disposition) rect rgba(230,245,255,0.5) C->>A: POST /redline/double (files, author_tag, X-API-Token) A->>V: Validate token A->>E: Redline O->M E-->>A: DOCX A A->>E: Redline M->O E-->>A: DOCX B A->>Z: Package DOCX A,B into ZIP stream Z-->>A: ZIP stream A-->>C: 200 ZIP (streaming) end alt Error paths A-->>C: 401 Unauthorized (bad/missing token) A-->>C: 413 Payload Too Large (file limit) A-->>C: 500 on missing server token end 
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

In burrows of bytes where tokens gleam,
I brew redlines like a moonlit dream.
Zip up the changes, swift and neat—
two tracks of edits, a tidy treat.
With Docker’s hum and tests that sing,
I thump my paw—ship everything! 🐇📄🧪

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 64.29% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed The pull request description clearly outlines the major features introduced, including the FastAPI endpoints, authentication, file limits, containerization, build script updates, and testing, and directly relates to the changeset.
Title Check ✅ Passed The title succinctly captures the primary changes by highlighting the addition of the FastAPI redlining API along with token authentication, tests, Docker support, and related tooling, making it clear and directly related to the main objectives of the pull request.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/fastapi-redline-service

Comment @coderabbitai help to get the list of available commands and usage tips.

@arthrod
Copy link
Owner Author

arthrod commented Oct 8, 2025

@claude review @coderabbitai review

@qodo-merge-pro
Copy link

qodo-merge-pro bot commented Oct 8, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Separate build and runtime environments

The current Dockerfile includes the entire .NET build toolchain in the final
image. It should be refactored into a multi-stage build to create a smaller,
more secure production image containing only the necessary runtime artifacts.

Examples:

Dockerfile [1-38]
# Use an official Python runtime as a parent image FROM python:3.11-slim # Set the working directory in the container WORKDIR /app # Install .NET SDK and other dependencies RUN apt-get update && \ apt-get install -y wget && \ wget https://dot.net/v1/dotnet-install.sh -O dotnet-install.sh && \ ... (clipped 28 lines)

Solution Walkthrough:

Before:

# Dockerfile FROM python:3.11-slim WORKDIR /app # Install .NET SDK and build tools RUN apt-get update && \ apt-get install -y wget && \ wget https://dot.net/v1/dotnet-install.sh ... && \ ./dotnet-install.sh ... ENV PATH="/root/.dotnet:$PATH" # Copy build files and build the C# project COPY csproj/ csproj/ ... RUN hatch run default:build # Copy the rest of the app and start COPY . . CMD ["hatch", "run", "api:start"] 

After:

# Dockerfile # Build stage FROM python:3.11-slim AS builder WORKDIR /app # Install .NET SDK and build tools RUN apt-get update && ... && ./dotnet-install.sh ... ENV PATH="/root/.dotnet:$PATH" # Copy build files and build the C# project COPY csproj/ csproj/ ... RUN hatch run default:build # Final runtime stage FROM python:3.11-slim WORKDIR /app # Copy only the built artifacts from the builder stage COPY --from=builder /app/src/python_redlines/dist ./src/python_redlines/dist # Copy application code and install runtime dependencies COPY . . RUN pip install ... CMD ["hatch", "run", "api:start"] 
Suggestion importance[1-10]: 9

__

Why: This is a critical architectural suggestion for the new Docker-based deployment, addressing significant image size, security, and efficiency issues by recommending a standard best practice.

High
Security
Prevent DoS by streaming file uploads

To prevent potential DoS attacks, read uploaded files in chunks and validate
their size incrementally, rather than loading the entire file into memory at
once.

src/python_redlines/api.py [66-73]

-original_content = await original.read() -modified_content = await modified.read() +async def read_with_limit(file: UploadFile, max_size: int) -> bytes: + contents = bytearray() + while chunk := await file.read(8192): # Read in 8KB chunks + contents.extend(chunk) + if len(contents) > max_size: + raise HTTPException( + status_code=status.HTTP_413_REQUEST_ENTITY_TOO_LARGE, + detail=f"File size exceeds the limit of {max_size // (1024*1024)}MB", + ) + return bytes(contents) -if len(original_content) > MAX_FILE_SIZE or len(modified_content) > MAX_FILE_SIZE: - raise HTTPException( - status_code=status.HTTP_413_REQUEST_ENTITY_TOO_LARGE, - detail=f"File size exceeds the limit of {MAX_FILE_SIZE // (1024*1024)}MB", - ) +original_content = await read_with_limit(original, MAX_FILE_SIZE) +modified_content = await read_with_limit(modified, MAX_FILE_SIZE)
  • Apply / Chat
Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a significant Denial of Service (DoS) vulnerability and provides a robust, standard solution by reading files in chunks to prevent memory exhaustion.

High
Avoid using shell=True for security

Avoid using shell=True with subprocess.Popen for security reasons by refactoring
the run_command function to accept a list of arguments instead of a string.

build_differ.py [21-26]

-process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) +process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) for line in process.stdout: print(line.decode().strip()) process.communicate() if process.returncode != 0: raise subprocess.CalledProcessError(process.returncode, command)
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies the security risk of using shell=True and proposes the best practice of using a list of arguments, which is a valuable security hardening improvement.

Medium
Security
Verify installer integrity

Add a checksum verification step for the downloaded dotnet-install.sh script in
the Dockerfile to mitigate supply-chain attacks.

Dockerfile [7-13]

-# Install .NET SDK and other dependencies +# Install .NET SDK and other dependencies with checksum verification RUN apt-get update && \ - apt-get install -y wget && \ + apt-get install -y wget ca-certificates && \ + update-ca-certificates && \ wget https://dot.net/v1/dotnet-install.sh -O dotnet-install.sh && \ + echo "expected_sha256 dotnet-install.sh" > dotnet-install.sh.sha256 && \ + sha256sum -c dotnet-install.sh.sha256 && \ chmod +x ./dotnet-install.sh && \ ./dotnet-install.sh --version 8.0.100 && \ - rm dotnet-install.sh + rm dotnet-install.sh dotnet-install.sh.sha256
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: This is a valid and important security suggestion to mitigate supply-chain attacks by verifying the integrity of the downloaded dotnet-install.sh script using a checksum.

Medium
Remove risky bind mount

Remove the volumes bind mount from docker-compose.yml to prevent overwriting
container artifacts and avoid potential security risks.

docker-compose.yml [2-8]

 services: api: build: . ports: - "8000:8000" - volumes: - - .:/app
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a security and operational risk with bind-mounting the entire project directory, which can overwrite built artifacts and is not recommended for production-like environments.

Medium
Possible issue
Detect publish path dynamically

Dynamically locate the .NET publish directory instead of hardcoding the net8.0
framework version to make the build script more robust against future framework
changes.

build_differ.py [80-105]

 for platform, ext in platforms.items(): print(f"Building for {platform}...") try: run_command(f'dotnet publish ./csproj -c Release -r {platform} --self-contained') - build_dir = f'./csproj/bin/Release/net8.0/{platform}/publish' + # Find the actual publish directory dynamically instead of hardcoding net8.0 + base_out = os.path.join('./csproj/bin/Release') + publish_dir = None + for root, dirs, files in os.walk(base_out): + # Match .../<tfm>/<rid>/publish exactly + parts = root.replace("\\", "/").split("/") + if len(parts) >= 4 and parts[-1] == "publish" and parts[-2] == platform: + publish_dir = root + break + if not publish_dir: + raise FileNotFoundError(f"Publish directory not found for {platform} under {base_out}") # The name of the archive file should match what engines.py expects dist_filename = f"{platform}-{version}.{ext}" if platform.startswith('win-'): - dist_filename = f"win-{platform.split('-')[1]}-{version}.{ext}" + dist_filename = f"win-{platform.split('-')[1]}-{version}.{ext}" elif platform.startswith('osx-'): - dist_filename = f"osx-{platform.split('-')[1]}-{version}.{ext}" + dist_filename = f"osx-{platform.split('-')[1]}-{version}.{ext}" # The directory name inside the archive should match what engines.py expects arcname = platform if platform.startswith('win-'): arcname = 'win-' + platform.split('-')[1] elif platform.startswith('osx-'): arcname = 'osx-' + platform.split('-')[1] - compress_files(build_dir, f"{dist_dir}/{dist_filename}", arcname=arcname) + compress_files(publish_dir, f"{dist_dir}/{dist_filename}", arcname=arcname) except (subprocess.CalledProcessError, FileNotFoundError) as e: print(f"Failed to build for {platform}: {e}") - # Continue to next platform continue
  • Apply / Chat
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies a hardcoded value (net8.0) that makes the build script brittle and proposes a dynamic approach to find the path, improving robustness.

Low
General
Enforce DOCX and compress ZIP

Add explicit ZIP_DEFLATED compression when creating the zip archive and validate
that uploaded files have a .docx extension.

src/python_redlines/api.py [86-126]

 @app.post("/redline/double", dependencies=[Depends(verify_token)]) async def redline_double( original: UploadFile = File(...), modified: UploadFile = File(...), author_tag: str = Form("Author"), ): """ Accepts two .docx files and returns a zip archive with two redlined documents. """ original_content = await original.read() modified_content = await modified.read() if len(original_content) > MAX_FILE_SIZE or len(modified_content) > MAX_FILE_SIZE: raise HTTPException( status_code=status.HTTP_413_REQUEST_ENTITY_TOO_LARGE, detail=f"File size exceeds the limit of {MAX_FILE_SIZE // (1024*1024)}MB", ) - # Redline 1: Original -> Modified - redline_om, _, _ = engine.run_redline( - author_tag, original_content, modified_content - ) + for f in (original, modified): + if not (f.filename and f.filename.lower().endswith(".docx")): + raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Only .docx files are supported") - # Redline 2: Modified -> Original - redline_mo, _, _ = engine.run_redline( - author_tag, modified_content, original_content - ) + redline_om, _, _ = engine.run_redline(author_tag, original_content, modified_content) + redline_mo, _, _ = engine.run_redline(author_tag, modified_content, original_content) - # Create a zip file in memory zip_buffer = io.BytesIO() - with zipfile.ZipFile(zip_buffer, "w") as zip_file: + with zipfile.ZipFile(zip_buffer, "w", compression=zipfile.ZIP_DEFLATED) as zip_file: zip_file.writestr("redline_original_vs_modified.docx", redline_om) zip_file.writestr("redline_modified_vs_original.docx", redline_mo) zip_buffer.seek(0) return StreamingResponse( zip_buffer, media_type="application/zip", headers={"Content-Disposition": "attachment; filename=redlines.zip"}, )
  • Apply / Chat
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly points out two valid improvements: adding explicit ZIP compression and validating that uploaded files are .docx, which enhances robustness and efficiency.

Low
  • More
@coderabbitai
Copy link

coderabbitai bot commented Oct 8, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@arthrod
Copy link
Owner Author

arthrod commented Oct 8, 2025

@gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant enhancement by adding a FastAPI web service to expose the document redlining functionality. The inclusion of Docker support, multi-platform build scripts, and comprehensive tests is commendable. My review focuses on several key areas for improvement:

  • Dockerfile Optimization: I've suggested using a multi-stage build to significantly reduce the final Docker image size.
  • Configuration: There's a critical issue with a hardcoded path in pyproject.toml that needs to be addressed.
  • API Performance and Maintainability: I've pointed out opportunities to improve the API's performance by handling blocking calls correctly and to enhance maintainability by reducing code duplication.
  • Test Correctness: A minor resource leak in the tests has been identified.

Overall, this is a solid contribution. Addressing these points will make the application more robust, performant, and easier to maintain.

"fastapi",
"python-dotenv"
]
env-vars = { PYTHONPATH = "{root}/src", PATH = "/home/jules/.dotnet:{env:PATH}" }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There is a hardcoded, user-specific path in the PATH environment variable: /home/jules/.dotnet. This will cause the application to fail for any other user or in any CI/CD environment. This should be replaced with a generic path that resolves to the user's home directory, such as by using the $HOME environment variable.

Suggested change
env-vars = { PYTHONPATH = "{root}/src", PATH = "/home/jules/.dotnet:{env:PATH}" }
env-vars = { PYTHONPATH = "{root}/src", PATH = "{env:HOME}/.dotnet:{env:PATH}" }
Comment on lines +75 to +77
redline_output, _, _ = engine.run_redline(
author_tag, original_content, modified_content
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The engine.run_redline() method is a synchronous function that performs a blocking operation (running a subprocess). Calling it directly from within an async endpoint will block the entire server's event loop, preventing it from handling other requests. This can severely degrade performance. You should run blocking calls in a separate thread pool using asyncio.get_running_loop().run_in_executor().

 import asyncio loop = asyncio.get_running_loop() redline_output, _, _ = await loop.run_in_executor( None, engine.run_redline, author_tag, original_content, modified_content )
Comment on lines +105 to +112
redline_om, _, _ = engine.run_redline(
author_tag, original_content, modified_content
)

# Redline 2: Modified -> Original
redline_mo, _, _ = engine.run_redline(
author_tag, modified_content, original_content
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to the other endpoint, the engine.run_redline() calls here are blocking and will stall the event loop. They should be executed in a thread pool. Since there are two independent redlining operations, you can run them concurrently using asyncio.gather to improve the endpoint's response time.

 import asyncio loop = asyncio.get_running_loop() # Redline 1: Original -> Modified task_om = loop.run_in_executor( None, engine.run_redline, author_tag, original_content, modified_content ) # Redline 2: Modified -> Original task_mo = loop.run_in_executor( None, engine.run_redline, author_tag, modified_content, original_content ) (redline_om, _, _), (redline_mo, _, _) = await asyncio.gather(task_om, task_mo)
Comment on lines +1 to +38
# Use an official Python runtime as a parent image
FROM python:3.11-slim

# Set the working directory in the container
WORKDIR /app

# Install .NET SDK and other dependencies
RUN apt-get update && \
apt-get install -y wget && \
wget https://dot.net/v1/dotnet-install.sh -O dotnet-install.sh && \
chmod +x ./dotnet-install.sh && \
./dotnet-install.sh --version 8.0.100 && \
rm dotnet-install.sh

# Add .dotnet to the PATH
ENV PATH="/root/.dotnet:$PATH"

# Copy files required for build
COPY pyproject.toml .
COPY hatch_run_build_hook.py .
COPY build_differ.py .
COPY src/python_redlines/__about__.py src/python_redlines/__about__.py
COPY csproj/ csproj/

# Install hatch
RUN pip install hatch

# Build the project (which includes running build_differ.py)
RUN hatch run default:build

# Copy the rest of the application code
COPY . .

# Expose the port the app runs on
EXPOSE 8000

# Start the API server
CMD ["hatch", "run", "api:start"] No newline at end of file

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Dockerfile can be significantly optimized for a smaller image size and better security by using a multi-stage build. Currently, the final image includes the .NET SDK and other build-time dependencies, which are not required for running the application.

A multi-stage build would involve:

  1. A builder stage to install the .NET SDK, build the C# binaries, and then discard this environment.
  2. A final, clean python:3.11-slim stage that copies only the necessary application code and the compiled binaries from the builder stage.

This practice leads to smaller, more secure, and faster-deploying images. Additionally, it's a good practice to clean up the apt cache within the same RUN layer to further reduce image size by adding && rm -rf /var/lib/apt/lists/*.

Comment on lines +88 to +99
dist_filename = f"{platform}-{version}.{ext}"
if platform.startswith('win-'):
dist_filename = f"win-{platform.split('-')[1]}-{version}.{ext}"
elif platform.startswith('osx-'):
dist_filename = f"osx-{platform.split('-')[1]}-{version}.{ext}"

# The directory name inside the archive should match what engines.py expects
arcname = platform
if platform.startswith('win-'):
arcname = 'win-' + platform.split('-')[1]
elif platform.startswith('osx-'):
arcname = 'osx-' + platform.split('-')[1]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for constructing dist_filename and arcname contains redundancy. For example, when platform is win-x64, the if condition platform.startswith('win-') is met, but the code inside re-constructs the exact same string. This logic can be simplified by removing the conditional blocks, making the code cleaner and easier to maintain.

 dist_filename = f"{platform}-{version}.{ext}" # The directory name inside the archive should match what engines.py expects arcname = platform
Comment on lines +66 to +102
original_content = await original.read()
modified_content = await modified.read()

if len(original_content) > MAX_FILE_SIZE or len(modified_content) > MAX_FILE_SIZE:
raise HTTPException(
status_code=status.HTTP_413_REQUEST_ENTITY_TOO_LARGE,
detail=f"File size exceeds the limit of {MAX_FILE_SIZE // (1024*1024)}MB",
)

redline_output, _, _ = engine.run_redline(
author_tag, original_content, modified_content
)

return Response(
content=redline_output,
media_type="application/vnd.openxmlformats-officedocument.wordprocessingml.document",
headers={"Content-Disposition": "attachment; filename=redline.docx"},
)


@app.post("/redline/double", dependencies=[Depends(verify_token)])
async def redline_double(
original: UploadFile = File(...),
modified: UploadFile = File(...),
author_tag: str = Form("Author"),
):
"""
Accepts two .docx files and returns a zip archive with two redlined documents.
"""
original_content = await original.read()
modified_content = await modified.read()

if len(original_content) > MAX_FILE_SIZE or len(modified_content) > MAX_FILE_SIZE:
raise HTTPException(
status_code=status.HTTP_413_REQUEST_ENTITY_TOO_LARGE,
detail=f"File size exceeds the limit of {MAX_FILE_SIZE // (1024*1024)}MB",
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for reading uploaded files and checking their size is duplicated in both the redline_single and redline_double endpoints. To improve maintainability and adhere to the DRY (Don't Repeat Yourself) principle, this common logic should be extracted into a FastAPI dependency.

Comment on lines +79 to +84
files = {
"original": ("large.docx", large_file_content, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
"modified": ("modified.docx", open("tests/test_data/modified.docx", "rb"), "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
}
data = {"author_tag": "test_author"}
response = await client.post("/redline/single", files=files, data=data, headers=headers)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The file opened via open("tests/test_data/modified.docx", "rb") is not closed, which will result in a ResourceWarning and can lead to resource leaks in the test suite. It's best practice to use a with statement to ensure that file handles are always closed properly, even if errors occur.

Suggested change
files = {
"original": ("large.docx", large_file_content, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
"modified": ("modified.docx", open("tests/test_data/modified.docx", "rb"), "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
}
data = {"author_tag": "test_author"}
response = await client.post("/redline/single", files=files, data=data, headers=headers)
with open("tests/test_data/modified.docx", "rb") as modified_file:
files = {
"original": ("large.docx", large_file_content, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
"modified": ("modified.docx", modified_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
}
data = {"author_tag": "test_author"}
response = await client.post("/redline/single", files=files, data=data, headers=headers)
@coderabbitai
Copy link

coderabbitai bot commented Oct 8, 2025

@arthrod: I'll review the changes in this pull request.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Comment on lines +75 to +76
redline_output, _, _ = engine.run_redline(
author_tag, original_content, modified_content

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid blocking asyncio loop with synchronous redline subprocess

Both endpoints invoke engine.run_redline directly from an async def. run_redline performs synchronous file I/O and subprocess.run, so awaiting requests will block the event loop for the lifetime of the diff operation. Under load a single request will stall the entire worker, defeating the purpose of exposing an async API. Offload the work via asyncio.to_thread/run_in_threadpool or make the handlers synchronous so FastAPI executes them in a threadpool, and apply the same change to /redline/double.

Useful? React with 👍 / 👎.

Comment on lines +66 to +70
original_content = await original.read()
modified_content = await modified.read()

if len(original_content) > MAX_FILE_SIZE or len(modified_content) > MAX_FILE_SIZE:
raise HTTPException(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge File-size guard runs after loading entire upload into memory

The request handlers read each uploaded file (await original.read()) into a bytes object before checking len(...) > MAX_FILE_SIZE. This means a client can upload arbitrarily large files and the server will allocate the entire payload in memory before the size check raises 413, negating the 10 MB protection described in the commit message and opening the API to memory exhaustion. Consider validating the size while streaming the upload or rejecting the request before materialising the whole body.

Useful? React with 👍 / 👎.

@coderabbitai coderabbitai bot changed the title cr Add FastAPI redlining API, token auth, tests, Docker, and tooling Oct 8, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
build_differ.py (1)

7-14: Security concern: exec() usage in version extraction.

The get_version() function uses exec() to extract version information, which poses security risks if the __about__.py file is ever compromised or modified maliciously.

Apply this diff to use safer version extraction:

 def get_version(): """ Extracts the version from the specified __about__.py file. """ - about = {} - with open('./src/python_redlines/__about__.py') as f: - exec(f.read(), about) - return about['__version__'] + import re + from pathlib import Path +  + about_file = Path('./src/python_redlines/__about__.py') + content = about_file.read_text() + match = re.search(r'__version__\s*=\s*["\']([^"\']+)["\']', content) + if not match: + raise ValueError("Could not find __version__ in __about__.py") + return match.group(1)
pyproject.toml (1)

58-81: Critical: Remove hardcoded user-specific PATH.

The hardcoded PATH contains a user-specific directory that will break in other environments.

Apply this diff to fix:

 [tool.hatch.envs.default] dependencies = [ "coverage[toml]>=6.5", "pytest", "pytest-asyncio", "httpx", "python-multipart", "fastapi", "python-dotenv" ] -env-vars = { PYTHONPATH = "{root}/src", PATH = "/home/jules/.dotnet:{env:PATH}" } +env-vars = { PYTHONPATH = "{root}/src", PATH = "{env:HOME}/.dotnet:{env:PATH}" } [tool.hatch.envs.default.scripts] test = "python -m pytest {args:tests}"

Better yet, handle .NET PATH detection dynamically in build scripts rather than hardcoding it here.

🧹 Nitpick comments (11)
.env.example (1)

1-3: Add final newline and expand environment variable documentation.

The file provides basic token guidance but could be more comprehensive:

  1. Missing final newline: dotenv-linter flags this (line 3)
  2. Limited documentation: Additional environment variables might be needed (e.g., MAX_FILE_SIZE, LOG_LEVEL, CORS_ORIGINS)

Apply this diff:

 # Generate a secure secret token, for example, by running: # python -c 'import secrets; print(secrets.token_hex(32))' API_TOKEN=your_secret_api_token_here + +# Optional: Override default file size limit (bytes) +# MAX_FILE_SIZE=10485760 + +# Optional: Set log level (DEBUG, INFO, WARNING, ERROR, CRITICAL) +# LOG_LEVEL=INFO +

Based on learnings: Use python-dotenv (recommended package) to load these variables.

.gitignore (1)

1-33: Well-organized gitignore with good multi-language coverage.

The reorganized .gitignore properly handles:

  • Python artifacts (__pycache__, *.pyc, .venv/)
  • .NET build outputs (csproj/bin/, csproj/obj/)
  • Hatch tooling (.hatch/, distribution directories)
  • IDE metadata (.idea/, .vscode/)
  • Environment files (.env)

Consider adding Docker-specific ignores if not already covered elsewhere:

+# Docker +*.log +docker-compose.override.yml +
build_differ.py (3)

17-27: Add proper error handling and improve process management.

The run_command function has been improved but still lacks robustness:

  1. No timeout handling: Long-running commands could hang indefinitely
  2. Limited error context: Exception message doesn't include stdout/stderr
  3. Blocking I/O: Line-by-line reading blocks the main thread

Apply this diff:

-def run_command(command): +def run_command(command: str, timeout: int = 300) -> None: """ - Runs a shell command and prints its output. + Runs a shell command and prints its output. +  + Args: + command: Shell command to execute + timeout: Maximum execution time in seconds (default: 300) +  + Raises: + subprocess.CalledProcessError: If command returns non-zero exit code + subprocess.TimeoutExpired: If command exceeds timeout """ process = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) + output_lines = [] for line in process.stdout: - print(line.decode().strip()) + decoded = line.decode().strip() + print(decoded) + output_lines.append(decoded) - process.communicate() + try: + process.communicate(timeout=timeout) + except subprocess.TimeoutExpired: + process.kill() + raise if process.returncode != 0: - raise subprocess.CalledProcessError(process.returncode, command) + error_output = '\n'.join(output_lines[-20:]) # Last 20 lines + raise subprocess.CalledProcessError( + process.returncode,  + command, + output=error_output.encode() + )

50-61: Modernize with pathlib and add type hints.

The function uses old-style string path manipulation.

Apply this diff:

-def cleanup_old_builds(dist_dir, current_version): +from pathlib import Path + +def cleanup_old_builds(dist_dir: str, current_version: str) -> None: """ Deletes any build files ending in .zip or .tar.gz in the dist_dir with a different version tag. +  + Args: + dist_dir: Distribution directory path + current_version: Current version string to preserve """ - if not os.path.exists(dist_dir): + dist_path = Path(dist_dir) + if not dist_path.exists(): return - for file in os.listdir(dist_dir): - if not file.endswith((f'{current_version}.zip', f'{current_version}.tar.gz', '.gitignore')): - file_path = os.path.join(dist_dir, file) - os.remove(file_path) - print(f"Deleted old build file: {file}") +  + for file_path in dist_path.iterdir(): + if file_path.name == '.gitignore': + continue + if not file_path.name.endswith((f'{current_version}.zip', f'{current_version}.tar.gz')): + file_path.unlink() + print(f"Deleted old build file: {file_path.name}")

As per coding guidelines: Prefer modern Python features and pathlib over os.path.


63-110: Add uv shebang and improve error handling in main().

The main function lacks proper script metadata and could benefit from structured logging.

Add a uv shebang at the top of the file and improve main():

+#!/usr/bin/env -S uv run +# /// script +# requires-python = ">=3.12" +# dependencies = [ +# "loguru", +# ] +# /// + import subprocess import os import tarfile import zipfile +from pathlib import Path +from loguru import logger # ... rest of code ... -def main(): +def main() -> None: - version = get_version() - print(f"Version: {version}") + logger.info("Starting build process") +  + try: + version = get_version() + logger.info(f"Building version: {version}") + except Exception as e: + logger.error(f"Failed to get version: {e}") + raise dist_dir = "./src/python_redlines/dist/" - if not os.path.exists(dist_dir): - os.makedirs(dist_dir) + Path(dist_dir).mkdir(parents=True, exist_ok=True) platforms = { "linux-x64": "tar.gz", "linux-arm64": "tar.gz", "win-x64": "zip", "win-arm64": "zip", "osx-x64": "tar.gz", "osx-arm64": "tar.gz", } + failed_platforms = [] for platform, ext in platforms.items(): - print(f"Building for {platform}...") + logger.info(f"Building for {platform}...") try: run_command(f'dotnet publish ./csproj -c Release -r {platform} --self-contained') build_dir = f'./csproj/bin/Release/net8.0/{platform}/publish' # The name of the archive file should match what engines.py expects dist_filename = f"{platform}-{version}.{ext}" if platform.startswith('win-'): dist_filename = f"win-{platform.split('-')[1]}-{version}.{ext}" elif platform.startswith('osx-'): dist_filename = f"osx-{platform.split('-')[1]}-{version}.{ext}" # The directory name inside the archive should match what engines.py expects arcname = platform if platform.startswith('win-'): arcname = 'win-' + platform.split('-')[1] elif platform.startswith('osx-'): arcname = 'osx-' + platform.split('-')[1] compress_files(build_dir, f"{dist_dir}/{dist_filename}", arcname=arcname) + logger.success(f"Successfully built {platform}") except (subprocess.CalledProcessError, FileNotFoundError) as e: - print(f"Failed to build for {platform}: {e}") + logger.error(f"Failed to build for {platform}: {e}") + failed_platforms.append(platform) # Continue to next platform continue cleanup_old_builds(dist_dir, version) - print("Build and compression complete.") + if failed_platforms: + logger.warning(f"Build completed with failures for: {', '.join(failed_platforms)}") + else: + logger.success("Build and compression complete for all platforms.")

As per coding guidelines: Add uv shebangs and implement loguru for structured logging.

src/python_redlines/api.py (1)

31-55: Improve error handling and token validation.

The token verification lacks explicit None handling and detailed error logging.

Apply this diff:

 def get_api_token(): """ Loads the API token from environment variables. +  + Returns: + str: The configured API token +  + Raises: + HTTPException: If API_TOKEN environment variable is not set """ token = os.getenv("API_TOKEN") if not token: + logger.critical("API_TOKEN environment variable not configured") raise HTTPException( status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, detail="API token not configured on server", ) return token async def verify_token( x_api_token: Annotated[str, Header()], api_token: str = Depends(get_api_token), -): +) -> None: """ Verifies the API token provided in the request header. +  + Args: + x_api_token: Token from X-Api-Token header + api_token: Expected token from environment +  + Raises: + HTTPException: If token is invalid or missing """ + if not x_api_token: + logger.warning("Missing API token in request") + raise HTTPException( + status_code=status.HTTP_401_UNAUTHORIZED,  + detail="Missing API Token" + ) +  if not secrets.compare_digest(x_api_token, api_token): + logger.warning("Invalid API token attempted") raise HTTPException( status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid API Token" )
pyproject.toml (2)

58-101: Add modern Python tooling dependencies.

The configuration is missing several tools recommended in the coding guidelines.

As per coding guidelines: Use ruff, loguru, and other modern tools.

Apply this diff:

 [tool.hatch.envs.default] dependencies = [ "coverage[toml]>=6.5", "pytest", "pytest-asyncio", "httpx", "python-multipart", "fastapi", - "python-dotenv" + "python-dotenv", + "ruff", + "loguru", + "mypy" ] env-vars = { PYTHONPATH = "{root}/src", PATH = "{env:HOME}/.dotnet:{env:PATH}" } [tool.hatch.envs.default.scripts] test = "python -m pytest {args:tests}" +lint = "ruff check {args:src tests}" +format = "ruff format {args:src tests}" +typecheck = "mypy {args:src tests}" test-cov = "coverage run -m pytest {args:tests}" cov-report = [ "- coverage combine", "coverage report", ] cov = [ "test-cov", "cov-report", ] build = "python -m build_differ" +[tool.ruff] +target-version = "py312" +line-length = 100 + +[tool.ruff.lint] +select = [ + "E", # pycodestyle errors + "W", # pycodestyle warnings + "F", # pyflakes + "I", # isort + "B", # flake8-bugbear + "C4", # flake8-comprehensions + "UP", # pyupgrade +] +ignore = [ + "B008", # FastAPI uses File() in defaults which is intentional +] + +[tool.mypy] +python_version = "3.12" +warn_return_any = true +warn_unused_configs = true +disallow_untyped_defs = true

92-101: Add production-ready uvicorn configuration.

The API start command lacks production settings.

Apply this diff:

 [tool.hatch.envs.api] dependencies = [ "fastapi", "uvicorn[standard]", "python-multipart", - "python-dotenv" + "python-dotenv", + "loguru" ] [tool.hatch.envs.api.scripts] -start = "uvicorn python_redlines.api:app --host 0.0.0.0 --port 8000" +start = "uvicorn python_redlines.api:app --host 0.0.0.0 --port 8000 --workers 4 --log-level info" +start-dev = "uvicorn python_redlines.api:app --host 0.0.0.0 --port 8000 --reload --log-level debug"
tests/test_api.py (3)

1-6: Clarify test constant naming to avoid static analysis warnings.

The imports and structure are sound. However, static analysis flagged TEST_API_TOKEN as a potential hardcoded credential (S105). While this is acceptable in test context, consider renaming to TEST_AUTH_TOKEN or MOCK_API_TOKEN to clarify it's a test fixture, not a production secret.

Apply this diff:

-TEST_API_TOKEN = "test-token" +MOCK_API_TOKEN = "test-token-for-testing-only"

Then update references in fixtures and tests accordingly.


1-86: Add comprehensive edge case and parametrized tests.

The current test suite covers basic scenarios but is missing critical edge cases and could benefit from parametrization.

Consider adding the following tests:

#!/usr/bin/env -S uv run # /// script # requires-python = ">=3.12" # dependencies = [ # "pytest>=8.4.2", # "httpx>=0.28.1", # "pytest-asyncio", # ] # /// import pytest from io import BytesIO from httpx import AsyncClient, ASGITransport from python_redlines.api import app, MAX_FILE_SIZE @pytest.mark.asyncio async def test_file_size_limit_modified_file(headers): """Test /redline/single returns 413 when modified file exceeds MAX_FILE_SIZE.""" async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client: large_file_content = b"a" * (MAX_FILE_SIZE + 1) with open("tests/test_data/original.docx", "rb") as original_file: files = { "original": ("original.docx", original_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), "modified": ("large.docx", large_file_content, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), } data = {"author_tag": "test_author"} response = await client.post("/redline/single", files=files, data=data, headers=headers) assert response.status_code == 413 @pytest.mark.asyncio async def test_missing_author_tag(headers): """Test /redline/single handles missing author_tag parameter.""" async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client: with open("tests/test_data/original.docx", "rb") as original_file: with open("tests/test_data/modified.docx", "rb") as modified_file: files = { "original": ("original.docx", original_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), "modified": ("modified.docx", modified_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), } response = await client.post("/redline/single", files=files, headers=headers) # Verify API handles missing optional parameter gracefully (422 or uses default) assert response.status_code in [200, 422] @pytest.mark.asyncio async def test_invalid_file_format(headers): """Test /redline/single handles non-DOCX files gracefully.""" async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client: invalid_file_content = b"Not a DOCX file" files = { "original": ("original.docx", invalid_file_content, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), "modified": ("modified.docx", invalid_file_content, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), } data = {"author_tag": "test_author"} response = await client.post("/redline/single", files=files, data=data, headers=headers) # Should return 4xx error for invalid format assert 400 <= response.status_code < 500 @pytest.mark.parametrize("endpoint,expected_content_type", [  ("/redline/single", "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),  ("/redline/double", "application/zip"), ]) @pytest.mark.asyncio async def test_endpoints_parametrized(endpoint, expected_content_type, headers): """Parametrized test for both endpoints verifying content types.""" async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client: with open("tests/test_data/original.docx", "rb") as original_file: with open("tests/test_data/modified.docx", "rb") as modified_file: files = { "original": ("original.docx", original_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), "modified": ("modified.docx", modified_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), } data = {"author_tag": "test_author"} response = await client.post(endpoint, files=files, data=data, headers=headers) assert response.status_code == 200 assert response.headers["content-type"] == expected_content_type

These additions would significantly improve test robustness and coverage.


1-86: Consider CI/CD integration and load testing.

For production readiness, consider:

  1. CI/CD Integration: Configure GitHub Actions or equivalent to run tests on every push/PR with multiple Python versions (3.12, 3.13)
  2. Coverage Reporting: Add pytest-cov to measure code coverage and set minimum thresholds (e.g., 80%)
  3. Load Testing: Add load tests using locust or similar to verify the API handles concurrent requests without resource exhaustion
  4. Security Scanning: Integrate bandit or similar tools to catch security issues in API code
  5. Container Testing: Add tests that run against the Dockerized service to verify deployment configuration

Example GitHub Actions workflow:

name: Test on: [push, pull_request] jobs: test: runs-on: ubuntu-latest strategy: matrix: python-version: ["3.12", "3.13"] steps: - uses: actions/checkout@v4 - uses: actions/setup-python@v5 with: python-version: ${{ matrix.python-version }} - name: Install dependencies run: |  pip install uv  uv pip install -e ".[test]"  - name: Run tests with coverage run: pytest --cov=src/python_redlines --cov-report=xml --cov-report=term-missing - name: Upload coverage uses: codecov/codecov-action@v4
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 254e742 and 2003706.

⛔ Files ignored due to path filters (3)
  • src/python_redlines/dist/.gitignore is excluded by !**/dist/**
  • tests/test_data/modified.docx is excluded by !**/*.docx
  • tests/test_data/original.docx is excluded by !**/*.docx
📒 Files selected for processing (9)
  • .env.example (1 hunks)
  • .gitignore (1 hunks)
  • Dockerfile (1 hunks)
  • build_differ.py (2 hunks)
  • docker-compose.yml (1 hunks)
  • dotnet-install.sh (1 hunks)
  • pyproject.toml (3 hunks)
  • src/python_redlines/api.py (1 hunks)
  • tests/test_api.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Add docstrings, remove deprecated libraries (no unittest or mock), remove typing Optional, List, etc in favor of modern syntax list, str | str, use async methods as much as possible, when applicable import and implement uvloop, typer, loguru, uv, ruff, pytest, assert, httpx/aiohttp, polars. Add uv shebangs when possible: #!/usr/bin/env -S uv run

/// script

requires-python = ">=3.12"

dependencies = [

"loguru",

]

///

Refactor variables, classes, functions or constants names if not descriptive enough. Don't forget to determine where else it's used.
For JavaScript and TypeScript, favor bun/oxlint/biome/tanstack/elysia, but propose refactoring only if (1) a seamless migration to these tools and/or (2) the current implementation of the code is not working. Always ensure authentication is properly implemented via oauth2 or oidc. Always run check for contrast and visualization for both light and dark themes. Verify if we don't have dead routes (pages that don't are ever rendered). Always review the comments of other commenters and assess them meticulously, indicating for each of them what was correct and what was incorrect. Verify any mock, incomplete or, most importantly, deceiving-looking incomplete implementations. Generally, focus on correct, non-duplicative endpoints, completeness and robustness.

All of your messages, either a response, a summary or an automatic review MUST include, in addition to the other requirements, the following:

  1. An explanation of the goal of the repo and a meticulous explanation of the goal of the PR.
  2. How the PR attempts to achieve EACH of these goals.
  3. Extremely comprehensively, you will indicate on what the PR does well, what is missing and what is wrong. This section should have several subtopics, but each subtopics must be very short, no more than a sentence. [subtopicx, subtopicy, subtopics, etc] that will be my punch list (and must be in list format).
  4. Then propose marginal enhancements, and full solu...

Files:

  • build_differ.py
  • tests/test_api.py
  • src/python_redlines/api.py
🪛 dotenv-linter (3.3.0)
.env.example

[warning] 3-3: [EndingBlankLine] No blank line at the end of the file

(EndingBlankLine)

🪛 Ruff (0.13.3)
build_differ.py

43-43: Loop control variable dirs not used within loop body

Rename unused dirs to _dirs

(B007)

tests/test_api.py

5-5: Possible hardcoded password assigned to: "TEST_API_TOKEN"

(S105)

src/python_redlines/api.py

59-59: Do not perform function call File in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


60-60: Do not perform function call File in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


88-88: Do not perform function call File in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


89-89: Do not perform function call File in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

🪛 Shellcheck (0.11.0)
dotnet-install.sh

[warning] 29-29: bold appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 31-31: black appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 33-33: green appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 35-35: blue appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 36-36: magenta appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 38-38: white appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 141-141: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 146-146: Quote this to prevent word splitting.

(SC2046)


[warning] 175-175: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 194-194: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 287-287: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 288-288: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 351-351: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 402-402: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 403-403: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 405-405: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 448-448: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 477-477: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 502-502: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 542-542: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 581-581: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 705-705: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 740-740: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 777-777: Prefer mapfile or read -a to split command output (or quote to avoid splitting).

(SC2207)


[warning] 778-778: Prefer mapfile or read -a to split command output (or quote to avoid splitting).

(SC2207)


[warning] 939-939: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 954-954: relative_or_absolute_path appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 998-998: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 999-999: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 1001-1001: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 1004-1004: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 1050-1050: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 1221-1221: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 1265-1265: Declare and assign separately to avoid masking return values.

(SC2155)


[error] 1371-1371: Double quote array expansions to avoid re-splitting elements.

(SC2068)


[error] 1384-1384: Double quote array expansions to avoid re-splitting elements.

(SC2068)


[warning] 1425-1425: Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a.

(SC2206)


[warning] 1426-1426: Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a.

(SC2206)


[warning] 1427-1427: Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a.

(SC2206)


[warning] 1469-1469: Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a.

(SC2206)


[warning] 1470-1470: Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a.

(SC2206)


[warning] 1471-1471: Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a.

(SC2206)


[warning] 1479-1479: Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a.

(SC2206)


[warning] 1480-1480: Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a.

(SC2206)


[warning] 1481-1481: Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a.

(SC2206)


[warning] 1505-1505: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?

(SC2140)


[warning] 1505-1505: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?

(SC2140)


[warning] 1505-1505: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?

(SC2140)


[warning] 1505-1505: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?

(SC2140)


[warning] 1505-1505: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?

(SC2140)


[warning] 1505-1505: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?

(SC2140)


[warning] 1505-1505: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?

(SC2140)


[warning] 1505-1505: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?

(SC2140)


[warning] 1508-1508: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?

(SC2140)


[warning] 1508-1508: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?

(SC2140)


[warning] 1512-1512: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?

(SC2140)


[warning] 1512-1512: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?

(SC2140)


[warning] 1514-1514: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?

(SC2140)


[warning] 1514-1514: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?

(SC2140)


[warning] 1520-1520: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?

(SC2140)


[warning] 1520-1520: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?

(SC2140)


[warning] 1636-1636: args appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 1638-1638: local_version_file_relative_path appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 1724-1724: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?

(SC2140)


[warning] 1724-1724: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?

(SC2140)


[warning] 1729-1729: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?

(SC2140)


[warning] 1729-1729: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?

(SC2140)


[warning] 1741-1741: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?

(SC2140)


[warning] 1741-1741: Word is of the form "A"B"C" (B indicated). Did you mean "ABC" or "A"B"C"?

(SC2140)


[warning] 1760-1760: This pattern always overrides a later one on line 1760.

(SC2221)


[warning] 1760-1760: This pattern never matches because of a previous pattern on line 1760.

(SC2222)

🔇 Additional comments (1)
dotnet-install.sh (1)

1-1888: Document vendored script source and version
Unable to fetch the official script automatically; manually verify its origin (e.g. compare with https://github.com/dotnet/install-scripts), determine its commit hash or release tag, and add the source URL and version/commit metadata at the top of dotnet-install.sh.

Comment on lines +29 to 48
def compress_files(source_dir, target_file, arcname=None):
"""
Compresses files in the specified directory into a tar.gz or zip file.
"""
if not arcname:
arcname = os.path.basename(source_dir)

print(f"Compressing {source_dir} to {target_file} with arcname {arcname}")

if target_file.endswith('.tar.gz'):
with tarfile.open(target_file, "w:gz") as tar:
tar.add(source_dir, arcname=os.path.basename(source_dir))
tar.add(source_dir, arcname=arcname)
elif target_file.endswith('.zip'):
with zipfile.ZipFile(target_file, 'w', zipfile.ZIP_DEFLATED) as zipf:
for root, dirs, files in os.walk(source_dir):
for file in files:
zipf.write(os.path.join(root, file),
os.path.relpath(os.path.join(root, file),
os.path.join(source_dir, '..')))
file_path = os.path.join(root, file)
zip_path = os.path.join(arcname, os.path.relpath(file_path, source_dir))
zipf.write(file_path, zip_path)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix unused variable and add type hints.

Static analysis correctly identifies an unused loop variable.

Apply this diff to fix the Ruff B007 warning:

-def compress_files(source_dir, target_file, arcname=None): +def compress_files(source_dir: str, target_file: str, arcname: str | None = None) -> None: """ Compresses files in the specified directory into a tar.gz or zip file. +  + Args: + source_dir: Directory containing files to compress + target_file: Output archive path (.tar.gz or .zip) + arcname: Name to use for root directory in archive (defaults to basename of source_dir) """ if not arcname: arcname = os.path.basename(source_dir) print(f"Compressing {source_dir} to {target_file} with arcname {arcname}") if target_file.endswith('.tar.gz'): with tarfile.open(target_file, "w:gz") as tar: tar.add(source_dir, arcname=arcname) elif target_file.endswith('.zip'): with zipfile.ZipFile(target_file, 'w', zipfile.ZIP_DEFLATED) as zipf: - for root, dirs, files in os.walk(source_dir): + for root, _dirs, files in os.walk(source_dir): for file in files: file_path = os.path.join(root, file) zip_path = os.path.join(arcname, os.path.relpath(file_path, source_dir)) zipf.write(file_path, zip_path)

As per coding guidelines: Add type hints using modern Python | syntax instead of Optional.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def compress_files(source_dir, target_file, arcname=None):
"""
Compresses files in the specified directory into a tar.gz or zip file.
"""
if not arcname:
arcname = os.path.basename(source_dir)
print(f"Compressing {source_dir} to {target_file} with arcname {arcname}")
if target_file.endswith('.tar.gz'):
with tarfile.open(target_file, "w:gz") as tar:
tar.add(source_dir, arcname=os.path.basename(source_dir))
tar.add(source_dir, arcname=arcname)
elif target_file.endswith('.zip'):
with zipfile.ZipFile(target_file, 'w', zipfile.ZIP_DEFLATED) as zipf:
for root, dirs, files in os.walk(source_dir):
for file in files:
zipf.write(os.path.join(root, file),
os.path.relpath(os.path.join(root, file),
os.path.join(source_dir, '..')))
file_path = os.path.join(root, file)
zip_path = os.path.join(arcname, os.path.relpath(file_path, source_dir))
zipf.write(file_path, zip_path)
def compress_files(source_dir: str, target_file: str, arcname: str | None = None) -> None:
"""
Compresses files in the specified directory into a tar.gz or zip file.
Args:
source_dir: Directory containing files to compress
target_file: Output archive path (.tar.gz or .zip)
arcname: Name to use for root directory in archive (defaults to basename of source_dir)
"""
if not arcname:
arcname = os.path.basename(source_dir)
print(f"Compressing {source_dir} to {target_file} with arcname {arcname}")
if target_file.endswith('.tar.gz'):
with tarfile.open(target_file, "w:gz") as tar:
tar.add(source_dir, arcname=arcname)
elif target_file.endswith('.zip'):
with zipfile.ZipFile(target_file, 'w', zipfile.ZIP_DEFLATED) as zipf:
for root, _dirs, files in os.walk(source_dir):
for file in files:
file_path = os.path.join(root, file)
zip_path = os.path.join(arcname, os.path.relpath(file_path, source_dir))
zipf.write(file_path, zip_path)
🧰 Tools
🪛 Ruff (0.13.3)

43-43: Loop control variable dirs not used within loop body

Rename unused dirs to _dirs

(B007)

🤖 Prompt for AI Agents
In build_differ.py around lines 29 to 48, the compress_files function currently declares an unused loop variable (dirs) and lacks type hints; update the signature to include modern Python type hints (e.g., def compress_files(source_dir: str, target_file: str, arcname: str | None = None) -> None) and replace the unused loop variable with an underscore (for root, _, files in os.walk(source_dir)) to resolve the Ruff B007 warning, keeping the rest of the logic unchanged. 
Comment on lines +1 to +8
version: '3.8'
services:
api:
build: .
ports:
- "8000:8000"
volumes:
- .:/app No newline at end of file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add environment variable loading, health checks, and resource limits.

The docker-compose configuration is minimal and lacks several production-ready features:

  1. Missing environment file: The API requires API_TOKEN from environment but the compose file doesn't load .env
  2. No health checks: Container health monitoring is not configured
  3. No resource limits: Memory/CPU constraints are absent
  4. No restart policy: Service won't auto-restart on failure
  5. Development volume mount: The .:/app mount is convenient for development but shouldn't be used in production

Apply this diff to add essential production configurations:

 version: '3.8' services: api: build: . ports: - "8000:8000" - volumes: - - .:/app + env_file: + - .env + healthcheck: + test: ["CMD", "python", "-c", "import httpx; httpx.get('http://localhost:8000/health', timeout=2)"] + interval: 30s + timeout: 3s + retries: 3 + start_period: 10s + restart: unless-stopped + deploy: + resources: + limits: + cpus: '2' + memory: 2G + reservations: + cpus: '0.5' + memory: 512M

Note: This assumes a /health endpoint exists. Consider adding one to api.py.

🤖 Prompt for AI Agents
In docker-compose.yml around lines 1 to 8, the service definition is missing production-ready settings: add env_file: .env to load API_TOKEN and other secrets, add a healthcheck that polls an existing /health endpoint with sensible interval, timeout and retries, add restart: always (or on-failure) so the service auto-restarts, and add resource limits under deploy.resources.limits (e.g., cpus and memory) to constrain CPU and memory; finally remove or replace the development bind mount (.:/app) with either no volume or a read-only volume for production. Ensure the healthcheck command, env_file path, restart policy and resource values are set to appropriate production-safe values. 
Comment on lines +1 to +38
# Use an official Python runtime as a parent image
FROM python:3.11-slim

# Set the working directory in the container
WORKDIR /app

# Install .NET SDK and other dependencies
RUN apt-get update && \
apt-get install -y wget && \
wget https://dot.net/v1/dotnet-install.sh -O dotnet-install.sh && \
chmod +x ./dotnet-install.sh && \
./dotnet-install.sh --version 8.0.100 && \
rm dotnet-install.sh

# Add .dotnet to the PATH
ENV PATH="/root/.dotnet:$PATH"

# Copy files required for build
COPY pyproject.toml .
COPY hatch_run_build_hook.py .
COPY build_differ.py .
COPY src/python_redlines/__about__.py src/python_redlines/__about__.py
COPY csproj/ csproj/

# Install hatch
RUN pip install hatch

# Build the project (which includes running build_differ.py)
RUN hatch run default:build

# Copy the rest of the application code
COPY . .

# Expose the port the app runs on
EXPOSE 8000

# Start the API server
CMD ["hatch", "run", "api:start"] No newline at end of file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical security issues: running as root and no multi-stage build.

The Dockerfile has several security and optimization issues:

  1. Runs as root: Container runs as root user (security risk)
  2. Single-stage build: Includes build tools in final image (bloat)
  3. No health check: Container health monitoring not configured
  4. Build-time network dependency: Downloads installer during build

Apply this diff to create a secure multi-stage build:

-# Use an official Python runtime as a parent image -FROM python:3.11-slim +# Multi-stage build for smaller, more secure images +FROM python:3.11-slim AS builder # Set the working directory in the container WORKDIR /app # Install .NET SDK and other dependencies RUN apt-get update && \ apt-get install -y wget && \ wget https://dot.net/v1/dotnet-install.sh -O dotnet-install.sh && \ chmod +x ./dotnet-install.sh && \ ./dotnet-install.sh --version 8.0.100 && \ - rm dotnet-install.sh + rm dotnet-install.sh && \ + apt-get clean && \ + rm -rf /var/lib/apt/lists/* # Add .dotnet to the PATH ENV PATH="/root/.dotnet:$PATH" # Copy files required for build COPY pyproject.toml . COPY hatch_run_build_hook.py . COPY build_differ.py . COPY src/python_redlines/__about__.py src/python_redlines/__about__.py COPY csproj/ csproj/ # Install hatch RUN pip install hatch # Build the project (which includes running build_differ.py) RUN hatch run default:build +# Final stage - runtime image +FROM python:3.11-slim + +WORKDIR /app + +# Create non-root user +RUN groupadd -r appuser && useradd -r -g appuser appuser && \ + chown -R appuser:appuser /app + +# Copy built artifacts from builder +COPY --from=builder --chown=appuser:appuser /app/dist ./dist +COPY --from=builder --chown=appuser:appuser /app/src ./src + # Copy the rest of the application code -COPY . . +COPY --chown=appuser:appuser pyproject.toml . +COPY --chown=appuser:appuser README.md . + +# Install runtime dependencies only +RUN pip install --no-cache-dir hatch && \ + pip install --no-cache-dir -e . + +# Switch to non-root user +USER appuser # Expose the port the app runs on EXPOSE 8000 +# Add health check +HEALTHCHECK --interval=30s --timeout=3s --start-period=10s --retries=3 \ + CMD python -c "import httpx; httpx.get('http://localhost:8000/health', timeout=2)" || exit 1 + # Start the API server CMD ["hatch", "run", "api:start"]
🤖 Prompt for AI Agents
In Dockerfile lines 1-38, the image currently runs as root, performs build and downloads installers in the final image, and lacks a HEALTHCHECK; convert to a secure multi-stage build by creating a builder stage (use an official dotnet SDK base like mcr.microsoft.com/dotnet/sdk:8.0 or include the dotnet installer in build context so no external network fetch at build time), perform all apt installs, dotnet and Python build steps and hatch build in that builder, produce only the runtime artifacts into a minimal final stage based on python:3.11-slim (no build tools), create a non-root user in the final stage and set USER to it, copy only needed files from builder, install runtime-only Python deps with pip --no-cache-dir, and add a HEALTHCHECK instruction to probe the application endpoint; ensure no wget/dotnet installer downloads happen in the final stage and remove build-time tools from the final image. 
Comment on lines +1 to +27
import os
import secrets
import zipfile
import io
from typing import Annotated

from dotenv import load_dotenv
from fastapi import (
FastAPI,
File,
UploadFile,
Form,
Response,
Depends,
HTTPException,
status,
Header,
)
from fastapi.responses import StreamingResponse
from python_redlines.engines import XmlPowerToolsEngine

load_dotenv()

app = FastAPI()

engine = XmlPowerToolsEngine()

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add essential API features: CORS, health check, rate limiting.

The API lacks several production-ready features.

Add these features to the API:

from fastapi import FastAPI from fastapi.middleware.cors import CORSMiddleware from fastapi.middleware.trustedhost import TrustedHostMiddleware import time from collections import defaultdict from threading import Lock app = FastAPI( title="Python Redlines API", description="API for generating redlined Word documents", version="1.0.0", docs_url="/docs", redoc_url="/redoc", ) # CORS configuration app.add_middleware( CORSMiddleware, allow_origins=os.getenv("CORS_ORIGINS", "*").split(","), allow_credentials=True, allow_methods=["POST"], allow_headers=["*"], ) # Trusted host middleware for security app.add_middleware( TrustedHostMiddleware, allowed_hosts=os.getenv("ALLOWED_HOSTS", "*").split(",") ) # Simple rate limiting (consider using slowapi for production) request_counts = defaultdict(list) rate_limit_lock = Lock() @app.middleware("http") async def rate_limit_middleware(request: Request, call_next): client_ip = request.client.host current_time = time.time() with rate_limit_lock: # Clean old requests (older than 1 minute) request_counts[client_ip] = [ t for t in request_counts[client_ip] if current_time - t < 60 ] # Check rate limit (e.g., 10 requests per minute) if len(request_counts[client_ip]) >= 10: logger.warning(f"Rate limit exceeded for {client_ip}") raise HTTPException( status_code=status.HTTP_429_TOO_MANY_REQUESTS, detail="Rate limit exceeded. Try again later." ) request_counts[client_ip].append(current_time) response = await call_next(request) return response @app.get("/health") async def health_check(): """Health check endpoint for container orchestration.""" return { "status": "healthy", "timestamp": datetime.utcnow().isoformat(), "version": "1.0.0" }

⚠️ Potential issue | 🟠 Major

Critical: Authentication should use OAuth2/OIDC, not simple bearer tokens.

The current implementation uses a basic token comparison, which doesn't meet the authentication requirements specified in the coding guidelines.

As per coding guidelines: "Always ensure authentication is properly implemented via oauth2 or oidc."

The current token-based auth is insufficient for production use. Consider implementing OAuth2 with JWT tokens:

#!/usr/bin/env -S uv run # /// script # requires-python = ">=3.12" # dependencies = [ # "fastapi", # "python-jose[cryptography]", # "passlib[bcrypt]", # "python-multipart", # "loguru", # "python-dotenv", # ] # /// from datetime import datetime, timedelta from typing import Annotated from fastapi import Depends, FastAPI, HTTPException, status from fastapi.security import OAuth2PasswordBearer, OAuth2PasswordRequestForm from jose import JWTError, jwt from passlib.context import CryptContext from loguru import logger # OAuth2 configuration SECRET_KEY = os.getenv("SECRET_KEY") # Different from API_TOKEN ALGORITHM = "HS256" ACCESS_TOKEN_EXPIRE_MINUTES = 30 oauth2_scheme = OAuth2PasswordBearer(tokenUrl="token") pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto") async def get_current_user(token: Annotated[str, Depends(oauth2_scheme)]): credentials_exception = HTTPException( status_code=status.HTTP_401_UNAUTHORIZED, detail="Could not validate credentials", headers={"WWW-Authenticate": "Bearer"}, ) try: payload = jwt.decode(token, SECRET_KEY, algorithms=[ALGORITHM]) username: str = payload.get("sub") if username is None: raise credentials_exception except JWTError: raise credentials_exception return username @app.post("/token") async def login(form_data: Annotated[OAuth2PasswordRequestForm, Depends()]): # Implement user authentication logic here # This is a placeholder access_token = create_access_token(data={"sub": form_data.username}) return {"access_token": access_token, "token_type": "bearer"}

Then update endpoints to use Depends(get_current_user) instead of Depends(verify_token).


🛠️ Refactor suggestion | 🟠 Major

Add comprehensive logging and observability.

The API lacks structured logging, which is critical for debugging and monitoring in production.

Apply this diff to add loguru-based logging:

+#!/usr/bin/env -S uv run +# /// script +# requires-python = ">=3.12" +# dependencies = [ +# "fastapi", +# "uvicorn", +# "python-multipart", +# "python-dotenv", +# "loguru", +# ] +# /// + import os import secrets import zipfile import io from typing import Annotated from dotenv import load_dotenv +from loguru import logger from fastapi import ( FastAPI, File, UploadFile, Form, Response, Depends, HTTPException, status, Header, + Request, ) from fastapi.responses import StreamingResponse from python_redlines.engines import XmlPowerToolsEngine load_dotenv() +# Configure logging +logger.add( + "logs/api_{time:YYYY-MM-DD}.log", + rotation="1 day", + retention="30 days", + level=os.getenv("LOG_LEVEL", "INFO"), +) + app = FastAPI() + +@app.middleware("http") +async def log_requests(request: Request, call_next): + logger.info(f"Request: {request.method} {request.url.path}") + try: + response = await call_next(request) + logger.info(f"Response: {response.status_code}") + return response + except Exception as e: + logger.error(f"Request failed: {e}") + raise

As per coding guidelines: Implement loguru for structured logging.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import os
import secrets
import zipfile
import io
from typing import Annotated
from dotenv import load_dotenv
from fastapi import (
FastAPI,
File,
UploadFile,
Form,
Response,
Depends,
HTTPException,
status,
Header,
)
from fastapi.responses import StreamingResponse
from python_redlines.engines import XmlPowerToolsEngine
load_dotenv()
app = FastAPI()
engine = XmlPowerToolsEngine()
#!/usr/bin/env -S uv run
# /// script
# requires-python = ">=3.12"
# dependencies = [
# "fastapi",
# "uvicorn",
# "python-multipart",
# "python-dotenv",
# "loguru",
# ]
# ///
import os
import secrets
import zipfile
import io
from typing import Annotated
from dotenv import load_dotenv
from loguru import logger
from fastapi import (
FastAPI,
File,
UploadFile,
Form,
Response,
Depends,
HTTPException,
status,
Header,
Request,
)
from fastapi.responses import StreamingResponse
from python_redlines.engines import XmlPowerToolsEngine
load_dotenv()
# Configure logging
logger.add(
"logs/api_{time:YYYY-MM-DD}.log",
rotation="1 day",
retention="30 days",
level=os.getenv("LOG_LEVEL", "INFO"),
)
app = FastAPI()
@app.middleware("http")
async def log_requests(request: Request, call_next):
logger.info(f"Request: {request.method} {request.url.path}")
try:
response = await call_next(request)
logger.info(f"Response: {response.status_code}")
return response
except Exception as e:
logger.error(f"Request failed: {e}")
raise
engine = XmlPowerToolsEngine()
🤖 Prompt for AI Agents
In src/python_redlines/api.py lines 1-27, add structured loguru-based logging and basic observability: import logger from loguru and configure a sink (stdout/stderr) with level read from environment (e.g., LOG_LEVEL), JSON formatting or structured output, and sensible rotation/retention; add a helper to bind request_id (generate via secrets.token_hex) and timestamp to each request; register FastAPI middleware that on request creates a bound logger and logs method, path, headers (non-sensitive), and on response logs status_code and duration; add an exception handler that logs stack traces and error context before raising HTTPException; ensure any engine creation or startup (engine = XmlPowerToolsEngine()) is logged with success/failure and error details; do not leave prints—use logger.info/debug/error with structured fields. 
Comment on lines +57 to +84
@app.post("/redline/single", dependencies=[Depends(verify_token)])
async def redline_single(
original: UploadFile = File(...),
modified: UploadFile = File(...),
author_tag: str = Form("Author"),
):
"""
Accepts two .docx files and returns a single redlined .docx file.
"""
original_content = await original.read()
modified_content = await modified.read()

if len(original_content) > MAX_FILE_SIZE or len(modified_content) > MAX_FILE_SIZE:
raise HTTPException(
status_code=status.HTTP_413_REQUEST_ENTITY_TOO_LARGE,
detail=f"File size exceeds the limit of {MAX_FILE_SIZE // (1024*1024)}MB",
)

redline_output, _, _ = engine.run_redline(
author_tag, original_content, modified_content
)

return Response(
content=redline_output,
media_type="application/vnd.openxmlformats-officedocument.wordprocessingml.document",
headers={"Content-Disposition": "attachment; filename=redline.docx"},
)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use async file operations and add request validation.

The endpoint uses synchronous file reading and lacks content-type validation.

The Ruff B008 warnings about File() in defaults are false positives for FastAPI - this is the correct FastAPI pattern. However, the synchronous .read() is a real issue.

Apply this diff:

 @app.post("/redline/single", dependencies=[Depends(verify_token)]) async def redline_single( - original: UploadFile = File(...), - modified: UploadFile = File(...), + original: Annotated[UploadFile, File(description="Original .docx file")], + modified: Annotated[UploadFile, File(description="Modified .docx file")], author_tag: str = Form("Author"), -): +) -> Response: """ Accepts two .docx files and returns a single redlined .docx file. +  + Args: + original: Original Word document + modified: Modified Word document + author_tag: Author name for tracking (default: "Author") +  + Returns: + Response: Redlined document as .docx +  + Raises: + HTTPException: If file size exceeds limit or content type is invalid """ - original_content = await original.read() - modified_content = await modified.read() + # Validate content types + valid_content_type = "application/vnd.openxmlformats-officedocument.wordprocessingml.document" + if original.content_type not in [valid_content_type, "application/octet-stream"]: + logger.warning(f"Invalid content type for original: {original.content_type}") + raise HTTPException( + status_code=status.HTTP_415_UNSUPPORTED_MEDIA_TYPE, + detail="Original file must be a .docx file" + ) + if modified.content_type not in [valid_content_type, "application/octet-stream"]: + logger.warning(f"Invalid content type for modified: {modified.content_type}") + raise HTTPException( + status_code=status.HTTP_415_UNSUPPORTED_MEDIA_TYPE, + detail="Modified file must be a .docx file" + ) +  + # Validate author_tag to prevent injection + if not author_tag or len(author_tag) > 100 or not author_tag.isprintable(): + logger.warning(f"Invalid author_tag: {author_tag}") + raise HTTPException( + status_code=status.HTTP_400_BAD_REQUEST, + detail="Invalid author_tag: must be 1-100 printable characters" + ) +  + logger.info(f"Processing single redline request: {original.filename} vs {modified.filename}") +  + try: + original_content = await original.read() + modified_content = await modified.read() + except Exception as e: + logger.error(f"Failed to read uploaded files: {e}") + raise HTTPException( + status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, + detail="Failed to read uploaded files" + ) if len(original_content) > MAX_FILE_SIZE or len(modified_content) > MAX_FILE_SIZE: + logger.warning(f"File size exceeded: original={len(original_content)}, modified={len(modified_content)}") raise HTTPException( status_code=status.HTTP_413_REQUEST_ENTITY_TOO_LARGE, detail=f"File size exceeds the limit of {MAX_FILE_SIZE // (1024*1024)}MB", ) - redline_output, _, _ = engine.run_redline( - author_tag, original_content, modified_content - ) + try: + redline_output, _, _ = engine.run_redline( + author_tag, original_content, modified_content + ) + except Exception as e: + logger.error(f"Redline generation failed: {e}") + raise HTTPException( + status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, + detail="Failed to generate redline document" + ) +  + logger.success(f"Successfully generated single redline") return Response( content=redline_output, media_type="application/vnd.openxmlformats-officedocument.wordprocessingml.document", headers={"Content-Disposition": "attachment; filename=redline.docx"}, )

Note: The UploadFile.read() is already async in Starlette/FastAPI, so using await is correct. The issue is that we need better error handling and validation.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@app.post("/redline/single", dependencies=[Depends(verify_token)])
async def redline_single(
original: UploadFile = File(...),
modified: UploadFile = File(...),
author_tag: str = Form("Author"),
):
"""
Accepts two .docx files and returns a single redlined .docx file.
"""
original_content = await original.read()
modified_content = await modified.read()
if len(original_content) > MAX_FILE_SIZE or len(modified_content) > MAX_FILE_SIZE:
raise HTTPException(
status_code=status.HTTP_413_REQUEST_ENTITY_TOO_LARGE,
detail=f"File size exceeds the limit of {MAX_FILE_SIZE // (1024*1024)}MB",
)
redline_output, _, _ = engine.run_redline(
author_tag, original_content, modified_content
)
return Response(
content=redline_output,
media_type="application/vnd.openxmlformats-officedocument.wordprocessingml.document",
headers={"Content-Disposition": "attachment; filename=redline.docx"},
)
@app.post("/redline/single", dependencies=[Depends(verify_token)])
async def redline_single(
original: Annotated[UploadFile, File(description="Original .docx file")],
modified: Annotated[UploadFile, File(description="Modified .docx file")],
author_tag: str = Form("Author"),
) -> Response:
"""
Accepts two .docx files and returns a single redlined .docx file.
Args:
original: Original Word document
modified: Modified Word document
author_tag: Author name for tracking (default: "Author")
Returns:
Response: Redlined document as .docx
Raises:
HTTPException: If file size exceeds limit or content type is invalid
"""
# Validate content types
valid_content_type = "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
if original.content_type not in [valid_content_type, "application/octet-stream"]:
logger.warning(f"Invalid content type for original: {original.content_type}")
raise HTTPException(
status_code=status.HTTP_415_UNSUPPORTED_MEDIA_TYPE,
detail="Original file must be a .docx file"
)
if modified.content_type not in [valid_content_type, "application/octet-stream"]:
logger.warning(f"Invalid content type for modified: {modified.content_type}")
raise HTTPException(
status_code=status.HTTP_415_UNSUPPORTED_MEDIA_TYPE,
detail="Modified file must be a .docx file"
)
# Validate author_tag to prevent injection
if not author_tag or len(author_tag) > 100 or not author_tag.isprintable():
logger.warning(f"Invalid author_tag: {author_tag}")
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Invalid author_tag: must be 1-100 printable characters"
)
logger.info(f"Processing single redline request: {original.filename} vs {modified.filename}")
try:
original_content = await original.read()
modified_content = await modified.read()
except Exception as e:
logger.error(f"Failed to read uploaded files: {e}")
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail="Failed to read uploaded files"
)
if len(original_content) > MAX_FILE_SIZE or len(modified_content) > MAX_FILE_SIZE:
logger.warning(f"File size exceeded: original={len(original_content)}, modified={len(modified_content)}")
raise HTTPException(
status_code=status.HTTP_413_REQUEST_ENTITY_TOO_LARGE,
detail=f"File size exceeds the limit of {MAX_FILE_SIZE // (1024*1024)}MB",
)
try:
redline_output, _, _ = engine.run_redline(
author_tag, original_content, modified_content
)
except Exception as e:
logger.error(f"Redline generation failed: {e}")
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail="Failed to generate redline document"
)
logger.success("Successfully generated single redline")
return Response(
content=redline_output,
media_type="application/vnd.openxmlformats-officedocument.wordprocessingml.document",
headers={"Content-Disposition": "attachment; filename=redline.docx"},
)
🧰 Tools
🪛 Ruff (0.13.3)

59-59: Do not perform function call File in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


60-60: Do not perform function call File in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)

Comment on lines +18 to +33
@pytest.mark.asyncio
async def test_redline_single(headers):
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
with open("tests/test_data/original.docx", "rb") as original_file:
with open("tests/test_data/modified.docx", "rb") as modified_file:
files = {
"original": ("original.docx", original_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
"modified": ("modified.docx", modified_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
}
data = {"author_tag": "test_author"}
response = await client.post("/redline/single", files=files, data=data, headers=headers)

assert response.status_code == 200
assert response.headers["content-type"] == "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
assert "attachment; filename=redline.docx" in response.headers["content-disposition"]

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add docstring and validate response body content.

The test correctly validates the happy path but is missing:

  1. Docstring (coding guideline requirement)
  2. Response body validation (should verify it's a valid DOCX)

Apply this diff:

 @pytest.mark.asyncio async def test_redline_single(headers): + """Test /redline/single endpoint returns a valid redlined DOCX with correct headers.""" async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client: with open("tests/test_data/original.docx", "rb") as original_file: with open("tests/test_data/modified.docx", "rb") as modified_file: files = { "original": ("original.docx", original_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), "modified": ("modified.docx", modified_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), } data = {"author_tag": "test_author"} response = await client.post("/redline/single", files=files, data=data, headers=headers) assert response.status_code == 200 assert response.headers["content-type"] == "application/vnd.openxmlformats-officedocument.wordprocessingml.document" assert "attachment; filename=redline.docx" in response.headers["content-disposition"] + # Verify response body is non-empty and has DOCX signature + assert len(response.content) > 0 + assert response.content[:4] == b"PK\x03\x04", "Response should be a valid ZIP/DOCX file"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@pytest.mark.asyncio
async def test_redline_single(headers):
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
with open("tests/test_data/original.docx", "rb") as original_file:
with open("tests/test_data/modified.docx", "rb") as modified_file:
files = {
"original": ("original.docx", original_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
"modified": ("modified.docx", modified_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
}
data = {"author_tag": "test_author"}
response = await client.post("/redline/single", files=files, data=data, headers=headers)
assert response.status_code == 200
assert response.headers["content-type"] == "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
assert "attachment; filename=redline.docx" in response.headers["content-disposition"]
@pytest.mark.asyncio
async def test_redline_single(headers):
"""Test /redline/single endpoint returns a valid redlined DOCX with correct headers."""
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
with open("tests/test_data/original.docx", "rb") as original_file:
with open("tests/test_data/modified.docx", "rb") as modified_file:
files = {
"original": ("original.docx", original_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
"modified": ("modified.docx", modified_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
}
data = {"author_tag": "test_author"}
response = await client.post("/redline/single", files=files, data=data, headers=headers)
assert response.status_code == 200
assert response.headers["content-type"] == "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
assert "attachment; filename=redline.docx" in response.headers["content-disposition"]
# Verify response body is non-empty and has DOCX signature
assert len(response.content) > 0
assert response.content[:4] == b"PK\x03\x04", "Response should be a valid ZIP/DOCX file"
🤖 Prompt for AI Agents
In tests/test_api.py around lines 18 to 33, the test function lacks a docstring and doesn't validate the response body is a valid DOCX; add a one-line docstring describing the test, then after asserting status and headers load response.content into an io.BytesIO and open it with zipfile.ZipFile to verify it is a valid ZIP-based DOCX by asserting the presence of expected entries (e.g., "[Content_Types].xml" and "word/document.xml") and that the file size/content is non-empty; raise/assert appropriately if ZipFile fails or entries are missing. 
Comment on lines +35 to +50
@pytest.mark.asyncio
async def test_redline_double(headers):
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
with open("tests/test_data/original.docx", "rb") as original_file:
with open("tests/test_data/modified.docx", "rb") as modified_file:
files = {
"original": ("original.docx", original_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
"modified": ("modified.docx", modified_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
}
data = {"author_tag": "test_author"}
response = await client.post("/redline/double", files=files, data=data, headers=headers)

assert response.status_code == 200
assert response.headers["content-type"] == "application/zip"
assert "attachment; filename=redlines.zip" in response.headers["content-disposition"]

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add docstring and validate ZIP response structure.

Similar to test_redline_single, this test needs docstring and response body validation.

Apply this diff:

 @pytest.mark.asyncio async def test_redline_double(headers): + """Test /redline/double endpoint returns a ZIP with two redlined DOCX files.""" async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client: with open("tests/test_data/original.docx", "rb") as original_file: with open("tests/test_data/modified.docx", "rb") as modified_file: files = { "original": ("original.docx", original_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), "modified": ("modified.docx", modified_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), } data = {"author_tag": "test_author"} response = await client.post("/redline/double", files=files, data=data, headers=headers) assert response.status_code == 200 assert response.headers["content-type"] == "application/zip" assert "attachment; filename=redlines.zip" in response.headers["content-disposition"] + # Verify response is a valid ZIP + assert len(response.content) > 0 + assert response.content[:4] == b"PK\x03\x04", "Response should be a valid ZIP file"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@pytest.mark.asyncio
async def test_redline_double(headers):
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
with open("tests/test_data/original.docx", "rb") as original_file:
with open("tests/test_data/modified.docx", "rb") as modified_file:
files = {
"original": ("original.docx", original_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
"modified": ("modified.docx", modified_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
}
data = {"author_tag": "test_author"}
response = await client.post("/redline/double", files=files, data=data, headers=headers)
assert response.status_code == 200
assert response.headers["content-type"] == "application/zip"
assert "attachment; filename=redlines.zip" in response.headers["content-disposition"]
@pytest.mark.asyncio
async def test_redline_double(headers):
"""Test /redline/double endpoint returns a ZIP with two redlined DOCX files."""
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
with open("tests/test_data/original.docx", "rb") as original_file:
with open("tests/test_data/modified.docx", "rb") as modified_file:
files = {
"original": ("original.docx", original_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
"modified": ("modified.docx", modified_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
}
data = {"author_tag": "test_author"}
response = await client.post("/redline/double", files=files, data=data, headers=headers)
assert response.status_code == 200
assert response.headers["content-type"] == "application/zip"
assert "attachment; filename=redlines.zip" in response.headers["content-disposition"]
# Verify response is a valid ZIP
assert len(response.content) > 0
assert response.content[:4] == b"PK\x03\x04", "Response should be a valid ZIP file"
Comment on lines +52 to +57
@pytest.mark.asyncio
async def test_redline_single_no_token():
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
response = await client.post("/redline/single")
assert response.status_code == 422 # Validation error for missing header

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add docstring to security test.

The test correctly validates missing token handling but needs documentation.

Apply this diff:

 @pytest.mark.asyncio async def test_redline_single_no_token(): + """Test /redline/single returns 422 validation error when X-Api-Token header is missing.""" async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client: response = await client.post("/redline/single") assert response.status_code == 422 # Validation error for missing header
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@pytest.mark.asyncio
async def test_redline_single_no_token():
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
response = await client.post("/redline/single")
assert response.status_code == 422 # Validation error for missing header
@pytest.mark.asyncio
async def test_redline_single_no_token():
"""Test /redline/single returns 422 validation error when X-Api-Token header is missing."""
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
response = await client.post("/redline/single")
assert response.status_code == 422 # Validation error for missing header
🤖 Prompt for AI Agents
In tests/test_api.py around lines 52 to 57, the test function test_redline_single_no_token lacks a docstring explaining its intent; add a one-line docstring directly under the async def that describes that this test asserts the endpoint returns 422 when the authentication header/token is missing, e.g. "Ensure POST /redline/single returns 422 when Authorization token is not provided." Keep it concise and place it before the test body. 
Comment on lines +59 to +72
@pytest.mark.asyncio
async def test_redline_single_invalid_token():
headers = {"X-Api-Token": "invalid-token"}
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
with open("tests/test_data/original.docx", "rb") as original_file:
with open("tests/test_data/modified.docx", "rb") as modified_file:
files = {
"original": ("original.docx", original_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
"modified": ("modified.docx", modified_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
}
data = {"author_tag": "test_author"}
response = await client.post("/redline/single", files=files, data=data, headers=headers)
assert response.status_code == 401

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add docstring to security test.

The test correctly validates invalid token handling but needs documentation.

Apply this diff:

 @pytest.mark.asyncio async def test_redline_single_invalid_token(): + """Test /redline/single returns 401 unauthorized when X-Api-Token header has invalid value.""" headers = {"X-Api-Token": "invalid-token"} async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client: with open("tests/test_data/original.docx", "rb") as original_file:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@pytest.mark.asyncio
async def test_redline_single_invalid_token():
headers = {"X-Api-Token": "invalid-token"}
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
with open("tests/test_data/original.docx", "rb") as original_file:
with open("tests/test_data/modified.docx", "rb") as modified_file:
files = {
"original": ("original.docx", original_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
"modified": ("modified.docx", modified_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
}
data = {"author_tag": "test_author"}
response = await client.post("/redline/single", files=files, data=data, headers=headers)
assert response.status_code == 401
@pytest.mark.asyncio
async def test_redline_single_invalid_token():
"""Test /redline/single returns 401 unauthorized when X-Api-Token header has invalid value."""
headers = {"X-Api-Token": "invalid-token"}
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
with open("tests/test_data/original.docx", "rb") as original_file:
with open("tests/test_data/modified.docx", "rb") as modified_file:
files = {
"original": ("original.docx", original_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
"modified": ("modified.docx", modified_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
}
data = {"author_tag": "test_author"}
response = await client.post("/redline/single", files=files, data=data, headers=headers)
assert response.status_code == 401
🤖 Prompt for AI Agents
In tests/test_api.py around lines 59 to 72, the security test lacks a docstring; add a one-line docstring immediately below the test function definition describing purpose (e.g., "Ensure endpoint returns 401 for invalid API token when submitting DOCX files"), keep it brief and in present tense, and ensure it does not change test behavior or indentation. 
Comment on lines +74 to +86
@pytest.mark.asyncio
async def test_file_size_limit(headers):
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
# Create a dummy file larger than the limit
large_file_content = b"a" * (MAX_FILE_SIZE + 1)
files = {
"original": ("large.docx", large_file_content, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
"modified": ("modified.docx", open("tests/test_data/modified.docx", "rb"), "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
}
data = {"author_tag": "test_author"}
response = await client.post("/redline/single", files=files, data=data, headers=headers)

assert response.status_code == 413 No newline at end of file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix critical resource leak and add docstring.

This test has a critical resource leak on line 81 where a file is opened without proper cleanup. This will cause issues on Windows and under concurrent test execution.

Apply this diff to fix the leak and add docstring:

 @pytest.mark.asyncio async def test_file_size_limit(headers): + """Test /redline/single returns 413 when original file exceeds MAX_FILE_SIZE.""" async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client: # Create a dummy file larger than the limit large_file_content = b"a" * (MAX_FILE_SIZE + 1) - files = { - "original": ("large.docx", large_file_content, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), - "modified": ("modified.docx", open("tests/test_data/modified.docx", "rb"), "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), - } - data = {"author_tag": "test_author"} - response = await client.post("/redline/single", files=files, data=data, headers=headers) + with open("tests/test_data/modified.docx", "rb") as modified_file: + files = { + "original": ("large.docx", large_file_content, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), + "modified": ("modified.docx", modified_file, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"), + } + data = {"author_tag": "test_author"} + response = await client.post("/redline/single", files=files, data=data, headers=headers) assert response.status_code == 413
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@pytest.mark.asyncio
async def test_file_size_limit(headers):
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
# Create a dummy file larger than the limit
large_file_content = b"a" * (MAX_FILE_SIZE + 1)
files = {
"original": ("large.docx", large_file_content, "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
"modified": ("modified.docx", open("tests/test_data/modified.docx", "rb"), "application/vnd.openxmlformats-officedocument.wordprocessingml.document"),
}
data = {"author_tag": "test_author"}
response = await client.post("/redline/single", files=files, data=data, headers=headers)
assert response.status_code == 413
@pytest.mark.asyncio
async def test_file_size_limit(headers):
"""Test /redline/single returns 413 when original file exceeds MAX_FILE_SIZE."""
async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
# Create a dummy file larger than the limit
large_file_content = b"a" * (MAX_FILE_SIZE + 1)
with open("tests/test_data/modified.docx", "rb") as modified_file:
files = {
"original": (
"large.docx",
large_file_content,
"application/vnd.openxmlformats-officedocument.wordprocessingml.document",
),
"modified": (
"modified.docx",
modified_file,
"application/vnd.openxmlformats-officedocument.wordprocessingml.document",
),
}
data = {"author_tag": "test_author"}
response = await client.post(
"/redline/single", files=files, data=data, headers=headers
)
assert response.status_code == 413
🤖 Prompt for AI Agents
In tests/test_api.py around lines 74 to 86, the test_file_size_limit opens tests/test_data/modified.docx without closing it (line ~81) and lacks a docstring; change the test to open the file using a context manager (with open(...) as modified_file:) so the file is closed after constructing the files payload (or alternatively load its bytes before building files), and add a brief docstring at the top of the test describing what it validates (e.g., "Ensure uploading a file larger than MAX_FILE_SIZE returns 413"). 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

2 participants