Posted on Jun 19

SpeechDown CLI: Playground for Software Craft and AI Collaboration

I've been working on a personal project called SpeechDown, a CLI tool that turns my voice notes into timestamped, multilingual Markdown files I can actually search and revisit. The aim isn’t to launch the next blockbuster transcription service—it’s to give myself a dependable way to capture ideas on the go in a structured format. For the last couple of years I’ve relied on its predecessor, voice-cli, which proved how powerful that workflow can be. SpeechDown is the natural successor and, yes, a playground for practicing software-craft principles and experimenting with AI-driven development.

This post is a brief tour of that journey so far.

A Quick Disclaimer

First things first: I don't recommend using SpeechDown for any critical work just yet. It's a work in progress. However, I believe the code and the development practices behind it can serve as a useful, real-world example for the concepts I'm about to discuss.

You can find the full source code on GitHub: dudarev/speechdown.

TLDR Overview

Purpose
- Capture and organise my own voice notes in a searchable Markdown corpus
- Personal sandbox to practise software-craft principles
- Test-bed for AI-assisted coding workflows
Architecture in a nutshell
- Domain-Driven Design (DDD) keeps core logic pure and language-aligned
- Ports & Adapters (Hexagonal) pattern isolates I/O, letting adapters swap freely
- Four layers: domain, application, infrastructure, presentation
Process discipline
- Architecture Decision Records (ADRs) capture the “why” of each big choice
- Design / PRD docs outline features up front for both humans and AIs
AI collaboration model
- Design docs serve as rich prompts for Copilot, Codex, Claude Code, etc.
- A single AI-rules.md file synchronizes naming, layout, and testing rules across tools
Current capabilities
- sd transcribe --within-hours 24 turns recent audio into timestamped Markdown
- Adding a new speech-to-text engine is as simple as implementing another adapter
Status
- Pre-v1 playground: solid for learning and tinkering, not yet production-grade
Further reading
- See the Relevant Links section at the end for a curated set of recent deep-dive posts and tools that extend these ideas.

Part 1: A Playground for Software Craftsmanship

One of my main goals with SpeechDown was to apply and practice established software design patterns in a Python context.

Domain-Driven Design (DDD) & Ports and Adapters Pattern

I structured the project using a layered architecture inspired by DDD and the Ports and Adapters (or Hexagonal) pattern. This helps keep the core logic of the application separate from the tools and technologies it uses.

The project is split into four distinct layers:

domain: Contains the core business logic, entities, and value objects. It has zero external dependencies.
application: Orchestrates the use cases. It defines interfaces (Ports) for external interactions.
infrastructure: Provides concrete implementations (Adapters) for the ports. This is where database connections, file system access, and API calls live.
presentation: The user-facing layer, in this case, the Command-Line Interface (CLI).

This structure is reflected in the source code directory:

src/speechdown/ ├── application/ │ ├── ports/ │ └── services/ ├── domain/ │ ├── entities.py │ └── value_objects.py ├── infrastructure/ │ ├── adapters/ │ └── database.py └── presentation/ └── cli/

A Port is just an interface. For example, to get a timestamp from a file, the application layer defines a simple contract:

# src/speechdown/application/ports/timestamp_port.py from pathlib import Path from typing import Protocol from datetime import datetime class TimestampPort(Protocol): def get_timestamp(self, path: Path) -> datetime: """Return timestamp extracted from filename or fallback to file mtime.""" ...

The Adapter is the concrete implementation. This one parses filenames or falls back to the file's modification time:

# src/speechdown/infrastructure/adapters/file_timestamp_adapter.py import re from datetime import datetime from pathlib import Path # ...  @dataclass class FileTimestampAdapter(TimestampPort): """Adapter for extracting timestamps from filenames with fallbacks.""" def get_timestamp(self, path: Path) -> datetime: # Try to extract from filename  extracted = self._extract_from_filename(path.name) if extracted: return extracted # Fallback to file modification time  return self._get_file_fallback_time(path) # ... implementation details ...

This separation makes the system incredibly flexible and testable. I can easily swap out the FileTimestampAdapter for one that reads metadata from the audio file without changing any of the application's core logic.

Documenting Decisions with ADRs and Design Docs

To keep track of why certain decisions were made, I use Architecture Decision Records (ADRs). They are simple Markdown files that document a decision, its context, and its consequences. You can see them in docs/adrs/.

For more detailed feature planning, I use Design Documents, which outline the how—covering product requirements, UX, and technical design. This practice is especially useful when working with AI assistants.

Part 2: A Playground for AI Collaboration

The second major goal of SpeechDown is to explore how to work effectively with modern AI coding assistants. Simply asking an AI to "add a feature" often results in code that breaks the established architecture.

My solution involves two key practices:

1. Design Documents (PRDs) as AI Prompts

I write detailed design documents before starting a feature. These documents serve as a comprehensive prompt for the AI, giving it the necessary context to generate code that fits the project's structure. I'm considering renaming my design folder to prds (Product Requirement Documents), as this seems to be emerging as a standard term for this practice.

2. Explicit Rules for AI Assistants

I maintain a master rule file, docs/ai/AI-rules.md, that explicitly defines the project's architecture, naming conventions, and testing requirements.

Here's a snippet:

# Master AI Rules for SpeechDown ## Common Guidelines for All AI Assistants ### Architecture (ADR 008) - Follow Domain-Driven Design with four layers: **domain**, **application**, **infrastructure**, **presentation**. - Domain layer (`src/speechdown/domain/`) contains entities and value objects only. No external dependencies. - Application layer (`src/speechdown/application/`) defines ports (interfaces) under `application/ports/`... - Dependencies point inward... ### Naming Conventions - Interfaces end with `Port` (e.g., `TranscriptionPort`). - Implementations end with `Adapter` (e.g., `WhisperTranscriberAdapter`). - Service classes end with `Service`.

A simple Python script (scripts/generate_ai_rules.py) then generates specific configuration files for different AI assistants from this master file:

.github/copilot-instructions.md for GitHub Copilot
AGENTS.md for OpenAI Codex
CLAUDE.md for Anthropic's Claude

This ensures that no matter which tool I'm using—GitHub Copilot, Google's Jules, or Claude Code—it has the same set of instructions. This has dramatically improved the quality and compliance of AI-generated code.

A Quick Look at the Tool

Despite being a playground, SpeechDown is a usable CLI tool. After initializing a project with sd init, you can run a transcription with a simple command:

# Transcribe all audio files modified in the last 24 hours sd transcribe --within-hours 24

This processes the audio files and groups the transcriptions into daily Markdown files, like 2025-06-13.md:

## 2025-06-13 14:30:22 - recording_idea.m4a This is the transcribed text from my first audio note. I should remember to talk about the AI rules.  ---  ## 2025-06-13 16:45:10 - project_update.m4a Another transcription from a different file, automatically appended and sorted chronologically.  ---

Relevant Links

This section gathers the core references mentioned above plus a hand-picked set of very recent articles for anyone who wants to dig deeper into the architecture patterns, ADR discipline, AI-assisted coding.

Project

Key Patterns & Practices

Recent Deep Dives

Conclusion

This project has been an incredible learning experience. It's a practical exercise in applying software architecture principles and a fascinating exploration of human-AI collaboration in coding.

I'm sharing this not as a finished product, but as a collection of ideas and examples. I'd love to hear your thoughts on this approach.

What are your strategies for maintaining clean architecture in your projects?
How do you guide AI assistants to produce code that fits your standards?

Feel free to browse the source code on GitHub, open an issue, or leave a comment below!

DEV Community