Posted on Mar 7

Building a Meeting Summarizer Backend with Python FastAPI, AWS Transcribe and AWS Bedrock

Introduction

In this tutorial, we’ll build a meeting summarizer backend using FastAPI and AWS Transcribe and Bedrock foundational models. The application transcribes audio recordings, extracts key discussion points, and provides structured summaries with sentiment analysis and issue detection.

Key Features

Audio Transcription – Uses AWS Transcribe to convert speech to text.
Speaker Labeling – Identifies different speakers in the conversation.
Summarization – AWS Bedrock’s Titan model extracts key insights.
Sentiment Analysis & Issue Detection – Provides a concise summary with tone detection.
FastAPI Backend – A lightweight, high-performance API for seamless integration.

Tech Stack

FastAPI – Lightweight web framework for Python
AWS Transcribe – Speech-to-text conversion
AWS Bedrock – Fully managed AI service providing LLM integration
Amazon S3 – Cloud storage for audio files and transcriptions
Jinja2 – Template engine for prompt formatting

Step 1: Project Setup

1. Install Prerequisites

Python 3.10+
Poetry 1.8+ – Dependency management tool
AWS CLI (Optional, for testing)

2. AWS S3 and Bedrock setup

Create two s3 buckets and grant necessary permissions.
- AWS_BUCKET_NAME - Bucket for holding the Audio files
- OUTPUT_BUCKET_NAME - Bucket for holding transcriptions
Request model access. In this example I'm using Titan Text G1 - Lite.

3. Clone the Repository

git clone https://github.com/bokal2/meeting-summarizer-backend.git cd meeting-summarizer-backend

4. Install Dependencies

install dependencies:

poetry shell poetry install

4. Configure AWS Credentials

Create a .env file with the following:

AWS_REGION=your_aws_region AWS_ACCESS_KEY_ID=your_access_key AWS_SECRET_ACCESS_KEY=your_secret_key AWS_BUCKET_NAME=your_bucket_name OUTPUT_BUCKET_NAME=your_output_bucket_name

Step 2: API Implementation

Main Components

The backend consists of:

Audio Upload & Transcription – Sends audio files to AWS S3 and triggers AWS Transcribe.
Text Processing – Converts transcribed text into a structured format.
Summarization with AWS Bedrock – Generates meeting summaries based on a prompt template.

FastAPI Implementation (main.py)

import json import time import uuid from fastapi import FastAPI, HTTPException, File, UploadFile from fastapi.templating import Jinja2Templates from fastapi.middleware.cors import CORSMiddleware import boto3 from decouple import config AWS_REGION = config("AWS_REGION") AWS_ACCESS_KEY_ID = config("AWS_ACCESS_KEY_ID") AWS_SECRET_ACCESS_KEY = config("AWS_SECRET_ACCESS_KEY") BUCKET_NAME = config("AWS_BUCKET_NAME") OUTPUT_BUCKET_NAME = config("OUTPUT_BUCKET_NAME") app = FastAPI() # Configure allowed origins origins = [ "http://localhost:3000", # Testing with a React App ] app.add_middleware( CORSMiddleware, allow_origins=origins, allow_credentials=True, allow_methods=["*"], allow_headers=["*"], ) templates = Jinja2Templates(directory="templates") async def upload_file_to_s3(file_obj, file_name, s3_client, bucket_name): try: s3_client.upload_fileobj(file_obj, bucket_name, file_name) except Exception as e: raise HTTPException( status_code=400, detail=f"File uplaod failed: {e}", ) def process_transcription(transcript_json): output_text = "" current_speaker = None items = transcript_json["results"]["items"] for item in items: speaker_label = item.get("speaker_label", None) content = item["alternatives"][0]["content"] if speaker_label is not None and speaker_label != current_speaker: current_speaker = speaker_label output_text += f"\n{current_speaker}: " if item["type"] == "punctuation": output_text = output_text.rstrip() output_text += f"{content} " return output_text async def transcribe_audio( model_id, bucket_name, file_name, file_content, output_bucket, ): # Upload Audio to s3 bucket  s3_client = boto3.client( "s3", region_name=AWS_REGION, aws_access_key_id=AWS_ACCESS_KEY_ID, aws_secret_access_key=AWS_SECRET_ACCESS_KEY, ) await upload_file_to_s3( file_obj=file_content, file_name=file_name, s3_client=s3_client, bucket_name=bucket_name, ) transcribe_client = boto3.client( "transcribe", region_name=AWS_REGION, aws_access_key_id=AWS_ACCESS_KEY_ID, aws_secret_access_key=AWS_SECRET_ACCESS_KEY, ) job_name = f"transcription-job-{uuid.uuid4()}" transcribe_client.start_transcription_job( TranscriptionJobName=job_name, Media={"MediaFileUri": f"s3://{bucket_name}/{file_name}"}, MediaFormat="mp3", LanguageCode="en-US", OutputBucketName=output_bucket, Settings={"ShowSpeakerLabels": True, "MaxSpeakerLabels": 2}, ) while True: job_status = transcribe_client.get_transcription_job( TranscriptionJobName=job_name, ) status = job_status["TranscriptionJob"]["TranscriptionJobStatus"] if status in ["COMPLETED", "FAILED"]: break time.sleep(2) if status == "FAILED": raise HTTPException(status_code=400, detail="Transcription Job failed") transcript_key = f"{job_name}.json" transcript_obj = s3_client.get_object( Bucket=output_bucket, Key=transcript_key, ) transcript_text = transcript_obj["Body"].read().decode("utf-8") transcript_json = json.loads(transcript_text) output_text = process_transcription(transcript_json) result = await summarize_transcription( model_id, transcript=output_text, ) return result

Step 3: Summarization Using AWS Bedrock

Prompt Engineering

We use a Jinja2 template to format the transcript for the Bedrock model:

I need to analyze and summarize a conversation. The transcript of the conversation is between the <data> XML-like tags. <data> {{transcript}} </data> Please do the following: 1. Identify the main topic being discussed. 2. Provide a concise summary of key points. 3. Include a one-word sentiment analysis. 4. List any issues, problems, or conflicts. Format the output in JSON: { "topic": "<main_topic>", "meeting_summary": "<summary>", "sentiment": "<one_word_sentiment>", "issues": [{"topic": "<issue>", "summary": "<description>"}] }

AWS Bedrock Summarization

async def summarize_transcription(model_id: str, transcript: str): bedrock_runtime = boto3.client( "bedrock-runtime", region_name=AWS_REGION, aws_access_key_id=AWS_ACCESS_KEY_ID, aws_secret_access_key=AWS_SECRET_ACCESS_KEY, ) template = templates.get_template("prompt_template.txt") rendered_prompt = template.render(transcript=transcript) try: kwargs = { "modelId": model_id, "contentType": "application/json", "accept": "*/*", "body": json.dumps( { "inputText": rendered_prompt, "textGenerationConfig": { "maxTokenCount": 512, "temperature": 0, "topP": 0.9, }, } ), } # Call AWS Bedrock  response = bedrock_runtime.invoke_model(**kwargs) # Parse response  response_body = json.loads(response.get("body").read()) result = response_body["results"][0]["outputText"] return {"response": result} except Exception as e: raise HTTPException( status_code=500, detail=f"Error invoking Bedrock: {str(e)}", )

Summary API Endpoint

@app.post("/summary") async def audio_summary_test( model_id: str = "amazon.titan-text-lite-v1", file: UploadFile = File(...), ): """An endpoint for generating meeting summary from audio file""" # But first, ensure the cursor is at position 0:  file.file.seek(0) response = await transcribe_audio( model_id=model_id, bucket_name=BUCKET_NAME, file_name=file.filename, file_content=file.file, output_bucket=OUTPUT_BUCKET_NAME, ) return {"response": response}

Step 4: Running the Application

1. Start the FastAPI Server

uvicorn main:app --reload

2. Test the API Using cURL

curl -X POST "http://127.0.0.1:8000/summary" \ -H "Content-Type: multipart/form-data" \ -F "file=@meeting_audio.mp3" \ -F "model_id=amazon.titan-text-lite-v1"

3. Sample JSON Response

{ "response": { "topic": "Project Updates", "meeting_summary": "The meeting discussed progress on Q1 deliverables...", "sentiment": "positive", "issues": [ {"topic": "Timeline Delay", "summary": "The team noted delays in the design phase."} ] } }

4. Test the API Using Next.js Application

I created a simple Next.js app to test the API. You can find the code in this Git repository, along with detailed setup instructions in the README to help you get it up and running quickly.

Some Noticeable Challanges

Accuracy in Transcription – Issues with accents, low audio volume, overlapping speech, and background noise can lead to poor transcription results.
LLM Summarization Accuracy – May miss nuances or oversimplify complex discussions.
Processing Time and Latency – Large audio files lead to long transcription times and LLM response delays.
Scalability Issues – Handling multiple users and large audio files can lead constraints on the underlying resources.
Prompt Engineering Complexity – Designing effective prompts for sequential or chat-based interactions is challenging, with limited reference resources currently available.

Conclusion

Exploring AWS Bedrock and experimenting with different foundation models was an exciting experience. It’s impressive how seamlessly developers can leverage these models to build LLM-powered applications with minimal hassle. The potential is immense, and I look forward to diving deeper, exploring advanced models, and uncovering new possibilities.

Next Steps:

Deploy the API using AWS Lambda or EKS
Enhance prompt engineering for better summarization accuracy

DEV Community