Posted on Nov 11, 2024 • Edited on Apr 12

Made AI-Powered Interactive Storybook Generator with Next.js, Gemini and Elevenlabs ️‍🔥

Hey there, fellow developers! 👋 Today, I'm excited to share how I built CrazyStory, an interactive storybook generator that combines the power of AI for story generation, text-to-speech, and image generation. This project showcases how to create an engaging web application that turns simple prompts into full-fledged illustrated stories with audio narration.

Tech Stack

Frontend Framework: Next.js with React
UI Components: shadcn/ui
Styling: Tailwind CSS
AI Services:

Google's Gemini AI for story generation
ElevenLabs API for text-to-speech
GetImg.ai for image generation

Additional Libraries:
jsPDF for PDF generation
Lucide React for icons
React Hooks for state management

Key Features
AI-powered story generation based on user prompts
Automatic illustration generation for each story page
Text-to-speech narration
Interactive page navigation
PDF and audio download capabilities
Responsive design with a modern UI

Step-by-Step Implementation Guide

Project Setup First, create a new Next.js project with Tailwind CSS:

npx create-next-app@latest story-wizard-pro --typescript --tailwind cd story-wizard-pro

Install required dependencies:
npm install @google/generative-ai jspdf lucide-react npm install @radix-ui/react-dialog @radix-ui/react-slot

UI Components Setup The application uses shadcn/ui components for a polished look. Install the core components:

npx shadcn-ui@latest init npx shadcn-ui@latest add button card input dialog

Core Functionality Implementation Story Generation with Gemini AI The story generation uses Google's Gemini AI model. Here's the key implementation:

const initializeChatSession = async () => { const genAI = new GoogleGenerativeAI(process.env.NEXT_PUBLIC_GEMINI_API_KEY); const model = genAI.getGenerativeModel({ model: "gemini-1.5-flash", }); const generationConfig = { temperature: 1, topP: 0.95, topK: 64, maxOutputTokens: 8192, }; const chatSession = model.startChat({ generationConfig, safetySettings, }); return chatSession; };

Image Generation Integration
The application uses GetImg.ai for generating illustrations:

const generateImageForPage = async (pageContent) => { const response = await fetch('https://api.getimg.ai/v1/flux-schnell/text-to-image', { method: 'POST', headers: { 'Authorization': `Bearer ${YOUR_API_KEY}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ prompt: pageContent.join(' '), width: 1200, height: 1200, steps: 2, output_format: 'png', response_format: 'url', }), }); const data = await response.json(); return data.url; };

Text-to-Speech Implementation
ElevenLabs API is used for generating natural-sounding narration:

const generateAudio = async (text) => { const response = await fetch("https://api.elevenlabs.io/v1/text-to-speech/21m00Tcm4TlvDq8ikWAM", { method: 'POST', headers: { "Accept": "audio/mpeg", "Content-Type": "application/json", "xi-api-key": YOUR_API_KEY }, body: JSON.stringify({ text: text, model_id: "eleven_monolingual_v1", voice_settings: { stability: 0.5, similarity_boost: 0.5 } }) }); const blob = await response.blob(); return URL.createObjectURL(blob); };

User Interface Design The UI is built with a combination of Tailwind CSS and shadcn/ui components. Here's the main layout structure:

<div className="min-h-screen bg-gradient-to-b from-slate-900 via-slate-800 to-slate-900"> <NavigationBar /> <main className="container mx-auto px-4 py-8"> {/* Story Input Section */} <div className="max-w-2xl mx-auto space-y-4 mb-12"> <Input type="text" value={storyType} onChange={(e) => setStoryType(e.target.value)} placeholder="What's your story about?" className="w-full pl-12 pr-4 py-3" /> <Button onClick={generateStory}> Generate Story </Button> </div> {/* Story Display Section */} <Card className="bg-slate-800/50 border-slate-700"> {/* Navigation Controls */} {/* Story Content */} {/* Audio Controls */} </Card> </main> </div>

PDF Generation The PDF download feature uses jsPDF:

const downloadPDF = () => { const pdf = new jsPDF(); let y = 20; // Add title pdf.setFont("helvetica", "bold"); pdf.setFontSize(16); pdf.text(`A Story About ${storyType}`, 105, y, { align: "center" }); // Add content storyPages.forEach((page, index) => { if (pageImages[index]) { pdf.addImage(pageImages[index], 'JPEG', 20, y, 170, 100); } // Add text content page.forEach(paragraph => { const lines = pdf.splitTextToSize(paragraph, 170); lines.forEach(line => { pdf.text(line, 20, y); y += 7; }); }); }); pdf.save("storybook.pdf"); };

Conclusion
Building Story Wizard Pro was an exciting journey into combining multiple AI services into a cohesive web application. The project demonstrates how modern web technologies can be used to create engaging, interactive experiences.

DEV Community

Made AI-Powered Interactive Storybook Generator with Next.js, Gemini and Elevenlabs ️‍🔥

Top comments (0)