Posted on Sep 14

SONICS.ai 🎬 💥 🎞️ create character-consistent Comics that 'speak' your Style

#devchallenge #googleaichallenge #ai #gemini

__This is a submission for the Google AI Studio Multimodal Challenge

💡 Inspiration

I always wanted to make comics that can capture my chaotic imaginations - but the drawing, erasing, starting again is 👀 such a drag!

Also, AI didn't help much - create - frustrate - regenerate - repeat and yet couldn't get my vibe 🌈 ... 👀 even more drag!

Well that was until ✨ Gemini nano banana gemini-2.5-flash-image-preview!

I am so blown by its editing capabilities specially working with multi-image, multi-modal inputs that I couldn't allow my lazy self to procrastinate anymore !

So, here's (quick links)

🪄 What I built

🎥 Demo Video

🧩 Multimodal app architecture

⚡ How I Used Google AI Studio

✨ Multimodal capabilities I implemented

🚀 Specific multimodal features I built for UX

🎉 Acknowledgement

What I built

SONICS.ai 🪄 is a Google-AI ✨ powered creative suite 🧠 🎬 📚 🎞️ that transforms user's simple idea into a fully-realized, multi-sensory, character-consistent comic book experience with podcast playbacks.

It allows users to add their flavours/ vibes 🌈 to every aspect of comic creation - from storyline to characters to scenes to dialogues to text styles - all in natural language.

The best part? You dont need to be good at drawing! AI solves it for you in ⚡ minutes !

You can bring your creativity to life without losing your patience with back-n-forth regeneration to get that perfect shot!

You can use SONICS for a variety of use cases - from bedtime stories podcast to full production-ready comics with playback

Bring your stories to life - your style!
Let your imagination go wild !

Demo

My project in action 🎥
Multimodal app architecture 🧩

My project in action

0:00 Intro
0:10 🧠 Story Conception
0:20 🎬 Character/ Cast Design
0:53 🎞️ Comic Panel Creation
1:24 📚 Comic preview
1:34 🎧 Audio preview
1:47 🎥 Play the Comic that speaks your Style

▶️ Play on Youtube

Note : Due to billing constraints, I couldnt deploy my app so this is the video demo 👆 showing my project in action.

Multimodal app architecture

gemini-2.5-flash gemini-2.5-flash-image-preview imagen-4.0 imagen-3.0

🧩

How I Used Google AI Studio

This app was entirely built on Google AI studio ⚡vibe-coded from scratch
👀 as you could have guessed by now for my lazy vibes !

I started with a simple idea prompt and kept on adding features by guiding the AI through pain-points I have faced when vibe-creating comics with my flavour.

The Multimodal capabilities I implemented ...

Multimodal Capabilities

Input

Output

Models ✨

Features 🚀

Text

Image

gemini-2.5-flash-image-preview

imagen

For quality Character, Scene Background generation

Text editor based updates

Image + Text

Text

gemini-2.5-flash

Automatic character description updates for natural language based character edits

Image (mask) + Image + Text

Image

gemini-2.5-flash-image-preview

For precise edits in characters/ scenes, dialogue corrections, text stylings, positional edits, detail improvement

Multiple Images + Text

A composite image with rendered text

gemini-2.5-flash-image-preview

For comics scene panel generations ensuring character consistencies across scenes, dailogue accuracy, scene quality

Input	Output	Models ✨	Features 🚀
Text	Image	`gemini-2.5-flash-image-preview` `imagen`	For quality Character, Scene Background generation Text editor based updates
Image + Text	Text	`gemini-2.5-flash`	Automatic character description updates for natural language based character edits
Image (mask) + Image + Text	Image	`gemini-2.5-flash-image-preview`	For precise edits in characters/ scenes, dialogue corrections, text stylings, positional edits, detail improvement
Multiple Images + Text	A composite image with rendered text	`gemini-2.5-flash-image-preview`	For comics scene panel generations ensuring character consistencies across scenes, dailogue accuracy, scene quality

Multimodal Features

The specific Multimodal functionalities 🚀 I built and why it enhances the user experience 👤 (UX)...

Composite scene panels 🎞️

✨ imagen gemini-2.5-flash-image-preview gemini-2.5-flash

🚀 The comic panels are created through an intelligent composition logic combining the multimodal capabilities of the models to create final panel images from the inputs - scene background, character images, scripts that were themsleves generated by using either of these.

👤 This ensures character consistency, dialogue accuracy as well as scene quality across comic scenes.

Flavour edits 🌈

✨ gemini-2.5-flash-image-preview gemini-2.5-flash

🚀 It is used for enabling precise surgical edits of scenes, characters, dialogues, styles leveraging masking.
Users can simply explain their edits in natural language for feature changes (with / without masking).
It also handles auto-updating user edit requests for images which must reflect in their respective strategic texts like character description to ensure further consistencies.

👤 This helps users avoid regenerating back-and-forth images from scratch which was really frustrating when we need to make a small style/ error correction. And users can add their vibes/ flavours/ styles to the scene in natural language without worrying about any inconsistency.

🎉

Acknowledgement

Google AI studio ⚡ is phenomenal at vibe-coding. I was able to generate and finish a well-working prototype in less that 6 hrs.
But as you could have guessed 👀 Parkinson's law took most time !

gemini-2.5-flash-image-preview ✨ (Gemini nano-banana) is the star of my whole idea. Due to nano banana, I was able to successfully create a consistent character comic experience, and solve the back-and-forth regeneration & vibe-check problem for vibe-comic enthusiasts.

imagen ✨ helped me create beautiful backgrounds for the comic scenes which were then fully realised using composite logic.

gemini-2.5-flash ✨ has been used for prompt engineering for inputs to other models, for auto-updating descriptions and also for optimising the deliverables.

Thank you!
It was a fun and great experience!