DEV Community

Cover image for SONICS.ai 🎬 πŸ’₯ 🎞️ create character-consistent Comics that 'speak' your Style
SS
SS

Posted on

SONICS.ai 🎬 πŸ’₯ 🎞️ create character-consistent Comics that 'speak' your Style

__This is a submission for the Google AI Studio Multimodal Challenge
Β 

πŸ’‘ Inspiration

Β 
I always wanted to make comics that can capture my chaotic imaginations - but the drawing, erasing, starting again is πŸ‘€ such a drag!

Also, AI didn't help much - create - frustrate - regenerate - repeat and yet couldn't get my vibe 🌈 ... πŸ‘€ even more drag!
Β 

Well that was until ✨ Gemini nano banana gemini-2.5-flash-image-preview!

I am so blown by its editing capabilities specially working with multi-image, multi-modal inputs that I couldn't allow my lazy self to procrastinate anymore !

So, here's (quick links)

πŸͺ„ What I built

πŸŽ₯ Demo Video

🧩 Multimodal app architecture

⚑ How I Used Google AI Studio

✨ Multimodal capabilities I implemented

πŸš€ Specific multimodal features I built for UX

πŸŽ‰ Acknowledgement


What I built

Β 
SONICS.ai πŸͺ„ is a Google-AI ✨ powered creative suite 🧠 🎬 πŸ“š 🎞️ that transforms user's simple idea into a fully-realized, multi-sensory, character-consistent comic book experience with podcast playbacks.

It allows users to add their flavours/ vibes 🌈 to every aspect of comic creation - from storyline to characters to scenes to dialogues to text styles - all in natural language.

Β 
The best part? You dont need to be good at drawing! AI solves it for you in ⚑ minutes !

You can bring your creativity to life without losing your patience with back-n-forth regeneration to get that perfect shot!

Β 
You can use SONICS for a variety of use cases - from bedtime stories podcast to full production-ready comics with playback

Bring your stories to life - your style!
Let your imagination go wild !


Demo

My project in action πŸŽ₯
Multimodal app architecture 🧩

Β 

My project in action

Β 

0:00 Intro
0:10 🧠 Story Conception
0:20 🎬 Character/ Cast Design
0:53 🎞️ Comic Panel Creation
1:24 πŸ“š Comic preview
1:34 🎧 Audio preview
1:47 πŸŽ₯ Play the Comic that speaks your Style

▢️ Play on Youtube

Note : Due to billing constraints, I couldnt deploy my app so this is the video demo πŸ‘† showing my project in action.

Β 

Multimodal app architecture

Β 
gemini-2.5-flash gemini-2.5-flash-image-preview imagen-4.0 imagen-3.0

🧩

phase 1
phase 2
phase 3
phase 4


How I Used Google AI Studio

Β 
This app was entirely built on Google AI studio ⚑vibe-coded from scratch
πŸ‘€ as you could have guessed by now for my lazy vibes !

Β 
I started with a simple idea prompt and kept on adding features by guiding the AI through pain-points I have faced when vibe-creating comics with my flavour.

Β 
The Multimodal capabilities I implemented ...

Multimodal Capabilities

Β 

Input

Output

Models ✨

Features πŸš€

Text




Image




gemini-2.5-flash-image-preview

imagen

For quality Character, Scene Background generation

Text editor based updates



Image + Text


Text


gemini-2.5-flash


Automatic character description updates for natural language based character edits

Image (mask) + Image + Text

Image



gemini-2.5-flash-image-preview

For precise edits in characters/ scenes, dialogue corrections, text stylings, positional edits, detail improvement

Multiple Images + Text

A composite image with rendered text

gemini-2.5-flash-image-preview

For comics scene panel generations ensuring character consistencies across scenes, dailogue accuracy, scene quality


Multimodal Features

The specific Multimodal functionalities πŸš€ I built and why it enhances the user experience πŸ‘€ (UX)...

Β 

Composite scene panels 🎞️

Β 
✨   imagen   gemini-2.5-flash-image-preview   gemini-2.5-flash
Β 
πŸš€ Β  The comic panels are created through an intelligent composition logic combining the multimodal capabilities of the models to create final panel images from the inputs - scene background, character images, scripts that were themsleves generated by using either of these.
Β 
πŸ‘€ Β  This ensures character consistency, dialogue accuracy as well as scene quality across comic scenes.

Β 

Flavour edits 🌈

Β 
✨   gemini-2.5-flash-image-preview  gemini-2.5-flash
Β 
πŸš€ Β  It is used for enabling precise surgical edits of scenes, characters, dialogues, styles leveraging masking.
Users can simply explain their edits in natural language for feature changes (with / without masking).
It also handles auto-updating user edit requests for images which must reflect in their respective strategic texts like character description to ensure further consistencies.
Β 
πŸ‘€ Β  This helps users avoid regenerating back-and-forth images from scratch which was really frustrating when we need to make a small style/ error correction. And users can add their vibes/ flavours/ styles to the scene in natural language without worrying about any inconsistency.


πŸŽ‰

Acknowledgement

Β 

Google AI studio ⚑ is phenomenal at vibe-coding. I was able to generate and finish a well-working prototype in less that 6 hrs.
But as you could have guessed πŸ‘€ Parkinson's law took most time !
Β 
gemini-2.5-flash-image-preview ✨ (Gemini nano-banana) is the star of my whole idea. Due to nano banana, I was able to successfully create a consistent character comic experience, and solve the back-and-forth regeneration & vibe-check problem for vibe-comic enthusiasts.

imagen ✨ helped me create beautiful backgrounds for the comic scenes which were then fully realised using composite logic.

gemini-2.5-flash ✨ has been used for prompt engineering for inputs to other models, for auto-updating descriptions and also for optimising the deliverables.


Thank you!
It was a fun and great experience!

πŸ‘€

What Definitely Not a drag!

Top comments (5)

Collapse
 
cloutboi profile image
Jovan Petrovic

This is dope!

Collapse
 
linfordlee14 profile image
Linford

This is a great indeed love the way you used your creativity to achieve such an amazing project

Collapse
 
ssithub profile image
SS

To all the comic enthusiasts- add your suggestions you would like to try

Collapse
 
ssithub profile image
SS

Comment down your use cases & any suggestions

Some comments may only be visible to logged-in visitors. Sign in to view all comments.