__This is a submission for the Google AI Studio Multimodal Challenge
Β
π‘ Inspiration
Β
I always wanted to make comics that can capture my chaotic imaginations - but the drawing, erasing, starting again is π such a drag!Also, AI didn't help much - create - frustrate - regenerate - repeat and yet couldn't get my vibe π ... π even more drag!
Β
Well that was until β¨ Gemini nano banana gemini-2.5-flash-image-preview
!
I am so blown by its editing capabilities specially working with multi-image, multi-modal inputs that I couldn't allow my lazy self to procrastinate anymore !
So, here's (quick links)
πͺ What I built
π₯ Demo Video
π§© Multimodal app architecture
β‘ How I Used Google AI Studio
β¨ Multimodal capabilities I implemented
π Specific multimodal features I built for UX
π Acknowledgement
What I built
Β
SONICS.ai πͺ is a Google-AI β¨ powered creative suite π§ π¬ π ποΈ that transforms user's simple idea into a fully-realized, multi-sensory, character-consistent comic book experience with podcast playbacks.It allows users to add their flavours/ vibes π to every aspect of comic creation - from storyline to characters to scenes to dialogues to text styles - all in natural language.
Β
The best part? You dont need to be good at drawing! AI solves it for you in β‘ minutes !You can bring your creativity to life without losing your patience with back-n-forth regeneration to get that perfect shot!
Β
You can use SONICS for a variety of use cases - from bedtime storiespodcast
to full production-readycomics
withplayback
Bring your stories to life - your style!
Let your imagination go wild !
Demo
Β
My project in action
Β
0:00 Intro
0:10 π§ Story Conception
0:20 π¬ Character/ Cast Design
0:53 ποΈ Comic Panel Creation
1:24 π Comic preview
1:34 π§ Audio preview
1:47 π₯ Play the Comic that speaks your Style
βΆοΈ Play on Youtube
Note : Due to billing constraints, I couldnt deploy my app so this is the video demo π showing my project in action.
Β
Multimodal app architecture
Β
gemini-2.5-flash
gemini-2.5-flash-image-preview
imagen-4.0
imagen-3.0
π§©
How I Used Google AI Studio
Β
This app was entirely built on Google AI studio β‘vibe-coded from scratch
π as you could have guessed by now for my lazy vibes !
Β
I started with a simple idea prompt and kept on adding features by guiding the AI through pain-points I have faced when vibe-creating comics with my flavour.
Β
The Multimodal capabilities I implemented ...
Multimodal Capabilities
Β
Input
Output
Models β¨
Features π
Text
Image
gemini-2.5-flash-image-preview
imagen
For quality Character, Scene Background generation
Text editor based updates
Image + Text
Text
gemini-2.5-flash
Automatic character description updates for natural language based character edits
Image (mask) + Image + Text
Image
gemini-2.5-flash-image-preview
For precise edits in characters/ scenes, dialogue corrections, text stylings, positional edits, detail improvement
Multiple Images + Text
A composite image with rendered text
gemini-2.5-flash-image-preview
For comics scene panel generations ensuring character consistencies across scenes, dailogue accuracy, scene quality
Multimodal Features
The specific Multimodal functionalities π I built and why it enhances the user experience π€ (UX)...
Β
Composite scene panels ποΈ
Β
β¨ Βimagen
Βgemini-2.5-flash-image-preview
Βgemini-2.5-flash
Β
π Β The comic panels are created through an intelligent composition logic combining the multimodal capabilities of the models to create final panel images from the inputs - scene background, character images, scripts that were themsleves generated by using either of these.
Β
π€ Β This ensures character consistency, dialogue accuracy as well as scene quality across comic scenes.
Β
Flavour edits π
Β
β¨ Βgemini-2.5-flash-image-preview
Βgemini-2.5-flash
Β
π Β It is used for enabling precise surgical edits of scenes, characters, dialogues, styles leveraging masking.
Users can simply explain their edits in natural language for feature changes (with / without masking).
It also handles auto-updating user edit requests for images which must reflect in their respective strategic texts like character description to ensure further consistencies.
Β
π€ Β This helps users avoid regenerating back-and-forth images from scratch which was really frustrating when we need to make a small style/ error correction. And users can add their vibes/ flavours/ styles to the scene in natural language without worrying about any inconsistency.
π
Acknowledgement
Β
Google AI studio β‘ is phenomenal at vibe-coding. I was able to generate and finish a well-working prototype in less that 6 hrs.
But as you could have guessed π Parkinson's law took most time !
Β
gemini-2.5-flash-image-preview
β¨ (Gemini nano-banana) is the star of my whole idea. Due to nano banana, I was able to successfully create a consistent character comic experience, and solve the back-and-forth regeneration & vibe-check problem for vibe-comic enthusiasts.
imagen
β¨ helped me create beautiful backgrounds for the comic scenes which were then fully realised using composite logic.
gemini-2.5-flash
β¨ has been used for prompt engineering for inputs to other models, for auto-updating descriptions and also for optimising the deliverables.
Thank you!
It was a fun and great experience!
π
What Definitely Not a drag!
Top comments (5)
This is dope!
This is a great indeed love the way you used your creativity to achieve such an amazing project
To all the comic enthusiasts- add your suggestions you would like to try
Comment down your use cases & any suggestions
Some comments may only be visible to logged-in visitors. Sign in to view all comments.