This is a submission for the Google AI Studio Multimodal Challenge
What I Built
FoodSnap Tutor is a frontend-only app that turns any food photo into instant cooking guidance. Upload an image and the app:
- Detects whether the image is food
- Identifies the likely dish name (with alternatives when uncertain)
- Generates a step-by-step recipe
- Estimates nutrition per serving (calories, protein, carbs, fat)
- Suggests a healthier variation and friendly moderation tips
Tech stack: React 19, TypeScript, Vite 6, Tailwind (CDN), and @google/genai calling Gemini 2.5 Flash. Everything runs in the browser โ no backend required.
Demo
Live demo: https://foodsnap-tutor.vercel.app
Youtube video demo: https://youtu.be/criJO6OhDY0
Github repository: https://github.com/longphanquangminh/foodsnap-tutor
Screenshots:
โจ Error screen:
How I Used Google AI Studio
I leveraged the @google/genai SDK to call Gemini 2.5 Flash with:
- An inline image part (base64-encoded upload).
- A structured prompt telling the model to reply in JSON.
- A response schema so the SDK validates the output client-side.
import { GoogleGenAI, Type } from "@google/genai"; // Strict JSON schema const schema = { type: Type.OBJECT, properties: { isFood: { type: Type.BOOLEAN }, dishName: { type: Type.STRING }, recipe: { type: Type.OBJECT, properties: { ingredients: { type: Type.ARRAY, items: { type: Type.STRING } }, steps: { type: Type.ARRAY, items: { type: Type.STRING } } }, required: ["ingredients", "steps"] }, nutrition: { type: Type.OBJECT, properties: { calories: { type: Type.STRING }, protein: { type: Type.STRING }, carbs: { type: Type.STRING }, fat: { type: Type.STRING } }, required: ["calories", "protein", "carbs", "fat"] }, healthierVariation: { type: Type.STRING }, friendlyAdvice: { type: Type.STRING } }, required: ["isFood", "dishName", "recipe", "nutrition", "healthierVariation"] }; // Convert upload to an image part const fileToPart = async (file: File) => { const dataUrl = await new Promise<string>((res, rej) => { const r = new FileReader(); r.readAsDataURL(file); r.onload = () => res(r.result as string); r.onerror = rej; }); const [meta, data] = dataUrl.split(","); const mime = meta.match(/:(.*?);/)?.[1] ?? "image/jpeg"; return { inlineData: { mimeType: mime, data } }; }; export async function analyzeFoodImage(image: File) { const ai = new GoogleGenAI({ apiKey: process.env.API_KEY! }); const imagePart = await fileToPart(image); const prompt = "You are FoodSnap Tutor, an expert AI chef and nutritionist..."; const res = await ai.models.generateContent({ model: "gemini-2.5-flash", contents: { parts: [imagePart, { text: prompt }] }, config: { responseMimeType: "application/json", responseSchema: schema } }); return JSON.parse(res.text); }
Environment wiring (Vite):
// vite.config.ts import { defineConfig, loadEnv } from "vite"; export default defineConfig(({ mode }) => { const env = loadEnv(mode, ".", ""); return { define: { "process.env.API_KEY": JSON.stringify(env.GEMINI_API_KEY) } }; });
# .env.local GEMINI_API_KEY=ai-xxxxxxxxxxxxxxxx
Multimodal Features
- Vision input: Users upload a dish photo that the SDK sends as inline image data.
- Structured output: Gemini returns validated JSON (recipe, nutrition, advice) for deterministic UI rendering.
- Single multimodal call: Image + text prompt โ cohesive culinary analysis.
- UX touches: Drag-and-drop upload, instant preview, animated results, retry flow.
- Robustness: UI handles blocked content or JSON parse errors gracefully with friendly messages.
Notes
โข Frontend-only: In production restrict the API key to allowed origins or proxy requests through a lightweight backend.
โข Built with React + Vite + Tailwind for fast iteration and static deployment.
Thanks for reading! If youโd like to try it out or peek at the code, check the demo and repo links above. ๐
Top comments (0)