Why this is cool
Give your content team a superpower: drop in any image → get a well‑formed, vertical‑specific blog post plus audio playback. We’ll combine Cloudinary AI (image captioning) with OpenAI (blog generation + TTS) inside a Vite + React app with an Express backend.
What you’ll build
- Upload an image → Cloudinary generates a caption describing it
- Send that caption to OpenAI → get a 300‑word marketing blog post tailored to the image’s vertical (auto, travel, fashion, etc.)
- Generate an MP3 narration of the post with OpenAI TTS
Demo idea: a red Ferrari image becomes a short, punchy automotive blog post with a play button for audio.
Prereqs
- Node 18+
- Free accounts: Cloudinary and OpenAI
- Basic React/JS/Node skills
⚠️ OpenAI billing: add a small credit (\$5–\$10) and a spending cap to avoid surprises.
1) Cloudinary setup
- Create/login → Settings → Product Environments
- Note your Cloud name (e.g.
demo
) - API Keys: Settings → Product Environments → API Keys → Generate New API Key
Keep these handy:
CLOUDINARY_CLOUD_NAME
CLOUDINARY_API_KEY
CLOUDINARY_API_SECRET
We’ll also use the public cloud name on the client (via Vite env).
2) OpenAI setup
- Create/login at platform.openai.com
- Billing → add payment details + monthly limit
- API Keys → Create new secret key
Save your OPENAI_API_KEY
in .env (server only).
3) Scaffold the project (Vite + React)
npm create vite@latest image-to-blog-ai -- --template react-swc cd image-to-blog-ai npm i
Install deps for client & server:
# client deps npm i axios react-markdown @cloudinary/react @cloudinary/url-gen # server deps npm i express cors cloudinary multer streamifier openai dotenv # (optional in dev) npm i -D nodemon
Project layout (single repo, both client + server):
image-to-blog-ai/ ├─ index.html ├─ src/ ├─ server.js # Express API ├─ public/ # serves speech.mp3 ├─ .env # server secrets ├─ vite.config.js ├─ package.json └─ ...
4) Vite dev proxy (no CORS headaches)
vite.config.js
import { defineConfig } from 'vite' import react from '@vitejs/plugin-react' export default defineConfig({ plugins: [react()], server: { port: 3000, proxy: { '/api': { target: 'http://localhost:6000', // Express port changeOrigin: true, secure: false, }, }, }, })
5) React UI
Create src/App.jsx (or .tsx
if you prefer TS):
import { useState, useEffect } from 'react' import axios from 'axios' import { AdvancedImage } from '@cloudinary/react' import { fill } from '@cloudinary/url-gen/actions/resize' import { Cloudinary } from '@cloudinary/url-gen' import ReactMarkdown from 'react-markdown' import AudioPlayer from './AudioPlayer' import './App.css' export default function App() { const [image, setImage] = useState(null) const [caption, setCaption] = useState('') const [story, setStory] = useState('') const [error, setError] = useState('') const [loading, setLoading] = useState(false) const [shouldSubmit, setShouldSubmit] = useState(false) const cld = new Cloudinary({ cloud: { cloudName: import.meta.env.VITE_CLOUD_NAME } }) useEffect(() => { if (shouldSubmit && image) handleSubmit() // eslint-disable-next-line react-hooks/exhaustive-deps }, [shouldSubmit, image]) const handleImageChange = (e) => { const file = e.target.files?.[0] if (!file) return setImage(file) setShouldSubmit(true) } const handleSubmit = async () => { if (!image) return const formData = new FormData() formData.append('image', image) try { setLoading(true) const { data } = await axios.post('/api/caption', formData, { headers: { 'Content-Type': 'multipart/form-data' }, }) setCaption(data.caption) setStory(data.story.content) const cldImg = cld.image(data.public_id) cldImg.resize(fill().width(500).height(500)) setImage(cldImg) setError('') } catch (err) { console.error(err) setError(err?.response?.data?.error || err.message) } finally { setShouldSubmit(false) setLoading(false) } } return ( <div className="app"> <h1>Image → Blog AI</h1> <label className="custom-file-upload"> <input type="file" accept="image/*" onChange={handleImageChange} /> Choose Image </label> {loading && <div className="spinner" />} {error && <p style={{ color: 'red' }}>{error}</p>} {image && !loading && typeof image === 'object' && image.constructor?.name !== 'CloudinaryImage' && ( <p>Uploading...</p> )} {image?.constructor?.name === 'CloudinaryImage' && ( <AdvancedImage cldImg={image} alt={caption} /> )} {story && ( <div> <AudioPlayer text={story} setLoading={setLoading} /> {!loading && <ReactMarkdown>{story}</ReactMarkdown>} </div> )} </div> ) }
Minimal src/AudioPlayer.jsx:
import { useState } from 'react' import axios from 'axios' export default function AudioPlayer({ text, setLoading }) { const [url, setUrl] = useState('') const generate = async () => { try { setLoading(true) const { data } = await axios.post('/api/generate-audio', { text }) setUrl(data.audioUrl) } finally { setLoading(false) } } return ( <div style={{ margin: '1rem 0' }}> <button onClick={generate}>🔊 Generate Audio</button> {url && ( <audio controls src={url} style={{ display: 'block', marginTop: 8 }} /> )} </div> )}
src/App.css (grab your own styles, e.g. a centered column, spinner, and a .custom-file-upload
button).
Tip: Store only the Cloud Name on the client via
VITE_CLOUD_NAME
. Keep all secrets on the server.
6) Express backend (Cloudinary + OpenAI)
Create .env in project root:
VITE_CLOUD_NAME=YOUR_CLOUD_NAME CLOUDINARY_CLOUD_NAME=YOUR_CLOUD_NAME CLOUDINARY_API_KEY=YOUR_CLOUDINARY_API_KEY CLOUDINARY_API_SECRET=YOUR_CLOUDINARY_API_SECRET OPENAI_API_KEY=YOUR_OPENAI_API_KEY
Create server.js in project root:
import 'dotenv/config.js' import express from 'express' import cors from 'cors' import { v2 as cloudinary } from 'cloudinary' import multer from 'multer' import streamifier from 'streamifier' import OpenAI from 'openai' import path from 'path' import { fileURLToPath } from 'url' import fs from 'fs/promises' const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY }) const __filename = fileURLToPath(import.meta.url) const __dirname = path.dirname(__filename) const app = express() app.use(express.json({ limit: '1mb' })) app.use(cors()) cloudinary.config({ secure: true, cloud_name: process.env.CLOUDINARY_CLOUD_NAME, api_key: process.env.CLOUDINARY_API_KEY, api_secret: process.env.CLOUDINARY_API_SECRET, }) // Multer in-memory store with basic filtering const storage = multer.memoryStorage() const upload = multer({ storage, limits: { fileSize: 8 * 1024 * 1024 }, // 8MB fileFilter: (_req, file, cb) => { const ok = /image\/(jpeg|png|webp|gif|bmp|tiff)/i.test(file.mimetype) cb(ok ? null : new Error('Unsupported file type'), ok) }, }) // Helper: promisify Cloudinary upload_stream function uploadBufferToCloudinary(buffer) { return new Promise((resolve, reject) => { const stream = cloudinary.uploader.upload_stream( { detection: 'captioning' }, (error, result) => (error ? reject(error) : resolve(result)) ) streamifier.createReadStream(buffer).pipe(stream) }) } app.post('/api/caption', upload.single('image'), async (req, res) => { try { if (!req.file) return res.status(400).json({ error: 'Image file is required' }) const result = await uploadBufferToCloudinary(req.file.buffer) const caption = result?.info?.detection?.captioning?.data?.caption || 'Unknown image' const story = await generateBlog(caption) res.json({ public_id: result.public_id, caption, story, }) } catch (err) { console.error('Caption error:', err) res.status(500).json({ error: err.message || 'Internal Server Error' }) } }) app.post('/api/generate-audio', async (req, res) => { try { const text = req.body?.text?.slice(0, 6000) || '' if (!text) return res.status(400).json({ error: 'Text is required' }) const mp3 = await openai.audio.speech.create({ model: 'tts-1', voice: 'alloy', input: text, }) const buffer = Buffer.from(await mp3.arrayBuffer()) const filePath = path.resolve(__dirname, 'public', 'speech.mp3') await fs.mkdir(path.dirname(filePath), { recursive: true }) await fs.writeFile(filePath, buffer) res.json({ audioUrl: `/speech.mp3` }) } catch (err) { console.error('TTS error:', err) res.status(500).json({ error: 'Error generating audio' }) } }) async function generateBlog(caption) { const message = { role: 'user', content: `Create a 300-word blog post for a marketing campaign. The post should be tailored to the image's vertical based on this caption: "${caption}". The article is for readers interested in that vertical, not for the business itself. Use an inviting tone, clear subheadings, and a call to action.`, } try { const response = await openai.chat.completions.create({ model: 'gpt-3.5-turbo', messages: [message], temperature: 0.8, }) return response.choices[0].message } catch (err) { console.error('OpenAI error:', err) return { role: 'assistant', content: 'Sorry—could not generate content right now.' } } } app.use(express.static(path.resolve(__dirname, 'public'))) const PORT = 6000 app.listen(PORT, () => console.log(`API listening on http://localhost:${PORT}`))
package.json (scripts for both dev servers):
{ "name": "image-to-blog-ai", "private": true, "type": "module", "scripts": { "dev": "vite", "start": "node server.js", "dev:api": "nodemon server.js" } }
7) Run it
# Terminal A (API) npm run dev:api # API → http://localhost:6000 # Terminal B (Vite) npm run dev # Web → http://localhost:3000
Upload an image → watch the caption + blog appear → click Generate Audio to get an MP3.
8) Production & security notes
- Keep secrets server‑side only; never expose API keys in the client
- Add rate limiting (e.g.
express-rate-limit
) and basic auth or tokens on/api
routes - Validate file types and size (shown above); consider virus scanning for public apps
- Cache TTS results per post hash to avoid re‑billing
- Consider the Responses API for future‑proof OpenAI calls; swap
chat.completions
when you’re ready
Troubleshooting
- CORS in dev: use the Vite proxy as shown (don’t call
http://localhost:6000
directly from the client) - Cloudinary caption is undefined: ensure the
detection: 'captioning'
add‑on is enabled for your account/plan - MP3 not found: verify
public/
exists and the server has write permissions
Wrap‑up
You now have an image‑to‑blog pipeline with Cloudinary + OpenAI: caption → post → audio. Drop it into your content workflow to turn static visuals into dynamic marketing assets.
Repo suggestion: name it cloudinary-react-image-to-blog-ai
. Add the README sections straight from this post and you’re set.
Resources
- Cloudinary React SDK:
@cloudinary/react
,@cloudinary/url-gen
- OpenAI Node SDK:
openai
- React Markdown:
react-markdown
- Dev proxy: Vite
server.proxy
Top comments (1)
This tutorial is such a clever and practical mashup—thank you for putting it together! Turning any image into a 300-word, vertical-aware blog post and offering narrated output? That’s a serious content-creation superpower.
A few things that really stood out:
"Image captioning via Cloudinary"is a smart and reliable starting point—leveraging existing AI tools to bootstrap the workflow.
Thanks for leveling up our content tooling toolkit! Curious—have you explored adding *"multilingual blogging or voice support", or even letting users tweak tone/style parameters before final output?