RaySKU Locker

Inspiration

Generative AI has a "creativity vs. fidelity" problem. Tools like Midjourney or DALL-E are incredible digital painters, but they are terrible product photographers. In e-commerce, fidelity is everything. If a brand uploads a specific running shoe, they cannot accept an AI "reimagining" the lace pattern, warping the logo, or hallucinating a new sole texture just to fit a vibe.

I was inspired to solve the "Hallucination Gap"—the trade-off between creative freedom and product accuracy. I wanted to move away from the paradigm of AI as a Painter (who redraws your product) and towards the paradigm of AI as a Studio Crew (a Set Designer and a Gaffer) who build a world around your sacred product.

My goal was to build RaySKU Locker: a tool that offers infinite creative backgrounds while maintaining a mathematically perfect, 100% "SKU Lock" on the original asset.

What it does

RaySKU Locker is an agentic photography studio that acts as a Creative Director, Set Designer, and Lighting Technician all in one.

The Locker: Users drag and drop a transparent product PNG (the "SKU") into the application.

The Brief: Users type a natural language prompt (e.g., "A luxury watch on a wet slate rock, moody lighting, moss in background").

The Reasoning: The system instantly translates this vibe into a physics-based JSON brief, determining optimal camera angles, lighting direction, and scene composition.

The Generation: It generates a photorealistic background and physically composites the product into it.

The Result: A high-fidelity image where the product pixels are identical to the upload, but the lighting and shadows match the new environment perfectly.

Crucially, it includes a "Staging Inspector"—a transparent UI panel that lets users see the AI's "brain" (the JSON data) before the image is even generated, removing the "black box" frustration of standard AI tools.

How I built it

I engineered a Dual-Stage Agentic Pipeline that decouples creative reasoning from pixel generation. Instead of asking one model to do everything, I chained specialized state-of-the-art models together.

The architecture flows as follows:

Stage 1: The Brain (Cerebras Inference / Llama 3.1) I used Cerebras to power the "Creative Director" agent. Because of Cerebras's near-instant inference speed, I could use a massive model (Llama 3.1 70B) to reason through complex prompts without latency. This agent outputs a structured JSON schema defining:

Lighting Physics: Direction (e.g., "Front-Left"), Hardness ("Soft").

Scene Objects: Specific items to place (e.g., "Japanese Maple Tree").

Camera: Focal length and depth of field.

Stage 2: The Set (Bria FIBO via Fal.ai) For the background, I utilized Bria FIBO. Bria is trained on licensed data, ensuring the "set design" is not only photorealistic but commercially safe for enterprise use. I programmatically inject the JSON object list into Bria to generate the background plate, ensuring negative space is preserved for the product.

Stage 3: The Physics (IC-Light V2 via Fal.ai) This is the core innovation. IC-Light V2 acts as a "Physics Engine." It takes the original SKU and the Bria background, calculating the Light Transport. It determines how the light from the virtual background would hit the 3D geometry of the product, generating realistic shadows and reflections without regenerating the product's pixels.

Tech Stack:

Frontend: Next.js 16 (Turbopack), Tailwind CSS v4.

UI: Shadcn UI (Radix Primitives).

Infrastructure: Fal.ai (for GPU orchestration) and Cerebras (for Agent logic).

Challenges I ran into

The biggest technical hurdle was the "Telephone Game" failure.

Initially, my pipeline passed data blindly. The Agent would decide "Mount Fuji," but Bria would only hear "Serene Landscape," generating a generic resort. Meanwhile, IC-Light would see "Product Shot" and default to a dark studio void, ignoring the beautiful sunny background Bria just created.

The Fix: I had to implement strict Context Propagation.

Prompt Injection: I updated the pipeline to explicitly map the Agent's objects array into the Bria prompt, forcing the model to draw specific geography (e.g., "Mount Fuji").

Dynamic Context: I updated the IC-Light call to inherit the scene description from the Agent. Instead of a generic "Product Shot" prompt, IC-Light now receives: "${background_setting}, ${lighting_condition} lighting...". This forced the lighting engine to "see" the mountain and cast the appropriate sunlight on the shoe.

Another challenge was Hydration Mismatches with Next.js 16 and Radix UI. The server generated one ID for the dropdowns, and the client generated another. I solved this by implementing a robust isMounted check pattern to ensure interactive components only render on the client side.

Accomplishments that I'm proud of

The "Holy Grail" of SKU Locking: I successfully achieved a pipeline where the generated image passes the "pixel-peep" test. The text on a shoe's tongue remains legible and unchanged, which is rare in generative AI.

The Staging Inspector: Building a UI that visualizes the hidden JSON layer of the AI. It turns "Prompt Engineering" into a visible, debuggable interface.

Speed: Thanks to Cerebras and Fal.ai, the entire "Reasoning -> Generation" loop feels almost real-time compared to traditional workflows.

What I learned

Prompt Engineering is API Design: When chaining models, your natural language prompt is the API schema. Precision in adjectives and object placement is as critical as strict typing in code.Specialization beats Generalization: A pipeline of three specialized models (Reasoning + Background + Relighting) vastly outperforms a single giant model trying to do it all.Mathematical Consistency: I learned a lot about how diffusion models interpret "Light Transport" ($$L_o = L_e + \int L_i \dots$$), effectively using IC-Light to approximate the rendering equation used in 3D graphics.