How It Works
When you click Create Avatar, four AI models work together in a serverless pipeline. The script, voice clone, and speech are generated synchronously, then the video renders asynchronously in post-processing steps โ no backend code required.
/api/generateValidates the request body โ ensures imagePath, audioPath, resumeText (strings), and durationSeconds (number) are all provided.
Makes an authenticated request to the central Arcade API to fetch the user's current credit balance.
Calculates dynamic credit cost (50 base + 5 per 5s increment), checks balance, generates a job ID, computes target word count for script length, and builds public serve URLs for the uploaded audio and image files.
Sends the resume text to Claude Haiku with strict word-count constraints based on the selected duration (~2.5 words/second). Outputs only the spoken words โ no markdown or labels.
Sends the voice sample to MiniMax Voice Cloning, which analyzes the audio and creates a custom voice model. Returns a voice_id for use with speech generation. Noise reduction and volume normalization are enabled.
Sends the script text and cloned voice_id to MiniMax Speech 2.8 Turbo, which generates natural-sounding speech audio in the cloned voice.
Creates a job record with status "processing", the generated script, and job ID. The frontend polls this record for completion.
Calls the Arcade transaction API to deduct the calculated credit cost from the user's balance.
Assembles the JSON response object with jobId, script, status, and remaining credits. Uses a function handler to avoid template escaping issues with script content.
Returns the job ID, generated script, status "processing", and updated credit balance. The client then polls GET /api/status for completion.
4-Model Chain: Claude Haiku + MiniMax Voice Cloning + MiniMax Speech + VEED Fabric
by Anthropic, MiniMax, VEED ยท View on Replicate โ
This pipeline chains four AI models together: Claude Haiku writes a personalized intro script from your resume, MiniMax Voice Cloning creates a custom voice model from your audio sample, MiniMax Speech 2.8 Turbo generates speech in your cloned voice, and VEED Fabric 1.0 animates your photo into a talking-head video synchronized with the audio.
Great for
Want to build something similar?
Import this JSON into any BFFless project
Video generation (1-3 min) runs as post-processing after the response is sent. The frontend polls GET /api/status?jobId=... until the job record is updated to 'complete'.
Post-Processing Steps
These run asynchronously after the response is sent to the client:
Generate Video (async)
Runs after the response is sent. Sends the photo and cloned speech audio to VEED Fabric 1.0 at 480p resolution. This can take 1-3 minutes.
Store Video (async)
Downloads the generated video and stores it in the avatar-videos directory. Saves extra fields: script, duration, job_id, image_url, and audio_url for gallery display.
Check Result (async)
Checks if the video was stored successfully. Sets status to "complete" with the video URL, or "error" with a message if storage failed.
Update Job Record (async)
Updates the job record with the final status and video URL. The frontend detects this change via polling.
Upload Pipelines
Stores photo, returns storage path
Stores voice sample, returns storage path
Built with BFFless Pipelines โ serverless backend workflows powered by configuration, not code.