Talking Avatar - AI Arcade

How It Works

When you click Create Avatar, four AI models work together in a serverless pipeline. The script, voice clone, and speech are generated synchronously, then the video renders asynchronously in post-processing steps — no backend code required.

POST/api/generate

✅

Validate InputValidate

Validates the request body — ensures imagePath, audioPath, resumeText (strings), and durationSeconds (number) are all provided.

🌐

Check CreditsHTTP Request

Makes an authenticated request to the central Arcade API to fetch the user's current credit balance.

⚙️

Verify & PrepareLogic

Calculates dynamic credit cost (50 base + 5 per 5s increment), checks balance, generates a job ID, computes target word count for script length, and builds public serve URLs for the uploaded audio and image files.

🤖

Generate ScriptAI

Sends the resume text to Claude Haiku with strict word-count constraints based on the selected duration (~2.5 words/second). Outputs only the spoken words — no markdown or labels.

🧠

Clone VoiceAI Model

Sends the voice sample to MiniMax Voice Cloning, which analyzes the audio and creates a custom voice model. Returns a voice_id for use with speech generation. Noise reduction and volume normalization are enabled.

🧠

Generate SpeechAI Model

Sends the script text and cloned voice_id to MiniMax Speech 2.8 Turbo, which generates natural-sounding speech audio in the cloned voice.

➕

Create Job RecordCreate

Creates a job record with status "processing", the generated script, and job ID. The frontend polls this record for completion.

🌐

Deduct CreditsHTTP Request

Calls the Arcade transaction API to deduct the calculated credit cost from the user's balance.

⚙️

Build ResponseLogic

Assembles the JSON response object with jobId, script, status, and remaining credits. Uses a function handler to avoid template escaping issues with script content.

📨

Return ResponseResponse

Returns the job ID, generated script, status "processing", and updated credit balance. The client then polls GET /api/status for completion.

🧠

4-Model Chain: Claude Haiku + MiniMax Voice Cloning + MiniMax Speech + VEED Fabric

by Anthropic, MiniMax, VEED · View on Replicate →

This pipeline chains four AI models together: Claude Haiku writes a personalized intro script from your resume, MiniMax Voice Cloning creates a custom voice model from your audio sample, MiniMax Speech 2.8 Turbo generates speech in your cloned voice, and VEED Fabric 1.0 animates your photo into a talking-head video synchronized with the audio.

Great for

Personal introductionsResume presentationsSocial media contentPortfolio demosTeam biosConference profiles

Want to build something similar?

Import this JSON into any BFFless project

Video generation (1-3 min) runs as post-processing after the response is sent. The frontend polls GET /api/status?jobId=... until the job record is updated to 'complete'.

Post-Processing Steps

These run asynchronously after the response is sent to the client:

Generate Video (async)

Runs after the response is sent. Sends the photo and cloned speech audio to VEED Fabric 1.0 at 480p resolution. This can take 1-3 minutes.

Store Video (async)

Downloads the generated video and stores it in the avatar-videos directory. Saves extra fields: script, duration, job_id, image_url, and audio_url for gallery display.

Check Result (async)

Checks if the video was stored successfully. Sets status to "complete" with the video URL, or "error" with a message if storage failed.

Update Job Record (async)

Updates the job record with the final status and video URL. The frontend detects this change via polling.

Upload Pipelines

POST/api/upload-image

Stores photo, returns storage path

POST/api/upload-audio

Stores voice sample, returns storage path

Built with BFFless Pipelines — serverless backend workflows powered by configuration, not code.