Create
Lip Sync
6 cr/sImage mode (max 35 s)
3 cr/sVideo mode (max 120 s)
Lip Sync has two modes. Image mode animates a static portrait photo to speak in sync with an audio clip — no camera required. Video mode takes an existing video clip and replaces the voice with new audio, useful for dubbing, re-voicing, or adding narration to footage you already have.
Lip Sync requires a paid plan (Starter, Pro, or Business). Free accounts will see an upgrade prompt.
Image mode — animate a portrait
- 1Navigate to Create → Lip Sync and select the Image tab.
- 2Choose a portrait: pick a saved avatar or upload a new photo (JPG, PNG, WEBP, max 4 MB). The photo should show a clear, forward-facing face.
- 3Upload an audio file (MP3 or WAV, max 4 MB). Audio must be 35 seconds or shorter.
- 4The page automatically detects the audio duration and shows the credit cost (duration × 6 cr/s).
- 5Optionally add a Style Prompt to guide subtle expression and body movement.
- 6Click Generate. The job typically completes in 30–90 seconds.
- 7The output video appears in the results panel and is saved to your Library.
Video mode — re-voice an existing clip
- 1Navigate to Create → Lip Sync and select the Video tab.
- 2Upload a source video (MP4 or MOV) that contains a face. This is the clip whose voice will be replaced.
- 3Upload a replacement audio file (MP3 or WAV, max 120 seconds).
- 4Optionally upload a Reference Face photo — only needed if the video contains multiple people and you want to target a specific face.
- 5Choose Video Length: No Extension (output matches the shorter of the video or audio) or Extend to Audio (the video is extended to match the full audio length).
- 6Click Generate. The job typically completes in 1–3 minutes depending on clip length.
Credit cost examples
| Mode | Audio Duration | Credit Cost |
|---|---|---|
| Image | 5 s | 30 credits |
| Image | 15 s | 90 credits |
| Image | 35 s (max) | 210 credits |
| Video | 30 s | 90 credits |
| Video | 60 s | 180 credits |
| Video | 120 s (max) | 360 credits |
Tips
- Image mode: use a clean, well-lit headshot with a neutral expression for the most natural results
- Image mode: avoid photos with heavy shadows across the face or extreme angles
- Image mode: generate voiceover in Audio Studio, then upload it here for a full talking-head workflow
- Video mode: the source video should have a single, clearly visible face for best accuracy
- Video mode: use Reference Face when the video has multiple people and you only want to sync one of them
- Keep clips concise — 5–15 seconds works best for short-form social content
Lip Sync works as a node inside the Storyboard canvas in both modes. Image mode: connect an Image Gen node to the portrait slot. Video mode: connect a Video Gen or Library node to the source video slot. Upload audio directly inside the node, then wire the output to a Video Combiner.