StandIn Labs
StandIn LabsCreateLip Sync

Create

Lip Sync

6 cr/sImage mode (max 35 s)
3 cr/sVideo mode (max 120 s)

Lip Sync has two modes. Image mode animates a static portrait photo to speak in sync with an audio clip — no camera required. Video mode takes an existing video clip and replaces the voice with new audio, useful for dubbing, re-voicing, or adding narration to footage you already have.

Lip Sync requires a paid plan (Starter, Pro, or Business). Free accounts will see an upgrade prompt.

Image mode — animate a portrait

  1. 1Navigate to Create → Lip Sync and select the Image tab.
  2. 2Choose a portrait: pick a saved avatar or upload a new photo (JPG, PNG, WEBP, max 4 MB). The photo should show a clear, forward-facing face.
  3. 3Upload an audio file (MP3 or WAV, max 4 MB). Audio must be 35 seconds or shorter.
  4. 4The page automatically detects the audio duration and shows the credit cost (duration × 6 cr/s).
  5. 5Optionally add a Style Prompt to guide subtle expression and body movement.
  6. 6Click Generate. The job typically completes in 30–90 seconds.
  7. 7The output video appears in the results panel and is saved to your Library.

Video mode — re-voice an existing clip

  1. 1Navigate to Create → Lip Sync and select the Video tab.
  2. 2Upload a source video (MP4 or MOV) that contains a face. This is the clip whose voice will be replaced.
  3. 3Upload a replacement audio file (MP3 or WAV, max 120 seconds).
  4. 4Optionally upload a Reference Face photo — only needed if the video contains multiple people and you want to target a specific face.
  5. 5Choose Video Length: No Extension (output matches the shorter of the video or audio) or Extend to Audio (the video is extended to match the full audio length).
  6. 6Click Generate. The job typically completes in 1–3 minutes depending on clip length.

Credit cost examples

ModeAudio DurationCredit Cost
Image5 s30 credits
Image15 s90 credits
Image35 s (max)210 credits
Video30 s90 credits
Video60 s180 credits
Video120 s (max)360 credits

Tips

  • Image mode: use a clean, well-lit headshot with a neutral expression for the most natural results
  • Image mode: avoid photos with heavy shadows across the face or extreme angles
  • Image mode: generate voiceover in Audio Studio, then upload it here for a full talking-head workflow
  • Video mode: the source video should have a single, clearly visible face for best accuracy
  • Video mode: use Reference Face when the video has multiple people and you only want to sync one of them
  • Keep clips concise — 5–15 seconds works best for short-form social content
Lip Sync works as a node inside the Storyboard canvas in both modes. Image mode: connect an Image Gen node to the portrait slot. Video mode: connect a Video Gen or Library node to the source video slot. Upload audio directly inside the node, then wire the output to a Video Combiner.