AI People

Recently, I dove back into AI-generated characters, focusing on building consistent digital characters for video. Here’s a breakdown of the process, tools used, and the journey of bringing them to life. Read till the end for the video result!

The Initial Image and Process

I imagined several personas and their characters. For each, I followed this process.

Let’s focus on Lana, who is into outdoors, wellness, and is eco-conscious.

I wanted to experiment with Flux, which was the latest image model at the time. Flux follows the prompt so well, that I was able to get the desired composition without controlnet for this experiment. Previously, I would use depth and/or openpose to direct and control.

I will still use them going forward, but it was interesting to work this way.

I did manage to get it working locally with ComfyUI, but both my MBP and PC were struggling. So I switched to Replicate.

After several iterations of prompts, I created a consistent image of Lana that captured the look I wanted. To ensure a consistent appearance across different poses, I then used the @fofrAI workflow and also my local ComfyUI setup. This allowed me to generate over 50 variations of the character, from which I selected around 20 that were the most consistent.

LoRA Model

Once the base images were ready, I created a LoRA model using @ostrisai’s trainer for Flux. This enabled me to easily generate new images of Lana in various poses and environments while maintaining consistency and look.

I used this model to generate several scenes, allowing me to place Lana in different settings – from casual to more elaborate scenarios.

Bringing Lana to Life with Video Generation

The next step was to create a video introduction for Lana.

I explored several tools and settled on RunwayML for video generation. It wasn’t perfect – some outputs were less stable, especially regarding camera movements and facial consistency. But after several iterations, I managed to generate good videos from an image and have a usable video.

Audio and Facial Syncing

With the video done, I moved to voice syncing. I tried several lipsync tools but found them not great (edit: this has greatly improved in recent days and I’ll might write up on the simpler workflows on this in the future).

So, I used a tool called LivePortrait to match facial expressions with the video generated from RunwayML.

My process is:

record myself reading the script.
use ffmpeg to force 24fps frames and extract audio
run my video and the runwayml video through the LivePortrait to transfer my facial expression onto the AI persona

While this worked well, but LivePortrait relies on InsightFace, which has a non-commercially friendly license. I found and used a ComfyUI node using MediaPipe, which does not have this usage restriction.

This is much easier for my MBP, so it does it without hating me, unlike when I’m experimenting locally with Flux. So, now I have a video that uses runwayml’s result as a base and follows facial expressions of me reading the script.

Audio

To get the voice of Lana, I used ElevenLabs. I imported the audio from my own recording, then changed the voice. Now I had a different voice that still matched the movements of my own speaking patterns. This way, I ended up with a fully generated character, speaking in sync with the video, using a voice that wasn’t mine.

The Final Video

After finalizing the facial expressions, I combined all elements using DaVinci Resolve, which provided all the editing tools I needed.

I combined the correct audio and video files and did the same for all characters to create an “advert” for https://www.unrealpeople.agency/ .

Here is the final video: