Controlling animation with AI

3D animation + Animate Diffusion

Hello Friend

We’re going straight into it today. I’ll show you how I made this:

From creating an image, turning it into a 3D model, animating the 3D model, animating the camera and stylizing the output.

I’ve done a few experiments with this workflow and I’m super excited for it, as I see enormous potential in it.

Ready? Let’s start.

Creating images

I use InvokeAI to create my character.

An important thing here is creating the character in T-pose. It’s the standard pose used to rig an animate characters.

With Invoke (or platforms with ControlNET) this is pretty simple, as we can upload the pose into ControlNET and generate our character on top of it.

Full body shot of a fantasy hermit character with horns wearing VERY simple, single layer, flowing clothing, standing in a t-pose

When I have the character I want, I move into

Creating a 3D model from my image.

I use CSM. There are alternative options like Charmed or text to image Genie.

I upload my image and I have the option to segment it, choosing only my character. Additional information has the potential to confuse AI, so limiting it is always beneficial.

I adjust the setting, usually I put everything on high, and generate. I create variations as usual and end up with this guy.

It’s not perfect. Or exactly like the image I used, but it’s alright. It has the features and colors which will make the final stylization much easier.

Animating the character.

Mixamo is just a fantastic website. We can upload our 3D model, rig it and choose from the freely available animations on the website.

This is why we need our character in T-pose. So we can mark the areas on the body that will let it move with animation.

I choose an animation of him standing up and added slight camera movement in Blender.

Things to note here:

Quality of the 3d model is not amazing. To improve it I suggest generating individual body parts, like the head or hands individually and combining them in Blender.

High-quality results require detailed work. AI gets us started and we take over to give it the final look.

With the base animation ready, let’s move into

Stylization with Animate Diffusion.

I used the cloud service fal to stylize my video. It is a repository of open-source models, which you can use and pay only for the gpu render time. It’s incredible.

It has a simple UI we can use to try all of these models out. The downside is, it has a simple UI. Which means we can’t experiment that much.

We can adjust:

  • positive prompt

  • negative prompt

  • Inference steps (how much details we will have)

  • CFG scale (how closely it will stick to the prompt)

  • Frame selection (render every 2nd frame in this example. Skip one frame in between)

  • Model

To truly get into the nitty gritty, I suggest creating a ComfyUI setup for Animate diffusion. This allows for much more precise experimentation and control.

Without it, creating consistency and avoiding distorted generations can be difficult - as you can see in the results of this experiment, by the way the character stands up, how the background changes, etc.

Adjusting the animation and camera movement to what the AI can do is also important. A lot more to explore here and incredibly exciting.