Fully autonomous AI production?

text to video is getting good.

Hello Artificial Artisan.

Thank you for being a subscriber 💛

If you like what I do, consider checking out my guides. Or sharing the newsletter with your friends. There is a referral link at the bottom of this post.

Let’s get to today’s topic - Autonomous video production with text to video.

After images, video has been the natural next step of Generative AI models. Runway announced Gen-2 a few months ago and since then more models have become available to test and experiment with.

I felt it was time to do so.

In my usual approach of, how can AI do every step of the way for me, I ran a quick experiment.

The tasks:

  • Write a short script for a video

  • Generate video footage

  • Generate voiceover

  • Create the sound design

Step 1: Making the Script

This is where every story starts. Without it, we’re just putting random clips and images together. Sometimes it works out, but in most cases having a well planned story ahead of generating assets, is incredibly beneficial.

Wanting to test out different AI models, I turned to Antrophic’s Claude and asked it for a story.

After a bit of back and forth, it gave me something I felt was usable.

Step 2: Generating videos with text to video

Aside from Runway’s Gen-2, there are two AI models that looked good for my experiment - Pika labs and Zeroscope

I ended up generating most of the videos with Pika Labs and a few with Gen-2 generations mixed into it.

For the prompts, I copied Claude’s output directly. It worked well for most of the shots I wanted to have.

I prompted some of the shots by myself and just like with images, the more knowledge you have about filmmaking and cameras, the more you can do with this technology.

Step 3: Voiceover

Nothing new here. AI voiceovers have been amazing for a while now. I copied the script Claude gave me into Elevenlabs and created a voice profile that matches what I had in mind for this video.

Step 4: Sound design

This is where AI fell short. Usually, I would add music and sound design to the video. It’s what will create the feelings and the atmosphere of the story. However, I am yet to find a really efficient way that AI can support this part of the process. From selecting the exact music mood and genre, to designing the sound effects like animal calls.

There are more and more options available in this area and soon there will be an assistant available to support my workflow.

SO, here is the final video, which took me 1 hour to make.

I’ll repeat this for emphasis. It took 1 HOUR of playing around with different AI platforms to make this video with nothing to start with.

Can it compare to BBC documentaries?

No.

But this software will only get better and the better we become at using it, the more possibilities we unlock as we move forward into the future of real time generated content.

I hope you enjoyed today’s experimental project.

Text to video has me captured and amazed and I will be diving deeper into it as I experiment and test different AI models and workflows. As usual, I’ll be sharing what I find with all of you, so you can start creating without spending time on figuring out HOW to create with AI.

You can stop by my website or social media and as always,

Keep creating and don't forget to have fun. ☀