AI-Enhanced Creator
Posts
Mid 2024 AI Creator Tech Stack

Mid 2024 AI Creator Tech Stack

Roundup of the best AI software

Nejc Susec
July 23, 2024 • Estimated Reading Time: 7 minutes

Hello Friend

Welcome to this week’s issue of the AI-Enhanced Creator. Thank you for being a subscriber. 💛

Have you ever thought about the size of AI models?

AI language models like GPT-3 contain over 175 billion parameters. Milky Way contains an estimated 100-400 billion stars, putting GPT-3's parameter count in a similar range.

Power the size of a galaxy at our fingertips. Pretty crazy.

Let’s see what we can do with it.

Resources

AI development of the week - Apple releases an open source language model. As a person who uses a local LLM setup, all open source developments are exciting. Especially when big companies enter the space.

Live event - Comfy Coffee Live portrait - This week we are covering Live Portrait workflows in ComfyUI. We will be joined by one of the most established ComfyUI creators and educators Purz. Gonna be a good one. If you want to learn more about comfy and animating faces you can join us on LinkedIn or Youtube.

This Weeks Lesson - AI Creator Tech Stack

As you dive deeper into the world of AI tools and technologies, staying focused can be… difficult. We want to stay ahead and at the same time actually apply the technology to our work.

I have a list. Of technology that is powerful and immediately applicable to work as an individual or a big company.

This list is focused on areas in the creative production and strategy:

Large Language model
Image Generation
Upscaling
Storyboarding
Voice generation
Video Generation
Characters
Music

Large Language Model

I consider two options here - Big player subscription or open source local installation.

I lean more towards local installation, as I can create my own database and modify the system prompts to my liking. For deep focused work LM studio + AnythingLLM with Llama 3 (at the moment) is my go to.

For quick facts, ideas and conversations, I go to ChatGPT or Claude.

Image Generation

I use Invoke or ComfyUI.

Both are free and open source. They are quite technical. ComfyUI being THE best and most complex software for visual generative AI.

For deep and focused work, we need to get into the deep technical parts.

Upscaling

Topaz is the place to be. Quick, straight forward and powerful. Both image and video.

We have alternatives in ComfyUI, which can work well as part of existing workflows you have set up. It saves the trouble of going from one app to another.

Storyboarding

Storyboards are just a great way to visually present an idea. I’m still amazed at how well I can convert a script into storyboard frames with Katalist. Sometimes I would generate individual frames in Invoke, if it is a more specific or complex shot. Overall the script to prompt translation is great. Really saves a lot of time.

Voice Generation

By now probably everyone has tried out Elevenlabs. It’s still at the top of any voice generator I have tried out so far. Especially with speech to speech, to keep the intonation of the performance. Also very easy to clone voices.

Video Generation

Runway Gen-3 and Luma Dream Machine are the big generators at the moment. The temporal consistency and length of videos has been performing really well.

Alternatively, Animate diffusion with ComfyUI is an absolutely incredible playground. The flexibility and adaptation for different workflows and use cases is incredible. If you’re looking to dive deep into generative video - ComfyUI and AnimateDiff is the place to be.

Characters

Live portrait is THE face and lipsync animator at the moment. Best way to use it is with comfyUI.

Alternatively Hedra for face animation performs well. HeyGen and Runway perform well for lipsync and dubbing.

Music

Suno is absolutely incredible at the quality of output it generates with very simple input.

Homework

AI is a rapidly developing field. Establishing your list of software you use in your creative process is really important. If you change your software whenever a new development is made, you will be in a continuous cycle of learning and struggling to move into the deeper parts of creating.

I HIGHLY recommend you choose your core software that helps YOU and your work. If you’re not working with dialogue, you don’t need lipsync and voiceovers.

It doesn’t mean you shouldn’t explore them, but if you’re working in a professional environment, make sure your bases are covered before you move onto exploration.

For me, the biggest advantage have been the locally hosted Large Language Models and ComfyUI. These tools are so incredibly powerful, once you start diving into them you can do a wide range of tasks you couldn’t imagine doing before.

For your homework I encourage you to take a strategic and objective look at your work. Think about what tools are actually necessary for your everyday. Focus on those and make a list.

When you feel comfortable with your tech stack, move forward and explore others.

Choosing software is one of the lesson in my free beginner’s guide. If you want to dive deeper into the methodology of using AI in a professional work environment, I highly recommend it.

Whenever you are ready, here are 4 ways I can help you:

FREE Beginner’s Guide: Take your first steps with a 6-lesson course designed to introduce core mindset, skills and software to start working with AI. It's the foundation on which you can build your expertise.
We have a free comunity of creators just like you. A place to connect, collaborate and learn together. It’s always easier to grow with others.
AI-Enhanced Creator: Power of a Production House: Building on top of the core concepts, we dive deep into workflows and techniques. With step by step walkthrough and process breakdown.
Coaching: For personal and hands-on learning experience. Focusing on your goals.