WORKSHOP OVERVIEW

"Time is the enemy, Quality is the battleground. Sacrifices must be made."

'The Name Of The Game Is Power' AI music video by Mark DK Berry

MAKING MOVIES

This page will provide info on the basic approach to making AI video shorts on a home PC for free, and the Workflows page will provide you with the free Comfyui workflows that I used to make each video. For hardware entry level and costs see the section in AI Movie Making page. And for ways to get involved, some of which have nothing to do with coding, AI, or computers, visit the Get Involved page.

There is more to the process than I could share on this site, and methodology changes every day (AI evolves too fast to teach a finite approach yet). If I get enough interest, I might run online workshops to go over my process in more detail. If that is something that you would like to be involved in get in touch.

HISTORY OF AI MOVIE MAKING

The open source AI video making scene really only began on 5th December 2024 when Tencent released the first decent, free, open source text-to-video model called Hunyuan.

I had tried working in UE5 (Unreal Engine) and with other methods to make "janky" music videos like the below "Fallen Angel" using UE5 Metahumans trying to animate scenes. I wasn't impressed, mostly because it would take years to master.

The below video took me 3 months to complete, though I was new to UE5 when I started it...

'Fallen Angel' music video by Mark DK Berry (not AI, but made using UE5 and Metahumans)

Before that I had mucked about with early AI effects using Google Colab to make goofy band videos like this ...

'Black Magic' by The Way UK (Indie Band Music Video)

But I knew I was just waiting for AI to catch up.

I entered the fray in January 2025, and though I've made less than a dozen AI Videos since, I have learnt a hell of a lot in that time. Most importantly, I can now see where it is headed.

I believe we are at the dawn of a new era. Something akin to 1920s silent movies arriving on the scene almost exactly 100 years ago.

THE PSYCHOLOGY OF WORKING WITH AI

ON BEING AN ARTIST

To have a reason to create, artists need to feel recognised and acknowledged.

I've been in the music business for decades and I know that sustained success in art does not happen "organically", and for most artists does not happen at all. It definitely will not happen without a marketing machine.

This can become a double-edged sword for the artist. When we start focusing on marketing ourselves we stop focusing on creating. And if I havent said it enough by now - time and energy are all we have, so we should spend them wisely.

My videos have been gaining good traction on YouTube because I am on the front of a wave in AI. It won't last, but I'll ride it while it's there. I could spend my time and energy on maximising marketing, sure, but I would rather spend it on doing what I love - creating.

My biggest life-lesson has been in reframing my reasons for creating. Now, I try to create simply for the love of creating. Also, I do not see AI as a threat; I see it as a tool humans can use to be more creative.

How you position yourself in this respect, is something to consider as the AI floodgates open and the creative world becomes drowned in "art", which it will.

AI EVOLVES TOO FAST TO KEEP UP

The AI scene evolves at such a break-neck speed, it is hard to keep up with all the new software and features that come out. It's mind-bending and emotionally disturbing.

I soon discovered that the tools for my projects were being made obsolete within a week. Why learn a complicated process if it will be gone tomorrow? Good question. But we are forced to keep up with the constantly evolving trends simply to stay relevant.

However, updating software often causes problems in other parts of the workflow. Not everyone is developing code to the same standards and versions, and sometimes half a dozen different versions of a similar thing appear at the same time. It is seriously overwhelming to keep up with.

I soon realised I was going to require a strategy to handle all of this.

MENTAL MANAGEMENT

90% of successful AI movie making will be in developing good mental approaches to how you manage your time and energy.

The first issue you will run into is mental management, and how to deal with the phenomenom caused by AI due to the speed it evolves at - FOMO (The fear of missing out).

I have constant FOMO trying to keep up. Every morning scouring the forums looking at the latest thing takes a lot of my time up. I still do it, just now I don't act on what I find.

Instead, I note down the latest things in my current project under a "TO LOOK AT" section and save the links there. I then ignore them until I am between projects. That satisfies the urge somewhat.

After 30 days on my current AI video project I had 30 items in my "TOO LOOK AT" list. Some of which could probably help me on my current project, but they would have to wait until it was finished. Why?

Because I learnt this next lesson the hard way...

DONT UPDATE MID-PROJECT

The second issue you will run into is other stuff breaking when you update.

Update one custom node and you can be sure something will not work in another. But I am often forced to update Comfyui several times during a longer project. It's often unavoidable.

However, I suggest avoiding updating mid-project where possible, even if just to save time and energy. Definitely do not do a big update unless you have no choice.

Recently I started getting BSOD during my overnight batch renders. The worst kind of issue, because black usually means hardware problems. Thankfully it was software and system RAM related and I could hobble through, but I could trace the issue back to the last update of Comfyui that I was forced to do to get something else working.

Another important thing to remember is this - open source is free, geeks work on it for free, it's their passion project and they don't get paid. As such you cannot expect help and support. So we are on our own out there, though people are incredibly helpful if you ask nicely. If you do find help, then lucky you.

Welcome to the bleeding edge of AI world! We are out there together, but pioneering is a lonesome road.

PLAN PROJECTS BEFORE YOU START THEM

If you do not manage a project as it gets bigger, you will struggle.

What we want to do, ultimately, is automate everything to speed the process up, but without losing control of the arty bits. Why have AI if you aint gonna use it, right?

This is an evolving process, but my current approach to a project goes likes this...

1. FIRST, BE BRUTAL WITH YOUR SCRIPT

💡 Woke up from a dream with an amazing idea for a movie?

Great! Cancel the next month, fire up your PC, and let's go...

Turn it into a script
Run a summary of it through ChatGPT and get it to look for plot holes
Run it through Claude or Gemini (always get a second AI opinion) and reduce the idea to a 5 minute version
Once happy with the idea, plan out x60 camera shots (Allow 5 seconds per shot). Hell, get Grok to do it.

Why only 5 minutes long when my brilliant idea needs to be 42 episodes and that's just Season 1?

Time and energy, of course.

5 minutes == x60 5-second video clips. That's a lot of work, as you'll discover.

Keep your project short as possible. Ask AI to be brutal with your script. Don't be indulgent, think of the viewer who will have the attention span of a gnat and think themselves to be Roger Joseph Ebert.

Bear in mind that a long movie currently takes years to make and thousands of employees to work on to complete it. We aren't ready to challenge that yet, but it won't be long before we can do it all ITB (in the box).

Trust me, stick to 3 to 5 minutes, no more, not yet. (My "narrated noir" project was 10 minutes long, and I regretted that foolhardy decision).

I also recommend setting yourself a 5 days time limit on the first few projects.

2. STORY-BOARDING

The average shot time in a 1930s movie was 12 seconds, the average shot time in 2025 is 2.5 seconds.

Hot tip: base your shots on the attention span of modern humans (or gnats), not on how wonderful you think your shot is. The average viewer likely won't care.

So, Ive now got my script idea and some ideas for 60 camera shots. Its time to turn those x60 ideas into text prompts and then into images. Here is an example of a text prompt for doing that (used for text-to-image creation of the base images):


  "close-up of 35 year old man wearing trench coat and trilby hat, the back of his hand brushing along a crumbling brick wall as he walks down an alley, gritty texture, moody lighting. He is recollecting his old haunts. Alley in background is blurred. No other people. Daytime. Distinct light and dramatic shadows. Cinematic. Photorealistic.	1950s Noir style. Monochromatic."

Below is the final result of the above prompt (NOTE: Firstly the prompt has ignored some requests, likely down to the strength applied in the worklow settings. Secondly, I later removed an alcove beneath his hand using Krita AI inpainting tools and added a brick wall back in. I did that because the image-to-video process later was leaving a bag behind in the alcove that I didnt want. I also removed the overall colour from the image result in Krita. So this isnt a prompt-and-go process, it usually requires extra steps for every shot.)

Recorded in csv as shot "NN_02A_01D_citywall_2". Get organised to keep track.

But I dont plan to sit around making images one-by-one, so I ...

Write Python code to run a Comfyui workflow overnight and batch produce images from all the prompts which I'd put into a csv. (When I say write Python code, I mean get AI to do it).
Then make a Storyboard with the still images by putting them together in free video editing software (Davinci is overkill, I use Reaper for visual storyboarding, or you could use Shotcut ).

This Storyboard will be the first run of the idea in still-image visual form, and it really helps to see what will work and what won't, how the story flows, or what is going to present challenges to turn it all into a video story later.

Mess with the story board, stretch the still-image shots to work: long shots, short shots, rip out what doesnt work, add in more of what does. Keep to the 3-5 minute time-limit for your own sanity, and I wouldn't go over 5 seconds for a single video clip unless you have a powerful machine, not first go, at least.

My 10-minute "narrated noir" project was x130 5-second long shots (mostly), and that had exponentially increased the time and energy burn required. I went from 4 days for my first AI music video project "Cafe" (that I didn't track with a csv), to almost 90 days for the "narrated noir". Argh!

Why? because AI models have improved and I can do more things. This then costs me more time and energy, not less. Aye, there's the rub!

This is why I keep saying: "Time is the enemy, Quality is the battleground. Sacrifices must be made."

And yes, I regretted that project's 10 minute length video decision, but I had to finish it on principle. Let that be a lesson to me.

Next, narrate (or put your song) over the top of your storyboard (I narrate onto a cheap phone as a guide, and drop the m4a file into Reaper). It will help get the story flowing well, and often a lot needs to change to make it feel right.
Also export out the timecode markers for each shot to a csv file (Reaper is fantastic for this)

Below is a screen shot of my shotlist.csv after I had finished the first (image only) storyboard and tweaked the timing of the visual still images. You can see where I extended some still images to 6 seconds, in reality I'll bring them back down to 5, or fade them out, but it felt like the right pace (after the narration got added). I discuss this all shortly.

First version of my Narrated Noir shot list .csv when planning "text-to-images" for 100 shots. The time code comes from Reaper DAW where I setup the storyboard with the images then render that out to a mp4 to test the flow of the video idea.

3. IMPROVE THE BASE IMAGES

The base images are the images used in the storyboard and will be fed into Comfyui to turn them into short video clips later. The results from that will be edited together to make the final short movie.

This next stage can become time consuming if you are the arty type because you can get very fussy with minutae, watch out for that.

Tidy up the base images and start to think about consistency in the background and environment of each shot. Decide how anal you are willing to be. I suggest "not very" at this stage, assuming you have a life.
Add a "video prompt" column into your csv file. This will be different to the image prompt. This will be to drive the base image into 3 to 5 seconds of video action.

NOTE: Prompt Engineering is a black art and will likely be the only job left in the future. Everything will come down to how you tell AI what to do. Each model is different in how it likes to be prompted. It really is a big area of study. At some point in the future I will devote a page to it, but it changes so much that it's impossible to pin down what works between different workflows and settings.

NOTE: I was originally trying for character consistency at this stage but when VACE (a custom node for "inpainting" video, i.e. swapping stuff) came out, that all changed. Now I save character consistency concerns until after I create the first round of video clips. This is not ideal, it is just the best current method based on my hardware limitations. This will change eventually, and some of the things in my "TO LOOK AT" list will probably change it.

4. TURN BASE IMAGES TO VIDEO CLIPS

It's time to test the "video prompts" I wrote describing the action I want.

Below is an example prompt to drive the image-to-video. We use the image made in the previous section as a base image for the video starting point and the text to describe what we want to happen in the action:


  "A man wearing a black trench coat and black trilby hat walking through thin London alleys in the misty morning. He runs his hand along the wall, a moment later he puts his hands in his jacket pockets and continues to walk down the alley."

Here was the result of the above prompt (YT has cropped it to a vertical resolution, but you can get the idea)

Image-to-video 5 second clip driven by a simple text prompt.

(NOTE: Watch the above video clip again, and notice it has very smooth movement. Normally at this stage it would just be low resolution (low quality) and 16fps (frames per second) and show a noticeable judder in movement between frames, but the above is taken from a final clip that has been upscaled and interpolated to 1920 x 1080 @ 64fps.)

To get the first set of video clips from the 60 images I...

Python code a workflow to run a batch of all 60 images over-night to turn them into 5 second video clips at the lowest resolution and fastest speed possible.

On my 3060 RTX 12 GB VRAM entry level graphics card, I get one video clip at 416 x 240 resolution, 16 fps, 5 seconds long produced in under 5 minutes. It looks crap, but it gives me an idea of how the video prompt + seed are working.

(NOTE: Something came out recently that might speed all this up. Currently in testing.)

Tweak the prompts in the csv to fix any obvious issues seen in the test video clips. Repeat the above process until you have all the seeds and video prompts producing okay action (it wont be perfect).
Python code a workflow to run a batch of (in my case roughly 10) images overnight but this time at the highest resolution and best quality (slowest speed) you can do.

On my 3060 RTX 12 GB VRAM entry level graphics card, I get one video clip at 1024 x 576 resolution, 16 fps, 5 seconds long produced in about 40 minutes. It looks great, but it needs work still. That comes in the next stage.

Many nights later (it wont be 6, because you will have to redo many of them), I now have my x 60 5 second long video clips and no person in them looks the same twice, and they are often a bit... janky ...but that is fine. We can fix it in the mix.

(Narrated noir project: it took me 40 days and many night batch runs to get the first 100 clips done, of which 40 were good and the rest needed work.)

5. FIX IT IN THE MIX

You now have your first set of x60, 1024 x 576 resolution, 16fps, 5 second long video clips.

But this is not the end, this is just the end of the beginning.

This next stage is the one currently evolving the fastest in AI open source video creation. You can see from my last video "Sirena" that I did not achieve perfect character consistency. I have since massively improved it, and share how in the "Footprints In Eternity" (narrated noir) video workflow page.

This is another black art - character consistency. It's the bug bear of open source video creation still. But we have "Loras", and I talk about them later. Loras offer the best current approach, but I already have something in my "TO LOOK AT" list that might replace it.

The more I write about it here, the sooner I will have to update it. Suffice to say this is where I...

Begin to develop my character looks with certainty: The hero, the love interest, the villain.
Sometimes I develop a 3D model of the head of each character using a Comfyui modelling workflow, then import it to Blender to make angled shots using cunning workflows to restyle from a grey blob to a photorealistic face later, but... this all costs time and energy.
Use whatever method works to get x10 shots of the face for each character that look good, look like the same person, and work from different angles. This will be for training "Loras".
Train the Lora on each character. Test the loras to find the best training point that works well. This will likely take at least a day and night per character.
Use the Lora in a workflow that either adds the character in over the top of what is there, or just replace the face using masking. There are many ways to do this and they vary in their abilities, from "FaceFusion", to "VACE", to Wan 1.3B with Loras. I use them all.

This area needs some magic wand cast over it to make it easier than it currently is to achieve character consistency.

Once you face the challenge of multiple characters in a shot including the background people, then welcome to the bleeding edge of AI movie-making where no one has perfect solutions yet.

6. FINAL TOUCHES

This is the polish phase where you fix the things you missed and confirm it all looks good then you interpolate and upscale each video clip. I'll get a bit techy here for the benefit of anyone struggling with this process.

Use Loras and faceswaps to get everyone looking right at 16fps 832 x 480 (or 1024 x 576 if you have the hardware, time, and energy)
Run the fixed up 16fps 832 x 480 video clip through a workflow to bump it back up to 1024 x 576. At the same time re-apply the Loras with Wan 1.3B model to clean up the blemishes while maintaining character consistency using the character Loras that you trained.
Finally push it through the polishing stage at a super low denoise at resolution 1280 x 720 if your machine can manage it.
Run that through interpolation workflows (they turn 16fps into 32fps then to 64fps to smooth out the movements by adding frames in between to blend the motion) combined with the last step; upscaling it to 1920 x 1080, and you have your final video clip of 1920 x 1080 @ 64 fps.

NOTE: some might scoff and say "interpolate after you upscale, dumbass", and to those people I say - buy me a 3090 and I'll test it further. Personally I have yet to see the quality difference, and my way is faster.

You're done. Almost.

Because this is not the end, no, this is just the end of the beginning...ish.

RECAP

Lets do a quick recap of the process but using a different base image example...

I first made a base image using a text-to-image prompt to create something like the below.


  "a man with black hair and white business shirt, is standing on the left of frame facing a woman with black hair and a black figure hugging shoulder strap dress. Their expressions are neutral. The setting is Noir 1950s style, moody shadows and spot-lighting create a dramatic effect. Cinematic. High Quality. Photorealistic."

Recorded in csv as shot "NN_09A_01G_rejection_2". You can see where I have at some point messed with the woman's hair and probably replaced her manually using Krita, as the edges are somewhat "janky".

Then I ran that image through an a image-to-video workflow using the following prompt to turn it into a 5 second video clip:


  "a man is rejecting a woman. After a moment she turns to leave, exiting to the right of the frame. The camera remains focused on the man watching her go."

The resulting 5 second video clip was run through another workflow to swap out the people using my Lora trained characters.

And from there I ran it through a couple more workflows to polish, upscale, and interpolate it into 1920 x 1080 @ 64fps finished video clip.

Those three main steps you can see in the video below (it went through about 5 steps really, swapping two new characters into the video).

Process of going from image creation to final 1920 x 1080 @ 64fps video clip

This is the process, give or take, applied to every single shot.

The slightly "plastic" look is either a problem or a feature depending on the situation, and you have to address that during the last stages. In the above case I havent as it suits the "Noir" style.

One thing to consider is setting a base line of quality control. If you make one clip too perfect and highly detailed, then you are going to have to make all the clips that perfect. Great if you have a monster PC and graphics card, more challenging if you don't. I tend to lean into "story" over "quality", but that is me. I later polished the above clip further for the "Noir" project.

7. POST PRODUCTION

Yet more black arts - colourisation, editing, adding sound, adding music, adding ambience. A whole other world that I won't get into here, because it's new to me too. But here is what I currently use.

Edit the final clips together in Davinci Resolve at 1920 x 1080 and 60 fps. (It's a beast, but amazing, and best of all they have a free version!)
Figure out a LUT and add it for colorisation to homogonize your look across the footage. It's time consuming but hella worth it.
Do all the other things; add music, sound fx, video FX.
Export out as a .mp4 or .mkv

Okay, now you are done. Release it to YouTube and try to get people interested.

NOTE: one word of advice, if your music is your own - and it needs to be legally usable - then I highly recommend releasing it about two weeks or more before you finish the video so you can be sure YT will pick up on the copyright. If you go viral you want to benefit from that. ALWAYS PROTECT YOUR MUSIC COPYRIGHT WITH AN ISRC.

LORAS

"Loras" are the thing you add into the mix to tell your software that you want a particular person, action, or thing to be added in the result. They can be trained for images or for videos, but you need to train a Lora for each different Comfyui AI model, they don't work across different ones (usually).

TRAINING LORAS

I no longer train image Loras, I see no point for video. Instead I train Wan Loras (AI text-to-video model). This means I add my characters in by swapping them in the video clips rather than putting them in the base images. It just sped up the process and gave the best results.

I am limited in some respects by home hardware, and will probably look to rent servers specifically to train characters quickly and cheaply in the future. It isnt expensive to do this and I have heard of people training a character fully in a few hours on rented beast servers for less than the price of a coffee.

Currently I can train a character for Wan video Lora in 4 hours on my 3060 RTX 12 GB VRAM and it works well. The important part is captioning and creating quality consistent images. A dataset of only 10 images works fine if they are good.

TIPS FOR TRAINING

I find the following logic really helps me for training Wan2.1 AI text-to-video model Loras and so far seems to be correct. I am by no means pro at this, just figuring it out as I go:

Don't describe whatever you want to be permanent, do describe the things you want to be changable.

So if my person has brown eyes and black hair and I want her to always have those things driven by the Lora, I dont mention them in the caption. If I want to be able to make her have any coloured hair, then I do mention the hair colour in the caption. This is the logic I have applied to Wan training and it works well so far.

This also means you need to define background stuff to avoid it becoming unchangable and showing up when you use the Lora. e.g. if there is a tree in the background you best mention it.

I would also suggest running something like Florence 2 on the images and adapting it. Since it is professionally designed to describe what it sees better than a human. But dont just use the captions it provides, you need to think through the logic I mentioned of what to describe and what not to describe.

The other thing that made my life easier was using 256 x 256 image training dataset instead of 512 x 512 or larger. The theory being - quality is important, resolution is not so much. This is not a choice for me anyway, since I have 12 GB VRAM GPU limitation, and dont want to rent a server. This way you are giving the model wiggle room to put realism into the face, while if you have too much precision, you are defining the face too hard, and so the workflow model will have no leeway to adapt smaller things like skin, light, or facial hair in situ. I think the logic stands, but I could be wrong.

So far this approach is working for me with Wan Lora training on my home PC. The other thing is make sure at least one training image has other people in it else you'll end up with a scene from "Being John Malcovitch" when you use it with other people in the video.

TESTING LORAS:

After training a Lora you can check your results using a tensorgraph, and when you do, that can help inform you of the best trained epochs.

Look for epochs that are on down swings (see below image), and around the turn of the arc as it begins to flatten out until it has flattened out.

I'll pick ten epochs to test that coincide with the downswings (epochs are saved every 5 steps in my training, example: Epoch 500, 505, 510 etc...) and in the image below, I red-marked beneath potential downswings I would pick to test.

I sometimes find Epoch 200 is as good as 600 and it often depends on the face angle when applying a character-swap Lora (I train using model Wan 1.3B t2v on my 3060 12GB VRAM, so I always swap out characters later using VACE, since I cant use the Lora in higher quality Wan 14B i2v model without training on 14B models, and that would require renting a server).

I also tend to find the best to be around 400 to 500, which might be down to my setup. In the example below, I only ever used Epoch 475. It worked well enough I didnt need to try another. (The red marks are just examples of downswings not necessarily ones I picked, though the one I used consistently was around that 2nd last red mark at 475 in the below example, but it could just as easily have been at 200.)

I suggest you now go to the Showcase page to see how I made each video. In each you'll find the Comfyui workflows used to make them available to downloads. The latest knowledge will be found in the Research page.