Skip to main content

The spirit of open source community is about sharing knowledge freely

Q ≈ T + E

RESEARCH & DEVELOPMENT

AI Research Playlist For ComfyUI & AI Video Creation

ABOUT

Subscribe to my YouTube channel and you will recieve the latest research as I post it.

I have now mostly completed my July to September 2025 research phase. I will be taking my foot off the research and video posting pedal for now.

But before I dive into the next project I need to now look at shot management because tracking shots and assets through projects will get big and problematic if I do not, and there is some good and some bad news about that...

The bad news first. I had planned to release AIMMS as freeware, but the amount of work it will take to complete, and time I need to give it, means I will not be doing that. I will instead release it as a commercial product. I am sorry if you got your hopes up, but I have to try to make some dinaro somewhere, and hopefully that will help fund my future research and project time. But taking this approach means I can look at making a more robust product and something that will work (hopefully) cross-platform. More on that as I develop it.

The good news, is that I plan to spend the next couple of months working on AIMMS which is in version 1.0 (currently nicknamed STORMSY) and for now is a passive story board management system. More on that in the link.

MY APPROACH PLAN FOR NEXT PROJECT

This will be my standard approach based on my current research once I start the next project. It will change for specific tasks but most video clips will go through this process.

  1. Work with WAN based models in 16fps to get best quality video clip that I can. Resolution of 720p if I can do it, less if not. See FFLF as an example.
  2. 16fps to 24fps interpolation. In this stage using a ComfyUI native node to upscale it quickly to 1080p then process it through a RIFE workflow to get it to 24fps, which is my final resolution and frame rate. See Interpolation.
  3. I will then take that 1080p at 24fps into a USDU upscaling workflow (30 mins on average) and it cleans up any artefacts at a low denoise setting. See Upscaling.
  4. Fix character consistency with tools like Wanimate or VACE to target and fix faces using Compositing methods without losing quality of the original video clips. It's a slow process, but means returning detail and facial "look" consistency to multiple characters in a video is possible even at distance. Especially useful for LowVRAM working on cinematic story.
  5. Finally take the finished video to Davinci Resolve for colorisation and Final Cut edit.

💻 MY HARDWARE

All research is being done on a regular home PC.

I was feeling the limitations on the last project back in May 2025 for Footprints In Eternity. But the devs tireless work keeps bringing the magic within reach of even the Low VRAM cards.

Making short AI films is doable with this hardware, it just takes Time and Energy. You might not hit true 1080p but you can get pretty good results if you know how.

May 2025 felt like the equivalent of the 1920s Silent movie era, but September 2025 feels more like the 1970s, in terms of the quality of acting (human interaction) and camera work possible.

The research I do here I share freely. I hope to help anyone and everyone with a low VRAM card achieve the ability to make short films. In time, I think we will even be able to make full-length movies on these cards. We shall see.

back to top of page


Topics are presented below in alphabetical order, and not in the order videos have been made.

CAMERA ANGLES

GETTING NEW CAMERA ANGLES USING COMFYUI (Uni3C, Hunyuan3D)

Getting New Camera Angles Using Comfyui (Uni3C, Hunyuan3D)

Date: 5th September 2025.

About: This is a follow up to the video in Phantom x3 character consistency

In that video, we put 3 highwayman characters in a video clip sat together at a campfire, with highly accurate consistency of their face and clothing. What we need to get now, is new camera position shots for driving dialogue.

For this, we need to move the camera to point over the shoulder of the guy on the right while pointing back toward the guy on the left. Then vice-versa.

To do it, I use a still image from the original Phantom workflow, turn it into a 3D model, turn that into a rotating camera shot and serve it up as an Open-Pose controlnet. From there we can go into a VACE workflow, or in this case a Uni3C wrapper workflow and use Magref and/or Wan 2.2 i2v Low Noise model to get the final result, which we then take to VACE once more to improve with a final character swap out for high detail (see VACE - ADDING HIGH QUALITY CHARACTERS BACK IN TO A SHOT). This then gives us our new "over-the-shoulder" camera shot close-ups to drive future dialogue shots for the campfire scene.

It is just one method I use to get new camera shots from any angle - above, below, around, to the side, to the back, or where-ever.

Hardware: This is all done on a 3060 RTX 12 GB VRAM with 32GB system ram.

Workflows: To download the x3 workflows shown in this video right click here and download the zip file. The workflows can be found in the png files within it and you can drop them into Comfyui and they will load up. It contains the following:

  1. MBEDIT - make-Pose-controlnet-from-video-input.png
  2. MBEDIT - uni3c-rotate-camera-around-subject-using-Pose-controlnet.png
  3. MBEDIT-Hunyuan3D_model_from-single-image.png

back to top of page


CHARACTER CONSISTENCY

PHANTOM WORKFLOW FOR 3 CHARACTERS (2nd Sept 2025)

Workflow explainer using Phantom GGUF model to put 3 characters in a video clip and maintain consistency

Date: 2nd September 2025

About: This is the first in a series of videos where I will share my latest research into Comfyui on making short video clips for story-telling.

(Forgive the poor quality voice-over I was testing Microsoft's VibeVoice for the first time).

I am pretty excited this time as we are much closer to being able to make short films now with dialogue and believable human interaction.

Here, I share a Phantom workflow with you that takes 3 different characters and places them into a defined position according to the prompt. The important thing is they remain consistent in face and clothing. The Wan 2.1 based Phantom model can do that.

Hardware: This is all done on a 3060 RTX 12 GB VRAM with 32GB system ram.

The workflow : is embedded in the png metadata file (right-click the image and download the png file)

Phantom wrapper workflow in .png metadata Download Workflow PNG

You can just drop the png file into comfyui and it will load up the workflow.

If you have any questions find me via the comment section of the video, or on social media via the contact page


VACE - ADDING HIGH QUALITY CHARACTERS BACK IN TO A SHOT

UPDATE: this is for working with images and uses a single VACE/WAN "Low Noise" model in the workflow. For more advanced use and VACE 2.2 with video methods, see the other sections using VACE 2.2 for dual worklows. This workflow is still relevant for base image work - Mark, 15th October 2025.

VACE Swap High Quality Characters Back in To New Camera Angles (Image Version)

Date: 6th September 2025

About: Using VACE to Add High Quality Characters Back in To New Camera Angles (Image Version)

This follows on from the two previous videos where we got 3 guys sat around a campfire then rotated the camera position to get "over-the-shoulder" shots ready for making dialogue scenes.

What we now want is to revisit the quality of the characters in those new camera shot positions. And we add them back in at higher quality using VACE model. Yes, we are using VACE here as an image workflow, not a video workflow.

Low VRAM: The trick here for the low VRAM guys, is to use it to create the base image, we are not using it to swap a character back into a video (that comes later). What this means is we can hit higher quality 720p output which gives us decent quality character shot ready for the next step, which is driving the dialogue.

I provide an updated workflow below that uses a superior method to grey masking, but you can switch it back to use whatever method works for you. I'll leave the workflow below setup with it so you can try it. It uses a black mask outline with openpose instead of the usual grey mask (grey mask was shown in the video but I used the open pose on black silouhette method). This then goes into the input frame of the VACE encode node, same as grey mask would. I found this way allowed Wan 2.2 to do its magic even better, and got me the best results. I forgot to mention it in the video, but you can see what I mean in the image of the workflow png file below.

Masking issues: The below images highlight the importance of the reference image position (character with white background). If the ref image guy is in the wrong place, VACE will add it's own idea based on the prompt. The difference as you can see, is huge. Many are not aware of this aspect, and might not realise how good VACE is when you can get this right, only how bad it is when you don't.

brokeback mountain guy caused by bad mask position

The above is when it goes wrong. I got Brokeback Mountain cowboys because VACE wasn't happy with the position of the ref image so it only used the prompt.

solve VACE fails with accurate masking position

Discoloration issues: The other thing to be aware of is poor colour matching (see the arms of his jacket on the left of the below image). I dont recall what I did to fix it, but it works fine in the updated workflow.

solve VACE fails with accurate masking position

The workflow: The VACE Character swap-out workflow is embedded in the png file below. Right-click the below image to download it, then drop it into ComfyUI.

VACE character swap wrapper workflow in .png metadata Download Workflow PNG

Further VACE research: To learn more about what VACE can do I highly recommend looking through this NotebookLM from Nathan Shipley

back to top of page


COMPOSITING

Compositing is taking a small part of a video clip, fixing that section at a higher resolution, then putting it back into the original video clip.

The problem: Working on low VRAM means we dont often hit 720p - the coveted resolution for WAN models - and working in cinematics this means faces-at-distance often look terrible.

If our aim is consistent characters, then Upscaling wont work, because we lose face consistency in the process.

The solution: The solution is compositing. Though it has a big cost on Time + Energy, for now it is the only solution to maintain consistent character features for fixing faces-at-a-distance on Low VRAM GPUs.

If you find other solutions, especially faster ones, please let me know.

SHOTCUT + WANIMATE + DAVINCI RESOLVE

Compositing in Comfyui - Maintaining High Quality Multi-Character Consistency

Date: 10th October 2025

About: There are at least two other approaches that could be used instead of Wanimate - VACE and "Stand-In". Let me know if you hear of any others.

Pulling the composite out of the original video clip for each character is quick and easy with Shotcut. Just use a scale filter and zoom in so the character does not leave the frame, but is enlarged as big as you can make it. Don't worry about the blur. Exporting out from Shotcut to mp4 hasnt created contrast or lossy problems using the method I describe here.

Run the Shotcut exported mp4 composite for each character through Wanimate replacing the face. Try to target only the face with the mask as blending will be easier later if no clothing is changed, but it can work fine if that isnt working out.

Wanimate tends to have blocky masking which works best if characters are far enough apart. VACE can be more targetted - e.g. if you have two people close together in the shot - but it has its own drawbacks. Stand-In I havent tested much, but might work too.

Wanimate is very good at pushing consistency into low res composites. However, on my 3060 RTX I cannot get over 1024 x 576 resolution with Wanimate, which seems to be the minimum resolution required for it to work well. In fact, I sometimes have to blur the result to make it sit in the context of the original clip better, so going to a higher resolution may be of little value. 576p is good enough, but takes 30 minutes to composite 81 frames at 16 fps.

Take the original video into Davinci Resolve and then layer the fixed composite video clips over the top. Use zoom and opacity to move them into position covering the original. Then use the "composite" settings for the clip, along with "cropping" and "soften" to blend it into the original shot. If it is to perfect it might not blend, so add some blur. If the camera is not static then keyframing the crop for each composite is easier than setting up tracking, but either method will work. I can do 4 character composites into a shot and export it out in under 20 minutes if there are no problems. This method works surprisingly well.

In many shots I have 3 or 4 subjects and each has to be addressed seperately. Given the cost on Time and Energy for compositing that, whatever is quicker is going to be of value. This area is currently one of the worst aspects for LowVRAM, as it compounds the losses making visual story-telling on LowVRAM potentially too long to justify, since there are many times middle-distance shots are going to be necessary for cinematics.

Higher VRAM cards can achieve 720p and more, so they are less likely to run into the problem. Having said that, I can achieve 720p on my 3060 with VACE 2.2 dual model FFLF (first-frame, last-frame) workflows, but it didnt solve middle-distance shots at all. It's worth noting that during my testing of Upscaling solutions to fix faces-at-distance I had to go to 1600 x 900 before I saw face-at-distance be fixed with t2v model "detailer/polisher" methods.

(Workflow for Wanimate has had a recent update, so I wont share it here. It can be downloaded from Kijais custom nodes examples folder. But the video explains the method using Shotcut and Davinci Resolve, as well as the older Wanimate workflow).

back to top of page


CONTROLNETS

BLENDER FOR COMFYUI - TIPS, TRICKS, & CONTROLNETS

Blender for Comfyui - Tips, Tricks, & Controlnets

Date: 16th September 2025

About: There are a few approaches we could take, but this video is specifically for people who, like me, do not know how to use Blender.

In it, I take a 2D image, convert it to a 3D model with Hunyuan3D, and then use that in Blender to make a head swivel, which in turn gets converted to an Open Pose model. Both those workflows I share below.

I also mention use of the Grease Pencil in Blender, and ways to work with Blender using Co-pilot, so that you don't really need to know how to use Blender at all, you just ask Co-pilot as you work with it to get to an end result.

It is all about using cheap and dirty tricks to get the job done as fast as possible, so that we get what we need to drive ComfyUI to give us results. In this case for dialogue scenes so the man looks in the right direction while he is talking. In this instance we use "Vibe-Blendering" to achieve it.

The workflows: I share two. One is to make a Hunyuan3D model from a 2D image and is quick and easy. The other is to turn a 3D short grey model animation clip (made in Blender) into an Open Pose clip for use as a controlnet, and is also very quick.

Right-click and download the below png files and drop them into ComfyUI to load up the workflows contained in their metadata.

Hunyuan 3D model from 2D image workflow in png metadata Download Workflow PNG

The above is Hunyuan 3D model from 2D image workflow

Open Pose controlnet from short video clip Download Workflow PNG

The above will convert a video clip animation of a person (such as saved out of Blender in the video example) to an Open Pose clip that can be used to drive a controlnet based workflow such as posing characters with Lipsync.

back to top of page


EXTENDING VIDEO CLIPS

SKYREELS DF MODEL EXTENDING CLIPS

Skyreels DF model for Extending video clips

Date: 23rd September 2025.

About: This Skyreels DF model workflow takes an input video, and after setting the frame overlap, will use those as start frames to create a continuation video of 97 frames. This can then be extended further using the same approach for two more clips.

The problem is that color and quality degradation occurs the more clips created, and this is a common issue with these methods. However, this workflow is fast even on a 3060 RTX, finishing each set in around 7 minutes where similar WAN methods took 30 minutes or more.

In the video example above, it works well enough to provide an "extended pine forest fly-over sequence" of four 121 frame clips (including original video clip of 121 frames) that are then over-laid in Davinci Resolve to line up and create one long sequence.

It's not perfect but the approach creates the video structure that could be restyled or re-detailed easily at a later date, or rent a more powerful server GPU and run an upscaling detailer process on it.

It is a good example of where the AI solutions are not yet available locally in OSS, so we have to look at workarounds. We could also spend time blending the seams and fixing color degradation issues in Davinci Resolve, or simply create the underlying struture now, and wait for the solutions to arrive in time.

In this case I will do the latter. What it does is enable me to test the flow of the opening shots so now I can add the narration to it and get a feel for how it flows. I will circle back around later to fix the clips when better AI solutions arrive. Turning them to depthmap sequence and restyling, or rent a server and upscale it with a detailer.

If you know of other solutions that will run on Low VRAM cards, feel free to send them on.

Workflows: Right-click and download the SkyReels DF extension video workflow image png file below, then drop it into ComfyUI and it will load the workflow.

Skyreels DF Extend Video workflow Download Workflow PNG

The other Skyreels workflow shown in the video is Skyreels i2v with Uni3C workflow to drive the camera. Right-click and download the image png file below, then drop it into ComfyUI and it will load the workflow.

Skyreels DF Extend Video workflow Download Workflow PNG

back to top of page


FFLF

(First-frame, Last-frame)

720P FFLF USING VACE 2.2 & WAN 2.2

720p FFLF using VACE2.2 + WAN2.2 on 12 GB VRAM GPU

Date: 18th September 2025.

About: In this video we achieve 1280 x 720 x 81 frames for the first time in a single run on a 3060 RTX 12GB VRAM GPU.

What is interesting about it, is that it not only includes FFLF using VACE, but uses controlnets to drive the result. So this is not a basic workflow, and it came as I surprise to me that it even finished on a 3060 RTX, I'll be honest. What's more, this is a Wan 2.2 dual-model workflow.

The reason for this success is explained in the video, but I am using fp8_e5m2 scaled models for both Wan 2.2 and VACE 2.2 and doing this has resolved some issues I had with GGUFs. I'd also had issues when mixing different types of VACE and WAN models together. This suggests to me there is a lurking problem, as I am seeing others run into road-blocks too. Something for devs above my paygrade to figure out.

Here we have it working, and working very well. And note that this is using 19GB in file size per model load. This throws out the myth that a 12GB VRAM card can only load a model file smaller than 12GB. True in native workflows, maybe, but in wrapper workflows I have been able to work with larger sometimes. This video proves that to be the case.

I have only just got this working, but in the opening line from one of my favourite novels by Bukowski, "it began as a mistake". I would never have tried this if other models had not been behaving strangely.

I share this workflow "fresh off the press" though the FFLF workflow has been around a while and I'd long ago promised a video on it. So the timing was perfect, since I'd planned to share the 832 x 480 version and related bump-up stages to fix the results, but having a 720p version allowed me to get rid of all all that.

One slight CAVEAT: When I first tested this workflow I got 720p x 81 frames done in 30 minutes. In the video example it takes 47 minutes. That is because I had restarted Comfyui and it was loading up fresh. A second run should be closer to 30 minutes.

After completion, I ran the 720p resulting video clip through a "16fps to 24fps" interpolation workflow, then through the 1080p Upscaler/Detailer USDU workflow, then finally reversed the video using Shotcut. And the result of that is ready to go to Davinci Resolve for the Final Cut.

Updates: "Radial Sage Attention" didnt work out due it having resolution constraints, I either had to drop resolution or go up to 1280 x 768 which I tried but it didnt finish. And though FastWAN shaved off 5 minutes from total run-time using it on the Low Noise model, in comparison to end result without, I felt it lost quality so wont use it. And for the record, in further tests once the first run is completed I am finding it finishing around 30 minutes each subsequent go, so it is keeping some of the models in memory enough to save a fair bit of time. Around 8 minutes is spend decoding latents, so another reason why I am researching Latent Space next.

Workflows: Right-click and download the FFLF workflow image png file below, then drop it into ComfyUI and it will load the workflow.

FFLF 720p Wan22 and VACE22 with optional controlnets Download Workflow PNG

For the other workflows mentioned in the video - the "16fps to 24fps" interpolation workflow is in the INTERPOLATION section, and the Upscaler workflow can be found in the UPSCALING section.

back to top of page


INTERPOLATION

16fps To 24fps

Date: 17th September 2025.

About: In Footprints In Eternity I tried to get as smooth a result as possible and targetted 60fps or 120fps in some cases.

Since then I have learnt that in cinematics the public actually prefer 24fps. And since my interest is not high-quality gamer perfection but is decent story-telling narrative at 720p or 1080p, then I no longer need to target those higher frame rates. Which suits me fine. It was a PITA anyway, because 5 seconds at 120fps is 600 frames.

The problem to solve is that Wan models work to 16fps. So how do we get from 16fps to 24fps smoothly?

It is actually pretty easy. We use RIFE or GIMM to interpolate the 16fps x3, which gives us 48fps. We then use a node to apply "use every 2nd frame" and that gives us the end result of 24fps, but done using a properly applied mathematical approach.

Why do we do it this way? So that we don't skip over frames and end up with jumpy video. I do not know how Topaz works to go from 16fps to 24fps and havent tried it. Given the Comfyui process is very fast even on a 3060 RTX GPU, I might not even bother reaching for Topaz during this stage.

For the record, I plan to use the following steps when going for production ready results:

  1. Work in 16fps to get best quality video clip that I can. Resolution is 720p if I can do it, less if not.
  2. 16fps to 24fps interpolation. But before I do this, I use a ComfyUI native node to upscale it quickly to 1080p
  3. I then take that 1080p at 24fps into a USDU upscaling workflow (30 mins on average) and it cleans up any artefacts at a low denoise setting.
  4. I now have a finished result and that will go to Shotcut to reverse the action if that is required, and to Davinci Resolve for colorisation and Final Cut edit.

If the above changes, then the workflow below might need to change, but for now that is my planned process when going from image, to video, to production ready clips.

Workflows: Right-click the below workflow and save it as a png file then load it into ComfyUI for the workflow.

RIFE 16fps to 24fps with 1080p upscale workflow Download Workflow PNG

back to top of page


LATENT SPACE

LATENT FILES & FIXING LONG VIDEO CLIPS ON LOW VRAM

Latent Space - Part 1 - Latent Files & Fixing Long Video Clips on Low VRAM

Date: 25th September 2025

About: In the video I discuss the "Secret Society of Latent Space" and why it is hard to crack. There isnt much info out there on working in Latent Space with video clips, and almost no workflows available.

In this video I share how I used it to test upscaling and detailing of a low-quality 32-second long extended clip made in ComfyUI using SkyReels DF workflow in the previous episode. You can download that workflow from the Extending video clips section of this page.

Now, I use a Latent Space workflow and file loading methods to fix the seams and add structural detail into the 32-second-long clip by breaking it into 120 frame long parts.

It was a good first test going from 480p to 720p to add the detail back in subtly while helping to fix the continuity. In this case using Phantom t2v model in a wrapper workflow to do it. I will do more videos on working in Latent Space as I master the art.

The workflow: Right-click on the below png image and save it. Drop it into Comfyui and it will load the workflow. NOTE: The automation part does not currently work and I have not found any help to resolve that yet. I will share updates here as I resolve them.

Also note that to load latent files you will need to manually move them to the Comfyui input folder.

Latent Space upscale/detail workflow contained in the .png metadata Download Workflow PNG

back to top of page


LIPSYNC

DIALOGUE - PART 1 - INFINITE TALK (Audio To Video)

Infinite Talk using audio files to drive the video in an image-to-video workflow.

Dialogue - Part 1 - Infinite Talk

Date: 7th September 2025

About: In this episode we open with a short dialogue scene of my highwaymen at the campfire discussing an unfortunate incident that occured in a previous episode.

It's not the best lipsync approach but it is the fastest. It uses a Magref model and Infinite Talk along with some masking to allow dialogue to occur back and forth between 3 characters.

In previous episodes I created five or six camera position shots: in front of the three men, close up over the shoulder from two directions, and some not-so-close up camera positions. Some worked and some didn't. They are utilised in the first part of the video so you can see how that went, and then discussed in the second part.

I used VibeVoice to make short text-to-speech audio clips. I used Audacity to trim audio and boost levels when needed. I used silent audio clips along with masking and prompting, to stop other men talking.

All of this I took into Davinci Resolve to add some sampled "meadow ambience" to create a cohesive sound bed. I could have done a lot more like color masking and reverb over the entire thing, but I would only do that for a production finish. This example was just goofing around to see what could be done, but it works well enough. The 1.5 minute of video dialogue took about a day to complete.

Infinite Talk is pretty good, and I let it roll for a 10-second-long clip which took 20 minutes on my 3060 RTX 12 GB VRAM at 832 x 480 and 25fps.

But this approach has inherent weaknesses. I'd use it as a test run for its speed and probably use some of the results in a final and could probably get a way with it. Though this was just to show what could be done.

Part 2 will be about using video controlnets to drive the lipsync and the facial expressions, which should give better control over the results, but of course, takes more time.

The workflow: The infinite Talk workflow with Masking is provided below. Right-click on the png image and save it. Drop it into Comfyui and it will load the workflow.

Infinite Talk workflow contained in the .png metadata Download Workflow PNG

back to top of page


DIALOGUE - PART 2 - INFINITE TALK + UNI3C, UNIANIMATE, FANTASY PORTRAIT

Dialogue Part 2 - Infinite Talk with Uni3C, UniAnimate, and Fantasy Portrait

Date: 17th September 2025

About: In "Dialogue Part 1" I looked at Infinite Talk on it's own. In this video I discover what does, and doesnt work with Uni3C and UniAnimate, then go on to show how powerful Fantasy Portrait with Infinite Talk can be.

When I start my next project I will likely switch between using Infinite Talk on its own and with Fantasy Portrait. I like them both. Infinite Talk provides a certain randomness to the human movement which is often unexpected and works.

I explain the pros and cons of all that in the video. It is possible you can mix some of these methods but I haven't tried.

If you have any questions on useage then contact me via my contact page. I will do my best to help.

Pre-requisites: This runs on my 3060 RTX 12GB VRAM. I also have pytorch 2.7, Sage Atten 1, and CUDA 12.8 installed. If you dont have those things it might be that causing issues, so confirm that first before contacting me with any concerns on workflow useage. (NOTE: at this time pytorch 2.8 is known to have some problems, either downgrade to 2.7, or upgrade to 2.9)

The workflow: The workflow contains all 4 methods and a switchable option, and may seem complicated but should be clear from the video. I left all my notes in it should you run into problems, but you can delete them otherwise.

Download the workflow from the png below by right-clicking and saving it. Then drop it into ComfyUI and the workflow should load.

Infinite Talk with Uni3c, Unianimate + Fantasy Portrait workflow contained in the .png metadata Download Workflow PNG

back to top of page


MASKING

VACE 2.2 - EXTENDING VIDEO CLIPS

VACE 2.2 - Part 1 - Extending Videos

Date: 11th October 2025

About: VACE is a powerful tool and I call it the "Swiss Army Knife" of Comfyui for good reason.

Read Nathan Shipley & Adrien Toupet article to find out more about the early form uses for VACE 2.1 that they shared in the notebook. It will give you a good grounding on some of what VACE can do.

In this video I discuss VACE 2.2 in a WAN 2.2 dual model workflow and use it for extending a video clip to drive 81 new frames using just 17 frames from the end of the previous clip. I use a prompt to inform VACE what I want in the 64 remaining frames.

I then overlay the output in Davinci Resolve to produce the final result from both clips.

It's the first time I used it for this purpose, and was surprised at how good the results were, so will be testing it further.

We need two kinds of frames when masking - middle grey images define the replacement frames. But for the actual masking we need black and white frames. VACE treats black frames as frames to be ignored, while white will be overwritten by VACE.

Along with prompting this gives us some powerful tools, with some caveats around expectation (I will discuss those in the next video, Part 2).

This workflow is based on the FFLF VACE+WAN 2.2 dual model workflow I shared in the FFLF section but has additional nodes for the masking tasks mentioned.

NOTE: I have only completed the first video on "extending" with VACE and will finish the others over the coming days, so for now will only add the x1 extension video & workflow, and will post the rest here as I complete them, which should be this week - Mark, 11th October 2025.

Workflow: To download the VACE 2.2 WAN 2.2 dual workflow for extending video clips shown in this video [right click the below image and download the png file then drop it into Comfyui to load the workflow it contains.

VACE 2.2 WAN 2.2 dual workflow for extending video clips. Download Workflow PNG

back to top of page


VACE 2.2 - INPAINTING WITH CAVEATS

VACE 2.2 - Part 2 - Inpainting with Caveats

Date: 11th October 2025

About: In this video I show how inpainting works to add in a missing shadow under a horse. I then show why this isnt the best approach for adding new items into existing video clips. I do this using a bear, while showing a variety of approaches we can use to tackle VACE.

Finally I include fixing an extended video clip where there was jarring on a seam, a roaming campfire, and wandering trees.

This video explores some of the caveats and issues with inpainting and explains its strengths and weaknesses in this context.

A future video will address inpainting to replace targeted subjects with VACE 2.2, but if you want a VACE 2.1 workflow that can do it, see the "Footprints In Eternity" workflow download link.

Workflow 1: To download the VACE 2.2 WAN 2.2 dual workflow used to target the horse shadow, right-click on the below image and save it as a png file, then drop it into comfyui to load the workflow it contains. Use the video as reference for setting up your own video masks to inpaint parts of frames as opposed to entire frames.

VACE 2.2 WAN 2.2 dual workflow for inpainting examples like a horse shadow. Download Workflow PNG

Workflow 2: To download the VACE 2.2 WAN 2.2 dual workflow for fixing jarring seams, roaming campfires, and wandering trees in video clips, right-click the below image and download the png file then drop it into Comfyui to load the workflow it contains. You may want to watch the video in VACE 2.2 - Extending Videos as it uses the same workflow and nodes for extending a video as for inpainting a section of video.

VACE 2.2 WAN 2.2 dual workflow for inpaint fixing video clips. Download Workflow PNG

Both the above workflows are based on the workflow used in FFLF but have nodes changed to suit the task at hand. They are all very similar to each other. The video shows a variety of additional approaches you can take with this workflow if you wish to experiment.

A future video will address inpainting to replace items in videos.

back to top of page


VACE 2.2 - CHARACTER SWAPPING

VACE 2.2 - Part 3 - Character Swapping & Replacing Objects

Date: 14th October 2025

About: In this video I show how to use SAM2 points editor, or Florence 2 text description to target a person or object with a mask, then replace the masked target using a reference image. This works very well.

One minor drawback is first frame flashing that occurs in VACE 2.2 with reference images but most of the time if done correctly this only happens during the High Noise model stage and is fixed during the Low Noise pass.

If that doesnt work, try changing the seed, changing the prompt, or making sure the reference image is roughly in the same position as the target being masked in the first frame. This is another common aspect of VACE 2.1 and 2.2 that helps the VACE node do a good job, but isn't always mentioned in tutorials on VACE.

Workflow: To download the VACE 2.2 WAN 2.2 dual workflow used to target a person or object for replacement, right-click on the below image and save it as a png file, then drop it into comfyui to load the workflow it contains. Use the video as reference for using the two approaches offered in the workflow for masking the video.

VACE 2.2 WAN 2.2 dual model workflow for character swapping using target mask. Download Workflow PNG

back to top of page


MEMORY MANAGEMENT

UPDATE: The t5 Text Encoder mentioned in the video on the YT channel has been superceded by the "T5 WanVideo Text Encoder Cached" node. See notes below for updated info - Mark, 25th September 2025.

Memory Tricks in Comfyui for Low VRAM GPUs

Date: 25th September 2025

About: Getting everything running on a 3060 GPU 12GB VRAM with 32GB system RAM has its challenges. In the video above I shared a few memory tricks that helped me get Comfyui working with less OOMs and better workflow capacity on a 3060 RTX 12 GB VRAM with 32GB system ram on a Windows 10 PC.

This came about after Wan 2.2 dual model workflows got released, and all I was getting was OOMs and Blue Screens (BSOD). These tricks sorted it out, and I've kept with them since. They might work for you too.

UPDATE (25th Sept 2025): Since making this video I looked into why I wasn't getting the results I expected with the "T5 Text Encoder" approach mentioned in the video. The below is the updated approach, and the node I will be using moving forward.

I had the opportunity to ask Kijai about the issue with t5 nodes and got this response:

"use this node. that's the only node that fully unloads it from RAM too."

I then asked him about the movie director you see in the screen shots below, and he said this:

"it's the official prompt enhancer yea dunno if it's good, just added it to match the original if wanted needs either of those models - https://huggingface.co/Kijai/WanVideo_comfy/tree/main/Qwen
it doesn't take any VRAM or RAM from the model inference when used with the Cached -node"

T5 'WanVideo Text Encode Cached' node Download Image

I will be using the above in all wrapper workflows from now on and I will be loading it into the GPU not the CPU but will test results of switching between on taxing workflows. The below screenshot is why I will be using this t5 cached node version.

T5 'WanVideo Text Encode Cached' impact on memory saving is fantastic. Download Image

As you can see the "sharkfin" effect is where it loads the t5 and then drops it. but better still on the left of the screenshot above you can see it is creating a cache of the prompt. So if you do not change your prompt and do not delete that cache then it doesnt need to load the model on the next run, it will use the cached copy.

The extender plug-in offers a movie director which will take a prompt and improve it. Download Image

The above is the "movie director" that I also asked Kijai about. It puts the result into the windows command prompt so needs cut and paste from there to tidy up, but I wont be using it myself so that will remain unplugged in my workflows.

FYI, if you don't see the t5 "WanVideo Text Encoder Cache" node being used in my future videos, it is only because I forgot to swap it in.

back to top of page


SHOT MANAGEMENT

I talk a bit about shot management in the Overview page. But newer information is shared below.

Shot Management & Why You Will Need It

Date: 30th September 2025

About: In this video I discuss the importance of shot management and offer a csv approach solution. I also discuss important information to keep in csv columns that will help in making a film cohesive, as well as what I no longer consider useful.

These are lessons I learnt on "Footprints In Eternity" which was a 10 minute narrated noir that took me 80 days to complete. I discuss that experience in the workflow page for that project.

Despite only being 120 shots in length "Footprints" was still difficult to manage because each shot often had 5 or more takes and changes associated to it and there were images and videos to keep track of.

Now we are able to make lipsync, dialogue, and extend video clips, we are rapidly moving into an area where movie-making is going to be possible on local machines. Without good shot management, we will soon run into problems. A full length feature film will require at least 1400 shots and maybe x10 times that number in take files that need to be tracked.

I only talk about the csv and basics in this video, but will be using the next project to test and streamline AIMMS which is a shot management system I hope to launch as Freeware before the end of 2025. This will provide a more visual image and video tracking system which will replace the csv approach. More on that as I get nearer to releasing AIMMS version 1.0.

back to top of page


SLOW MOTION

SMOOTH SLOW MOTION FOR CAMERA ZOOM-IN SHOTS

This is specifically for making buttery-smooth slow motion for camera shots that zoom slowly in toward an object in the centre of the frame without changing trajectory.

Slow Motion Smooth Zoom-in For ComfyUI (Discussion video)

Date: 20th September 2025

About: I use ComfyUI FFLF workflow shared in the FFLF section of this page. This makes 81 frames at 16 fps and a 5 second long video clip.

I take that to a "16fps to 24fps" interpolation workflow shared in the Interpolation section of this page, and that also upscales the clip to 1080p.

At this point it looks okay, but on closer inspection you will see problems with things like trees and branches. We solve this next by pushing the 1080p 24fps 5 second video clip through the USDU detailer workflow you can find on the Upscaling section of this page.

We are nearly there. But we have two things left to do.

Firstly we use Shotcut to motion interpolate the video from 24fps to 120fps. Do this right by the video above, and you will not see any quality loss.

We then take the result into Davinci Resolve 20 (free version, but be aware of commercial caveats if you don't buy it). In Davinci Resolve we work in a 24fps timeline, and there we right-click our imported video, slow it down by 4 or 5 (from 120fps) to a reasonable 24fps speed, which also makes our 5 second clip run for nearly 20 seconds.

We have just one last trick to play, and that is using re-time in Davinci Resolve (also a free feature) to set re-time process and motion expression to the highest settings we can.

And we are done. We now have a butter-smooth 20-second long 24fps video clip of a slow-motion camera zoom-in.

NOTE: YouTube degrades the quality of videos, below is the final clip. 20 seconds long in 1080p @ 24fps:

20 second final clip of Slow Motion Smooth Zoom-in For ComfyUI

back to top of page


UPSCALING

Detailed discussion on using the Upscaler workflows shared below.

UPCALING TO 1600 X 900 X 81 FRAMES ON LOW VRAM GPU

Date: 10th September 2025

About: Not many people realise, but even on a low VRAM card such as the 3060 RTX 12GB VRAM GPU, we can now take a video clip that looks like this:

A screen shot of a low quality crowd seen with AI faces that look collapsed Download Image

and with a simple t2v workflow set to low denoise, end up with a video clip that looks like this:

Screen shot of fully fixed faces in a crowd scene. Download Image

Both of the above are screen shots from a video clip created at 480p and fixed and upscaled to 900p in ComfyUI using the below upscaler workflow on a 12GB VRAM GPU.

Admittedly my use case is slightly different, because I need to maintain character consistency when applying the upscaling process, and I talk about how in the video at the top of this section.

Workflow: To download the 1600x900 x 81 frames upscaler workflow used in the above shot, right-click the image below, save it as a png file, and drop it into ComfyUI

1600x900x81 upscaler workflow in .png format Download Workflow PNG

UPSCALE A 1080p VIDEO CLIP USING "ULTIMATE SD UPSCALER"

Date: 10th September 2025

About: This is a recent find from user "Muri-Muri" on Reddit, so I havent had much time to test its potential but I share it here anyway as promised in the video at the top of this section.

It could probably do the work of a complete upscaling restyler or fix faces at distance, but I tested it as a polisher workflow to give an existing 1080p 24fps 5 second video clip a final touch.

I share it here as-is, and so far only got it to 65 frames not to 81, but using low denoise of 0.08 meant I was able to run through the same video twice doing 65 frames each time start and end.

I then used Shotcut to join the two clips back together to get the final 1080p 24fps 5 second clip but with extra polish. All this on the 3060 RTX 12GB VRAM. Bigger VRAM cards will finish it in one go with ease.

The workflow ran in 20 minutes to 65 frames, but struggled with 2.5 hours for 81 frames on my rig, hence the split frame approach in two goes then piecing it together in Shotcut with no visible seam due to the low denoise used. I'll address getting it down to a one-shot run in the coming weeks.

I will update this section as I discover more about what this workflow can do, and put it through its paces. The important thing here, is it gives us access to the USDU (Ultimate SD Upscaler node) which is a go to for me when fixing and improving images. We can, with this worklow, now apply USDU to video clips.

( UPDATE NOTE: I managed to get 1080p x 121 frames @ 24fps out of this in one go in 30 minutes by changing the steps to 2, and tiling height and width to 512 x 512. Play around with that tiling will make a difference.)

Workflow: Download the USDU upscaling workflow from the below image by right-clicking, saving it as png file, and dropping it into ComfyUI. Let me know what you figure out.

1600x900x81 upscaler workflow in .png format Download Workflow PNG

back to top of page



AREAS STILL REQUIRING SOLUTIONS

Below is a list of things that needed solving or improving as of June 2025. I will update them as I get time.

back to top of page


USEFUL SOFTWARE

100% open source software (actually Davinci is not OSS, but is free with licensing caveats)


back to top

Related Content

Narrated Films

Collection of ComfyUI workflows for AI narrated film creation, including detailed explanations and...

workflows ai-videos comfyui

Sirena music video

AI music video 'Sirena' by Mark DK Berry - underwater romance concept with detailed workflows and...

workflows ai-videos comfyui

The Name Of The Game Is Power

AI music video 'The Name Of The Game Is Power' by Mark DK Berry - Bond-style theme with Wan 2.1...

workflows ai-videos comfyui

I've Got All That You Need

AI music video 'I've Got All That You Need' by Mark DK Berry - workflows and project details for...

workflows ai-videos comfyui

I'm Still Alive (In The Naked Disco Of My Mind)

AI music video 'I'm Still Alive (In The Naked Disco Of My Mind)' by Mark DK Berry - film montage...

workflows ai-videos comfyui

Baes With Guns

AI music video 'Baes With Guns' by Mark DK Berry - workflows and project details for ComfyUI...

workflows ai-videos comfyui

Kali

AI music video 'Kali' by Mark DK Berry - workflows and project details for ComfyUI implementation...

workflows ai-videos comfyui

Cafe

AI music video 'Cafe' by Mark DK Berry - first AI video project using ComfyUI and Hunyuan...

workflows ai-videos comfyui

Footprints In Eternity

Narrated noir AI video 'Footprints In Eternity' by Mark DK Berry - detailed workflows and project...

workflows ai-videos comfyui

Next Project

Future AI video projects by Mark DK Berry - research, development, and upcoming workflow...

workflows ai-videos research

Music Videos

Collection of ComfyUI workflows for AI music video creation, including detailed explanations and...

workflows ai-videos comfyui

We use cookies to improve your experience on our website. By continuing to browse, you consent to our use of cookies. Learn more