Return to Comfyui Workflows (also accessible via "Workshop" in the menu bar)
"THE NAME OF THE GAME IS POWER"
WORKFLOWS & PROJECT DETAILS
🎥 Description
A "Mission Impossible" style short music video about how humanity let power accumulate in the wrong places.
-
Music: "Gone In 60 Seconds" by Mark DK Berry Available on Bandcamp
-
Date Video Published: 15th March 2025
🎬 About the Project
This is where the game changed completely.
With the release of Wan 2.1 i2v (image-to-video) model for Comfyui in early March 2025, it opened up a whole new world of creative possibility.
I wanted to do a Bond-style theme music video (having just heard Babs Brocolli sold her soul to Amazon and original Bond was good as dead). My old D&B track "Gone In 60 Seconds" seemed like a good choice to test it out.
It also gave me the opportunity to try out Flux training Lora on a character. To avoid upsetting anyone - or running into the problem I had with the last video I've Got All That You Need of inadvertently using a celebrity - I put myself in it. Nothing to do with my ego, of course.
I got critiqued for things like the motorbike rider having gloves, then not having gloves. But you have to draw a line and decide how much time and energy you are willing to give a project. This one was all about testing the new image-to-video model, so I had no idea what it would end up like while I was making it. I really was flying blind until I got to the final stages of putting it all together in Davinci Resolve.
This project led to me starting a csv for tracking shot names and information like best takes. After this project I also began storyboarding. At the end of it, I could see where this was going, and I knew it was going to get a lot more time-consuming and complicated from here on in.
⚠️ Key Challenges
-
I spent a few days researching workflows then picked the best I could find. I got lucky with one, and I ended up using the same Wan i2v 480 workflow almost exclusively because it included interpolation and upscaling, which cut down post-production time.
-
Character Consistency
Seemed easy, but in retrospect I wasnt challenged by it like I have been since. Going from image to video, when people turn their heads, it changes the face. You need a Wan (Video) Lora if you are doing video that doesnt involve single-direction facing, I had used a Flux (image) Lora in this case to make the base image have my face in it.
-
Clothing Consistency
I took what I was given, prompting for changes, but was less concerned because it wasn't important to me on this project. (Note: the most criticism came from professional animation artists and women; women notice subtle clothing differences).
-
Motion issues
For some reason Wan 2.1 doesnt like moving things. I often have to change seeds and fight with the prompt. Especially the car. Real challenging and I often had to adapt the shot to what it would give me.
-
Psychological Toll
I didnt have mental strain on this project or any of the previous ones. 5 days was the perfect length for my attention span (about a week tops, it seems). I was also excited to see what it would end up like and fuelled by that.
🔧 Workflows & Tools Used
-
The below zip file contains x2 Comfyui json workflows:
-
Flux-lora_training_workflow__3000steps_low10gbvram (this was fiddly to get working but ran on my hardware. You'll need to look into how to train Loras and what settings to use, its a journey. Also I no longer train Loras for images, I see no point, and will discuss this in future video projects).
-
nameofthegame-Wan2.1-i2v-workflow (I have since updated this and you'll want to check later projects if you want those, but this is good too, just missing a few new things that came out since to boost performance)
Right-Click and "Save link as" to download the ZIP containing Comfyui json workflows
-
-
Other useful Links:
-
Wan 2.1 workflow originated from - https://civitai.com/user/oscarchuncha654 .
-
For improving Wan 2.1 prompting check out text of the code in - https://github.com/Wan-Video/Wan2.1/blob/main/wan/utils/prompt_extend.py
-
⏱️ Time & Energy Investment
-
When making previous AI music videos, I limited myself to 5 days, but with the advent of image-to-video workflows, I aimed for better quality and more accurate depictions, thus requiring more time.
-
I aimed at 10 days but managed to complete this version in 8 days.
→ Total: 8 working days (There were no overnight batch renders on this one either, so day work only. And this is not including a number of additional days re-installing Comfyui after sage attention nuked it.)
💻 Hardware
-
GPU: RTX 3060 (12GB VRAM)
-
RAM: 32GB
-
OS: Windows 10
All of it was done on a regular home PC.
🧰 Software Stack
-
ComfyUI (Flux, Wan 2.1)
-
Essential Installations (if you want it timely on low spec hardware): Sage Attention, Triton, and Teacache (Note: First attempt sage attention install killed my Comfyui installation and it required rebuilding. But it was a blessing in disguise as my setup had become bloated)
-
Krita + ACLY plugin – Fast inpainting and upscaling
-
Topaz/Shotcut – Used for 16fps to 24fps interpolation (I tested Topaz enhancement features but didnt find them useful for digital video fixing. They make grand claims, but I didn't see it.)
-
Davinci Resolve 19 – Final cut and colour grade
🎨 Loras Used & Trained
- Created my own character Lora in Flux (3-hour process) to avoid copyright issues after previous experiences.
IMPORTANT NOTE: I believe it is essential to avoid future copyright concerns of using models trained on famous actors faces. It is likely that they will come up with a way to analyse videos and chase royalties for copyright in the future. Think on that now, then if your video goes viral and you earn a monkey, you won't have to worry about being sued later.
A flux LORA training workflow is attached in zip file and ran fine on my hardware.
📺 Resolution & Rendering Details
-
Flux/SDXL output size: 1344 x 768 (upscaled x2 for quality, then downscaled again on the way in to Wan 2.1)
-
I went with 832 x 480 in Wan i2v workflow going in, and this got upscaled again at the end of the workflow to 1920 x 1080. But looking at it later I realised I should have gone higher with the initial resolution. (I did some research at the start of the next project "Sirena" to better balance time versus quality).
-
Interpolated all clips from 16 fps to 24fps using Topaz. I didn't realise that the workflow was already doing that because I was outputing video reduced back down to 16fps which was creating slow motion which I was fine with for music videos. I had to do some later research into fps and how best to work with it. A better understanding only came after the "Sirena" project thanks to discussions with people on Reddit, and I will discuss it there.
-
I finalised in Davinci Resolve 19 (free version) and continued my education in applying color grades and LUTs. Again, new stuff, and I wasnt sure what I was doing in this one.
😵💫 Final Thoughts
Lessons Learned:
-
Organization: This process is going to require tracking much more information than before, including shot-naming conventions and good/bad takes. When lipsync and ambient audio comes along, this is going to be challenging work for one person.
-
Flexibility: The story I began with wasn't the story I ended up creating - it's easiest if you start with a plan but allow AI to redirect it based on what it will and won't do.
-
Quality vs. Storytelling: Sometimes sacrificing visual quality for storytelling works better, especially given hardware limitations. (Consider that people used to make do with small bad reception televisions in black and white).
This AI model really felt like a leap ahead into a new dawn in visual storytelling. It also increased the workload and the learning curve to make it happen.
🙏 Extra Thanks To:
-
Kijai on github for his stella work in the Comfyui community. He really has led the charge.
-
Oscarchuncha654 (civitai) for the original Wan 2.1 workflow that he has since removed, not sure why.
-
ACLY for Krita AI plugin