"SIRENA"

"Sirena" AI music video by Mark DK Berry.

WORKFLOWS & PROJECT DETAILS

🎥 Description

"In Latin, "sirena" translates to siren. The word originates from the Greek word Σειρήν (Seirḗn), which refers to the mythical creatures known for their enchanting voices and ability to lure sailors to their doom."

Music: "Under The Water" by Mark DK Berry – Available on Bandcamp
Date Video Published: 6th April 2025

🧜‍♀️ About the Project

"Sirena" was my seventh AI music video, and for this one I deliberately stepped out of my comfort zone to tackle something different: an underwater romance.

My main goal was to improve image and animation quality across the board. Unfortunately, despite giving myself extra time and effort, I didn’t quite reach the level I’d hoped for. While hardware played a part, character consistency and learning all the workflow settings were the real bottlenecks.

⚠️ Key Challenges

Character Consistency

A nightmare. I trained Flux Loras, which were decent, but unless it’s a full-face image, they don’t stay consistent in video creation.

Face swapping? A black art. Gave up after days of fiddling.

Wan 2.1 Lora training: Took two days to get working and produced poor results, and it slowed my machine so much it became unusable. I'm holding out hope for keyframing and video inpainting (like VACE) as future solutions.
Legs, Hair, Body Shape

Characters often morphed or warped, even from behind. Three shots had bad leg warping that I didn’t have time to fix.
Clothing Consistency

I put real effort into this, but many outfits didn’t stick once they became video.
Image Tiling Artifacts

Only spotted these late in production. The culprit? Flux or SDXL during upscaling. I patched some of it using blur tools in Krita, but didn’t have time to redo every clip.
Psychological Toll

Once I got past the 8–10 day mark, I was chasing diminishing returns. At 18 days, I was questioning whether I was going insane tweaking pixels no one would notice or care about.

🔧 Workflows & Tools Used On "Sirena"

The below zip file contains x5 Comfyui workflows:
- Wan2.1-i2v-14B_480-sirena-workflow
- Ace++ face swapping workflow
- Flux image creation with Loras workflow
- Flux inpainting with multi-stacking loras workflow
- Krita ACLY workflow exported using API for upscaling and inpainting workflow
Right-Click and "Save link as" to download the ZIP containing Comfyui json workflows
Other useful Links:
- Wan 2.1 workflow originated from - https://civitai.com/user/oscarchuncha654 . Still the best on my system after tweaking.
- For improving Wan 2.1 prompting check out - https://github.com/Wan-Video/Wan2.1/blob/main/wan/utils/prompt_extend.py
- Character creation (360°): - https://www.youtube.com/watch?v=8DRQenukHhk
- MickMumpitz Flux character creation method was useful too - https://www.youtube.com/watch?v=Uls_jXy9RuU (Note: At four months old, this method is not working any longer due to Comfyui updates, I think. I have migrated to other approaches anyway and will share them in the next project).

⏱️ Time & Energy Investment

13 days to reach the first rough cut (with lots of testing + research)
1 day colour grading
1 day fixing what I could

→ Total: 18 working days (and some nights running batch renders, and a number of additional days installing, re-installing, fixing broken installs, workflows, nodes, etc.)

I didn't properly track electricity use this time, but I think it was around 40 KWhs. I will track it in the future. It's possible there will come a moment where it is cheaper and faster to rent powerful servers and batch process everything, than to run a GPU locally for days and nights burning KWhs.

💻 Hardware

GPU: RTX 3060 (12GB VRAM)
RAM: 32GB
OS: Windows 10

All of it was done on a regular home PC.

🧰 Software Stack

ComfyUI (Flux, Wan 2.1, inpainting models)
Krita + ACLY plugin – Fast inpainting and upscaling
Topaz – Used only for 16fps to 120fps interpolation (not enhancement)
Reaper DAW – Storyboarding with shot names and timecode burned into MP4
Davinci Resolve 19 – Final cut and colour grade
LibreOffice – Tracking shot names, prompts, colour themes, fixes, etc.

🎨 Loras Used & Trained

Trained Flux Loras for both the fisherman and mermaid (10 images each; ~3 hours per Lora)
Trained Wan 2.1 Lora on WSL2 with help from The Art Official — ultimately not usable on my setup. It should work but is complex, I will have to research it better for next time.
FLux Loras from Civitai used in the video (see the video for their "looks"):
- Tango Dancer - https://civitai.com/models/96785/andrew-atroshenko-style
- Oil painting animation - https://civitai.com/models/757042/ob-oil-painting-with-bold-brushstrokes
- Impressionism paint effect - https://civitai.com/models/545264/impressionism-sdxl-pony-flux

📺 Resolution & Rendering Details

I need to work on this for the next project, it didn't go according to plan

Flux output: 1344x768 (upscaled x2, then downscaled in Wan 2.1)
Tested Wan 2.1 resolutions extensively:
- Started with 848x480 but it wasnt good enough.
- Final choices: 1024x592 and 1344x768 (40–120 mins per clip run overnight in batches mostly)
Interpolated all clips to 120fps via Topaz (In retrospect it was pointless. Davinci Resolve free version allows up to 60fps)
Tried to bring back an "analog" vibe in final color grading in DR

😵‍💫 Final Thoughts

The biggest hurdle was still character consistency. I trained Loras, tested face-swapping, tried everything, but nothing quite nailed it. Underwater scenes and low-res footage made things harder.

Prompting and camera direction was another headache. Wan 2.1 is better than Hunyuan, but not exactly "obedient." I tried short prompts, long prompts, "3-sentence" tricks with mixed results.

By the end, I was feeling frustrated. I had hoped for more photorealism and tighter characters. Instead, the video still felt cartoonish (though that was partly intentional). I haven't fully mastered this — or maybe the tech just isn't quite there yet.

There were many challenges and frustrations, by the end it was more about getting it finished and learning from the experience.

🙏 Extra Thanks To:

Cullen Kelly – Great color grading tutorials on YouTube.
The Art Official – Helped me get Wan training working in WSL2 (after this project finished).