As an AIGC enthusiast, have you ever found yourself uncertain about which model to choose when building a workflow to generate the video you envision? You're not alone. In this blog post, we’ll provide a comprehensive overview of the current video generation models compatible with ComfyUI, along with a detailed comparison of their capabilities and key differences. Today’s video models support a wide range of tasks — including text-to-video (T2V), image-to-video (I2V), and video-to-video (V2V) operations such as editing and temporal extension — offering flexible solutions for diverse creative and technical needs, and many of them allow lower level of GPU-and to save time, renting cloud GPU on a platform like RunC.AI with a very cheap price has been gradually trendy.
1. LTX-Video
Developed by AI video company Lightricks, LTX-Video is the first high-quality video generation model based on the DiT (Denoising Diffusion Transformer) architecture — marking a revolutionary advancement in AI-driven video creation. LTX-Video requires highly detailed prompts to achieve optimal results. The more specific the input — such as character appearance, background setting, and camera movement — the better the output quality. It supports a range of tasks, including text-to-video, image-to-video, video extension, keyframe animation, and video style transfer. Ideal use cases include rapid content creation for short-form videos and social media. An RTX 3090 allows smooth operation, while an RTX 4090 can generate approximately 5 seconds of video in around 20 seconds.
To download model and deploy by yourself: Lightricks/LTX-Video · Hugging Face
Price: Free
(Official site of LTX-Video)
2. Wan 2.1
Wan2.1 breaks new ground in generative video, outperforming all competitors across text-to-video, image-to-video, and video editing tasks. As the first model supporting bilingual Chinese/English text generation at 480P/720P resolutions, it features Wan-VAE for processing unlimited-length 1080P videos with perfect temporal coherence. From consumer GPUs to high-end 14B versions (requiring RTX 4090), Wan2.1 delivers professional-grade results for advertising, animation, and video production - redefining AI-powered content creation.
To download model and deploy by yourself: GitHub - Wan-Video/Wan2.1: Wan: Open and Advanced Large-Scale Video Generative Models
Price: Free, including commercial use (have to comply with license terms)
(Official site of Wan 2.1)
3. VACE
VACE, integrated with Wan 2.1, is a versatile AI model for video creation and editing. It supports multiple tasks—reference-to-video (R2V), video-to-video (V2V), and masked video editing (MV2V)—enabling flexible workflows. With features like Move/Swap/Reference/Expand/Animate-Anything, it unlocks creative possibilities while ensuring temporal and spatial consistency using Diffusion Transformer technology. Users can easily adjust dimensions, frame rate, and length, or use text prompts for object replacement. Optimized for consumer GPUs like RTX 4090, VACE Wan 2.1 makes advanced video editing accessible to all.
To download model and deploy by yourself: GitHub - ali-vilab/VACE: Official implementations for paper: VACE: All-in-One Video Creation and EditingPrice: Free
(Official site of VACE)
4. Wan2.1 SkyReelsV2 VACE
To address the common limitation of video duration in many generative models, SkyReels-V2 sets its sights on enabling infinitely long cinematic video generation. By integrating Multimodal Large Language Models (MLLMs), multi-stage pretraining, reinforcement learning, and diffusion-guided control, SkyReels-V2 achieves a highly optimized and scalable generation framework.
Beyond its technical advancements, SkyReels-V2 offers a diverse range of practical applications — including story-driven video generation, image-to-video synthesis, cinematographic guidance, and multi-character consistency through its SkyReels-A2 system.
To download model and deploy by yourself: GitHub - SkyworkAI/SkyReels-V2: SkyReels-V2: Infinite-length Film Generative model
Price: 25 credits for free trials for new users (enough to cover one video trial);
Standard: $28/month (4200 credits monthly or $336/year
Pro: $76/month (14200 credits monthly) or $912/year
(Official site of SkyReelsV2)
5. CogVideoX
CogVideoX is an open-source image-to-video generation model developed by Zhipu AI. It generates video from a single image and text prompt, leveraging a 3D Causal Variational Autoencoder combined with expert-adaptive LayerNorm technology. The model is capable of producing 6-second videos at 720×480 resolution. The codebase for CogVideoX is fully open-source and supports a wide range of applications, including education, virtual reality, entertainment, and social media content creation. As part of the broader CogVideoX series, the open-source release now supports text-to-video, image-to-video, and video extension tasks.
To download model and deploy by yourself: GitHub - THUDM/CogVideo: text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Price: Free. Better experience at the official site if subscribing membership (¥19/month, around $2.65/month)
6. HunyuanCustom
HunyuanCustom, developed by Tencent Hunyuan, is a multimodal video generation tool powered by the HunyuanVideo foundation model. It supports text, image, audio, and video inputs to create high-quality, customizable videos with strong subject consistency and fine-grained control. Users can generate realistic videos by uploading reference images and a text prompt, enabling dynamic changes in action, attire, and scene. In audio-driven mode, it supports lip-synced speech or singing, ideal for digital humans, virtual assistants, and educational demos. In video-driven mode, it allows object/person replacement or insertion into existing clips, enabling creative video editing and enhancement.
To download the the model and deploy by yourself: GitHub - Tencent-Hunyuan/HunyuanCustom: HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
Price: Free. Better experience at the official site if subscribing membership (¥49/1000 requests, around $6.8/month)
(Official site of HunyuanCustom)
7. OpenAI Sora
On February 16, 2024, OpenAI released Sora, a powerful video generation model capable of creating new videos from text prompts, images, or existing video clips. Sora supports variable resolutions and durations (up to 1 minute) and can simulate realistic physical properties — such as maintaining 3D perspective during camera motion.
Price: ChatGPT Plus $20/month (720p resolution and 10s duration videos); ChatGPT $200/month (Faster generation; 1080p resolution and 20s duration videos; 5 concurrent generation; download videos without watermark)
(Official site of OpenAI Sora)
8. Runway Gen-3 Alpha
Gen-3 Alpha represents Runway’s next-generation foundation model, developed on a cutting-edge infrastructure designed for large-scale multimodal training. It marks a significant leap in fidelity, consistency, and motion compared to Gen-2, advancing progress toward General World Models. Within the Gen-3 Alpha family, Gen-3 Alpha Turbo offers faster, more cost-efficient generation while maintaining high-quality output. Unlike the standard model, Turbo is accessible across all subscription tiers but requires an input image to generate results.
Price: Free trial 125 credits
Yearly Plan: $12 per editor/month. Billed yearly as $144; Monthly Plan: $15 per editor/month. Billed monthly. 625 every month for both plans.
(Official site of Runway Gen-3 Alpha)
9. Pika Labs
Pika video generation and editing tool developed by the startup Pika Labs. It allows users to change original videos and combine new elements simply by inputting text or images, supporting various styles such as 3D animation, anime, cartoons, and cinematic content. Launched in beta in late April 2023, Pika has attracted over 500,000 early users, generating millions of videos weekly. Users can access the tool via the Pika Labs Discord community, where videos are generated through dedicated bots.
Price: Standard: $10/month, $96 yearly; Pro: $28/month, $336/year; Fancy: $95/month, $912/year
(Official site of Pika AI)
10. MuseSteamer AI
MuseSteamer AI, developed by Baidu, is a cutting-edge audio-video generation model that debuted on July 2, 2025. As the world's first AI capable of synchronized Chinese audio-visual generation, it revolutionizes traditional AIGC workflows by producing visuals, sound effects, and voice narration simultaneously—eliminating the conventional "visuals first, audio later" approach. Through groundbreaking innovation, MuseSteamer enables seamless, integrated content creation. Adding a start or an end frame is a must.
To Use MuseSteamer AI Online: https://huixiang.baidu.com/
Price: Currently free for individual users.
(Official site of MuseSteamer AI)
11. Luma Dream Machine
Luma's Dream Machine supports text-to-video and image-to-video generation. Users can input natural language prompts or upload images to create dynamic video clips. The model produces diverse visual styles, including cinematic, animated, and photorealistic results. Notably, it features a built-in brainstorming tool that generates multiple style suggestions (e.g., sci-fi movie scenes) based on initial prompts. It also offers adjustable parameters like camera movement and lighting effects, significantly lowering the barrier to creative video production.
Price: Video creation is for subscribers only. Yearly: $83.99 for Lite, $251.99 for Plus, $797.99 for Unlimited
(Official site of Luma Dream Machine)
12. Kling.ai
Kling AI, developed by Kuaishou's AI team, represents a groundbreaking advancement in AI-powered video generation. This cutting-edge platform converts text prompts and static images into high-fidelity cinematic videos with remarkable realism and precision. At its core, Kling AI leverages an advanced Diffusion Transformer architecture, enabling deep semantic comprehension and physics-aware video synthesis. The system currently delivers 1080p output at 30fps, with 4K resolution capabilities under active testing - producing professional-grade results with exceptional detail, texture clarity, and fluid motion. Adding a start or an end frame is a must.
Price: Trial Package: 166 credits and 1 advanced trial. Standard $79.2/year; Pro $293.04/year; Premier $728.64/year
(Official site of Kling AI)
13. Google Veo
Google Veo, developed by Google DeepMind, is an advanced model capable of generating high-quality 1080p videos lasting over 60 seconds. It supports diverse cinematic and visual styles, precisely capturing the nuanced details and tonal subtleties of input prompts while delivering unprecedented creative control—the model intelligently interprets professional filmmaking directives, including specialized techniques like time-lapse photography and aerial landscape shots. Veo3 version allows adding a start or end frame (but not both simultaneously).
Price: Only for Google AI users. Google AI Pro ($19.99/month); Google AI Ultra ($249.99/month)
(Official site of Google Veo)
14. PixVerse
PixVerse AI is a powerful generative AI model capable of effortlessly transforming multimodal inputs into stunning videos in just minutes. It supports a wide range of multimodal inputs, including images, text, and audio. PixVerse offers customization options, allowing users to apply their own artistic styles to generated videos, ensuring unique and personalized results. Additionally, PixVerse features a distinctive "Multi-Keyframe Generation" function, enabling users to upload up to 7 images as keyframes. In the first-last frame mode, it can seamlessly generate coherent videos up to 30 seconds long, significantly enhancing creators' control over AI-driven video storytelling.
Price: Daily free 60 credits (non-membership 360p only)
Yearly (20% off): $8 for standard, $24 for Pro, $48 for Premium per month
Monthly: $10 for standard, $30 for Pro, $60 for Premium per month
(Official site of PixVerse AI)
Feeling overwhelmed by the sheer number of models available and unsure which one to choose? The truth is, each model has its own strengths — there’s no one-size-fits-all solution. One important difference is, for AI video creation platform like Pika, Gen-3 Alpha. Kling, users can only access their offical site to use their models. But for models like LTX-Video, Wan 2.1, VACE, users can run them on ComfyUI locally or on Cloud GPU like RunC.AI, which is totally free or costing $0.42 hourly.
Trial Comparison
To see the difference between these models, let's see what they would generate with same text prompt. (All the videos are created with free trial credits, 720p, 5s)
Prompt: A young noblewoman is resplendent in lavish medieval Rococo-style attire, her gown adorned with delicate lace and floral embroidery that cascades in elegant folds. Her powdered wig towers in an intricate updo decorated with pearls and pastel ribbons, the very epitome of aristocratic fashion. With slender, porcelain-white fingers, she daintily lifts a bite of strawberry cake to her rouged lips while seated upon an expansive emerald lawn. A uniformed attendant stands dutifully behind her, holding aloft a silk parasol trimmed with golden fringe to shield her delicate complexion from the sun's rays. At her satin-slippered feet, a snow-white Persian cat dozes contentedly amidst the flowing skirts of her dress, its tail twitching occasionally in peaceful slumber.
Wan2.1 SkyReelsV2 VACE
HunyuanCustom
CogVideoX
PixVerse
Runway Gen-3, MuseSteamer, and Kling need input image/start or end frame. Let's use one image from th movie Orlando with slightly changed prompt:
A young noblewoman is resplendent in lavish medieval Rococo-style attire, her gown adorned with delicate lace and floral embroidery that cascades in elegant folds. Her powdered wig towers in an intricate updo decorated with pearls and pastel ribbons, the very epitome of aristocratic fashion. With slender, porcelain-white fingers, she daintily lifts a bite of strawberry cake to her rouged lips while seated upon an opulent sofa, drinking coffee. A uniformed attendant walking toward her with a delicate tray, offering strawberry cream cake. At her satin-slippered feet, a snow-white Persian cat dozes contentedly amidst the flowing skirts of her dress, its tail twitching occasionally in peaceful slumber.
MuseSteamer AI
So, what's the option? The best approach is to experiment with different models and select the one that best fits your budget and creative vision. If you’re working with limited resources, the most cost-effective and time-saving option is to rent cloud GPUs. Deploying ComfyUI on a cloud GPU (an RTX 4090 meets most needs) and installing open-source models allows you to build your own tailored workflow — a smart, flexible choice. If you’re eager to dive in, consider joining the AIGC community via RunC.AI and ride the cutting edge of the AI-driven creative revolution.
About RunC.AI
Rent smart, run fast. RunC.AI allows users to gain access to a wide selection of scalable, high-performance GPU instances and clusters at competitive prices compared to major cloud providers like Amazon Web Services (AWS), Google Cloud, and Microsoft Azure.