The Landscape¶
This work is licensed under a Creative Commons Attribution 4.0 International License.
A Glance at the Generative AI Landscape (2024-2025)
The field of Generative AI is rapidly evolving. This section provides a snapshot of some of the most influential models and platforms as of early 2024, with a look towards what we might expect in 2025.
Image Credit: Yang et al. (While this image depicts the state of LLMs in 2023, it effectively illustrates the foundational models and their evolution)
View the HuggingFace Arena LLM Leaderboard¶
Table: Prices of Services (last checked 06/2025)¶
LLM Service | Plan | Price (per month) | Details |
---|---|---|---|
Anthropic Claude | Free | $0 | Access to Claude 3 Sonnet with usage limits |
Pro | $20 | 5x more usage, access to Claude 3 Opus and Haiku, priority access | |
Team | $25/user (min 5) | Everything in Pro plus central billing, team collaboration features | |
Claude API | Pay-As-You-Go | Varies | Claude 3.7 Sonnet: $3/1M input, $15/1M output Claude 4 Opus: $15/1M input, $75/1M output Claude 3.5 Haiku: $0.25/1M input, $1.25/1M output |
Google Gemini | Free | $0 | Access to Gemini Pro with usage limits |
Gemini Advanced | $19.99 | Access to Gemini Ultra 1.0, 2TB storage, integration with Google Workspace | |
Gemini Business | $20/user | Access to Gemini in Workspace apps (Docs, Sheets, Slides, Meet) | |
Gemini Enterprise | $30/user | Advanced features, enhanced security, admin controls | |
Vertex AI Gemini API | Pay-As-You-Go | Varies | Gemini 1.5 Flash: $0.075/1M input, $0.30/1M output Gemini 1.5 Pro: $1.25/1M input, $5.00/1M output Gemini 2.5 Pro (128k): $3.50/1M input, $10.50/1M output |
OpenAI ChatGPT | Free | $0 | Access to GPT-4o mini with usage limits |
Plus | $20 | Access to GPT-4+, DALL-E 3, advanced data analysis | |
Pro | $200 | Unlimited access to o1, o4-mini, GPT-4.5, and Advanced Voice | |
Team | $25/user | Everything in Plus with higher limits, admin console, team workspace | |
Enterprise | Contact Sales | Unlimited high-speed GPT-4+ models, extended context windows, enterprise security | |
OpenAI API | Pay-As-You-Go | Varies | GPT-4o: $5/1M input, $15/1M output GPT-4 Turbo: $10/1M input, $30/1M output GPT-4: $30/1M input, $60/1M output GPT-3.5 Turbo: $0.50/1M input, $1.50/1M output |
Perplexity AI | Free | $0 | Limited searches with Perplexity model |
Pro | $20 | Unlimited Pro searches, file uploads, API access, choice of models (GPT-4, Claude, Gemini) | |
Enterprise | Contact Sales | Team management, enhanced security, SSO, dedicated support | |
Microsoft Copilot | Free | $0 | Access to GPT-4, limited image generation with DALL-E 3 |
Pro | $20/user | Priority access, faster performance, 100 boosts/day with DALL-E 3 | |
Microsoft 365 Copilot | Business | $30/user | AI in Word, Excel, PowerPoint, Outlook, Teams. Requires M365 license |
GitHub Copilot | Individual | $10 | AI pair programming in VS Code, Visual Studio, Neovim, JetBrains |
Business | $19/user | Everything in Individual plus organization management | |
Enterprise | $39/user | Everything in Business plus security vulnerability filtering, IP indemnity | |
Mistral AI | La Plateforme | Varies | Mistral 7B: $0.25/1M tokens Mixtral 8x7B: $0.70/1M tokens Mistral Small: $2/1M input, $6/1M output Mistral Large: $8/1M input, $24/1M output |
Cohere | Free Trial | $0 | Limited API calls for testing |
Production | Varies | Command: $1/1M input, $2/1M output Command Light: $0.30/1M tokens Embed: $0.10/1M tokens |
|
Midjourney | Basic | $10 | ~200 image generations/month |
Standard | $30 | 15 hrs fast GPU time, unlimited relaxed | |
Pro | $60 | 30 hrs fast GPU time, stealth mode | |
Mega | $120 | 60 hrs fast GPU time, stealth mode | |
DALL-E 3 | Via ChatGPT Plus | Included | Image generation within ChatGPT |
API | Varies | Standard: $0.040/image, HD: $0.080/image | |
Stable Diffusion | DreamStudio | $10 | 1000 credits (~5000 images) |
API | Varies | $0.002 per image (512x512) | |
Grok by xAI | X Premium | $8 | Access via X (Twitter) Premium |
X Premium+ | $16 | Priority access, higher limits | |
Character AI | Free | $0 | Limited features and queue priority |
c.ai+ | $9.99 | Priority access, faster responses, exclusive features | |
Replicate | Pay-As-You-Go | Varies | Run open-source models, pricing per second of compute |
Hugging Face | Free | $0 | Community models and datasets |
Pro | $9 | Advanced features, private repos | |
Enterprise | Contact Sales | Dedicated support, SLAs, security features | |
Amazon Bedrock | On-Demand | Varies | Access to Claude, Llama 2, Stable Diffusion, and more |
Google Vertex AI | On-Demand | Varies | 130+ foundation models including Gemini, Claude, Llama |
Azure AI Studio | On-Demand | Varies | Access to GPT-4, Claude, Llama, Mistral, and more |
Meta Llama | Open Source | Free | Llama 2 and Llama 3 models for download |
Ollama | Local Install | Free | Run LLMs locally on your hardware |
LM Studio | Local Install | Free | Desktop app for running LLMs locally |
Jan.ai | Local Install | Free | Open-source ChatGPT alternative, runs locally |
Continue.dev | Open Source | Free | Open-source autopilot for VS Code and JetBrains |
Poe by Quora | Monthly | $19.99 | Access to various chatbots including GPT-4, Claude |
Yearly | $199.99 | Annual subscription with all chatbot access | |
You.com | YouPro | $20 | Latest AI models, personalized AI with memory |
Jasper AI | Creator | $49 | Writing assistant with templates |
Teams | $125 | Advanced features for small teams | |
Business | Contact Sales | Custom pricing for organizations | |
Replit AI | Core | $20 | AI coding assistant integrated in Replit IDE |
Notes:
- Token pricing for API access can be complex. Refer to each provider's pricing page for the most accurate and up-to-date details.
- "Contact Sales" typically indicates that pricing is customized based on usage, features, and the specific needs of the customer.
- Many services offer free trials or limited free tiers, allowing you to test them out before committing to a paid plan.
Additional Chatbot and LLM Services:
-
Amazon Bedrock, Azure AI Foundry, Google Vertex: Provide access to various foundation models but each run on a respective cloud service provider's hardware. Ideal for companies and institutions already running their infrastructure on commercial cloud services.
-
You.com: Offers a pro plan with access to latest AI models, personalized AI with memory and advanced AI writing tools.
-
Poe by Quora: A platform that gives you access to various chatbots (like GPT-4, Claude, etc.) through a single subscription.
Image and Video Generation Models
Image Generation Models¶
Stable Diffusion 3.5
Stable Diffusion 3.5 is the latest iteration from Stability AI, featuring multiple model sizes: - SD3.5 Large (8B): High-quality generation with advanced prompt adherence - SD3.5 Medium (2.5B): Balanced performance and quality - SD3.5 Large Turbo: Optimized for speed with 4-8 step generation
Models are available via HuggingFace, GitHub, and various APIs.
FLUX Models
FLUX by Black Forest Labs (creators of Stable Diffusion) offers state-of-the-art diffusion models: - FLUX.1 [pro]: Top-tier model for commercial use - FLUX.1 [dev]: Open-weight model for non-commercial use - FLUX.1 [schnell]: Fast local generation model
Other Leading Image Generation Models
- DALL·E 3 (OpenAI): Photorealistic generation with excellent prompt understanding, integrated into ChatGPT Plus
- Midjourney v6.1: Industry-leading artistic and stylized generation via Discord
- Imagen 3 (Google): Advanced text-to-image with excellent photorealism, available in ImageFX
- Adobe Firefly 3: Enterprise-focused with commercial-safe training data
- Ideogram 2.0: Excellent text rendering capabilities within images
- Leonardo.AI: Real-time canvas generation with fine-tuned models
Video Generation Models¶
Google Veo 3
Veo 3 represents Google's latest advancement in video generation: - Generates up to 4K resolution videos - Includes voices and sound effects - Improved understanding of real-world physics and human movement - Better camera control and cinematic effects - Available through Google Labs and VideoFX
OpenAI Sora
Sora (OpenAI) features: - Up to 1-minute video generation at 1080p - Advanced physics simulation and 3D consistency - Available to ChatGPT Plus and Pro subscribers - Turbo mode for faster generation
Other Notable Video Generation Models
- Runway Gen-3 Alpha: Professional-grade with advanced motion control
- Pika 2.0: Scene editing and sound effects generation
- Stable Video Diffusion 2: Open-source image-to-video model
- Meta Movie Gen: High-quality video with synchronized audio (research preview)
- Kling 1.5: Chinese model with impressive motion quality
- HeyGen: Specialized in AI avatar video generation
- Synthesia: Enterprise-focused avatar video platform
Advanced Capabilities¶
Image and Video Understanding
- Segment Anything Model 2 (SAM 2) (Meta): Real-time segmentation for images and videos
- CLIP (OpenAI): Vision-language understanding
- LLaVA: Open-source visual instruction tuning
3D Generation
- DreamGaussian: Text/image to 3D in minutes
- Meshy: Text to 3D mesh generation
- Luma Genie: Text to 3D model generation
Emerging Trends
- Consistency Models: Faster generation with fewer steps
- ControlNet Integration: Precise control over generation
- Real-time Generation: Sub-second image creation
- Multimodal Models: Unified image, video, and audio generation
- Neural Radiance Fields (NeRFs): 3D scene representation
- Diffusion Transformers (DiT): Next-generation architectures
Glossary
Google's Machine Learning Glossary
NVIDIA's Data Science Glossary
Agentic AI: uses sophisticated reasoning and iterative planning to autonomously solve complex, multi-step problems.
Anthropic:
A research organization emphasizing AI safety and governance. Known for Claude, a large language model (LLM) with advanced reasoning and robust safety features.
ChatGPT:
OpenAI’s general-purpose LLM, renowned for its conversational strengths, versatility, and ability to adapt to varied tasks through effective prompt engineering.
Claude:
Anthropic’s LLM, recognized for its interpretability, strong reasoning capabilities, and rigorous safety considerations.
Copilot (GitHub, Microsoft):
An AI-driven developer assistant offering code suggestions, debugging support, and efficiency improvements, leveraging generative AI to boost productivity.
Embeddings:
Numerical vector representations of data (e.g., text, images, audio) that capture semantic meaning and relationships. Useful for search, clustering, recommendation, and more.
Foundation Models:
Large-scale deep learning models (e.g., LLMs, vision models, multimodal models) trained on massive datasets. They serve as a base or "foundation" for a wide range of downstream tasks, enabling transfer learning and rapid adaptation.
Gemini:
Google’s family of multimodal foundation models, capable of understanding and generating text, images, and other data types, reflecting Google’s advancements in AI research.
GitHub:
A leading platform for version control and software collaboration. Now integrated with AI tools like GitHub Copilot for enhanced code development workflows.
HuggingFace:
A hub and community for open-source AI models, datasets, and applications. Widely used in the natural language processing (NLP) community for model sharing and development.
Large Language Models (LLMs):
A subset of foundation models trained on extensive text corpora, enabling them to generate human-like text, summarize information, reason about topics, and perform a variety of NLP tasks. Examples include GPT, Claude, and Gemini.
Parameters:
The trainable values within a neural network, updated during the training process to minimize loss and define the model’s learned behavior.
Prompt Engineering:
The practice of crafting, refining, and optimizing instructions (prompts) given to AI models in order to guide their outputs toward desired results.
Stable Diffusion:
A family of open-source latent-diffusion-based models used for generating high-quality images from text or other forms of input (e.g., sketches).
Token:
A fundamental unit of text—often a word, subword, or character—that LLMs process when understanding or generating language.
Weights:
Numerical parameters within a neural network that determine the strength of connections between neurons or nodes.
Zero-shot Learning:
The capability of an AI model to perform tasks it has never been explicitly trained on, often made possible by large-scale pretraining on diverse datasets.