The Landscape¶
This work is licensed under a Creative Commons Attribution 4.0 International License.
A Glance at the Generative AI Landscape (2024-2025)
The field of Generative AI is rapidly evolving. This section provides a snapshot of some of the most influential models and platforms as of early 2024, with a look towards what we might expect in 2025.
Image Credit: Yang et al. (While this image depicts the state of LLMs in 2023, it effectively illustrates the foundational models and their evolution)
View the HuggingFace Arena LLM Leaderboard¶
Table: Prices of Services (last checked 12/2024)¶
LLM Service | Plan | Price (per month) | Details |
---|---|---|---|
ChatGPT | Free | $0 | Access to GPT-3.5 model. Limited availability during peak times. |
Plus | $20 | Access to o1, GPT-4, priority access, faster responses. 50 msgs/3hrs on GPT-4, more on GPT-3.5 | |
Pro | $200 | Access to o1, GPT-4, higher priority access, faster responses. | |
ChatGPT Enterprise | Enterprise | Contact Sales | Enhanced security, privacy, admin controls, and higher usage limits. Minimum 150 users |
OpenAI Platform API | Pay-As-You-Go | Varies | o1: $15.00/1M input tokens GPT-4 Turbo: $0.01/1K input tokens, $0.03/1K output tokens. GPT-4: $0.03/1K input, $0.06/1K output (8K context), $0.06/1k input, $0.12/1K output (32K) GPT-3.5 Turbo: $0.0005/1K input, $0.0015/1K output |
Google Gemini | Free | Free | Access to Gemini Pro. Limited availability. |
Google One AI Premium | $19.99 | Access to Gemini Advanced, 2TB storage, and other Google One benefits. | |
Gemini Education | $18 | Access to Gemini Advanced | |
Gemini Education Plus | $27 | Access to Gemini Advanced | |
Notebook LM | Part of Gemini subscription | Access to Gemini Advanced | |
Vertex AI Gemini API | Pay-As-You-Go | Varies | Gemini 1.0 Pro: $0.00025/1K input characters, $0.0005/1K output characters, $0.002/image input. Gemini 1.5 Pro: $0.00125/1K input characters, $0.00375/1K output characters. |
Grok by xAI | Basic (Via X Premium) | $8 | Access to Grok via X Premium subscription, also includes ad-free access to X. |
Premium+ (Via X Premium+) | $16 | Includes all Basic features, plus largest reply boost and access to full suite of X Premium tools. | |
Midjourney | Basic | $10 | ~200 image generations/month (3.3 hrs fast GPU time). |
Standard | $30 | 15 hrs fast GPU time/month, unlimited relaxed GPU time. | |
Pro | $60 | 30 hrs fast GPU time/month, unlimited relaxed GPU time, stealth mode. | |
Mega | $120 | 60 hrs fast GPU time/month, unlimited relaxed GPU time, stealth mode. | |
DALL·E 3 | Via ChatGPT Plus | Included in ChatGPT Plus | Available for image generation within ChatGPT Plus. |
DALL-E API | Pay-as-you-go | Varies | DALL-E 3: Standard quality $0.040/image, HD quality $0.080/image. DALL-E 2: $0.020/image (1024x1024), $0.018/image (512x512), $0.016/image (256x256). |
OpenAI Sora | Research Preview | Not yet released | Not yet available to the public. Currently in a red teaming phase. |
Anthropic Claude | Claude 3 Haiku | $0.25/1K input, $1.25/1K output | Entry-level model with 200K context window. |
Claude 3 Sonnet | $3/1K input, $15/1K output | Mid-tier model with enhanced capabilities, 200K context window. | |
Claude 3 Opus | $15/1K input, $75/1K output | Advanced model comparable to GPT-4, 200K context window. | |
Mistral AI | Mistral Small | $2/1M input, $6/1M output | Suitable for lightweight tasks. |
Mistral Medium | $2.7/1M input, $8.1/1M output | Balanced performance for general use. | |
Mistral Large | $8/1M input, $24/1M output | High-performance model for complex tasks. | |
Microsoft Copilot | Free | Free | Access to basic Copilot features, web grounding, GPT-4, and DALL-E 3 |
Pro | $20/user | Priority access to GPT-4 and GPT-4 Turbo, faster performance, 100 boosts/day with DALL-E 3 | |
Microsoft 365 Copilot | Add-on | $30/user | AI-powered assistance integrated with Microsoft 365 apps. Requires a Microsoft 365 Business Standard, Business Premium, E3, or E5 license. |
GitHub Copilot | Individual | $10 | AI coding assistant for developers. |
Business | $19/user | Includes team-based collaboration features. | |
Enterprise | $39/user | Includes organization-wide policy management and enhanced security features. | |
Ollama | Local Install | Free | Run large language models locally. Requires a compatible GPU and technical setup. |
Meta Llama 3 | Open Source | Free | Open-source foundation models for research and commercial use. Requires technical setup for local hosting. |
Replit Ghostwriter | Included in Replit Core | $20 | AI assistant for coding and debugging, integrated into the Replit IDE. Includes all other Replit Core features. |
Jasper AI | Creator | $49 | Writing assistant with templates and AI tools for individuals. |
Teams | $125 | Advanced writing, commands, and longer outputs for small teams. | |
Character AI | c.ai+ | $9.99 | Conversational AI for entertainment and character interactions. Plus features include priority access, faster responses, and early access to new features. |
Perplexity AI | Pro | $20 | AI-powered search assistant with enhanced query capabilities, unlimited file uploads, and access to various models. |
Amazon Bedrock | On-Demand | Varies | Pay-as-you-go pricing for various foundation models, including those from Anthropic, Cohere, Meta, Mistral AI, and Amazon. |
Azure AI Foundry | On-Demand | Varies | Pay-as-you-go pricing for various foundation models, including those from Anthropic, Meta, Mistral AI, and OpenAI |
Google Vertex AI | On-Demand | Varies | Pay-as-you-go pricing for various foundation models, powered by Gemini with 160+ other foundation models |
You.com | YouPro | $20 | Access to latest AI models, personalized AI with memory, advanced AI writing tools. |
Poe by Quora | Monthly | $19.99 | Access to various chatbots, including GPT-4, Claude, and others. Limited messages on some models. |
Yearly | $199.99 | Annual subscription with access to all chatbots. Limited messages on some models. | |
Continue.dev | Open Source | Free | Open-source autopilot for software development. VS Code and JetBrains extension. Integrates with any LLM. |
Notes:
- Token pricing for API access can be complex. Refer to each provider's pricing page for the most accurate and up-to-date details.
- "Contact Sales" typically indicates that pricing is customized based on usage, features, and the specific needs of the customer.
- Many services offer free trials or limited free tiers, allowing you to test them out before committing to a paid plan.
Additional Chatbot and LLM Services:
-
Amazon Bedrock, Azure AI Foundry, Google Vertex: Provide access to various foundation models but each run on a respective cloud service provider's hardware. Ideal for companies and institutions already running their infrastructure on commercial cloud services.
-
You.com: Offers a pro plan with access to latest AI models, personalized AI with memory and advanced AI writing tools.
-
Poe by Quora: A platform that gives you access to various chatbots (like GPT-4, Claude, etc.) through a single subscription.
Stable Diffusion Image and Video
Stable Diffusion 3
Stable Diffusion 3 is the latest iteration of the popular open-source text-to-image model developed by Stability AI. It builds upon the advancements of previous versions, offering improved image quality, more accurate adherence to prompts, and enhanced capabilities for generating complex scenes and details.
Stable Diffusion models are available via HuggingFace, GitHub and through various APIs and user interfaces.
Diffusion models have two modes, forward and reverse. Forward diffusion adds random noise until the image is lost. Reverse diffusion uses Markov Chains to recover data from a Gaussian distribution, thereby gradually removing noise.
Stable Diffusion relies upon Latent Diffusion Model (LDM)
Other Notable Image Generation Models
- DALL·E 3 (OpenAI) is known for its photorealistic image generation and ability to understand complex prompts. It's integrated into ChatGPT Plus and available through an API.
- Midjourney v6 is highly regarded for its artistic and stylized image generation. It's accessible through a Discord interface and requires a subscription.
- Imagen 2 (Google): Not directly accessible to the public, but notable for its photorealistic text-to-image generation capabilities.
- Adobe Firefly is integrated into the Adobe Creative Cloud suite of applications and is geared toward enterprise creative workflows.
Video Generation Models
- Sora (OpenAI) has generated significant excitement for its ability to create realistic and imaginative videos from text prompts. It is currently available on a limited basis.
- Runway Gen-2 allows for text-to-video, image-to-video, and video-to-video editing. It's popular among video creators for its accessibility and range of features.
- Pika Labs is an additional option that allows for text-to-video generation and editing.
- Stable Video Diffusion is an image-to-video diffusion model that allows users to generate short video clips based on a still image input.
Image and Video Segmentation
Segment Anything (Meta), Kirillov et al. , is a powerful image segmentation technology that allows you to isolate objects within images with high precision.
Emerging Trends in Image and Video Analysis
- Multimodal Integration: Combining image/video analysis with other modalities (text, audio) for a more holistic understanding of content.
- 3D Scene Generation: Generating 3D models and environments from images and videos.
- Real-time Analysis: Performing image and video analysis in real-time for applications like augmented reality and live video processing.
Glossary
Google's Machine Learning Glossary
NVIDIA's Data Science Glossary
Agentic AI: uses sophisticated reasoning and iterative planning to autonomously solve complex, multi-step problems.
Anthropic:
A research organization emphasizing AI safety and governance. Known for Claude, a large language model (LLM) with advanced reasoning and robust safety features.
ChatGPT:
OpenAI’s general-purpose LLM, renowned for its conversational strengths, versatility, and ability to adapt to varied tasks through effective prompt engineering.
Claude:
Anthropic’s LLM, recognized for its interpretability, strong reasoning capabilities, and rigorous safety considerations.
Copilot (GitHub, Microsoft):
An AI-driven developer assistant offering code suggestions, debugging support, and efficiency improvements, leveraging generative AI to boost productivity.
Embeddings:
Numerical vector representations of data (e.g., text, images, audio) that capture semantic meaning and relationships. Useful for search, clustering, recommendation, and more.
Foundation Models:
Large-scale deep learning models (e.g., LLMs, vision models, multimodal models) trained on massive datasets. They serve as a base or "foundation" for a wide range of downstream tasks, enabling transfer learning and rapid adaptation.
Gemini:
Google’s family of multimodal foundation models, capable of understanding and generating text, images, and other data types, reflecting Google’s advancements in AI research.
GitHub:
A leading platform for version control and software collaboration. Now integrated with AI tools like GitHub Copilot for enhanced code development workflows.
HuggingFace:
A hub and community for open-source AI models, datasets, and applications. Widely used in the natural language processing (NLP) community for model sharing and development.
Large Language Models (LLMs):
A subset of foundation models trained on extensive text corpora, enabling them to generate human-like text, summarize information, reason about topics, and perform a variety of NLP tasks. Examples include GPT, Claude, and Gemini.
Parameters:
The trainable values within a neural network, updated during the training process to minimize loss and define the model’s learned behavior.
Prompt Engineering:
The practice of crafting, refining, and optimizing instructions (prompts) given to AI models in order to guide their outputs toward desired results.
Stable Diffusion:
A family of open-source latent-diffusion-based models used for generating high-quality images from text or other forms of input (e.g., sketches).
Token:
A fundamental unit of text—often a word, subword, or character—that LLMs process when understanding or generating language.
Weights:
Numerical parameters within a neural network that determine the strength of connections between neurons or nodes.
Zero-shot Learning:
The capability of an AI model to perform tasks it has never been explicitly trained on, often made possible by large-scale pretraining on diverse datasets.