Skip to content

The Landscape

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

A Glance at the Generative AI Landscape

tree

Image Credit: Yang et al. (While this image depicts the state of LLMs in 2023, it effectively illustrates the foundational models and their evolution)

The field of Generative AI is rapidly evolving.

This section provides a snapshot of some of the most influential models and platforms as of 2025.

Matt Turck's MAD Landscape

HuggingFace Arena LLM Leaderboard

Table: Prices of Services (last checked 10/2025)

LLM Service Plan Price (per month) Details
Anthropic Claude Free $0 Basic Claude access with limited daily use
Pro $20 More usage, Claude Code terminal access, unlimited projects, Research access
Max $100 Priority access, substantially higher usage, enhanced features
Max Pro $200 Highest tier with maximum usage limits and priority access to newest models
Team $30/month or $25/month (annual) Central billing, administration, collaboration features (minimum 5 members)
Enterprise Contact Sales Enhanced context window, SSO, role-based access, audit logs, compliance API
Claude API Pay-As-You-Go Varies Claude Sonnet 4.5: $3/1M input, $15/1M output (200K context)
Claude Opus 4.1: $15/1M input, $75/1M output
Claude Haiku 3.5: $0.80/1M input, $4/1M output
Batch processing: 50% discount, Prompt caching available (75-90% savings)
Claude Code Included in Pro+ $20+ Terminal-based AI coding assistant included with Pro, Max, Max Pro subscriptions
Web Search: $10/1,000 searches, Code Execution: $0.05/hour per container
Google AI Free $0 Unlimited Gemini 2.5 Flash, limited Gemini 2.5 Pro, 32K context window
Google AI Pro $19.99 Expanded access to Gemini 2.5 Pro (100 queries/day), 1M context window, 2TB storage, NotebookLM
Free for university students for 1 year
Google AI Ultra $249.99 Highest access to Gemini 2.5 Pro, exclusive access to Gemini 2.5 Deep Think, Veo 3 video generation, YouTube Premium, 30TB storage
Gemini API Pay-As-You-Go Varies Gemini 2.5 Flash: $0.30/1M input, $2.50/1M output
Gemini 2.5 Pro: $1.25/1M input (≤200K), $10/1M output (≤200K)
Gemini 2.5 Flash-Lite: $0.10/1M input, $0.40/1M output
Batch processing: 50% discount
OpenAI ChatGPT Free $0 Limited access to GPT-5 (10 messages every 5 hours), then GPT-5-mini
Plus $20 Higher message limits to GPT-5, unlimited GPT-5-mini, access to o3-mini, o1 models
Pro $200 Unlimited GPT-5 access, GPT-5 Pro with advanced reasoning, extended context windows
Team $25/user (annual) or $30/user (monthly) All Plus features with higher message caps, team workspace, data excluded from training
Enterprise Contact Sales Unlimited high-speed models, extended context windows, enterprise security
OpenAI API Pay-As-You-Go Varies GPT-5: $1.25/1M input, $10/1M output (272K-400K context)
GPT-5-mini: $0.25/1M input, $2/1M output
GPT-5-nano: $0.05/1M input, $0.40/1M output
GPT-4o: $2.50/1M input, $10/1M output
o3-mini: $1.10/1M input, $4.40/1M output
Perplexity AI Free $0 Unlimited quick searches, 5 Pro searches/day, 5 follow-up questions every 4 hours
Pro $20/month or $200/year 300+ Pro searches/day, access to advanced AI models, file uploads
Education Pro $4.99/month All Pro features with student/faculty verification
1 month free trial
Max $200/month or $2,000/year Unlimited Labs usage, access to top-tier models (OpenAI o3-pro, Claude Opus 4)
Enterprise Pro $40/user/month or $400/user/year Admin tools, collaboration features, domain verification, SCIM provisioning
Microsoft Copilot Free $0 GPT-4o-powered chat, 15 image generation boosts/day
Microsoft 365 Premium Premium $19.99 Full M365 suite + Copilot in all apps, 1TB storage, extended AI usage limits, 40 image generations
Microsoft 365 Copilot Business/Enterprise $30/user AI in Word, Excel, PowerPoint, Outlook, Teams. Requires existing M365 license (\(12.50-\)57/user)
Consumption-based $0.01 per message Pay-per-use alternative to monthly subscription (30 messages for proprietary files, 25 per agent action)
GitHub Copilot Free $0 Up to 2,000 code completions/month, 50 premium requests/month
Free for students, teachers, open source maintainers
Pro $10/month or $100/year Unlimited code completions, 300 premium requests/month, access to Copilot coding agent
Pro+ $39/month or $390/year 1,500 premium requests/month, full access to all models, GitHub Spark, compute resources
Business $19/user/month 300 premium requests/user, user management, usage metrics, team collaboration
Enterprise $39/user/month 1,000 premium requests/user, all AI models, advanced customization, enterprise features
Mistral AI Le Chat Free $0 Basic AI assistant with limited messages
Le Chat Pro $14.99 Up to 6x more messages, 150 flash answers/day, 5x web searches, 1,000 memories, 15GB libraries
Le Chat Team $24.99/user or $299.88/user/year 200 flash answers/day, 30GB libraries/user, domain verification, SCIM provisioning
La Plateforme API Varies Mistral Medium 3: $0.40/1M input, $2.00/1M output
Mistral Nemo: $0.30/1M tokens
Mistral Large 2: $3/1M input, $9/1M output
Codestral: $1/1M input, $3/1M output
Cohere Free Trial $0 Limited API calls for testing
Production Varies Command R 03-2024: $0.50/1M input, $1.50/1M output
Command R+ 08-2024: $2.50/1M input, $10/1M output
Command-light: $0.30/1M input, $0.60/1M output
Aya Expanse (8B & 32B): $0.50/1M input, $1.50/1M output
Education Program Contact AI access for students and educators (pricing not publicly disclosed)
DeepSeek DeepSeek Chat API Pay-As-You-Go DeepSeek Chat: $0.57/1M input, $1.68/1M output
DeepSeek Reasoner (R1): $0.57/1M input, $1.68/1M output
128K context window
~200x cheaper than GPT-4 Turbo
⚠️ NOT ALLOWED for US-based researchers - See restrictions below
Open Source Free Free to download and deploy locally. Training cost: $294,000 (peer-reviewed in Nature)
⚠️ Self-hosted use requires institutional IT/security approval
Qwen (Alibaba) Qwen Chat Free Free web interface powered by Qwen-Max
⚠️ NOT RECOMMENDED for US-based researchers - Chinese company, data sovereignty concerns
Qwen API Pay-As-You-Go Qwen-Flash: $0.05/1M input, $0.40/1M output
Qwen3-Coder: $0.22/1M input, $0.95/1M output
Qwen-Max: $1.60/1M input, $6.40/1M output
1M context window, 90-day free trial (1M tokens)
Open Source Free Apache 2.0 license, 40M+ downloads. Sizes: 0.6B-235B parameters
⚠️ Self-hosted use requires institutional IT/security approval
Midjourney Basic $10 ~200 image generations/month
Standard $30 15 hrs fast GPU time, unlimited relaxed
Pro $60 30 hrs fast GPU time, stealth mode
Mega $120 60 hrs fast GPU time, stealth mode
DALL-E 3 Via ChatGPT Plus Included Image generation within ChatGPT
API Varies Standard: $0.040/image, HD: $0.080/image
Stable Diffusion DreamStudio $10 1000 credits (~5000 images)
API Varies $0.002 per image (512x512)
Grok by xAI X Premium $8 Access via X (Twitter) Premium
X Premium+ $16 Priority access, higher limits
Character AI Free $0 Limited features and queue priority
c.ai+ $9.99 Priority access, faster responses, exclusive features
Together AI Serverless Inference Pay-As-You-Go Text & Vision Models: \(0.02-\)3.50/1M tokens
Image Models: \(0.0027-\)0.08/megapixel
Embedding Models: \(0.01-\)0.08/1M tokens
GPU Clusters Pay-As-You-Go Instant Clusters: \(1.76-\)5.50/GPU hour
Reserved Clusters: Starting at $1.30/GPU hour
Fine-Tuning Pay-As-You-Go LoRA Fine-Tuning (≤16B params): Starting at $0.48
Full Fine-Tuning (70-100B params): Up to $3.20
Groq Free Tier $0 Available for getting started
Developer Tier Pay-As-You-Go Up to 10x more rate limits than free tier. Batch Processing: 50% cost discount (through April 2025)
Enterprise Contact Sales Custom solutions for large organizations
Replicate Pay-As-You-Go Varies CPU: $0.36/hour
Nvidia T4 GPU: $0.81/hour (public), $1.98/hour (private)
8x H100 GPU: $43.92/hour
Run open-source models with per-second billing
Hugging Face Free $0 Community models and datasets
Pro $9 Advanced features, private repos
Enterprise Contact Sales Dedicated support, SLAs, security features
Amazon Bedrock On-Demand Varies Access to Claude, Llama 2, Stable Diffusion, and more
Google Vertex AI On-Demand Varies 130+ foundation models including Gemini, Claude, Llama
Azure AI Studio On-Demand Varies Access to GPT-4, Claude, Llama, Mistral, and more
Meta Llama Open Source Free Llama 2 and Llama 3 models for download
Ollama Local Install Free Run LLMs locally on your hardware
LM Studio Local Install Free Desktop app for running LLMs locally
Jan.ai Local Install Free Open-source ChatGPT alternative, runs locally
Continue.dev Open Source Free Open-source autopilot for VS Code and JetBrains
Poe by Quora Monthly $19.99 Access to various chatbots including GPT-4, Claude
Yearly $199.99 Annual subscription with all chatbot access
You.com YouPro $20 Latest AI models, personalized AI with memory
Jasper AI Creator $49 Writing assistant with templates
Teams $125 Advanced features for small teams
Business Contact Sales Custom pricing for organizations
Replit AI Core $20 AI coding assistant integrated in Replit IDE

Agentic Browsers (AI-Powered Web Browsers)

Browser Plan Price (per month) Details
Perplexity Comet Free $0 AI-powered browser with sidecar assistant, Perplexity AI search, tab management, content summarization
Perplexity Max $200 Background Assistant for multi-tasking, autonomous task execution (booking flights, sending emails), mission control dashboard
Dia Browser Free Beta $0 (Invite-only) AI-first browser, URL bar = AI chat, tab conversations, Skills system, browsing history context (opt-in)
macOS 14+ M1+ only
Dia Pro $20 Unlimited AI chat and Skills, multi-step reasoning, task automation
Acquired by Atlassian ($610M)
Fellou Free $0 1,000 Sparks (~4 tasks), Deep Search, autonomous web actions, Shadow Workspace for background tasks
Plus $19 2,000 Sparks (~8 tasks), 3 scheduled tasks, priority support
Pro $39.90 5,000 Sparks (~20 tasks), 5 scheduled tasks, Image/Code/Music agents
Ultra $199.90 Unlimited Sparks, unlimited scheduled/concurrent tasks, exclusive support
Opera Neon Subscription $19.99 (Waitlist) Neon Do (autonomous browsing), Neon Make (AI creation), Cards system, Tasks workspaces, local processing
Genspark AI Browser Free $0 100 credits daily, Super Agent Everywhere, Autopilot Mode, 700+ MCP tool integrations
Plus $24.99 10,000 credits monthly, priority AI agent access, top-tier models, AI Slides/Sheets/Docs
Pro $249.99 125,000 credits monthly, full Super Agent access, phone calls, video generation
Microsoft Edge Copilot Mode Free (Experimental) $0 Cross-tab awareness, task automation, in-page assistance, browser history/credentials access
Windows/Mac, opt-in
Opera One + Aria Free $0 Free AI assistant, real-time web access, page context mode, image generation, tab commands, local AI models
No account required
Brave + Leo AI Free $0 Privacy-first AI, Llama 3.1 8B, Mixtral, Claude Haiku, Qwen, content awareness, zero data retention
Leo Premium Varies Claude Sonnet 4, DeepSeek R1 reasoning models, Bring Your Own Model (BYOM)

Notes on Agentic Browsers:

  • True Agentic Capabilities: Comet, Fellou, Opera Neon, Dia, and Genspark can autonomously perform multi-step tasks (booking, purchasing, form filling)
  • AI-Enhanced: Microsoft Edge Copilot Mode, Opera One, and Brave Leo provide AI assistance but with less autonomous action
  • Platform Availability: Most are Chromium-based; Dia is macOS only (M1+); Others support Windows/Mac/Linux
  • Privacy Considerations: Check each browser's data policies - some use cloud AI, others offer local processing
  • Coming Soon: OpenAI browser expected late 2025 with ChatGPT integration and Operator agent

Notes:

  • Token pricing for API access can be complex. Refer to each provider's pricing page for the most accurate and up-to-date details.
  • "Contact Sales" typically indicates that pricing is customized based on usage, features, and the specific needs of the customer.
  • Many services offer free trials or limited free tiers, allowing you to test them out before committing to a paid plan.

⚠️ Important Restrictions for US-Based Researchers

DeepSeek AI - Federal and State Restrictions

PAID CLOUD SERVICE NOT ALLOWED:

DeepSeek's paid API and cloud services are prohibited for US-based researchers at many institutions due to:

Federal Restrictions:

  • H.R. 1121 - "No DeepSeek on Government Devices Act" (Introduced Feb 2025)

  • House Select Committee Report - "DeepSeek Unmasked: Exposing the CCP's Latest Tool For Spying, Stealing, and Subverting U.S. Export Control Restrictions"

  • Federal Agency Bans: NASA, U.S. Navy, Department of Defense (DOD), Department of Commerce have banned DeepSeek

  • Owned by High-Flyer (Chinese company with CCP control)

  • Data stored in China and accessible to Chinese government

  • Content manipulation to align with CCP propaganda

State-Level Bans:

  • Texas (Jan 31, 2025), Virginia (Feb 11, 2025), New York (Feb 10, 2025)

  • Additional states: Iowa, South Dakota, Kansas, Tennessee, North Carolina, Nebraska, Arkansas, North Dakota, Oklahoma, Alabama, Georgia

University Bans:

SELF-HOSTED OPEN-SOURCE MAY BE PERMITTED:

Open-source DeepSeek models can be downloaded and run on-premises, but researchers MUST:

  • ✅ Check with institutional IT and security teams first

  • ✅ Ensure compliance with federal grant requirements (NSF, DOD, DOE)

  • ✅ Never upload sensitive, proprietary, or controlled data

  • ✅ Document usage for research security compliance


Qwen (Alibaba) - Data Sovereignty Concerns

NOT SPECIFICALLY BANNED, BUT NOT RECOMMENDED:

Qwen is not subject to specific federal bans like DeepSeek, but has serious concerns for US researchers:

Key Issues:

  • Owned by Alibaba (Chinese company subject to CCP control)

  • Data stored in China under Chinese data sovereignty laws

  • No GDPR compliance or EU data protection representative

  • Potential surveillance under Chinese national security laws

  • Congressional scrutiny (Senators urged sanctions in 2023, not yet implemented)

Regulatory Framework:

SELF-HOSTED OPEN-SOURCE MAY BE PERMITTED:

Qwen's Apache 2.0 licensed models (40M+ downloads on HuggingFace) can be run on-premises, but researchers MUST:

  • ✅ Check with institutional IT and security teams first

  • ✅ Verify compliance with federal grant terms

  • ✅ Avoid uploading to Chinese cloud services

  • ✅ Document AI tool usage in research security plans


Recommendations for Researchers

✅ SAFE FOR RESEARCH (US-based alternatives):

  • OpenAI (ChatGPT, GPT-5 API) - US company

  • Anthropic (Claude) - US company

  • Google (Gemini) - US company

  • Microsoft (Copilot) - US company

  • Mistral AI - French company (EU-based)

  • Cohere - Canadian company

⚠️ USE WITH EXTREME CAUTION (Chinese companies):

  • DeepSeek - BANNED at many institutions

  • Qwen - Not banned, but data sovereignty concerns

  • Check institutional policies BEFORE use

✅ SELF-HOSTED OPEN-SOURCE (May be acceptable):

  • Meta Llama (US company, Apache 2.0)

  • DeepSeek open-source (with institutional approval)

  • Qwen open-source (with institutional approval)

  • Mistral open-source (EU company, Apache 2.0)

ALWAYS:

  1. Check your institution's AI usage policy

  2. Review federal grant terms (NSF, NIH, DOD, DOE)

  3. Consult with IT security and research compliance offices

  4. Never share sensitive, proprietary, or controlled data with foreign AI services

  5. Document all AI tool usage for research security requirements

Best Options for Students & Educators:

  • Free/Low-Cost Options:

    • DeepSeek - Most affordable API at $0.57/$1.68 per 1M tokens (~200x cheaper than GPT-4 Turbo), open-source option available
    • Meta Llama - Completely free open-source models (Llama 4 Scout & Maverick available for download)
    • GitHub Copilot - Free for students, teachers, and open source maintainers
    • Perplexity Education Pro - $4.99/month with student/faculty verification (1 month free trial)
    • Google AI Pro - Free for university students for 1 year ($19.99/month value)
    • HuggingFace - Free community access to models and datasets, $2/month free credits for Pro users
    • Ollama, LM Studio, Jan.ai - Run LLMs locally on your hardware for free
  • Best Value Paid Options:

    • Mistral Le Chat Pro - $14.99/month (cheaper than competitors, strong performance)
    • OpenAI GPT-4o-mini API - $0.15/$0.60 per 1M tokens (60%+ cheaper than GPT-3.5 Turbo)
    • Gemini 2.5 Flash-Lite - $0.10/$0.40 per 1M tokens (most economical for high-volume simple tasks)
    • Claude Haiku 3.5 API - $0.80/$4 per 1M tokens (balanced cost and capability)
  • Educational Programs Available:

    • Cohere Education Program - Contact for student/educator access
    • Google AI Pro - 1 year free for university students
    • Perplexity Education Pro - $4.99/month with verification

Additional Chatbot and LLM Services:

  1. Amazon Bedrock, Azure AI Foundry, Google Vertex: Provide access to various foundation models but each run on a respective cloud service provider's hardware. Ideal for companies and institutions already running their infrastructure on commercial cloud services.

  2. You.com: Offers a pro plan with access to latest AI models, personalized AI with memory and advanced AI writing tools.

  3. Poe by Quora: A platform that gives you access to various chatbots (like GPT-4, Claude, etc.) through a single subscription.

Image and Video Generation Models

Image Generation Models (2025)

Stable Diffusion 3.5 (October 2024)

Stable Diffusion 3.5 from Stability AI features: - SD3.5 Large (8.1B): High-quality 1MP generation with advanced prompt adherence - SD3.5 Medium (2.5B): Balanced performance for consumer hardware (0.25-2MP) - SD3.5 Large Turbo: Optimized for speed with 4-step generation - Open Source: Free for non-commercial and commercial use under $1M revenue - Platforms: HuggingFace, GitHub, Replicate, Fireworks AI

FLUX Models (Black Forest Labs)

FLUX by Black Forest Labs offers cutting-edge diffusion models: - FLUX.1 Kontext (May 2025): Combines text+image prompts, state-of-the-art in-context generation and editing - FLUX 1.1 Pro Ultra: Latest professional variant with enhanced quality - FLUX.1 Krea Dev (July 2025): Better performance, varied aesthetics, improved realism - FLUX.1 Schnell: Apache-licensed open-source for fast local generation (12B parameters) - FLUX.1 Tools (November 2024): Fill, Depth, Canny, Redux for advanced control - Architecture: 12B parameter rectified flow transformer - Platforms: API access, BFL Playground, Azure AI Foundry

GPT-4o Image Generation (OpenAI)

GPT-4o Image (March 2025): - Model: gpt-image-1 (replaces DALL-E 3) - Resolution: Up to 4096×4096 pixels (4K) - Features: Native integration in GPT-4o, reliable text rendering, multi-turn refinement, image transformation - Access: ChatGPT (Free/Plus/Pro), OpenAI API - Safety: C2PA metadata watermarking on all images

Midjourney V7 (April 2025)

Midjourney latest features: - V7: Current default (since June 2025) with stunning text precision, richer textures, improved bodies/hands - Draft Mode: 10x speed at half the cost - Personalization: First model with personalization enabled by default - V8: In development with "significant differences" and innovative features - Video: Coming soon (in final sprint stage) - Platform: Discord-based, Web Interface

Google Imagen 4 (May 2025)

Imagen 4 and Imagen 4 Ultra: - Resolution: Up to 2K resolution - Speed: 10x faster mode available - Features: Enhanced photo-realism, improved text rendering, advanced typography, diverse art styles - Safety: SynthID watermarking, content filtering - Access: Gemini API, Google AI Studio, Google Labs

Adobe Firefly Image Model 4 (April 2025)

Firefly 4 and Firefly 4 Ultra: - Resolution: Up to 2K with lifelike quality - Features: Exceptional precision, camera control, structure/style references - Commercial-Safe: Training data with indemnification for enterprise - Integration: Photoshop, Illustrator, InDesign, API access - Firefly Video Model: New modality (April 2025)

Breakthrough New Models (2024-2025)

  • Reve Image 1.0 (March 2025): #1 on Artificial Analysis Arena, best-in-class prompt adherence and typography
  • Recraft V3 (October 2024): #1 on HuggingFace leaderboard at launch, first with vector art generation and extended text
  • HiDream-I1 (April 2025): 17B parameters, open-source (MIT), sparse transformer architecture
  • Ideogram 3.0 (2025): Enhanced realism, style reference (3 images), superior text rendering
  • Leonardo Lucid Origin (2025): Most versatile model, accurate text rendering, full HD renders

Video Generation Models (2025)

OpenAI Sora 2 (September 2025)

Sora 2 latest features:

  • Native Audio: Synchronized dialogue, music, and sound effects

  • Resolution: Up to 1080p, duration up to 20 seconds

  • Physics: Superior simulation (basketball rebounds, water buoyancy, gymnastics)

  • Cameo Feature: Insert user likenesses with consent

  • Pricing: Plus plan (50 videos/month at 480p), Pro plan (10x more usage, higher resolutions)

  • Access: ChatGPT Plus/Pro, iOS app (US/Canada, invite-only)

Google Veo 3 (May 2025)

Veo 3 represents Google's latest advancement:

  • Resolution: Up to 4K, 8-second videos

  • Native Audio: Dialogue, sound effects, ambient noise

  • Features: Best-in-class physics, realism, prompt adherence, advanced character/camera controls

  • Access: Flow (Google Labs), Gemini app (AI Pro subscribers), Google AI Studio, Gemini API, Vertex AI

  • Limits: 3 videos/day for paying subscribers

  • Rollout: 159+ countries (July 2025)

Runway Gen-4 (March 2025)

Runway Gen-4 features:

  • World Consistency: Characters, locations, objects consistent across scenes

  • Visual References: Image + text prompt (no fine-tuning required)

  • Duration: 5 or 10 seconds

  • Gen-4 Turbo: Faster generation at lower cost

  • Access: app.runwayml.com

Meta Movie Gen (2025 Release Planned)

Movie Gen research features:

  • Models: 30B video, 13B audio

  • Resolution: 1080p HD, up to 16 seconds at 16 fps

  • Audio: Up to 45 seconds with synchronized sound

  • Features: Four capabilities (video generation, personalized video, precise editing, audio generation)

  • Status: Research phase, Instagram integration planned 2025

  • Partnership: Blumhouse Productions

Leading Commercial Video Platforms

  • Pika 2.2 (February 2025): Pikaframes keyframe system, 10-second videos, 1080p, Pikatwists dramatic endings
  • Kling AI 2.5 Turbo (September 2025): Enhanced prompt adherence, superior high-motion scenes, 1080p, 30% cost reduction
  • Luma Ray3 (September 2025): Draft mode, HDR/EXR support, deep reasoning, fast generation
  • HeyGen (2025): Avatar IV with hyper-realistic avatars, 140+ languages, Veo 3 integration, 60%+ Fortune 100 adoption
  • Synthesia 3.0 (2025): Express-2 avatars, AI dubbing (32 languages), video agents, $2.1B valuation
  • Hedra Character-3 (April 2025): Omnimodal model, 4K @ 60fps, 90-second videos, full-body animation with speech

Open-Source Video Models

  • Hunyuan Video (Tencent): 13B+ parameters, largest open-source model, video-to-audio module, GitHub/HuggingFace
  • Stable Video 4D 2.0 (May 2025): Enhanced 4D generation, 48 frames (12×4 views), 576×576, GitHub available
  • Mochi 1 (Genmo): 10B parameters, Apache 2.0 license, 30fps, 5.4 seconds (HD version pending)

Advanced Capabilities

Image and Video Understanding

3D Generation

Emerging Trends

  • Consistency Models: Faster generation with fewer steps
  • ControlNet Integration: Precise control over generation
  • Real-time Generation: Sub-second image creation
  • Multimodal Models: Unified image, video, and audio generation
  • Neural Radiance Fields (NeRFs): 3D scene representation
  • Diffusion Transformers (DiT): Next-generation architectures

Glossary

Google's Machine Learning Glossary

NVIDIA's Data Science Glossary

Agentic AI: Uses sophisticated reasoning and iterative planning to autonomously solve complex, multi-step problems. Agentic systems can break down tasks, use tools, and make decisions to achieve goals with minimal human intervention.

Anthropic: A research organization emphasizing AI safety and governance. Known for Claude, a large language model (LLM) with advanced reasoning and robust safety features.

API (Application Programming Interface): A set of protocols and tools that allow different software applications to communicate. In AI, APIs enable developers to integrate LLM capabilities into their applications programmatically.

Attention Mechanism: A neural network technique that allows models to focus on relevant parts of input data when processing information. The foundation of transformer architectures used in modern LLMs.

Chain-of-Thought (CoT): A prompting technique that encourages AI models to break down complex problems into intermediate reasoning steps, improving accuracy on tasks requiring logic and multi-step reasoning.

ChatGPT: OpenAI's general-purpose LLM, renowned for its conversational strengths, versatility, and ability to adapt to varied tasks through effective prompt engineering.

Claude: Anthropic's LLM, recognized for its interpretability, strong reasoning capabilities, and rigorous safety considerations.

Context Window: The maximum amount of text (measured in tokens) that an LLM can process at once, including both the input prompt and generated output. Modern models range from 8K to over 1M tokens.

Copilot (GitHub, Microsoft): An AI-driven developer assistant offering code suggestions, debugging support, and efficiency improvements, leveraging generative AI to boost productivity.

Diffusion Models: A class of generative models that create images by iteratively denoising random noise. Used in systems like Stable Diffusion, DALL-E, and Midjourney for text-to-image generation.

Embeddings: Numerical vector representations of data (e.g., text, images, audio) that capture semantic meaning and relationships. Useful for search, clustering, recommendation, and more.

Few-Shot Learning: The ability of an AI model to learn new tasks from just a few examples provided in the prompt, without requiring additional training or fine-tuning.

Fine-Tuning: The process of further training a pre-trained model on a specific dataset or task to specialize its capabilities for particular use cases or domains.

Foundation Models: Large-scale deep learning models (e.g., LLMs, vision models, multimodal models) trained on massive datasets. They serve as a base or "foundation" for a wide range of downstream tasks, enabling transfer learning and rapid adaptation.

Gemini: Google's family of multimodal foundation models, capable of understanding and generating text, images, and other data types, reflecting Google's advancements in AI research.

Generative AI (GenAI): AI systems capable of creating new content—text, images, code, audio, video—based on patterns learned from training data. Includes LLMs, image generators, and multimodal models.

GitHub: A leading platform for version control and software collaboration. Now integrated with AI tools like GitHub Copilot for enhanced code development workflows.

Hallucination: When an AI model generates false, nonsensical, or unfaithful information presented as fact. A key challenge in LLM reliability, especially for factual or specialized domains.

HuggingFace: A hub and community for open-source AI models, datasets, and applications. Widely used in the natural language processing (NLP) community for model sharing and development.

Inference: The process of using a trained AI model to make predictions or generate outputs. In LLMs, this refers to generating text responses from prompts.

Large Language Models (LLMs): A subset of foundation models trained on extensive text corpora, enabling them to generate human-like text, summarize information, reason about topics, and perform a variety of NLP tasks. Examples include GPT, Claude, and Gemini.

LoRA (Low-Rank Adaptation): An efficient fine-tuning technique that modifies only a small subset of model parameters, reducing computational costs while maintaining performance for specialized tasks.

MCP (Model Context Protocol): A standardized protocol for connecting AI assistants to external data sources and tools. Enables LLMs to access databases, APIs, and live information while maintaining security and privacy.

Mixture of Experts (MoE): A neural network architecture that uses multiple specialized sub-models (experts) and activates only relevant ones for each input, improving efficiency and scalability in large models.

Multimodal Models: AI systems that can process and generate multiple types of data (text, images, audio, video) in combination. Examples include GPT-4 with vision, Gemini, and Claude with image understanding.

Parameters: The trainable values within a neural network, updated during the training process to minimize loss and define the model's learned behavior. Model size is often described by parameter count (e.g., 7B, 70B parameters).

Prompt Engineering: The practice of crafting, refining, and optimizing instructions (prompts) given to AI models in order to guide their outputs toward desired results.

Quantization: A technique that reduces the precision of model weights (e.g., from 16-bit to 4-bit) to decrease memory usage and computational requirements, enabling deployment on resource-constrained devices.

RAG (Retrieval-Augmented Generation): A technique that enhances LLM responses by retrieving relevant information from external knowledge bases or documents before generating answers, reducing hallucinations and improving factual accuracy.

RLHF (Reinforcement Learning from Human Feedback): A training method that uses human preferences to fine-tune AI models, improving their alignment with human values and desired behaviors. Used extensively in ChatGPT and Claude development.

Stable Diffusion: A family of open-source latent-diffusion-based models used for generating high-quality images from text or other forms of input (e.g., sketches).

System Prompt: Initial instructions given to an AI model that define its role, behavior, constraints, and capabilities for a conversation or task. Often invisible to end users but shapes all responses.

Temperature: A parameter controlling randomness in AI-generated outputs. Lower temperatures (0.0-0.3) produce more deterministic responses; higher temperatures (0.7-1.0) increase creativity and variability.

Token: A fundamental unit of text—often a word, subword, or character—that LLMs process when understanding or generating language. Pricing and context limits are typically measured in tokens.

Transformer: The neural network architecture that powers modern LLMs, introduced in the paper "Attention is All You Need" (2017). Uses attention mechanisms to process sequences efficiently.

Vector Database: A specialized database optimized for storing and querying high-dimensional embedding vectors, enabling fast semantic search and similarity matching for RAG applications.

Weights: Numerical parameters within a neural network that determine the strength of connections between neurons or nodes.

Zero-shot Learning: The capability of an AI model to perform tasks it has never been explicitly trained on, often made possible by large-scale pretraining on diverse datasets.