AI & Machine Learning¶
Open source AI and machine learning tools, frameworks, and platforms for research and scientific computing.
Open Source Large Language Models (LLMs)¶
General Purpose LLMs¶
Meta LLaMA - Meta's open foundation models including Llama 3 and Llama 4 (Scout/Maverick variants) with 128k context, Apache 2.0 license
Mistral AI - French AI company providing open-weight models including Mistral Small 3 (24B parameters), Mistral Large 2 (123B parameters), and Mixtral MoE
Mixtral 8x7B - powerful Mixture-of-Experts model using 8 expert networks, Apache 2.0 license
EleutherAI GPT-NeoX-20B - 20 billion parameter model trained on The Pile dataset, Apache 2.0 license
EleutherAI Pythia - family of models designed for research transparency and reproducibility
BLOOM - BigScience Large Open-science Open-access Multilingual language model with 176B parameters
Falcon - Technology Innovation Institute's open-source LLM family including Falcon-180B
DeepSeek - DeepSeek-Coder and DeepSeek-Math models with strong reasoning for engineering and research
Qwen - Alibaba Cloud's multilingual LLMs with strong performance across languages and coding
Scientific & Research-Focused LLMs¶
BioGPT - Microsoft's pre-trained language model for biomedical text generation and mining
Galactica - Meta AI's scientific knowledge model trained on 48 million papers, textbooks, and knowledge bases
PubMedGPT - Stanford CRFM's biomedical language model trained on PubMed abstracts
LLM Inference & Deployment¶
Inference Engines¶
vLLM - high-throughput, memory-efficient inference engine with PagedAttention, 120-160 req/sec throughput with continuous batching
Text Generation Inference (TGI) - Hugging Face's production-ready inference container (maintenance mode as of Dec 2025, consider vLLM or SGLang)
Ollama - easy-to-use local LLM deployment with simple CLI, ideal for development and prototyping
llama.cpp - C++ implementation enabling LLM inference on CPU and edge devices with quantization support
LM Studio - desktop application for running LLMs locally with user-friendly GUI
SGLang - high-performance serving with strong caching and scheduler optimizations
TensorRT-LLM - NVIDIA's optimized inference library for maximum performance on NVIDIA GPUs
Model Serving Platforms¶
Ray Serve - scalable model serving built on Ray for distributed Python applications
TorchServe - PyTorch's official model serving framework for production ML models
NVIDIA Triton Inference Server - high-performance inference serving for multiple frameworks (TensorFlow, PyTorch, ONNX)
Agentic AI Frameworks¶
LangChain - comprehensive ecosystem for building LLM-powered applications with extensive integrations, chains, agents, and memory
LlamaIndex - data framework for LLM applications with sophisticated RAG capabilities and knowledge base integration
AutoGPT - pioneering autonomous AI agents that independently pursue goals through iterative planning (167k+ GitHub stars)
CrewAI - framework for orchestrating role-based AI agents working as collaborative teams
Microsoft AutoGen - framework enabling next-gen LLM applications with multi-agent conversation
MetaGPT - multi-agent framework simulating software company with roles like Product Manager, Architect, Engineer
ChatDev - collaborative AI agents creating software through multi-agent conversation
BabyAGI - simple autonomous task-driven AI agent using OpenAI and vector databases
AgentGPT - browser-based autonomous AI agents for achieving user-defined goals
RAG (Retrieval Augmented Generation)¶
RAG Frameworks¶
LangChain RAG - comprehensive RAG implementation with document loaders, text splitters, and retrievers
LlamaIndex (GPT Index) - leading data framework for RAG with advanced indexing, chunking, and retrieval
Haystack - open source NLP framework by deepset for building RAG pipelines and semantic search
txtai - all-in-one embeddings database for semantic search, RAG, and LLM orchestration
Vector Databases¶
Chroma - open-source embedding database for AI applications, ideal for local development and prototyping
Weaviate - open-source vector database with hybrid search (vector + keyword), multi-modal support
Qdrant - high-performance vector similarity search engine written in Rust
Milvus - cloud-native vector database built for scalable similarity search
pgvector - PostgreSQL extension for vector similarity search, integrates with existing PostgreSQL databases
FAISS - Facebook AI Similarity Search library for efficient similarity search of dense vectors
Pinecone - managed vector database service (commercial with free tier)
MLOps Platforms¶
Experiment Tracking & Model Management¶
MLflow - open-source platform for ML lifecycle including experiment tracking, model registry, and deployment
Weights & Biases (W&B) - AI developer platform for experiment tracking, visualization, and collaboration (free tier available)
Neptune.ai - metadata store for MLOps with experiment tracking and model registry
DVC (Data Version Control) - Git-like version control for machine learning projects including data and models
ClearML - open-source MLOps platform for experiment management and orchestration
Comet ML - platform for tracking, comparing, and optimizing ML experiments
Pipeline Orchestration¶
Kubeflow - Kubernetes-native ML platform for deploying, monitoring, and managing ML workflows at scale
Apache Airflow - platform for programmatically authoring, scheduling, and monitoring workflows
Prefect - workflow orchestration tool for building, observing, and reacting to data pipelines
ZenML - extensible open-source MLOps framework for production-ready ML pipelines
Metaflow - Netflix's framework for building and managing real-life data science projects
Scientific AI Applications¶
Computational Biology & Drug Discovery¶
AlphaFold 3 - DeepMind's AI system for protein structure prediction, 2024 Nobel Prize in Chemistry (200M+ predictions)
ESMFold - Meta AI's protein structure prediction using language models (600M+ metagenomic proteins)
RoseTTAFold - University of Washington's protein structure prediction network
OpenFold - open-source reproduction of AlphaFold2 and foundation for community development
ChemBERTa - transformer models for molecular property prediction
DeepChem - democratizing deep learning for drug discovery, materials science, and quantum chemistry
Climate & Earth Science¶
ClimateLearn - benchmark dataset and library for ML in climate science
Microsoft AI for Earth - AI tools and grants for environmental research and conservation
FourCastNet - NVIDIA's global data-driven weather forecasting using neural networks
AI Ethics & Responsible AI¶
Fairness & Bias Detection¶
AI Fairness 360 (AIF360) - IBM's comprehensive toolkit with 70+ fairness metrics and 10+ bias mitigation algorithms
Fairlearn - Microsoft's open-source toolkit for assessing and improving fairness of AI systems
Aequitas - bias and fairness audit toolkit by Center for Data Science and Public Policy
Explainability & Interpretability¶
SHAP (SHapley Additive exPlanations) - game-theoretic approach to explain ML model predictions
LIME (Local Interpretable Model-agnostic Explanations) - explaining predictions of any machine learning classifier
InterpretML - Microsoft's toolkit for training interpretable models and explaining blackbox systems
Captum - PyTorch library for model interpretability and understanding
What-If Tool - Google's visual interface for probing ML model behavior
Responsible AI Frameworks¶
IBM AI Fairness 360 Toolkit - comprehensive fairness metrics and bias mitigation algorithms
Google Responsible AI Practices - principles and practices for responsible AI development
Model Cards Toolkit - standardized documentation for ML models following model cards framework
Data Annotation & Labeling¶
Label Studio - open-source data labeling tool for text, images, audio, video, and time series
CVAT (Computer Vision Annotation Tool) - free online interactive video and image annotation tool
Labelbox - training data platform for building AI applications (commercial with free tier)
VGG Image Annotator (VIA) - lightweight standalone image/video/audio annotation tool from Oxford
Prodigy - scriptable annotation tool for creating training and evaluation data
Doccano - open-source text annotation tool for classification, sequence labeling, and sequence to sequence
Model Hubs & Repositories¶
Hugging Face Hub - largest repository with 1M+ models across all modalities (NLP, vision, audio, multimodal)
PyTorch Hub - pre-trained model repository for research reproducibility, integrated with Papers with Code
TensorFlow Hub - library for publishing, discovering, and reusing ML modules in TensorFlow
ONNX Model Zoo - collection of pre-trained ONNX models for various tasks
Papers with Code - free resource linking academic papers with code implementations and leaderboards
Model Zoo - discover open-source deep learning models and projects
Foundation Model Training¶
Training Frameworks¶
DeepSpeed - Microsoft's deep learning optimization library for training massive models with ZeRO optimizer
Megatron-LM - NVIDIA's framework for training multi-billion parameter language models
Colossal-AI - unified deep learning system for large-scale model training with parallelism
Alpa - system for training and serving large-scale neural networks
Distributed Training¶
Horovod - distributed deep learning training framework for TensorFlow, Keras, PyTorch, and MXNet
Ray Train - scalable machine learning library for distributed training
PyTorch Distributed - PyTorch's native distributed training with various backends (DDP, FSDP)
TensorFlow Distributed - TensorFlow's APIs for distributing training across multiple devices
GPU Computing & Cloud Resources¶
GPU Computing Libraries¶
CUDA Toolkit - NVIDIA's parallel computing platform and programming model
cuDNN - GPU-accelerated library for deep neural networks
TensorRT - NVIDIA's SDK for high-performance deep learning inference
ROCm - AMD's open-source platform for GPU computing
OpenCL - open standard for parallel programming of heterogeneous systems
Free/Academic GPU Resources¶
Google Colab - free Jupyter notebooks with GPU/TPU access
Kaggle Kernels - free notebooks with GPU acceleration for data science competitions
Lightning AI - cloud platform for building AI products (free tier available)
Paperspace Gradient - ML development platform with free GPU instances
ML Frameworks & Libraries¶
Deep Learning Frameworks¶
PyTorch - open-source machine learning library developed by Meta AI
TensorFlow - end-to-end open-source platform for machine learning by Google
JAX - Google's composable transformations of Python+NumPy programs
Keras - high-level neural networks API running on top of TensorFlow
MXNet - Apache's flexible and efficient deep learning library
Classic ML Libraries¶
scikit-learn - comprehensive ML library for Python with classification, regression, clustering
XGBoost - optimized gradient boosting library for supervised learning
LightGBM - Microsoft's fast, distributed, high-performance gradient boosting framework
CatBoost - gradient boosting library with categorical features support
LLM APIs & Prompt Engineering¶
OpenAI API - access to GPT-4, GPT-3.5, DALL-E, and other OpenAI models (commercial)
Anthropic Claude API - API access to Claude models including Claude Opus, Sonnet, and Haiku (commercial)
Cohere API - NLP platform with embeddings, generation, and classification APIs (commercial)
Together AI - fastest cloud platform for building and running generative AI
OpenRouter - unified API for multiple LLM providers with single integration
PromptLayer - platform for prompt engineering and LLM observability
Additional Resources¶
Awesome LLM - curated list of Large Language Model resources
Awesome MLOps - curated list of MLOps tools and practices
Awesome Production Machine Learning - curated list of production-level ML tools
State of AI Report - annual comprehensive report on AI progress and trends