AI & Machine Learning¶

Open source AI and machine learning tools, frameworks, and platforms for research and scientific computing.

Open Source Large Language Models (LLMs)¶

General Purpose LLMs¶

Meta LLaMA - Meta's open foundation models including Llama 3 and Llama 4 (Scout/Maverick variants) with 128k context, Apache 2.0 license

Mistral AI - French AI company providing open-weight models including Mistral Small 3 (24B parameters), Mistral Large 2 (123B parameters), and Mixtral MoE

Mixtral 8x7B - powerful Mixture-of-Experts model using 8 expert networks, Apache 2.0 license

EleutherAI GPT-NeoX-20B - 20 billion parameter model trained on The Pile dataset, Apache 2.0 license

EleutherAI Pythia - family of models designed for research transparency and reproducibility

BLOOM - BigScience Large Open-science Open-access Multilingual language model with 176B parameters

Falcon - Technology Innovation Institute's open-source LLM family including Falcon-180B

DeepSeek - DeepSeek-Coder and DeepSeek-Math models with strong reasoning for engineering and research

Qwen - Alibaba Cloud's multilingual LLMs with strong performance across languages and coding

Scientific & Research-Focused LLMs¶

BioGPT - Microsoft's pre-trained language model for biomedical text generation and mining

Galactica - Meta AI's scientific knowledge model trained on 48 million papers, textbooks, and knowledge bases

PubMedGPT - Stanford CRFM's biomedical language model trained on PubMed abstracts

LLM Inference & Deployment¶

Inference Engines¶

vLLM - high-throughput, memory-efficient inference engine with PagedAttention, 120-160 req/sec throughput with continuous batching

Text Generation Inference (TGI) - Hugging Face's production-ready inference container (maintenance mode as of Dec 2025, consider vLLM or SGLang)

Ollama - easy-to-use local LLM deployment with simple CLI, ideal for development and prototyping

llama.cpp - C++ implementation enabling LLM inference on CPU and edge devices with quantization support

LM Studio - desktop application for running LLMs locally with user-friendly GUI

SGLang - high-performance serving with strong caching and scheduler optimizations

TensorRT-LLM - NVIDIA's optimized inference library for maximum performance on NVIDIA GPUs

Model Serving Platforms¶

Ray Serve - scalable model serving built on Ray for distributed Python applications

TorchServe - PyTorch's official model serving framework for production ML models

NVIDIA Triton Inference Server - high-performance inference serving for multiple frameworks (TensorFlow, PyTorch, ONNX)

Agentic AI Frameworks¶

LangChain - comprehensive ecosystem for building LLM-powered applications with extensive integrations, chains, agents, and memory

LlamaIndex - data framework for LLM applications with sophisticated RAG capabilities and knowledge base integration

AutoGPT - pioneering autonomous AI agents that independently pursue goals through iterative planning (167k+ GitHub stars)

CrewAI - framework for orchestrating role-based AI agents working as collaborative teams

Microsoft AutoGen - framework enabling next-gen LLM applications with multi-agent conversation

MetaGPT - multi-agent framework simulating software company with roles like Product Manager, Architect, Engineer

ChatDev - collaborative AI agents creating software through multi-agent conversation

BabyAGI - simple autonomous task-driven AI agent using OpenAI and vector databases

AgentGPT - browser-based autonomous AI agents for achieving user-defined goals

RAG (Retrieval Augmented Generation)¶

RAG Frameworks¶

LangChain RAG - comprehensive RAG implementation with document loaders, text splitters, and retrievers

LlamaIndex (GPT Index) - leading data framework for RAG with advanced indexing, chunking, and retrieval

Haystack - open source NLP framework by deepset for building RAG pipelines and semantic search

txtai - all-in-one embeddings database for semantic search, RAG, and LLM orchestration

Vector Databases¶

Chroma - open-source embedding database for AI applications, ideal for local development and prototyping

Weaviate - open-source vector database with hybrid search (vector + keyword), multi-modal support

Qdrant - high-performance vector similarity search engine written in Rust

Milvus - cloud-native vector database built for scalable similarity search

pgvector - PostgreSQL extension for vector similarity search, integrates with existing PostgreSQL databases

FAISS - Facebook AI Similarity Search library for efficient similarity search of dense vectors

Pinecone - managed vector database service (commercial with free tier)

MLOps Platforms¶

Experiment Tracking & Model Management¶

MLflow - open-source platform for ML lifecycle including experiment tracking, model registry, and deployment

Weights & Biases (W&B) - AI developer platform for experiment tracking, visualization, and collaboration (free tier available)

Neptune.ai - metadata store for MLOps with experiment tracking and model registry

DVC (Data Version Control) - Git-like version control for machine learning projects including data and models

ClearML - open-source MLOps platform for experiment management and orchestration

Comet ML - platform for tracking, comparing, and optimizing ML experiments

Pipeline Orchestration¶

Kubeflow - Kubernetes-native ML platform for deploying, monitoring, and managing ML workflows at scale

Apache Airflow - platform for programmatically authoring, scheduling, and monitoring workflows

Prefect - workflow orchestration tool for building, observing, and reacting to data pipelines

ZenML - extensible open-source MLOps framework for production-ready ML pipelines

Metaflow - Netflix's framework for building and managing real-life data science projects

Scientific AI Applications¶

Computational Biology & Drug Discovery¶

AlphaFold 3 - DeepMind's AI system for protein structure prediction, 2024 Nobel Prize in Chemistry (200M+ predictions)

ESMFold - Meta AI's protein structure prediction using language models (600M+ metagenomic proteins)

RoseTTAFold - University of Washington's protein structure prediction network

OpenFold - open-source reproduction of AlphaFold2 and foundation for community development

ChemBERTa - transformer models for molecular property prediction

DeepChem - democratizing deep learning for drug discovery, materials science, and quantum chemistry

Climate & Earth Science¶

ClimateLearn - benchmark dataset and library for ML in climate science

Microsoft AI for Earth - AI tools and grants for environmental research and conservation

FourCastNet - NVIDIA's global data-driven weather forecasting using neural networks

AI Ethics & Responsible AI¶

Fairness & Bias Detection¶

AI Fairness 360 (AIF360) - IBM's comprehensive toolkit with 70+ fairness metrics and 10+ bias mitigation algorithms

Fairlearn - Microsoft's open-source toolkit for assessing and improving fairness of AI systems

Aequitas - bias and fairness audit toolkit by Center for Data Science and Public Policy

Explainability & Interpretability¶

SHAP (SHapley Additive exPlanations) - game-theoretic approach to explain ML model predictions

LIME (Local Interpretable Model-agnostic Explanations) - explaining predictions of any machine learning classifier

InterpretML - Microsoft's toolkit for training interpretable models and explaining blackbox systems

Captum - PyTorch library for model interpretability and understanding

What-If Tool - Google's visual interface for probing ML model behavior

Responsible AI Frameworks¶

IBM AI Fairness 360 Toolkit - comprehensive fairness metrics and bias mitigation algorithms

Google Responsible AI Practices - principles and practices for responsible AI development

Model Cards Toolkit - standardized documentation for ML models following model cards framework

Data Annotation & Labeling¶

Label Studio - open-source data labeling tool for text, images, audio, video, and time series

CVAT (Computer Vision Annotation Tool) - free online interactive video and image annotation tool

Labelbox - training data platform for building AI applications (commercial with free tier)

VGG Image Annotator (VIA) - lightweight standalone image/video/audio annotation tool from Oxford

Prodigy - scriptable annotation tool for creating training and evaluation data

Doccano - open-source text annotation tool for classification, sequence labeling, and sequence to sequence

Model Hubs & Repositories¶

Hugging Face Hub - largest repository with 1M+ models across all modalities (NLP, vision, audio, multimodal)

PyTorch Hub - pre-trained model repository for research reproducibility, integrated with Papers with Code

TensorFlow Hub - library for publishing, discovering, and reusing ML modules in TensorFlow

ONNX Model Zoo - collection of pre-trained ONNX models for various tasks

Papers with Code - free resource linking academic papers with code implementations and leaderboards

Model Zoo - discover open-source deep learning models and projects

Foundation Model Training¶

Training Frameworks¶

DeepSpeed - Microsoft's deep learning optimization library for training massive models with ZeRO optimizer

Megatron-LM - NVIDIA's framework for training multi-billion parameter language models

Colossal-AI - unified deep learning system for large-scale model training with parallelism

Alpa - system for training and serving large-scale neural networks

Distributed Training¶

Horovod - distributed deep learning training framework for TensorFlow, Keras, PyTorch, and MXNet

Ray Train - scalable machine learning library for distributed training

PyTorch Distributed - PyTorch's native distributed training with various backends (DDP, FSDP)

TensorFlow Distributed - TensorFlow's APIs for distributing training across multiple devices

GPU Computing & Cloud Resources¶

GPU Computing Libraries¶

CUDA Toolkit - NVIDIA's parallel computing platform and programming model

cuDNN - GPU-accelerated library for deep neural networks

TensorRT - NVIDIA's SDK for high-performance deep learning inference

ROCm - AMD's open-source platform for GPU computing

OpenCL - open standard for parallel programming of heterogeneous systems

Free/Academic GPU Resources¶

Google Colab - free Jupyter notebooks with GPU/TPU access

Kaggle Kernels - free notebooks with GPU acceleration for data science competitions

Lightning AI - cloud platform for building AI products (free tier available)

Paperspace Gradient - ML development platform with free GPU instances

ML Frameworks & Libraries¶

Deep Learning Frameworks¶

PyTorch - open-source machine learning library developed by Meta AI

TensorFlow - end-to-end open-source platform for machine learning by Google

JAX - Google's composable transformations of Python+NumPy programs

Keras - high-level neural networks API running on top of TensorFlow

MXNet - Apache's flexible and efficient deep learning library

Classic ML Libraries¶

scikit-learn - comprehensive ML library for Python with classification, regression, clustering

XGBoost - optimized gradient boosting library for supervised learning

LightGBM - Microsoft's fast, distributed, high-performance gradient boosting framework

CatBoost - gradient boosting library with categorical features support

LLM APIs & Prompt Engineering¶

OpenAI API - access to GPT-4, GPT-3.5, DALL-E, and other OpenAI models (commercial)

Anthropic Claude API - API access to Claude models including Claude Opus, Sonnet, and Haiku (commercial)

Cohere API - NLP platform with embeddings, generation, and classification APIs (commercial)

Together AI - fastest cloud platform for building and running generative AI

OpenRouter - unified API for multiple LLM providers with single integration

PromptLayer - platform for prompt engineering and LLM observability

Additional Resources¶

Awesome LLM - curated list of Large Language Model resources

Awesome MLOps - curated list of MLOps tools and practices

Awesome Production Machine Learning - curated list of production-level ML tools

State of AI Report - annual comprehensive report on AI progress and trends