Hugging Face¶

This work is licensed under a Creative Commons Attribution 4.0 International License.
What is Hugging Face?¶
Hugging Face is the central hub for the open-source AI community. Think of it as "GitHub for AI models" - a platform where researchers and developers share:
- Models: Pre-trained AI models ready to download and use
- Datasets: Training and evaluation data for machine learning
- Spaces: Interactive demos and applications
- Documentation: Model cards, papers, and usage guides
For researchers and academics, Hugging Face provides access to state-of-the-art models without needing to train them from scratch, saving significant computational resources and time.
Create a Hugging Face Account
Hugging Face
Follow these instructions to sign up for Hugging Face:
-
Visit the Hugging Face website: https://huggingface.co
-
Click on the "Sign Up" button in the top-right corner of the page.
-
Fill in your email address, username, and password in the respective fields.
-
Check the box to agree to Hugging Face's terms and conditions, then click "Sign Up."
-
You'll receive an email to confirm your account. Click on the confirmation link in the email.
-
Once your account is confirmed, sign in to access Hugging Face's features.
For more information, visit the Hugging Face documentation: https://huggingface.co/docs
Navigating the Hub¶
Finding Models¶
The Model Hub hosts over 1 million models. To find what you need:
- Browse by Task: Filter by what you want to do (text generation, image classification, translation, etc.)
- Sort by Downloads: Popular models are well-tested and documented
- Filter by License: Important for academic and commercial use
- Check the Model Card: Every model should have documentation explaining its capabilities and limitations
Popular Model Categories for Researchers:
| Category | Example Models | Use Cases |
|---|---|---|
| Text Generation | Llama 3, Mistral, Qwen | Writing assistance, code generation, analysis |
| Embeddings | BGE, E5, GTE | Document search, similarity matching, RAG |
| Vision-Language | LLaVA, Qwen-VL | Image analysis, chart interpretation |
| Speech | Whisper, Wav2Vec2 | Transcription, audio analysis |
Finding Datasets¶
The Dataset Hub hosts datasets for training and evaluation:
- Search by Domain: Academic papers, code, images, audio, etc.
- Check Size and Format: Ensure it fits your storage and processing capabilities
- Review the License: Some datasets have restrictions on use
Installing the Hugging Face CLI¶
The huggingface_hub library provides tools for downloading and managing models.
Installation¶
Authentication (Required for Some Models)¶
Some models (especially Llama and other gated models) require you to accept license terms and authenticate:
- Create an Access Token:
- Go to huggingface.co/settings/tokens
- Click "New token" and create a token with "Read" access
-
Copy the token (you will only see it once)
-
Login via CLI:
Paste your token when prompted. -
Accept Model License (for gated models):
- Visit the model page (e.g., meta-llama/Llama-3.3-70B-Instruct)
- Click "Access repository" and accept the license terms
Token Security
Treat your Hugging Face token like a password. Do not commit it to version control or share it publicly.
Downloading Models¶
Method 1: Using Ollama (Recommended for Beginners)¶
The easiest way to run Hugging Face models locally is through Ollama, which handles all the complexity:
# Install Ollama (if not already installed)
curl -fsSL https://ollama.com/install.sh | sh
# Run popular models directly
ollama run llama3.2
ollama run mistral
ollama run qwen2.5
Ollama automatically downloads optimized versions of models from Hugging Face.
Method 2: Using huggingface-cli¶
For more control, download models directly:
# Download a specific model
huggingface-cli download microsoft/Phi-3-mini-4k-instruct
# Download to a specific directory
huggingface-cli download microsoft/Phi-3-mini-4k-instruct --local-dir ./models/phi3
# Download only specific files (useful for large models)
huggingface-cli download meta-llama/Llama-3.2-1B --include "*.safetensors"
Method 3: Using Python¶
from huggingface_hub import snapshot_download
# Download entire model repository
model_path = snapshot_download(
repo_id="microsoft/Phi-3-mini-4k-instruct",
local_dir="./models/phi3"
)
print(f"Model downloaded to: {model_path}")
Running Models Locally¶
Once downloaded, you can run models using various frameworks.
Option 1: Transformers Library (Most Flexible)¶
The transformers library from Hugging Face is the standard for working with models:
Basic Text Generation Example:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load model and tokenizer
model_name = "microsoft/Phi-3-mini-4k-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16, # Use half precision to save memory
device_map="auto" # Automatically use GPU if available
)
# Generate text
prompt = "Explain the process of photosynthesis in simple terms:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=200,
temperature=0.7,
do_sample=True
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Option 2: llama.cpp (Efficient CPU/GPU Inference)¶
For running models efficiently on consumer hardware, llama.cpp provides optimized inference:
# Install llama-cpp-python
pip install llama-cpp-python
# Or with GPU support (CUDA)
CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python
Using GGUF Format Models:
from llama_cpp import Llama
# Download a GGUF model from Hugging Face
# Example: https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF
llm = Llama(
model_path="./models/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
n_ctx=4096, # Context window
n_threads=8, # CPU threads
n_gpu_layers=35 # Layers to offload to GPU (0 for CPU-only)
)
output = llm(
"What are the key differences between supervised and unsupervised learning?",
max_tokens=300,
temperature=0.7,
echo=False
)
print(output["choices"][0]["text"])
Option 3: Text Generation Web UI¶
For a graphical interface, text-generation-webui provides a ChatGPT-like experience for local models.
Recommended Models for Beginners¶
Here are well-tested models suitable for different hardware configurations:
Small Models (4-8GB RAM)¶
| Model | Size | Best For |
|---|---|---|
| Phi-3-mini | 3.8B | General tasks, runs on laptops |
| Qwen2.5-3B-Instruct | 3B | Multilingual, good reasoning |
| Llama-3.2-1B-Instruct | 1B | Very fast, basic tasks |
Medium Models (16-32GB RAM)¶
| Model | Size | Best For |
|---|---|---|
| Llama-3.2-3B-Instruct | 3B | Balanced performance |
| Mistral-7B-Instruct | 7B | Excellent general purpose |
| Qwen2.5-7B-Instruct | 7B | Strong reasoning, coding |
Large Models (GPU Required)¶
| Model | Size | Best For |
|---|---|---|
| Llama-3.3-70B-Instruct | 70B | Near-frontier performance |
| Qwen2.5-72B-Instruct | 72B | State-of-the-art open model |
| DeepSeek-R1 | 671B | Advanced reasoning (requires cluster) |
Quantized Models
For running larger models on limited hardware, look for quantized versions (GGUF format). These reduce memory requirements with minimal quality loss. Search for model names with "GGUF" or visit TheBloke for quantized versions.
Downloading Datasets¶
Using the datasets Library¶
Load a Dataset:
from datasets import load_dataset
# Load a dataset from the Hub
dataset = load_dataset("squad") # Stanford Question Answering Dataset
# View dataset structure
print(dataset)
print(dataset["train"][0]) # First training example
Download for Offline Use:
from datasets import load_dataset
# Download and cache locally
dataset = load_dataset(
"scientific_papers",
"arxiv",
cache_dir="./data/scientific_papers"
)
# Save to disk in a specific format
dataset.save_to_disk("./data/arxiv_papers")
Popular Academic Datasets¶
| Dataset | Description | Size |
|---|---|---|
| arxiv-papers | ArXiv papers | 4.6TB of papers |
| wikipedia | Wikipedia articles | Multiple languages |
| pile | Diverse text corpus | 800GB |
| code_search_net | Code from GitHub | 6M functions |
Spaces: Interactive Demos¶
Hugging Face Spaces hosts interactive applications built with models:
- Try before you download: Test models in your browser
- Share your work: Deploy demos for papers or projects
- Learn from examples: See how others implement solutions
Most Spaces are built using Gradio, an open-source Python library for creating web interfaces for ML models. You can build and deploy your own Gradio apps to Spaces with just a few lines of code. See our Gradio documentation for tutorials and examples.
Notable Spaces for Researchers:
- Whisper - Audio transcription
- Document Question Answering - Extract information from documents
- Stable Diffusion - Image generation
Best Practices¶
Storage Management¶
Models can be large. Manage your cache:
Model Selection Tips¶
- Start small: Begin with smaller models to test your workflow
- Check benchmarks: Review model cards for performance on relevant tasks
- Consider licensing: Ensure the license fits your use case (research vs. commercial)
- Read the limitations: Model cards describe known issues and biases
For Academic Use¶
- Cite properly: Model cards include citation information
- Document your setup: Record model versions and parameters for reproducibility
- Check data provenance: Understand what data was used to train the model
Further Resources¶
- Hugging Face Documentation: https://huggingface.co/docs
- Transformers Library: https://huggingface.co/docs/transformers
- Hugging Face Course: https://huggingface.co/learn (Free NLP course)
- Model Hub: https://huggingface.co/models
- Dataset Hub: https://huggingface.co/datasets
- Spaces: https://huggingface.co/spaces
- Gradio (for building Spaces): See our Gradio documentation for tutorials on building interactive demos
Hugging Face vs. Ollama
For most workshop participants, we recommend starting with Ollama for running local models. It handles model downloading and optimization automatically. Use Hugging Face directly when you need:
- Access to specific model versions or configurations
- Fine-tuning or training capabilities
- Datasets for research
- Models not available in Ollama's library