Skip to content

Hugging Face

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

What is Hugging Face?

Hugging Face is the central hub for the open-source AI community. Think of it as "GitHub for AI models" - a platform where researchers and developers share:

  • Models: Pre-trained AI models ready to download and use
  • Datasets: Training and evaluation data for machine learning
  • Spaces: Interactive demos and applications
  • Documentation: Model cards, papers, and usage guides

For researchers and academics, Hugging Face provides access to state-of-the-art models without needing to train them from scratch, saving significant computational resources and time.

Create a Hugging Face Account

🤗 Hugging Face

Follow these instructions to sign up for Hugging Face:

  1. Visit the Hugging Face website: https://huggingface.co

  2. Click on the "Sign Up" button in the top-right corner of the page.

  3. Fill in your email address, username, and password in the respective fields.

  4. Check the box to agree to Hugging Face's terms and conditions, then click "Sign Up."

  5. You'll receive an email to confirm your account. Click on the confirmation link in the email.

  6. Once your account is confirmed, sign in to access Hugging Face's features.

For more information, visit the Hugging Face documentation: https://huggingface.co/docs

Finding Models

The Model Hub hosts over 1 million models. To find what you need:

  1. Browse by Task: Filter by what you want to do (text generation, image classification, translation, etc.)
  2. Sort by Downloads: Popular models are well-tested and documented
  3. Filter by License: Important for academic and commercial use
  4. Check the Model Card: Every model should have documentation explaining its capabilities and limitations

Popular Model Categories for Researchers:

Category Example Models Use Cases
Text Generation Llama 3, Mistral, Qwen Writing assistance, code generation, analysis
Embeddings BGE, E5, GTE Document search, similarity matching, RAG
Vision-Language LLaVA, Qwen-VL Image analysis, chart interpretation
Speech Whisper, Wav2Vec2 Transcription, audio analysis

Finding Datasets

The Dataset Hub hosts datasets for training and evaluation:

  1. Search by Domain: Academic papers, code, images, audio, etc.
  2. Check Size and Format: Ensure it fits your storage and processing capabilities
  3. Review the License: Some datasets have restrictions on use

Installing the Hugging Face CLI

The huggingface_hub library provides tools for downloading and managing models.

Installation

pip install huggingface_hub
conda install -c conda-forge huggingface_hub

Authentication (Required for Some Models)

Some models (especially Llama and other gated models) require you to accept license terms and authenticate:

  1. Create an Access Token:
  2. Go to huggingface.co/settings/tokens
  3. Click "New token" and create a token with "Read" access
  4. Copy the token (you will only see it once)

  5. Login via CLI:

    huggingface-cli login
    
    Paste your token when prompted.

  6. Accept Model License (for gated models):

  7. Visit the model page (e.g., meta-llama/Llama-3.3-70B-Instruct)
  8. Click "Access repository" and accept the license terms

Token Security

Treat your Hugging Face token like a password. Do not commit it to version control or share it publicly.

Downloading Models

The easiest way to run Hugging Face models locally is through Ollama, which handles all the complexity:

# Install Ollama (if not already installed)
curl -fsSL https://ollama.com/install.sh | sh

# Run popular models directly
ollama run llama3.2
ollama run mistral
ollama run qwen2.5

Ollama automatically downloads optimized versions of models from Hugging Face.

Method 2: Using huggingface-cli

For more control, download models directly:

# Download a specific model
huggingface-cli download microsoft/Phi-3-mini-4k-instruct

# Download to a specific directory
huggingface-cli download microsoft/Phi-3-mini-4k-instruct --local-dir ./models/phi3

# Download only specific files (useful for large models)
huggingface-cli download meta-llama/Llama-3.2-1B --include "*.safetensors"

Method 3: Using Python

from huggingface_hub import snapshot_download

# Download entire model repository
model_path = snapshot_download(
    repo_id="microsoft/Phi-3-mini-4k-instruct",
    local_dir="./models/phi3"
)

print(f"Model downloaded to: {model_path}")

Running Models Locally

Once downloaded, you can run models using various frameworks.

Option 1: Transformers Library (Most Flexible)

The transformers library from Hugging Face is the standard for working with models:

pip install transformers torch accelerate

Basic Text Generation Example:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
model_name = "microsoft/Phi-3-mini-4k-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,  # Use half precision to save memory
    device_map="auto"           # Automatically use GPU if available
)

# Generate text
prompt = "Explain the process of photosynthesis in simple terms:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=200,
    temperature=0.7,
    do_sample=True
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Option 2: llama.cpp (Efficient CPU/GPU Inference)

For running models efficiently on consumer hardware, llama.cpp provides optimized inference:

# Install llama-cpp-python
pip install llama-cpp-python

# Or with GPU support (CUDA)
CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python

Using GGUF Format Models:

from llama_cpp import Llama

# Download a GGUF model from Hugging Face
# Example: https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF

llm = Llama(
    model_path="./models/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
    n_ctx=4096,      # Context window
    n_threads=8,     # CPU threads
    n_gpu_layers=35  # Layers to offload to GPU (0 for CPU-only)
)

output = llm(
    "What are the key differences between supervised and unsupervised learning?",
    max_tokens=300,
    temperature=0.7,
    echo=False
)

print(output["choices"][0]["text"])

Option 3: Text Generation Web UI

For a graphical interface, text-generation-webui provides a ChatGPT-like experience for local models.

Here are well-tested models suitable for different hardware configurations:

Small Models (4-8GB RAM)

Model Size Best For
Phi-3-mini 3.8B General tasks, runs on laptops
Qwen2.5-3B-Instruct 3B Multilingual, good reasoning
Llama-3.2-1B-Instruct 1B Very fast, basic tasks

Medium Models (16-32GB RAM)

Model Size Best For
Llama-3.2-3B-Instruct 3B Balanced performance
Mistral-7B-Instruct 7B Excellent general purpose
Qwen2.5-7B-Instruct 7B Strong reasoning, coding

Large Models (GPU Required)

Model Size Best For
Llama-3.3-70B-Instruct 70B Near-frontier performance
Qwen2.5-72B-Instruct 72B State-of-the-art open model
DeepSeek-R1 671B Advanced reasoning (requires cluster)

Quantized Models

For running larger models on limited hardware, look for quantized versions (GGUF format). These reduce memory requirements with minimal quality loss. Search for model names with "GGUF" or visit TheBloke for quantized versions.

Downloading Datasets

Using the datasets Library

pip install datasets

Load a Dataset:

from datasets import load_dataset

# Load a dataset from the Hub
dataset = load_dataset("squad")  # Stanford Question Answering Dataset

# View dataset structure
print(dataset)
print(dataset["train"][0])  # First training example

Download for Offline Use:

from datasets import load_dataset

# Download and cache locally
dataset = load_dataset(
    "scientific_papers",
    "arxiv",
    cache_dir="./data/scientific_papers"
)

# Save to disk in a specific format
dataset.save_to_disk("./data/arxiv_papers")
Dataset Description Size
arxiv-papers ArXiv papers 4.6TB of papers
wikipedia Wikipedia articles Multiple languages
pile Diverse text corpus 800GB
code_search_net Code from GitHub 6M functions

Spaces: Interactive Demos

Hugging Face Spaces hosts interactive applications built with models:

  • Try before you download: Test models in your browser
  • Share your work: Deploy demos for papers or projects
  • Learn from examples: See how others implement solutions

Most Spaces are built using Gradio, an open-source Python library for creating web interfaces for ML models. You can build and deploy your own Gradio apps to Spaces with just a few lines of code. See our Gradio documentation for tutorials and examples.

Notable Spaces for Researchers:

Best Practices

Storage Management

Models can be large. Manage your cache:

# View cache usage
huggingface-cli scan-cache

# Delete unused models
huggingface-cli delete-cache

Model Selection Tips

  1. Start small: Begin with smaller models to test your workflow
  2. Check benchmarks: Review model cards for performance on relevant tasks
  3. Consider licensing: Ensure the license fits your use case (research vs. commercial)
  4. Read the limitations: Model cards describe known issues and biases

For Academic Use

  • Cite properly: Model cards include citation information
  • Document your setup: Record model versions and parameters for reproducibility
  • Check data provenance: Understand what data was used to train the model

Further Resources

Hugging Face vs. Ollama

For most workshop participants, we recommend starting with Ollama for running local models. It handles model downloading and optimization automatically. Use Hugging Face directly when you need:

  • Access to specific model versions or configurations
  • Fine-tuning or training capabilities
  • Datasets for research
  • Models not available in Ollama's library