Ollama¶

This work is licensed under a Creative Commons Attribution 4.0 International License.
What is Ollama?¶
Ollama is an open-source tool that makes it simple to run large language models (LLMs) locally on your own computer. Think of it as a "Docker for AI models" - it handles all the complexity of downloading, configuring, and running AI models so you can focus on using them.
Why Run AI Models Locally?
| Benefit | Description |
|---|---|
| Privacy | Your data never leaves your computer - ideal for sensitive research, patient data, or proprietary information |
| No API Costs | After initial setup, unlimited usage with no per-token charges |
| Offline Access | Work without internet connectivity once models are downloaded |
| Customization | Fine-tune models, adjust parameters, and create custom configurations |
| No Rate Limits | Generate as much content as your hardware can handle |
| Reproducibility | Lock down specific model versions for reproducible research |
When to Use Ollama vs. Cloud Services:
- Use Ollama when: privacy is paramount, you have adequate hardware, you need offline access, or you want to experiment freely without cost concerns
- Use Cloud Services (ChatGPT, Claude, etc.) when: you need the most capable models, lack powerful hardware, or need multimodal capabilities like vision
Hardware Requirements
Running local models requires computational resources. As a general guideline:
- Small models (1-3B parameters): 8GB RAM minimum, runs on most modern laptops
- Medium models (7-8B parameters): 16GB RAM recommended, GPU acceleration helpful
- Large models (13B+ parameters): 32GB+ RAM or dedicated GPU with 8GB+ VRAM
Apple Silicon Macs (M1/M2/M3/M4) are particularly well-suited for local AI due to unified memory architecture.
Installation¶
macOS¶
Option 1: Download the App (Recommended)
- Visit ollama.com/download
- Download the macOS installer
- Open the downloaded file and drag Ollama to your Applications folder
- Launch Ollama from Applications - it will appear as a llama icon in your menu bar
- The Ollama service now runs in the background
Option 2: Homebrew
After installation, start the Ollama service:
Linux¶
One-Line Install Script:
This script:
- Detects your Linux distribution (Ubuntu, Debian, Fedora, CentOS, Arch, etc.)
- Installs Ollama to
/usr/local/bin - Sets up a systemd service for automatic startup
- Configures GPU support if NVIDIA drivers are detected
Manual Installation (Alternative):
# Download the binary
curl -L https://ollama.com/download/ollama-linux-amd64 -o ollama
# Make it executable
chmod +x ollama
# Move to system path
sudo mv ollama /usr/local/bin/
# Start the service
ollama serve
Start Ollama on Boot:
# Enable the systemd service
sudo systemctl enable ollama
# Start the service now
sudo systemctl start ollama
# Check service status
sudo systemctl status ollama
Windows¶
Option 1: Windows Installer (Recommended)
- Visit ollama.com/download
- Download the Windows installer (
.exe) - Run the installer and follow the prompts
- Ollama will start automatically and appear in the system tray
Option 2: Windows Subsystem for Linux (WSL)
If you prefer a Linux-like environment on Windows:
# First, ensure WSL2 is installed and updated
wsl --install
# Then in your WSL terminal:
curl -fsSL https://ollama.com/install.sh | sh
Docker¶
For containerized deployments or server environments:
# Pull and run the Ollama container
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
# With NVIDIA GPU support
docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
Verify Installation¶
After installation, verify Ollama is working:
# Check Ollama version
ollama --version
# List available models (will be empty initially)
ollama list
# Test the API endpoint
curl http://localhost:11434/api/tags
Downloading and Managing Models¶
The Model Library¶
Ollama maintains a curated library of optimized models at ollama.com/library. These models are:
- Pre-quantized for efficient memory usage
- Tested for compatibility with Ollama
- Available in multiple size variants
- Automatically configured for optimal performance
Downloading Models¶
Basic Download:
# Download a model (happens automatically when you run it)
ollama pull llama3.2
# Or run directly - it will download if not present
ollama run llama3.2
Specify Model Size/Variant:
Models often come in multiple sizes. Use tags to select:
# Llama 3.2 variants
ollama pull llama3.2:1b # 1 billion parameters (~1GB)
ollama pull llama3.2:3b # 3 billion parameters (~2GB)
ollama pull llama3.2 # Default (usually the balanced option)
# Qwen 2.5 variants
ollama pull qwen2.5:0.5b # Tiny - very fast
ollama pull qwen2.5:1.5b # Small
ollama pull qwen2.5:3b # Medium
ollama pull qwen2.5:7b # Large - best quality
ollama pull qwen2.5:14b # Extra large
ollama pull qwen2.5:32b # Very large - requires significant RAM/VRAM
ollama pull qwen2.5:72b # Massive - requires high-end GPU
# DeepSeek R1 distilled models
ollama pull deepseek-r1:1.5b # Smallest reasoning model
ollama pull deepseek-r1:7b # Good balance
ollama pull deepseek-r1:8b # Llama-based distillation
ollama pull deepseek-r1:14b # Higher quality
ollama pull deepseek-r1:32b # Best distilled quality
ollama pull deepseek-r1:70b # Full-size distillation
Managing Downloaded Models¶
# List all downloaded models
ollama list
# Example output:
# NAME ID SIZE MODIFIED
# llama3.2:latest a3e4c7e8d9f0 2.0 GB 2 hours ago
# qwen2.5:7b b5f6c8d9e0a1 4.4 GB 1 day ago
# deepseek-r1:8b c7d8e9f0a1b2 4.9 GB 3 days ago
# Show detailed information about a model
ollama show llama3.2
# Delete a model to free disk space
ollama rm llama3.2:1b
# Copy a model (useful for creating variants)
ollama cp llama3.2 my-llama
Model Storage Location¶
Models are stored in:
- macOS:
~/.ollama/models - Linux:
~/.ollama/modelsor/usr/share/ollama/.ollama/models - Windows:
C:\Users\<username>\.ollama\models
To change the storage location, set the OLLAMA_MODELS environment variable:
# Linux/macOS
export OLLAMA_MODELS=/path/to/your/models
# Windows PowerShell
$env:OLLAMA_MODELS = "D:\ollama\models"
Running Models¶
Interactive Chat¶
The simplest way to use Ollama is through interactive chat:
This opens an interactive session where you can type prompts and receive responses. Use /bye or Ctrl+C to exit.
Chat Session Commands:
| Command | Description |
|---|---|
/bye |
Exit the chat session |
/clear |
Clear conversation history |
/set parameter value |
Change model parameters |
/show info |
Display model information |
/show license |
Show model license |
/load <model> |
Load a different model |
/save <name> |
Save current session |
Single-Prompt Execution¶
For scripting and automation, pass the prompt directly:
# Single prompt with immediate response
ollama run llama3.2 "What is photosynthesis?"
# Pipe input from a file
cat essay.txt | ollama run llama3.2 "Summarize this text:"
# Save output to a file
ollama run llama3.2 "Write a haiku about machine learning" > haiku.txt
Model Parameters¶
Adjust model behavior with parameters:
# In interactive mode
/set temperature 0.7
/set num_predict 500
# Or set when starting
ollama run llama3.2 --verbose
Common Parameters:
| Parameter | Description | Default | Range |
|---|---|---|---|
temperature |
Creativity/randomness | 0.8 | 0.0-2.0 |
top_p |
Nucleus sampling threshold | 0.9 | 0.0-1.0 |
top_k |
Limit vocabulary sampling | 40 | 1-100 |
num_predict |
Maximum tokens to generate | 128 | -1 (unlimited) to n |
num_ctx |
Context window size | 2048 | Model-dependent |
repeat_penalty |
Penalty for repetition | 1.1 | 0.0-2.0 |
seed |
Random seed for reproducibility | Random | Any integer |
Multiline Input¶
For complex prompts, use multiline input:
ollama run llama3.2 """
You are an expert historian. Please analyze the following event
and provide context about its significance:
The signing of the Treaty of Westphalia in 1648.
Include:
1. Historical context
2. Key provisions
3. Long-term impact on international relations
"""
Using the Ollama API¶
Ollama provides a REST API that enables integration with other applications. The API runs on http://localhost:11434 by default.
Generate Completions¶
Basic Generation:
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Explain quantum computing in simple terms",
"stream": false
}'
With Parameters:
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Write a creative story about a robot learning to paint",
"stream": false,
"options": {
"temperature": 0.9,
"num_predict": 500,
"top_p": 0.95
}
}'
Chat API (Conversational)¶
For multi-turn conversations:
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{"role": "system", "content": "You are a helpful research assistant."},
{"role": "user", "content": "What are the main causes of climate change?"},
{"role": "assistant", "content": "The main causes include greenhouse gas emissions..."},
{"role": "user", "content": "How can individuals help reduce these emissions?"}
],
"stream": false
}'
Streaming Responses¶
For real-time output, enable streaming:
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Write a detailed explanation of neural networks",
"stream": true
}'
Each response chunk is a JSON object. Parse them line by line for real-time display.
API Endpoints Reference¶
| Endpoint | Method | Description |
|---|---|---|
/api/generate |
POST | Generate text completion |
/api/chat |
POST | Chat with conversation history |
/api/tags |
GET | List available models |
/api/show |
POST | Show model information |
/api/pull |
POST | Download a model |
/api/delete |
DELETE | Remove a model |
/api/copy |
POST | Copy a model |
/api/embeddings |
POST | Generate embeddings |
Python Integration¶
Using the Official Ollama Python Library¶
Basic Usage:
import ollama
# Simple generation
response = ollama.generate(
model='llama3.2',
prompt='What is machine learning?'
)
print(response['response'])
Chat Conversation:
import ollama
# Multi-turn chat
messages = [
{'role': 'system', 'content': 'You are a helpful coding assistant.'},
{'role': 'user', 'content': 'Write a Python function to calculate factorial'}
]
response = ollama.chat(
model='llama3.2',
messages=messages
)
print(response['message']['content'])
Streaming Responses:
import ollama
# Stream responses for better UX
stream = ollama.chat(
model='llama3.2',
messages=[{'role': 'user', 'content': 'Explain the water cycle'}],
stream=True
)
for chunk in stream:
print(chunk['message']['content'], end='', flush=True)
Generate Embeddings:
import ollama
# Generate embeddings for semantic search or RAG
embedding = ollama.embeddings(
model='nomic-embed-text', # or any embedding model
prompt='The quick brown fox jumps over the lazy dog'
)
print(f"Embedding dimension: {len(embedding['embedding'])}")
Using with LangChain¶
LangChain provides a powerful framework for building LLM applications:
from langchain_ollama import OllamaLLM
# Initialize the model
llm = OllamaLLM(model="llama3.2")
# Simple invocation
response = llm.invoke("What are the benefits of exercise?")
print(response)
Chat Model with History:
from langchain_ollama import ChatOllama
from langchain_core.messages import HumanMessage, SystemMessage
chat = ChatOllama(model="llama3.2", temperature=0.7)
messages = [
SystemMessage(content="You are a research assistant specializing in biology."),
HumanMessage(content="Explain CRISPR gene editing.")
]
response = chat.invoke(messages)
print(response.content)
Building a Simple RAG System:
from langchain_ollama import OllamaLLM, OllamaEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
# Initialize components
llm = OllamaLLM(model="llama3.2")
embeddings = OllamaEmbeddings(model="nomic-embed-text")
# Sample documents (in practice, load from files)
documents = [
"Machine learning is a subset of artificial intelligence...",
"Neural networks are inspired by biological neurons...",
"Deep learning uses multiple layers of neural networks..."
]
# Split documents
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
splits = text_splitter.create_documents(documents)
# Create vector store
vectorstore = Chroma.from_documents(splits, embeddings)
retriever = vectorstore.as_retriever()
# Create RAG chain
template = """Answer based on the context:
Context: {context}
Question: {question}
Answer:"""
prompt = ChatPromptTemplate.from_template(template)
rag_chain = (
{"context": retriever, "question": RunnablePassthrough()}
| prompt
| llm
)
# Query
response = rag_chain.invoke("What is deep learning?")
print(response)
For more on RAG systems, see our RAG documentation.
Using with Requests (Direct API)¶
For simple integrations without additional dependencies:
import requests
import json
def query_ollama(prompt, model="llama3.2", stream=False):
"""Simple function to query Ollama API."""
response = requests.post(
'http://localhost:11434/api/generate',
json={
'model': model,
'prompt': prompt,
'stream': stream
}
)
if stream:
# Handle streaming response
for line in response.iter_lines():
if line:
chunk = json.loads(line)
yield chunk.get('response', '')
else:
return response.json()['response']
# Usage
result = query_ollama("What is the capital of France?")
print(result)
# Streaming usage
for chunk in query_ollama("Tell me a story", stream=True):
print(chunk, end='', flush=True)
Jupyter Notebook Integration¶
Ollama integrates seamlessly with Jupyter notebooks for interactive research:
Basic Notebook Usage¶
# Cell 1: Install and import
!pip install ollama
import ollama
# Cell 2: List available models
models = ollama.list()
for model in models['models']:
print(f"{model['name']}: {model['size'] / 1e9:.1f} GB")
# Cell 3: Interactive chat
response = ollama.generate(
model='llama3.2',
prompt='Explain the difference between correlation and causation'
)
print(response['response'])
Building a Research Assistant¶
import ollama
class ResearchAssistant:
"""A simple research assistant using Ollama."""
def __init__(self, model='llama3.2'):
self.model = model
self.conversation = []
def set_context(self, context):
"""Set the research context/system prompt."""
self.conversation = [{
'role': 'system',
'content': context
}]
def ask(self, question):
"""Ask a question and get a response."""
self.conversation.append({
'role': 'user',
'content': question
})
response = ollama.chat(
model=self.model,
messages=self.conversation
)
assistant_message = response['message']['content']
self.conversation.append({
'role': 'assistant',
'content': assistant_message
})
return assistant_message
def summarize_paper(self, abstract):
"""Summarize a research paper abstract."""
prompt = f"""Please analyze this research abstract and provide:
1. Main research question
2. Methodology used
3. Key findings
4. Potential implications
Abstract:
{abstract}"""
return self.ask(prompt)
# Usage in notebook
assistant = ResearchAssistant(model='llama3.2')
assistant.set_context("You are an expert in computational biology.")
response = assistant.ask("What are the latest advances in protein folding prediction?")
print(response)
For more on Jupyter AI integration, see our Jupyter AI documentation.
Creating Custom Models with Modelfiles¶
Modelfiles allow you to create customized versions of models with specific behaviors, system prompts, or parameters.
Basic Modelfile Structure¶
Create a file called Modelfile (no extension):
# Base model to customize
FROM llama3.2
# Set model parameters
PARAMETER temperature 0.7
PARAMETER num_ctx 4096
PARAMETER top_p 0.9
# Set the system prompt
SYSTEM """You are a helpful research assistant specializing in academic writing.
You help researchers improve their papers by:
- Suggesting clearer phrasing
- Identifying logical gaps
- Recommending relevant citations
- Improving overall structure
Always be constructive and specific in your feedback."""
# Optional: Add custom template
TEMPLATE """{{ if .System }}<|system|>
{{ .System }}<|end|>
{{ end }}{{ if .Prompt }}<|user|>
{{ .Prompt }}<|end|>
{{ end }}<|assistant|>
{{ .Response }}<|end|>
"""
Create and Use the Custom Model¶
# Create the custom model
ollama create research-assistant -f Modelfile
# Run your custom model
ollama run research-assistant
Modelfile Commands Reference¶
| Command | Description | Example |
|---|---|---|
FROM |
Base model (required) | FROM llama3.2 |
PARAMETER |
Set model parameters | PARAMETER temperature 0.7 |
SYSTEM |
Set system prompt | SYSTEM "You are helpful..." |
TEMPLATE |
Custom prompt template | TEMPLATE "..." |
ADAPTER |
Apply LoRA adapter | ADAPTER ./lora.gguf |
LICENSE |
Specify license | LICENSE "MIT" |
MESSAGE |
Add example messages | MESSAGE user "Hello" |
Example: Academic Discipline-Specific Assistants¶
Biology Research Assistant:
FROM llama3.2
PARAMETER temperature 0.3
PARAMETER num_ctx 8192
SYSTEM """You are an expert biology research assistant with deep knowledge of:
- Molecular biology and genetics
- Cell biology and biochemistry
- Evolutionary biology
- Ecology and environmental science
When answering questions:
1. Use precise scientific terminology
2. Cite relevant concepts and theories
3. Distinguish between established facts and current hypotheses
4. Suggest relevant experimental approaches when applicable"""
Statistics Tutor:
FROM qwen2.5:7b
PARAMETER temperature 0.2
PARAMETER num_ctx 4096
SYSTEM """You are a patient statistics tutor helping graduate students understand
statistical concepts. When explaining:
1. Start with intuitive explanations before formal definitions
2. Use concrete examples from research contexts
3. Show step-by-step calculations when relevant
4. Explain assumptions and when methods are appropriate
5. Suggest R or Python code for implementation
Always check for understanding and offer to clarify further."""
Code Review Assistant:
FROM deepseek-r1:8b
PARAMETER temperature 0.1
PARAMETER num_ctx 8192
SYSTEM """You are a senior software engineer conducting code reviews.
For each piece of code you review:
1. Identify potential bugs or errors
2. Suggest performance improvements
3. Recommend better naming or structure
4. Check for security vulnerabilities
5. Ensure code follows best practices
Be constructive and explain the reasoning behind each suggestion."""
GPU Configuration and Performance¶
Automatic GPU Detection¶
Ollama automatically detects and uses available GPUs. Check your GPU status:
NVIDIA GPU Setup (Linux)¶
Ensure you have the NVIDIA drivers and CUDA toolkit:
# Check NVIDIA driver
nvidia-smi
# The Ollama install script usually handles CUDA setup
# If needed, install CUDA toolkit:
# sudo apt install nvidia-cuda-toolkit
Apple Silicon Optimization¶
Apple M1/M2/M3/M4 Macs use Metal for GPU acceleration automatically. Ollama is highly optimized for Apple Silicon:
Memory Management¶
Control GPU Memory Usage:
# Set maximum VRAM usage (in MB)
OLLAMA_GPU_MEMORY=8192 ollama serve
# Or in environment
export OLLAMA_GPU_MEMORY=8192
CPU-Only Mode:
Performance Tuning Parameters¶
| Environment Variable | Description | Example |
|---|---|---|
OLLAMA_NUM_PARALLEL |
Number of parallel requests | 4 |
OLLAMA_MAX_LOADED_MODELS |
Models to keep in memory | 2 |
OLLAMA_GPU_MEMORY |
Max GPU memory (MB) | 8192 |
OLLAMA_HOST |
API bind address | 0.0.0.0:11434 |
OLLAMA_KEEP_ALIVE |
Model unload timeout | 5m |
Monitoring Performance¶
# Watch GPU memory usage (NVIDIA)
watch -n 1 nvidia-smi
# Monitor Ollama logs
journalctl -u ollama -f # Linux with systemd
Model Recommendations¶
By Hardware Capability¶
| Model | Size | Best For |
|---|---|---|
llama3.2:1b |
~700MB | Quick responses, basic tasks |
qwen2.5:1.5b |
~1GB | Multilingual, reasoning |
phi3:mini |
~2GB | Coding, analysis |
deepseek-r1:1.5b |
~1GB | Reasoning tasks |
| Model | Size | Best For |
|---|---|---|
llama3.2:3b |
~2GB | Balanced performance |
qwen2.5:7b |
~4.5GB | Strong reasoning, coding |
mistral:7b |
~4GB | General purpose |
deepseek-r1:8b |
~5GB | Advanced reasoning |
codellama:13b |
~7GB | Specialized coding |
| Model | Size | Best For |
|---|---|---|
llama3.3:70b |
~40GB | Near-frontier capability |
qwen2.5:32b |
~20GB | Strong all-around |
deepseek-r1:70b |
~40GB | Best open reasoning |
mixtral:8x7b |
~26GB | Mixture of experts |
By Use Case¶
Academic Writing and Research:
# For writing assistance and analysis
ollama pull qwen2.5:7b
# For reasoning-heavy tasks
ollama pull deepseek-r1:8b
Coding and Development:
# General coding
ollama pull deepseek-coder:6.7b
# Code review and debugging
ollama pull codellama:13b
# Fast completions
ollama pull starcoder2:3b
Data Analysis:
# Statistical reasoning
ollama pull qwen2.5:7b
# Code generation for analysis
ollama pull deepseek-coder:6.7b
Teaching and Tutoring:
Embeddings and RAG:
# Text embeddings
ollama pull nomic-embed-text
# Multilingual embeddings
ollama pull mxbai-embed-large
Integration with Other Tools¶
Open WebUI¶
Open WebUI provides a ChatGPT-like interface for Ollama:
# Run with Docker
docker run -d -p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data \
--name open-webui \
ghcr.io/open-webui/open-webui:main
Access at http://localhost:3000. Open WebUI automatically detects your Ollama installation.
VS Code Integration¶
Install the Continue extension for AI-assisted coding with Ollama:
- Install the Continue extension from VS Code marketplace
- Configure to use Ollama in settings:
Obsidian Integration¶
Use the Ollama plugin for Obsidian for note-taking with AI assistance.
API-Compatible Services¶
Ollama's API is compatible with the OpenAI API format. Many tools that support OpenAI can work with Ollama:
# Using OpenAI library with Ollama
from openai import OpenAI
client = OpenAI(
base_url='http://localhost:11434/v1',
api_key='ollama' # Ollama doesn't require a key, but the library needs something
)
response = client.chat.completions.create(
model='llama3.2',
messages=[
{'role': 'user', 'content': 'Hello!'}
]
)
print(response.choices[0].message.content)
Troubleshooting¶
Common Issues and Solutions¶
Model fails to load - Out of Memory
Symptoms: Error messages about memory allocation, system becomes unresponsive
Solutions:
-
Try a smaller model variant:
-
Close other memory-intensive applications
-
Reduce context window:
-
Use quantized versions (look for
q4_0orq4_K_Mtags)
Ollama service not running
Symptoms: Connection refused errors, curl: (7) Failed to connect
Solutions:
-
Start the service:
-
Check if another process is using port 11434:
-
Use a different port:
Slow generation speed
Symptoms: Model runs much slower than expected
Solutions:
-
Verify GPU is being used:
-
Check GPU drivers are up to date
-
Ensure sufficient VRAM:
-
Try a smaller model or quantization
Model gives poor quality responses
Symptoms: Responses are incoherent, repetitive, or off-topic
Solutions:
-
Adjust temperature:
-
Increase context window for complex tasks:
-
Try a larger model variant
-
Be more specific in your prompts
Cannot connect from other devices
Symptoms: API works on localhost but not from other machines
Solutions:
-
Bind to all interfaces:
-
Check firewall settings:
-
Verify network connectivity
Getting Help¶
- Official Documentation: github.com/ollama/ollama
- Discord Community: discord.gg/ollama
- GitHub Issues: github.com/ollama/ollama/issues
Academic Use Cases¶
Literature Review Assistance¶
import ollama
def analyze_abstract(abstract):
"""Analyze a research paper abstract."""
prompt = f"""Analyze this research abstract and provide:
1. Research question or hypothesis
2. Methodology
3. Key findings
4. Limitations mentioned
5. Potential follow-up questions
Abstract:
{abstract}
"""
response = ollama.generate(model='qwen2.5:7b', prompt=prompt)
return response['response']
# Example usage
abstract = """
We present a novel approach to protein structure prediction using
graph neural networks. Our method achieves state-of-the-art results
on the CASP14 benchmark, outperforming existing methods by 15% in
GDT-TS score. We demonstrate that incorporating evolutionary
information through multiple sequence alignments significantly
improves prediction accuracy.
"""
analysis = analyze_abstract(abstract)
print(analysis)
Grant Writing Support¶
import ollama
def improve_grant_section(text, section_type):
"""Suggest improvements for grant application sections."""
prompt = f"""You are an experienced grant reviewer. Review this {section_type}
section and provide specific suggestions to strengthen it:
{text}
Please provide:
1. Strengths of the current text
2. Areas that need improvement
3. Specific rewrite suggestions
4. Questions a reviewer might ask"""
response = ollama.generate(model='qwen2.5:7b', prompt=prompt)
return response['response']
Teaching Assistant¶
import ollama
def create_quiz_questions(topic, difficulty, num_questions):
"""Generate quiz questions on a topic."""
prompt = f"""Create {num_questions} {difficulty}-level multiple choice questions
about {topic}. For each question:
1. Provide the question
2. Give 4 options (A, B, C, D)
3. Indicate the correct answer
4. Explain why the correct answer is right
Format clearly with separators between questions."""
response = ollama.generate(model='llama3.2', prompt=prompt)
return response['response']
# Generate quiz
quiz = create_quiz_questions(
topic="the scientific method",
difficulty="intermediate",
num_questions=5
)
print(quiz)
Data Analysis Helper¶
import ollama
def suggest_analysis(data_description):
"""Suggest statistical analyses for a dataset."""
prompt = f"""Based on this data description, suggest appropriate statistical
analyses and explain the rationale:
{data_description}
Please provide:
1. Recommended statistical tests/methods
2. Assumptions to check
3. Python/R code snippets for implementation
4. How to interpret potential results"""
response = ollama.generate(model='qwen2.5:7b', prompt=prompt)
return response['response']
Further Resources¶
- Ollama Website: ollama.com
- Model Library: ollama.com/library
- GitHub Repository: github.com/ollama/ollama
- API Documentation: github.com/ollama/ollama/blob/main/docs/api.md
- Discord Community: discord.gg/ollama
Related Workshop Materials¶
- Hugging Face: Find and download models for use with Ollama
- Gradio: Build web interfaces for your Ollama-powered applications
- RAG: Implement retrieval-augmented generation with local models
- Jupyter AI: Integrate AI assistance into your research notebooks
- Agentic AI: Build autonomous AI agents with local models
- MCP: Connect Ollama to external tools and data sources
Getting Started Recommendation
If you're new to running local AI models, start with these steps:
- Install Ollama using the method for your operating system
- Download a small model:
ollama pull llama3.2:3b - Try interactive chat:
ollama run llama3.2:3b - Experiment with the Python library for programmatic access
- Create a custom Modelfile for your specific use case
Once comfortable, explore larger models and integrations with tools like Open WebUI or LangChain.