A University of Arizona · BIO5 Institute

PH&AI Summer School · 2nd Edition

AI Literacy Track · Deep Dive

Agentic
Engineering.

Past the chat window — into your filesystem, your shell, and your research stack. Vibe coding · MCP · sandboxes · safety.

Tyson Swetnam, PhD

Associate Professor · BIO5 Institute

Grand Challenges Research Building
Tucson, AZ · June 2026

APH&AI · AI Literacy

02 / The Shift

What's Different This Time

The model isn't reading your prompt. It's reading your filesystem.

Yesterday · the basics deck

Prompt → response.

A chat window. One turn at a time. The model sees what you paste. You see what it writes. The boundary between "your computer" and "the AI" is a textbox.

Today · this deck

Prompt → tool calls → effects.

The model reads your repo, edits files, runs your tests, calls APIs, queries databases — through standardized tool protocols. You approve actions, not paragraphs.

Everything in the basics deck still applies — CRAFT, role, format. But the failure modes change: an unclear prompt no longer wastes a turn. It wastes a file.

APH&AI · AI Literacy

03 / Outcome

What You'll Leave With

Four lenses for the agentic stack.

01.

Vibe coding.

The IDE / extension / CLI / browser landscape. How to choose a surface for the work.

02.

Local filesystems.

What an agent on your machine can actually do — and how to scope its authority.

03.

MCP.

The Model Context Protocol — USB-C for AI. Standard plug for tools, data, resources.

04.

Sandboxes.

Containers, VMs, hosted notebooks. Where you let the agent run hot without setting the lab on fire.

Companion page: tyson-swetnam.github.io/intro-gpt/vibe/ and /research/. Every section below has a deeper write-up there.

APH&AI · AI Literacy

Part 01 · Vibe Coding

Part 01

Vibe
coding.

An LLM, your IDE, your repo. The fastest way to ship code you don't fully understand — and the fastest way to break something you do.

APH&AI · AI Literacy

05 / Origin

February 2025 · Karpathy on X

Where the term came from — and what it actually means.

There's a new kind of coding I call vibe coding, where you fully give in to the vibes, embrace exponentials, and forget that the code even exists.

Andrej Karpathy · Feb 2 2025

Working definition

Using an LLM to generate and edit code directly inside your IDE — the model is a collaborative partner, not a search engine you copy-paste from.

The trade

Speed in exchange for meaningful authority over your machine. Files, network, shell. Worth knowing what you handed over.

APH&AI · AI Literacy

06 / Surfaces

Pick The Surface That Matches The Job

Four families. Different authority, different blast radius.

01 · Desktop IDE

Standalone editors.

Full-fat editing experience. Agent sees the open repo.

Claude Desktop
VS Code
Cursor
Positron (Posit)
Windsurf
Antigravity

02 · VS Code ext

Inside your editor.

Bring the agent to where you already work. BYOM in some cases.

Claude Code
Gemini CLI Companion
OpenAI Codex
GitHub Copilot
Cline
Roo Code

03 · CLI

Terminal-first.

Scriptable, headless, scriptable into pipelines and CI.

Aider
Claude Code CLI
OpenAI Codex CLI
Gemini CLI
OpenCode.ai

04 · Browser

Sandboxed by default.

No local filesystem. Code runs in a hosted Python or Node env.

Claude Code (web)
ChatGPT
Google Gemini
OpenWebUI (self-host)

Authority increases left-to-right in your head, but is actually highest in 01 · Desktop IDE: that's the surface that touches your real files.

APH&AI · AI Literacy

07 / Choosing

A First-Pass Decision Table

Match the surface to the situation. Try one. Switch when it stops fitting.

If you are…	Surface	Reach for	Notes
New to vibe coding, exploring	Browser	ChatGPT / Gemini / Claude.ai	No install, sandboxed Python, zero blast radius.
A researcher with R / Python notebooks	Desktop IDE	Positron · Cursor · VS Code	Native data-science tooling, agent has repo context.
Comfortable in terminal, scripting workflows	CLI	Claude Code CLI · Aider · Gemini CLI	Pipe into CI, run headless against many repos.
Working with sensitive data on your laptop	BYOM	Cline + Ollama · Aider + local LLM	Self-hosted model; nothing leaves the machine.
Already deep in GitHub PR review	VS Code ext	GitHub Copilot · Claude Code	Lives in the editor you already have open.

None of these are permanent. The cost of switching is one afternoon of muscle memory.

APH&AI · AI Literacy

Part 02 · Local Filesystems

Part 02

Your
filesystem
is a tool.

Once you grant filesystem access, the agent has everything your user does. That's the deal. Understand what it can reach.

APH&AI · AI Literacy

09 / Capabilities

What An Agent On Your Machine Can Actually Do

Four authorities. Granted with one click. Worth saying out loud.

FS.

Filesystem.

Read, modify, and delete files anywhere your user has permission. Not just the open repo — your home directory, your Downloads, your Dropbox.

NET.

Network.

Make API calls, fetch URLs, exfiltrate data, install packages. The agent has your IP and your connectivity.

SH.

Shell.

Execute arbitrary commands — file deletion, force pushes, cloud sync. The terminal is the terminal.

ENV.

Environment.

Read environment variables — including secrets your shell exposes. API keys, database URLs, auth tokens.

Heads-up

LLMs occasionally hallucinate package names. Attackers register them on PyPI / npm. If your agent installs dependencies without review, it can pull in malicious code. Read the requirements.txt diff.

APH&AI · AI Literacy

10 / Practices

Practices That Keep This Manageable

Seven habits. Adopt them before you point an agent at code that matters.

01 · Review

Read every command before approving. Most tools prompt — don't auto-approve everything.

02 · Scope

Work inside project-specific virtual environments, not at user-root.

03 · Secrets

Never store secrets in code. Use environment variables and secret managers.

04 · Sudo

Be cautious with sudo or administrator privileges. Agents rarely need them.

05 · Monitor

Actively watch agent actions when you're learning a new tool. Trust comes from observation.

06 · Policy

Follow your institution's security and privacy policies. UA has classifications. So does your IRB.

07 · Sandbox

For sensitive work, use containers or VMs — covered in Part 04.

APH&AI · AI Literacy

Part 03 · Model Context Protocol

Part 03 · MCP

USB-C
for AI.

One protocol, any tool. The plug that lets the same agent talk to your filesystem, your database, your calendar, your lab instruments.

APH&AI · AI Literacy

12 / MCP · What

Model Context Protocol

An open protocol that standardizes how applications provide context to LLMs.

Before MCP: every agent shipped its own plugin system. Connecting Claude to your filesystem meant one integration; connecting Cursor to the same filesystem meant another.

After MCP: write one server. Any compatible client can use it. The "M×N" problem becomes "M + N".

Maintained by

Anthropic, open spec at modelcontextprotocol.io. Implementations from Anthropic, OpenAI, community.

Tools

Functions the model can call — read a file, run a query, send an email.

Resources

Data the model can pull in — files, database rows, API responses.

Prompts

Reusable templates the server publishes for the client to surface.

APH&AI · AI Literacy

13 / MCP · How

Anatomy Of An MCP Connection

One host. One protocol. Many servers, each scoped to one job.

Host / Client

Claude Desktop

The app the user talks to. Manages the model, the chat, and the list of attached servers.

also: Cursor · VS Code · Cline · Claude Code

MCP · JSON-RPC over stdio / SSE

Servers · each a separate process

filesystem read · write · list

github issues · prs

postgres query · schema

slack channels · messages

your-own-thing lab gear · REDCap

Each server is its own process with its own permissions. You enable them one at a time in the host's config — and you revoke them the same way.

APH&AI · AI Literacy

14 / MCP · Config

From Zero To Connected

Connecting Claude Desktop to your project folder — the actual JSON.

// claude_desktop_config.json
{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": [
        "-y",
        "@modelcontextprotocol/server-filesystem",
        "/Users/tswetnam/research/phai-2026"
      ]
    },
    "postgres": {
      "command": "npx",
      "args": ["-y", "@mcp/server-postgres"],
      "env": { "DATABASE_URL": "..." }
    }
  }
}

What this buys you

Claude can now read & write files in your project folder and query your local Postgres — through tool calls you approve in the chat.

Scoping rule

The path you pass in is the leash — the filesystem server cannot reach outside it. Use this. Point each MCP at the narrowest scope that does the job.

APH&AI · AI Literacy

Part 04 · AI Sandboxes

Part 04

Let it
run hot.

The point of a sandbox: somewhere the agent can experiment, install random packages, and break things without taking your laptop down with it.

APH&AI · AI Literacy

16 / Sandboxes

Three Tiers Of Isolation

Pick the tier that matches the data sensitivity, not your comfort.

Tier 01 · Soft

Hosted notebook.

ChatGPT's Python sandbox, Gemini code execution, Claude artifacts. The agent runs code in someone else's container.

Zero install
No local file access
Ephemeral by default
Data leaves your machine

Tier 02 · Medium

Container / devcontainer.

Docker or VS Code devcontainer. Agent runs against your real repo but inside an isolated FS + network.

Reproducible env
Limited host access
Survives destructive commands
Network still escapes

Tier 03 · Hard

VM / data enclave.

Full virtual machine or institutional enclave. For HIPAA, FERPA, CUI, or anything where leakage is unacceptable.

Strong isolation
Auditable
Often offline
Heavy to spin up

Public health work crosses all three tiers in a week. Synthetic data → soft. Cohort exploration → medium. Patient-level analysis → hard. Match the tier to the row.

APH&AI · AI Literacy

17 / The Loop

The Agentic Loop

Plan → act → observe → revise. The model that does this in a container is the one you trust.

Goal

→

Plan

→

Act · tool call

→

Observe

↻

Revise & loop

Where humans belong

At step 03 approval for high-stakes actions, and at step 04 observation for everything. Don't auto-approve write tools.

Why the sandbox matters

Steps 03→04 may run 50 iterations before surfacing to you. Each iteration touches the FS. A bad plan in a sandbox is a wasted minute; on your laptop it's a restore.

APH&AI · AI Literacy

Part 05 · Agentic Research

Part 05

Putting it
to work.

Literature review · hypothesis generation · code & data analysis. The same stack, pointed at a real research problem.

APH&AI · AI Literacy

19 / Literature

Literature Review & Synthesis

Four tools. Four jobs. Don't reach for the same one twice.

Perplexity

Search & summarize

The "what's out there on X?" tool. Web-first, citations inline, fast.

Gemini Deep Research

Multi-step report

You pose a research question, it produces a structured report over many minutes.

NotebookLM

Your own corpus

Upload your PDFs / docs / audio. Grounded answers, no hallucinated citations.

ScholarAI / SemScholar

Peer-reviewed

Custom GPTs and tools pinned to academic indexes. For when "web" isn't enough.

Rule of thumb: Perplexity for scoping, ScholarAI for citations you'll actually cite, NotebookLM for synthesizing what you've already collected.

APH&AI · AI Literacy

20 / Hypothesis

Hypothesis Generation Via Role-Play

Roles aren't just style. They route which patterns the model surfaces.

Pattern 01 · Domain expert

I want you to act as a data scientist
with complete knowledge of R, the
TidyVerse, and RStudio.

Write the code to:
1. Create a new R project env
2. Load Palmer Penguins
3. Plot regressions of body mass,
   bill length & width by species

Output as R + RMarkdown with text
and code in ``` blocks.

Pattern 02 · Talk to a dead scientist

I want you to respond as though
you are the mathematician
Benoit Mandelbrot.

Explain the relationship of
lacunarity and fractal dimension
for a self-affine series.

Show results using mathematical
equations in LaTeX or MathJax.

Try with and without web search enabled. The deltas tell you which claims the model is generating vs. retrieving.

APH&AI · AI Literacy

21 / Code Execution

Where The Code Actually Runs

Same prompt. Five very different places it could execute.

Surface	Where it runs	Sees your files?	Good for
ChatGPT Python tool	OpenAI sandbox	No	Quick plots, data wrangling, "show me what this CSV looks like"
Gemini code execution	Google sandbox	No	Inline Python results in chat, large-context analysis
Claude Code (CLI / IDE)	Your machine	Yes — via MCP	Real repo work, multi-file refactors, test runs
Jupyter AI	Your kernel	Yes	Notebook-native AI in JupyterLab; great for R / Python research
Cline + Ollama	Your machine, local model	Yes	Sensitive code; nothing leaves the laptop

Match the row to the data classification. Synthetic test data → row 1. De-identified cohort → row 3. IRB-sensitive → row 5, or push to an institutional enclave.

APH&AI · AI Literacy

22 / Safety

Coding Safely With AI

Six categories. Work through them in order before pointing an agent at code that matters.

Review every line.

Correctness, efficiency, maintainability. Test edge cases. Check for SQL injection, XSS, weak auth, secret leakage. If you don't understand a block, ask the AI to explain it.

Local execution risks.

FS, network, shell, env vars. Review commands before approving. Project-scoped venvs. Never store secrets in code. Avoid admin privileges.

Bias, licensing, IP.

Models reproduce non-inclusive identifiers, deprecated patterns, licensed code. AI-generated code is "common practice," not "best practice." Check license compatibility. Document AI use.

Privacy & data handling.

Prompts, file contents, terminal output, project metadata — all leave your machine. Don't share PHI/PII in prompts. For sensitive code: Cline+Ollama, Aider+local LLM.

Accessibility.

Ask for WCAG review on UI code. Alt text, ARIA, color contrast, keyboard nav. Audit identifiers and comments for inclusive language.

Environmental footprint.

Don't use a frontier model when a smaller one will do. Cache results. Avoid agentic loops that fire speculative requests. Cumulative compute is the cost.

APH&AI · AI Literacy

23 / Resources

Bookmarks

Where to keep reading after the room empties out.

Vibe Coding

tyson-swetnam.github.io/intro-gpt/vibe/

Research

tyson-swetnam.github.io/intro-gpt/research/

Agentic AI

tyson-swetnam.github.io/intro-gpt/agentic/

MCP

modelcontextprotocol.io · /intro-gpt/mcp/

Sandboxes

tyson-swetnam.github.io/intro-gpt/ai_sandboxes/

Local models

ollama.com · /intro-gpt/ollama/

Claude Code

docs.anthropic.com/en/docs/claude-code

APH&AI · AI Literacy

End · Q&A

Now point one at something.

Boot a
sandbox.

Pick a real task from this week. Spin up a devcontainer. Attach one MCP. Watch the agent loop. Approve every write. Decide what to delegate next.

Tyson Swetnam, PhD

tswetnam@arizona.edu · @tswetnam

BIO5 Institute
University of Arizona

AgenticEngineering.

The model isn't reading your prompt. It's reading your filesystem.

Prompt → response.

Prompt → tool calls → effects.

Four lenses for the agentic stack.

Vibe coding.

Local filesystems.

MCP.

Sandboxes.

Vibecoding.

Where the term came from — and what it actually means.

Four families. Different authority, different blast radius.

Standalone editors.

Inside your editor.

Terminal-first.

Sandboxed by default.

Match the surface to the situation. Try one. Switch when it stops fitting.

Yourfilesystemis a tool.

Four authorities. Granted with one click. Worth saying out loud.

Filesystem.

Network.

Shell.

Environment.

Seven habits. Adopt them before you point an agent at code that matters.

USB-Cfor AI.

An open protocol that standardizes how applications provide context to LLMs.

One host. One protocol. Many servers, each scoped to one job.

Claude Desktop

Connecting Claude Desktop to your project folder — the actual JSON.

Let itrun hot.

Pick the tier that matches the data sensitivity, not your comfort.

Hosted notebook.

Container / devcontainer.

VM / data enclave.

Plan → act → observe → revise. The model that does this in a container is the one you trust.

Putting itto work.

Four tools. Four jobs. Don't reach for the same one twice.

Perplexity

Gemini Deep Research

NotebookLM

ScholarAI / SemScholar

Roles aren't just style. They route which patterns the model surfaces.

Same prompt. Five very different places it could execute.

Six categories. Work through them in order before pointing an agent at code that matters.

Where to keep reading after the room empties out.

Boot asandbox.

Agentic
Engineering.

Vibe
coding.

Your
filesystem
is a tool.

USB-C
for AI.

Let it
run hot.

Putting it
to work.

Boot a
sandbox.