Past the chat window — into your filesystem, your shell, and your research stack. Vibe coding · MCP · sandboxes · safety.
A chat window. One turn at a time. The model sees what you paste. You see what it writes. The boundary between "your computer" and "the AI" is a textbox.
The model reads your repo, edits files, runs your tests, calls APIs, queries databases — through standardized tool protocols. You approve actions, not paragraphs.
Everything in the basics deck still applies — CRAFT, role, format. But the failure modes change: an unclear prompt no longer wastes a turn. It wastes a file.
The IDE / extension / CLI / browser landscape. How to choose a surface for the work.
What an agent on your machine can actually do — and how to scope its authority.
The Model Context Protocol — USB-C for AI. Standard plug for tools, data, resources.
Containers, VMs, hosted notebooks. Where you let the agent run hot without setting the lab on fire.
Companion page: tyson-swetnam.github.io/intro-gpt/vibe/ and /research/. Every section below has a deeper write-up there.
An LLM, your IDE, your repo. The fastest way to ship code you don't fully understand — and the fastest way to break something you do.
There's a new kind of coding I call vibe coding, where you fully give in to the vibes, embrace exponentials, and forget that the code even exists.
Using an LLM to generate and edit code directly inside your IDE — the model is a collaborative partner, not a search engine you copy-paste from.
Speed in exchange for meaningful authority over your machine. Files, network, shell. Worth knowing what you handed over.
Full-fat editing experience. Agent sees the open repo.
Bring the agent to where you already work. BYOM in some cases.
Scriptable, headless, scriptable into pipelines and CI.
No local filesystem. Code runs in a hosted Python or Node env.
Authority increases left-to-right in your head, but is actually highest in 01 · Desktop IDE: that's the surface that touches your real files.
| If you are… | Surface | Reach for | Notes |
|---|---|---|---|
| New to vibe coding, exploring | Browser | ChatGPT / Gemini / Claude.ai | No install, sandboxed Python, zero blast radius. |
| A researcher with R / Python notebooks | Desktop IDE | Positron · Cursor · VS Code | Native data-science tooling, agent has repo context. |
| Comfortable in terminal, scripting workflows | CLI | Claude Code CLI · Aider · Gemini CLI | Pipe into CI, run headless against many repos. |
| Working with sensitive data on your laptop | BYOM | Cline + Ollama · Aider + local LLM | Self-hosted model; nothing leaves the machine. |
| Already deep in GitHub PR review | VS Code ext | GitHub Copilot · Claude Code | Lives in the editor you already have open. |
None of these are permanent. The cost of switching is one afternoon of muscle memory.
Once you grant filesystem access, the agent has everything your user does. That's the deal. Understand what it can reach.
Read, modify, and delete files anywhere your user has permission. Not just the open repo — your home directory, your Downloads, your Dropbox.
Make API calls, fetch URLs, exfiltrate data, install packages. The agent has your IP and your connectivity.
Execute arbitrary commands — file deletion, force pushes, cloud sync. The terminal is the terminal.
Read environment variables — including secrets your shell exposes. API keys, database URLs, auth tokens.
One protocol, any tool. The plug that lets the same agent talk to your filesystem, your database, your calendar, your lab instruments.
Before MCP: every agent shipped its own plugin system. Connecting Claude to your filesystem meant one integration; connecting Cursor to the same filesystem meant another.
After MCP: write one server. Any compatible client can use it. The "M×N" problem becomes "M + N".
Anthropic, open spec at modelcontextprotocol.io. Implementations from Anthropic, OpenAI, community.
Functions the model can call — read a file, run a query, send an email.
Data the model can pull in — files, database rows, API responses.
Reusable templates the server publishes for the client to surface.
The app the user talks to. Manages the model, the chat, and the list of attached servers.
also: Cursor · VS Code · Cline · Claude Code
Each server is its own process with its own permissions. You enable them one at a time in the host's config — and you revoke them the same way.
// claude_desktop_config.json { "mcpServers": { "filesystem": { "command": "npx", "args": [ "-y", "@modelcontextprotocol/server-filesystem", "/Users/tswetnam/research/phai-2026" ] }, "postgres": { "command": "npx", "args": ["-y", "@mcp/server-postgres"], "env": { "DATABASE_URL": "..." } } } }
Claude can now read & write files in your project folder and query your local Postgres — through tool calls you approve in the chat.
The path you pass in is the leash — the filesystem server cannot reach outside it. Use this. Point each MCP at the narrowest scope that does the job.
The point of a sandbox: somewhere the agent can experiment, install random packages, and break things without taking your laptop down with it.
ChatGPT's Python sandbox, Gemini code execution, Claude artifacts. The agent runs code in someone else's container.
Docker or VS Code devcontainer. Agent runs against your real repo but inside an isolated FS + network.
Full virtual machine or institutional enclave. For HIPAA, FERPA, CUI, or anything where leakage is unacceptable.
Public health work crosses all three tiers in a week. Synthetic data → soft. Cohort exploration → medium. Patient-level analysis → hard. Match the tier to the row.
At step 03 approval for high-stakes actions, and at step 04 observation for everything. Don't auto-approve write tools.
Steps 03→04 may run 50 iterations before surfacing to you. Each iteration touches the FS. A bad plan in a sandbox is a wasted minute; on your laptop it's a restore.
Literature review · hypothesis generation · code & data analysis. The same stack, pointed at a real research problem.
Search & summarize
The "what's out there on X?" tool. Web-first, citations inline, fast.
Multi-step report
You pose a research question, it produces a structured report over many minutes.
Your own corpus
Upload your PDFs / docs / audio. Grounded answers, no hallucinated citations.
Peer-reviewed
Custom GPTs and tools pinned to academic indexes. For when "web" isn't enough.
Rule of thumb: Perplexity for scoping, ScholarAI for citations you'll actually cite, NotebookLM for synthesizing what you've already collected.
I want you to act as a data scientist with complete knowledge of R, the TidyVerse, and RStudio. Write the code to: 1. Create a new R project env 2. Load Palmer Penguins 3. Plot regressions of body mass, bill length & width by species Output as R + RMarkdown with text and code in ``` blocks.
I want you to respond as though you are the mathematician Benoit Mandelbrot. Explain the relationship of lacunarity and fractal dimension for a self-affine series. Show results using mathematical equations in LaTeX or MathJax.
Try with and without web search enabled. The deltas tell you which claims the model is generating vs. retrieving.
| Surface | Where it runs | Sees your files? | Good for |
|---|---|---|---|
| ChatGPT Python tool | OpenAI sandbox | No | Quick plots, data wrangling, "show me what this CSV looks like" |
| Gemini code execution | Google sandbox | No | Inline Python results in chat, large-context analysis |
| Claude Code (CLI / IDE) | Your machine | Yes — via MCP | Real repo work, multi-file refactors, test runs |
| Jupyter AI | Your kernel | Yes | Notebook-native AI in JupyterLab; great for R / Python research |
| Cline + Ollama | Your machine, local model | Yes | Sensitive code; nothing leaves the laptop |
Match the row to the data classification. Synthetic test data → row 1. De-identified cohort → row 3. IRB-sensitive → row 5, or push to an institutional enclave.
Pick a real task from this week. Spin up a devcontainer. Attach one MCP. Watch the agent loop. Approve every write. Decide what to delegate next.