This tutorial covers two ways to run local inference with OpenShell: using Ollama or using LM Studio. Both approaches expose a local model backend throughDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/NVIDIA/OpenShell/llms.txt
Use this file to discover all available pages before exploring further.
inference.local so that agents inside a sandbox can make inference requests without reaching external APIs.
- Ollama
- LM Studio
Ollama offers two approaches: a self-contained community sandbox with Ollama pre-installed, or routing sandbox inference to a host-level Ollama instance shared across multiple sandboxes.This pulls the community sandbox image, applies the bundled policy, and drops you into a shell with Ollama running.
To auto-update on every sandbox start:
Prerequisites
- A working OpenShell installation. Complete the Quickstart before proceeding.
Option A: Ollama community sandbox (recommended)
The Ollama community sandbox bundles Ollama, Claude Code, OpenCode, and Codex into a single image. Ollama starts automatically when the sandbox launches.Create the sandbox
Model recommendations
| Use case | Model | Notes |
|---|---|---|
| Smoke test | qwen3.5:0.8b | Fast and lightweight, good for verifying setup |
| Coding and reasoning | qwen3.5 | Strong tool calling support for agentic workflows |
| Complex tasks | nemotron-3-super | 122B parameter model, requires 48 GB+ VRAM |
| No local GPU | qwen3.5:cloud | Runs on Ollama’s cloud infrastructure, no ollama pull required |
Cloud models use the
:cloud tag suffix and do not require local hardware.Tool calling
Agentic workflows (Claude Code, Codex, OpenCode) rely on tool calling. The following models have reliable tool calling support: Qwen 3.5, Nemotron-3-Super, GLM-5, and Kimi-K2.5. Check the Ollama model library for the latest additions.Updating Ollama
To update Ollama inside a running sandbox:Option B: Host-level Ollama
Use this approach when you want a single Ollama instance on the gateway host, shared across multiple sandboxes throughinference.local.This approach uses Ollama because it is easy to install and run locally, but you can substitute other inference engines such as vLLM, SGLang, TRT-LLM, and NVIDIA NIM by changing the startup command, base URL, and model name.
Install and start Ollama
Install Ollama on the gateway host:Start Ollama on all interfaces so it is reachable from sandboxes:
Pull a model
In a second terminal, pull a model:Type
/bye to exit the interactive session. The model stays loaded.Create a provider
Create an OpenAI-compatible provider pointing at the host Ollama instance:OpenShell injects
host.openshell.internal so sandboxes and the gateway can reach the host machine. You can also use the host’s LAN IP.Troubleshooting
| Problem | Fix |
|---|---|
| Ollama not reachable from sandbox | Ollama must be bound to 0.0.0.0, not 127.0.0.1. The community sandbox handles this automatically. |
Wrong OPENAI_BASE_URL | Use http://host.openshell.internal:11434/v1, not localhost or 127.0.0.1. |
| Model not found | Run ollama ps to confirm the model is loaded. Run ollama pull <model> if needed. |
| HTTPS vs HTTP | Code inside sandboxes must call https://inference.local, not http://. |
| AMD GPU driver issues | Ollama v0.18+ requires ROCm 7 drivers for AMD GPUs. Update your drivers if you see GPU detection failures. |
GPU support for local inference
Both Ollama and LM Studio can use local GPU resources:- NVIDIA GPUs: Both tools support CUDA automatically when the appropriate drivers are installed. No additional configuration is required in OpenShell.
- AMD GPUs: Ollama v0.18+ requires ROCm 7 drivers. LM Studio uses ROCm automatically on supported hardware.
- Apple Silicon: Both tools use Metal for hardware acceleration on M-series Macs.
- CPU fallback: If no GPU is detected, inference runs on CPU. For most coding assistant workloads, a small quantized model (such as
qwen3.5:0.8b) runs acceptably on CPU.
GPU resources are available to Ollama and LM Studio running on the gateway host. Sandboxes themselves do not have direct GPU access — inference is routed from the sandbox through
inference.local to the host-side backend.What’s next
Managed inference
Learn how OpenShell routes inference requests and manages provider configuration.
Configure inference backends
Configure vLLM, SGLang, TRT-LLM, NVIDIA NIM, or any other OpenAI-compatible backend.
Community sandboxes
Explore pre-built sandbox images for common development workflows.
LM Studio CLI docs
Learn more about the
lms CLI for headless LM Studio usage.