This page covers the managed local inference endpoint (Documentation Index
Fetch the complete documentation index at: https://mintlify.com/NVIDIA/OpenShell/llms.txt
Use this file to discover all available pages before exploring further.
https://inference.local). External inference endpoints are controlled by sandbox network_policies — see Policies for details.
The configuration requires two values:
| Value | Description |
|---|---|
| Provider record | The credential backend OpenShell uses to authenticate with the upstream model host. |
| Model ID | The model to use for generation requests. |
Configure the inference backend
Create a provider
Create a provider record that holds the backend credentials OpenShell will use when forwarding requests from This reads
inference.local.- NVIDIA API Catalog
- OpenAI-compatible
- Local endpoint
- Anthropic
NVIDIA_API_KEY from your environment.Set inference routing
Point By default,
inference.local at the provider you created and choose the model to use:openshell inference set probes the resolved upstream endpoint before saving. If the endpoint is not live yet, add --no-verify to persist the route without the probe.Update part of the config
Useopenshell inference update when you want to change only one field without repeating the other:
Use the local endpoint from a sandbox
After inference is configured, code inside any sandbox can callhttps://inference.local directly:
model and api_key values supplied by the client are not sent upstream. The privacy router injects the real credentials from the configured provider and rewrites the model before forwarding.
Some SDKs require a non-empty API key value even though
inference.local does not use it. Pass any placeholder such as test or unused in those cases.Verify end-to-end from a sandbox
To confirm connectivity from inside a sandbox:How gateway-level config applies
- Gateway-scoped: The active provider and model apply to every sandbox using that gateway. All sandboxes see the same
inference.localbackend. - HTTPS only:
inference.localis intercepted only for HTTPS traffic. - Hot-refresh: Provider and inference changes are picked up within about 5 seconds by default. Sandboxes do not need to be restarted.
Self-hosted NIM endpoint
To configureinference.local to forward to a self-hosted NVIDIA NIM instance:
Local inference with Ollama
To pointinference.local at an Ollama instance running on the same host as the gateway:
:cloud tag suffix (for example, qwen3.5:cloud), which do not require local hardware.
For a fully self-contained Ollama setup — with Ollama running inside the sandbox itself — see the local inference tutorial.
Troubleshooting
Endpoint probe fails oninference set
openshell inference set verifies the upstream endpoint before saving by default. If the model server is not running yet, use --no-verify to save the config first and retry verification later.
Requests fail with a connection error inside the sandbox
Check that the upstream server is bound to 0.0.0.0 rather than 127.0.0.1. Requests to inference.local originate from the gateway runtime, so loopback addresses are not reachable. Use host.openshell.internal or the host’s LAN IP in the provider’s OPENAI_BASE_URL.
SDK rejects an empty API key
Some SDKs validate that the API key field is non-empty before sending the request. Pass any non-empty placeholder — inference.local ignores whatever value the sandbox provides.
Changes not taking effect
Hot-refresh propagates within about 5 seconds. If a sandbox still uses the old config after that window, check openshell inference get to confirm the saved configuration is correct.
Next steps
Inference routing overview
Understand the two routing paths and supported API patterns.
Manage providers
View and update provider credential records.
Local inference tutorial
Complete walkthrough for local inference with Ollama and LM Studio.