Inference Routing - NVIDIA OpenShell

OpenShell handles inference traffic through two paths: requests to external hosts like api.openai.com, and requests to inference.local, a special endpoint exposed inside every sandbox.

Two routing paths

Path	How it works
External endpoints	Traffic to external hosts is treated like any other outbound request. It is allowed or denied by `network_policies`. See Policies for details.
`inference.local`	A special HTTPS endpoint exposed inside every sandbox. The privacy router strips the sandbox-supplied credentials, injects the configured backend credentials, and forwards the request to the managed model endpoint.

How `inference.local` works

When code inside a sandbox calls https://inference.local, the privacy router intercepts the request and routes it to the backend configured for that gateway. OpenShell applies the configured model to generation requests and supplies the provider credentials itself — no sandbox code needs access to the real API key. If code calls an external inference host directly, that traffic bypasses inference.local entirely and is evaluated only by network_policies.

Property	Detail
Credentials	No sandbox API keys needed. Credentials come from the configured provider record.
Configuration	One provider and one model define sandbox inference for the active gateway. Every sandbox on that gateway sees the same `inference.local` backend.
Provider support	NVIDIA NIM, any OpenAI-compatible provider, and Anthropic all work through the same endpoint.
Hot-refresh	Provider credential changes and inference updates propagate within about 5 seconds by default, without recreating sandboxes.

The client-supplied model and api_key values sent to inference.local are not forwarded upstream. The privacy router injects the real credentials from the configured provider and rewrites the model before forwarding.

Supported API patterns

The patterns accepted by inference.local depend on the provider type configured for the gateway.

OpenAI-compatible
Anthropic-compatible

Pattern	Method	Path
Chat Completions	`POST`	`/v1/chat/completions`
Completions	`POST`	`/v1/completions`
Responses	`POST`	`/v1/responses`
Model Discovery	`GET`	`/v1/models`
Model Discovery	`GET`	`/v1/models/*`

Pattern	Method	Path
Messages	`POST`	`/v1/messages`

Requests to inference.local that do not match the configured provider’s supported patterns are denied.

Next steps

Configure inference routing

Set up the provider and model behind inference.local.

Sandbox policies

Control which external inference endpoints sandboxes can reach.

Documentation Index

​Two routing paths

​How inference.local works

​Supported API patterns

​Next steps

Configure inference routing

Sandbox policies

Two routing paths

How `inference.local` works

Supported API patterns

Next steps