Deploying a Secure, Intelligent LLM Gateway

As platform engineers, we are currently stuck between a rock and a hard place. Our internal developers want frictionless access to frontier models. Our security teams, however, are terrified of “Shadow AI”—sensitive corporate IP, HR data, or infrastructure secrets being pasted into public web UIs.

The most obvious “solutions” are draconian: block everything, force everyone onto an underperforming internal model, or accept vendor lock-in and pay the price of admission.

But what if you could offer a “Smart Pipe”? A single API endpoint that automatically detects sensitive context and routes it to a private, self-hosted model, while seamlessly passing general coding questions to the public frontier?

Today, I’m sharing a DevOps recipe in GitHub to build exactly that. We will combine LiteLLM (for intelligent routing) and Ollama (for hosting our private models) with an OpenZiti or NetFoundry CloudZiti network (for zero-trust networking) to create a self-hostable, zero-trust, semantic LLM gateway.

The Goal: Intelligent Context Routing as a Service

We are building a gateway that serves as a single URL for your users (agents, devs, apps). Behind the scenes, it acts as a traffic controller:

  1. The Sensitivity Check: The gateway analyzes the prompt using a local embedding model to prevent context leakage to other parties.
  2. The Private Route: If the prompt matches specific “utterances” (e.g., “Project Apollo,” “API Keys,” “Customer PII”), it is routed over a secure, dark overlay network to a private model running on your infrastructure.
  3. The Public Route: If the prompt is generic (e.g., “Write a Python script to print the schema of an arbitrary object as JSON”), it is routed to a public provider or OpenRouter for the best performance/cost.

This happens transparently. The user just sees a response.

The Stack (Low-Code / No-Code)

We can deploy this entirely via Docker Compose. No complex control planes, no enterprise licenses required to prove the concept. The recipe requires a few prerequisites: an operational OpenZiti or NetFoundry CloudZiti network and a local Docker installation with Compose, or a compatible container runtime.

  • The Brain: LiteLLM Proxy (Container). Handles the API translation and semantic routing logic.
  • The Muscle: Ollama (Container). Hosts the private LLM (e.g., Llama 3) and the embedding model, with optional CUDA acceleration.
  • The Shield: OpenZiti or NetFoundry CloudZiti. Creates a selective, zero-trust bridge between the Gateway and the Private Model.
  • The Access Layer: OpenZiti or NetFoundry CloudZiti (for private access), or zrok.io or NetFoundry Frontdoor (for clientless access to a public API).

The Architecture: The “Sandwich” Strategy

To ensure true isolation, we use separate Docker networks for LiteLLM and Ollama.

  1. Network A (litellm_private): Hosts the LiteLLM Proxy. Has internet access to reach the frontier model providers and OpenZiti.
  2. Network B (ollama_private): Hosts the Private LLM (Ollama). Has internet access to reach Ziti.

Here is the magic: LiteLLM cannot communicate directly with Ollama. It must go through Ziti. This allows us to enforce identity-based policies. The Gateway “dials” the Ollama service, and Ziti tunnels the traffic securely, even if they are on different clouds or data centers. This allows you to place Ollama optimally for model data and hardware accelerators, and place the gateway optimally for controlling access.

The “Secret Sauce”: Semantic Routing Without Context Leakage

Our goal is to prevent the leak of sensitive context. Sending prompts to a third-party service for embedding generation defeats the purpose of a secure gateway because sensitive information, such as the example “Here are my AWS keys, fix this,” has already been leaked to the cloud service before the security decision can be made.

Our recipe runs the embedding model locally alongside the private LLM. To make this work, the setup requires pulling a private embedding model (e.g., nomic-embed-text) into Ollama, alongside your chosen private LLM (e.g., Llama 3).

How it works in router.json:

{
  "encoder_type": "litellm",
  "encoder_name": "ollama/nomic-embed-text:latest",
  "routes": [
    {
      "name": "private-model",
      "description": "Route sensitive prompts to the private Ollama model",
      "utterances": [
        "What are our internal policies on",
        "Summarize the confidential report about",
        "Explain our proprietary process for",
        "What is our company's strategy for",
        "Show me the private documentation for",
        "Access internal knowledge base about",
        "What are the details of our contract with",
        "AWS_SECRET_ACCESS_KEY"
      ],
      "score_threshold": 0.5
    }
  ]
}

LiteLLM calculates the vector distance between the user’s prompt and your defined “utterances.” If it’s close, it stays private. If not, it goes public.

Publishing Your Gateway: Ziti vs. zrok

Once your gateway is running, how do your customers reach it? You have two powerful, zero-cost options:

Option A: The “Fort Knox” Approach (OpenZiti or NetFoundry CloudZiti)

If your users are internal developers or sensitive automated agents, you don’t want your Gateway listening on the public internet.

  • Mechanism: You publish the Gateway as a Ziti Service.
  • Client Side: The user runs a lightweight Ziti Tunneler (agent) on their laptop or server.
  • Benefit: The API has no public IP and is “dark.” The core benefit is that access to the LiteLLM Gateway is controlled entirely by Ziti identities and service policies, eliminating the need to manage application-level API keys or tokens for network security. If they don’t have a Ziti identity, they can’t even see the TCP port.

Option B: The “Public API” Approach (zrok.io or NetFoundry Frontdoor)

If you need to share this gateway with a partner, a wider audience, or a tool that can’t run a tunneler, use zrok or NetFoundry frontdoor-agent.

  • Mechanism: Runs zrok share public http://litellm-ziti-router:4000 in a container
  • Benefit: You get an instant, hardened public URL (e.g., https://my-gateway.share.zrok.io). You can secure this with LiteLLM’s many authentication options, or zrok’s built-in Google/GitHub (OIDC) or HTTP basic auth.

Why This Matters for Platform Teams

By treating Intelligent Context Routing as a Service, you shift the security burden from the user to the infrastructure.

  1. Zero-Code Compliance: Developers don’t need to decide “Is this safe for ChatGPT?” The router decides for them based on the semantic hints you’ve mapped to specific private models.
  2. Cost Control: You can route “easy” questions to your cheap, private Llama 3 instances and save the frontier model budget for complex reasoning.
  3. Observability: You have a single control point to audit who is asking what, regardless of which underlying model fulfills the request.

This recipe provides a tangible path to owning your AI infrastructure—starting with a single docker compose up.

Try the recipe from GitHub with your OpenZiti or CloudZiti network

The NetFoundry Approach: 

Why CloudZiti?

OpenZiti is an open-source, zero-trust overlay network technology developed by NetFoundry. Instead of building a perimeter around your network, you place zero-trust connectors, i.e., “tunnelers”,  within the application stack, as close as possible to each peer application to eliminate or minimize the attack surface, e.g., exposing a service only on the server’s loopback interface or publishing a service only within a private subnet. Flexible deployment alternatives include transparent proxies for the application’s specific container or host, network-level gateways, or, if it’s desirable to eliminate the tunneler, our SDK can be imported directly by your application.

Connections are mutually authenticated, encrypted, and policy-controlled — no open inbound ports. No VPNs. No public exposure.

For those unfamiliar, NetFoundry provides a cloud-managed service built on OpenZiti — CloudZiti adds:

  • Hosted, dedicated, private overlays
  • Automated provisioning and lifecycle management
  • Deep telemetry and observability
  • Compliance options (FIPS, HIPAA, NIST, PCI, NERC CIP)
  • Hybrid/air-gapped deployment flexibility
  • Enterprise performance, integrations, features, SLAs, and support

This approach doesn’t just add security; it removes complexity, creating a system that is simpler to manage, more secure, and easier to reason about.

Contact NetFoundry for a virtual demo, and we’ll get you started with your own zero-trust native network, ready in minutes, as a free trial.

Why NetFoundry Frontdoor?

While zrok.io offers a phenomenal, zero-cost way to secure and publish your Gateway instantly, NetFoundry also provides a commercial alternative, Frontdoor, for enterprise use cases that require specific performance, support, and compliance guarantees.

Like zrok.io, Frontdoor is designed to provide a hardened, public-facing entry point—a “front door”—for your private, Ziti-enabled HTTP and TCP backend services, such as the LiteLLM Gateway.

Key distinctions of Frontdoor include:

  • Enterprise SLAs and Support: Guaranteed uptime, performance, and 24/7 support structures not available in community-driven offerings.
  • Built-in Compliance: Options to meet stringent regulatory requirements (e.g., FIPS, HIPAA, NIST) necessary for sensitive corporate deployments.
  • Managed Infrastructure: Leverage NetFoundry’s global, resilient, and highly available platform for your public-facing APIs, instead of a self-hosted or shared community service.
  • Deep Integrations: Seamless integration with the broader CloudZiti platform for centralized identity management, policy enforcement, and advanced observability across all network segments.

Frontdoor is the choice for platform teams that need the convenience of a public URL like zrok.io, but with the assurance and capabilities required by a large organization.

Contact NetFoundry for a virtual demo, and we’ll get you started with your own zero-trust native network, ready in minutes, as a free trial.

Get the latest NetFoundry 
News & Insights