Expose Local LLMs with ngrok: Ollama and LM Studio

Running LLMs locally saves money. It keeps your data private. But you're stuck on one machine. ngrok fixes that.

I use ngrok to expose my local models to the internet. Now I can test from my phone. Share with teammates. Demo without shipping my laptop.

The Problem

Local LLMs run on localhost. That means:

  • No mobile testing
  • No sharing with the team
  • No remote access
  • Deployment overkill for experiments

ngrok creates a secure tunnel. It gives you a public URL pointing to your local port.

Quick Setup

Step 1: Run Your LLM

Ollama (recommended):

curl -fsSL https://ollama.ai/install.sh | sh
ollama pull llama2
ollama serve

Ollama runs on port 11434.

LM Studio:

Download from their site. Start the server. Default port is 1234.

Step 2: Install ngrok

# macOS
brew install ngrok

# Linux/Windows
# Download from ngrok.com/download

Step 3: Authenticate

Sign up at ngrok.com. Get your auth token. Run:

ngrok config add-authtoken YOUR_TOKEN

Skip this and nothing works.

Step 4: Create the Tunnel

# For Ollama
ngrok http 11434

# For LM Studio
ngrok http 1234

You get output like:

Forwarding    https://abc123.ngrok.io -> http://localhost:11434

That HTTPS URL is your public endpoint.

Step 5: Test It

curl https://abc123.ngrok.io/api/generate \
  -d '{"model": "llama2", "prompt": "Hello", "stream": false}'

If you see a response, you're done.

Security Considerations

Always add authentication. Without it, anyone with your URL can use your GPU.

ngrok http 11434 --basic-auth="user:pass"

Monitor traffic. Open http://localhost:4040 to see all requests in real time.

Watch resources. Every request burns your local CPU and memory. Keep htop running.

Free tier limits. Tunnels disconnect periodically. Paid plans offer persistent connections.

Config File for Repeated Use

Create ~/.ngrok2/ngrok.yml:

version: "2"
authtoken: YOUR_TOKEN
tunnels:
  llm:
    proto: http
    addr: 11434
    auth: "user:pass"
    inspect: false

Now start with:

ngrok start llm

One command. Done.

Python Example

import requests
import os

def query_llm(prompt):
    url = os.environ.get("NGROK_LLM_URL")
    response = requests.post(
        f"{url}/api/generate",
        json={
            "model": "llama2",
            "prompt": prompt,
            "stream": False
        }
    )
    return response.json()["response"]

print(query_llm("Explain recursion in one sentence."))

Store the URL in an environment variable. It changes on free tier restarts.

Common Issues

Connection refused: Your LLM server isn't running. Start it first.

Slow responses: Use smaller quantized models. Or upgrade your hardware.

Tunnel drops: Free tier limitation. Restart ngrok or pay for persistent tunnels.

When Not to Use ngrok

ngrok is for development. For production, consider:

  • VPS with Docker
  • Cloudflare Tunnels (free alternative)
  • Proper cloud deployment

My Workflow

  1. Start Ollama
  2. Run ngrok start llm
  3. Test with curl
  4. Share URL or use in apps
  5. Monitor at localhost:4040

It takes 30 seconds. I do it every day.

Summary

ngrok turns your local LLM into a remote API. Install it. Add auth. Share the URL. That's the whole workflow.

Start with the free tier. Upgrade if you need stable URLs or persistent connections.