Running LLMs locally saves money. It keeps your data private. But you're stuck on one machine. ngrok fixes that.
I use ngrok to expose my local models to the internet. Now I can test from my phone. Share with teammates. Demo without shipping my laptop.
The Problem
Local LLMs run on localhost. That means:
- No mobile testing
- No sharing with the team
- No remote access
- Deployment overkill for experiments
ngrok creates a secure tunnel. It gives you a public URL pointing to your local port.
Quick Setup
Step 1: Run Your LLM
Ollama (recommended):
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull llama2
ollama serveOllama runs on port 11434.
LM Studio:
Download from their site. Start the server. Default port is 1234.
Step 2: Install ngrok
# macOS
brew install ngrok
# Linux/Windows
# Download from ngrok.com/downloadStep 3: Authenticate
Sign up at ngrok.com. Get your auth token. Run:
ngrok config add-authtoken YOUR_TOKENSkip this and nothing works.
Step 4: Create the Tunnel
# For Ollama
ngrok http 11434
# For LM Studio
ngrok http 1234You get output like:
Forwarding https://abc123.ngrok.io -> http://localhost:11434
That HTTPS URL is your public endpoint.
Step 5: Test It
curl https://abc123.ngrok.io/api/generate \
-d '{"model": "llama2", "prompt": "Hello", "stream": false}'If you see a response, you're done.
Security Considerations
Always add authentication. Without it, anyone with your URL can use your GPU.
ngrok http 11434 --basic-auth="user:pass"Monitor traffic. Open http://localhost:4040 to see all requests in real time.
Watch resources. Every request burns your local CPU and memory. Keep htop running.
Free tier limits. Tunnels disconnect periodically. Paid plans offer persistent connections.
Config File for Repeated Use
Create ~/.ngrok2/ngrok.yml:
version: "2"
authtoken: YOUR_TOKEN
tunnels:
llm:
proto: http
addr: 11434
auth: "user:pass"
inspect: falseNow start with:
ngrok start llmOne command. Done.
Python Example
import requests
import os
def query_llm(prompt):
url = os.environ.get("NGROK_LLM_URL")
response = requests.post(
f"{url}/api/generate",
json={
"model": "llama2",
"prompt": prompt,
"stream": False
}
)
return response.json()["response"]
print(query_llm("Explain recursion in one sentence."))Store the URL in an environment variable. It changes on free tier restarts.
Common Issues
Connection refused: Your LLM server isn't running. Start it first.
Slow responses: Use smaller quantized models. Or upgrade your hardware.
Tunnel drops: Free tier limitation. Restart ngrok or pay for persistent tunnels.
When Not to Use ngrok
ngrok is for development. For production, consider:
- VPS with Docker
- Cloudflare Tunnels (free alternative)
- Proper cloud deployment
My Workflow
- Start Ollama
- Run
ngrok start llm - Test with curl
- Share URL or use in apps
- Monitor at localhost:4040
It takes 30 seconds. I do it every day.
Summary
ngrok turns your local LLM into a remote API. Install it. Add auth. Share the URL. That's the whole workflow.
Start with the free tier. Upgrade if you need stable URLs or persistent connections.