Edge LLM Gateway v1.4.2

An OpenAI-compatible edge endpoint that fronts your self-hosted inference backend (Ollama, vLLM, llama.cpp, TGI). Deployed close to your users, talks to your GPU box over a single authenticated upstream.

Why

Self-hosting an LLM means your GPU is in one place and your users are in many. Cold-region latency and TLS handshake cost dominate small-token requests. Edge LLM Gateway terminates TLS at the edge, multiplexes connections to your backend, and exposes a familiar OpenAI-style API so existing SDKs just work.

Getting started

curl https://your-deployment.vercel.app/api/v1/chat/completions?auth_token=YOUR_TOKEN \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.1-8b-instruct",
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": true
  }'

Configuration

Features