One API for Leading AI Models
Access GPT-4o, Claude, Gemini, DeepSeek and more through a single OpenAI-compatible endpoint. Simple integration, unified billing.
Quick Start
Replace your OpenAI base URL and use your Relay API key:
# Replace your OpenAI base URL
BASE_URL="https://relay4ai.cloud/v1"
API_KEY="your-relay-api-key"
curl "$BASE_URL/chat/completions" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello!"}]}'
Works with the OpenAI Python/JS SDK, LangChain, and any OpenAI-compatible client. Just change base_url.
Authentication
Generate an API key from your dashboard. Include it in the Authorization header. All requests require HTTPS.
Include your API key in every request:
Authorization: Bearer YOUR_API_KEY
API keys are managed from your dashboard. You can create multiple keys with custom rate limits.
API Reference
POST /v1/chat/completions
OpenAI-compatible chat completions endpoint. Supports streaming and non-streaming modes.
Parameter
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model ID, e.g. gpt-4o, claude-sonnet-4-6 |
messages | array | Yes | Array of message objects with role and content |
temperature | number | No | 0–2. Default varies by model |
max_tokens | integer | No | Maximum tokens in the response |
stream | boolean | No | Enable SSE streaming. Default false |
top_p | number | No | Nucleus sampling parameter |
Code Examples
cURL
curl https://relay4ai.cloud/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in 3 sentences."}
],
"temperature": 0.7,
"max_tokens": 200
}'
Python
# pip install openai
from openai import OpenAI
client = OpenAI(
base_url="https://relay4ai.cloud/v1",
api_key="YOUR_API_KEY"
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in 3 sentences."}
],
temperature=0.7,
max_tokens=200
)
print(response.choices[0].message.content)
Node.js
// npm install openai
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://relay4ai.cloud/v1',
apiKey: 'YOUR_API_KEY',
});
const response = await client.chat.completions.create({
model: 'gpt-4o',
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Explain quantum computing in 3 sentences.' },
],
temperature: 0.7,
max_tokens: 200,
});
console.log(response.choices[0].message.content);
Streaming
Set stream: true for SSE streaming responses. Ideal for real-time chat applications.
Python (streaming)
from openai import OpenAI
client = OpenAI(
base_url="https://relay4ai.cloud/v1",
api_key="YOUR_API_KEY"
)
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Write a short poem."}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
GET /v1/models
Retrieve all available models with their context window sizes.
curl https://relay4ai.cloud/v1/models \
-H "Authorization: Bearer YOUR_API_KEY"
Supported Models
| Provider | Model ID | Context |
|---|---|---|
| OpenAI | gpt-4o, gpt-4o-mini | 128K |
| Anthropic | claude-opus-4-7, claude-sonnet-4-6 | 200K |
gemini-2.5-pro, gemini-2.5-flash | 1M | |
| DeepSeek | deepseek-ai/DeepSeek-V4-Flash, deepseek-ai/DeepSeek-V3.2 | 128K |
| Moonshot | Pro/moonshotai/Kimi-K2.6 | 128K |
| Zhipu | Pro/zai-org/GLM-5.1 | 128K |
| Qwen | Qwen/Qwen3.5-35B-A3B, Qwen/Qwen3.5-122B-A10B | 128K |
More models added as they are released. See full pricing on your dashboard after login.
Error Codes
| Status | Meaning |
|---|---|
200 | Success |
400 | Bad request — check your JSON body and parameters |
401 | Invalid or missing API key |
402 | Insufficient credits — top up your balance |
429 | Rate limit exceeded — slow down or upgrade your plan |
500 | Server error — we're on it. Retry with exponential backoff |
502 | Upstream provider error — the model provider is temporarily unavailable |
Rate Limits
Rate limits vary by plan. Exceeding limits returns HTTP 429. Free users: 30 RPM. Monthly plan users get higher limits.
| Plan | Rate Limit |
|---|---|
| Free / Pay-as-you-go | 30 requests per minute |
| Starter ($9/mo) | 60 requests per minute |
| Pro ($29/mo) | 120 requests per minute |
| Max ($99/mo) | 300 requests per minute |
Rate limit headers are included in every response. Check X-RateLimit-Remaining for your current quota.