Relay4AI API Docs — Multi-Model AI API Reference

One API for Leading AI Models

Access GPT-4o, Claude, Gemini, DeepSeek and more through a single OpenAI-compatible endpoint. Simple integration, unified billing.

Get Started Free Quick Start ↓

Quick Start Authentication Chat Completions Streaming List Models Error Codes Rate Limits

Quick Start

Replace your OpenAI base URL and use your Relay API key:

# Replace your OpenAI base URL
BASE_URL="https://relay4ai.cloud/v1"
API_KEY="your-relay-api-key"

curl "$BASE_URL/chat/completions" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello!"}]}'

Works with the OpenAI Python/JS SDK, LangChain, and any OpenAI-compatible client. Just change base_url.

Authentication

Generate an API key from your dashboard. Include it in the Authorization header. All requests require HTTPS.

Include your API key in every request:

Authorization: Bearer YOUR_API_KEY

API keys are managed from your dashboard. You can create multiple keys with custom rate limits.

API Reference

POST /v1/chat/completions

OpenAI-compatible chat completions endpoint. Supports streaming and non-streaming modes.

Parameter

Parameter	Type	Required	Description
`model`	string	Yes	Model ID, e.g. `gpt-4o`, `claude-sonnet-4-6`
`messages`	array	Yes	Array of message objects with `role` and `content`
`temperature`	number	No	0–2. Default varies by model
`max_tokens`	integer	No	Maximum tokens in the response
`stream`	boolean	No	Enable SSE streaming. Default `false`
`top_p`	number	No	Nucleus sampling parameter

Code Examples

cURL

curl https://relay4ai.cloud/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in 3 sentences."}
    ],
    "temperature": 0.7,
    "max_tokens": 200
  }'

Python

# pip install openai
from openai import OpenAI

client = OpenAI(
    base_url="https://relay4ai.cloud/v1",
    api_key="YOUR_API_KEY"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing in 3 sentences."}
    ],
    temperature=0.7,
    max_tokens=200
)

print(response.choices[0].message.content)

Node.js

// npm install openai
import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://relay4ai.cloud/v1',
  apiKey: 'YOUR_API_KEY',
});

const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Explain quantum computing in 3 sentences.' },
  ],
  temperature: 0.7,
  max_tokens: 200,
});

console.log(response.choices[0].message.content);

Streaming

Set stream: true for SSE streaming responses. Ideal for real-time chat applications.

Python (streaming)

from openai import OpenAI

client = OpenAI(
    base_url="https://relay4ai.cloud/v1",
    api_key="YOUR_API_KEY"
)

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a short poem."}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

GET /v1/models

Retrieve all available models with their context window sizes.

curl https://relay4ai.cloud/v1/models \
  -H "Authorization: Bearer YOUR_API_KEY"

Supported Models

Provider	Model ID	Context
OpenAI	`gpt-4o`, `gpt-4o-mini`	128K
Anthropic	`claude-opus-4-7`, `claude-sonnet-4-6`	200K
Google	`gemini-2.5-pro`, `gemini-2.5-flash`	1M
DeepSeek	`deepseek-ai/DeepSeek-V4-Flash`, `deepseek-ai/DeepSeek-V3.2`	128K
Moonshot	`Pro/moonshotai/Kimi-K2.6`	128K
Zhipu	`Pro/zai-org/GLM-5.1`	128K
Qwen	`Qwen/Qwen3.5-35B-A3B`, `Qwen/Qwen3.5-122B-A10B`	128K

More models added as they are released. See full pricing on your dashboard after login.

Error Codes

Status	Meaning
`200`	Success
`400`	Bad request — check your JSON body and parameters
`401`	Invalid or missing API key
`402`	Insufficient credits — top up your balance
`429`	Rate limit exceeded — slow down or upgrade your plan
`500`	Server error — we're on it. Retry with exponential backoff
`502`	Upstream provider error — the model provider is temporarily unavailable

Rate Limits

Rate limits vary by plan. Exceeding limits returns HTTP 429. Free users: 30 RPM. Monthly plan users get higher limits.

Plan	Rate Limit
Free / Pay-as-you-go	30 requests per minute
Starter ($9/mo)	60 requests per minute
Pro ($29/mo)	120 requests per minute
Max ($99/mo)	300 requests per minute

Rate limit headers are included in every response. Check X-RateLimit-Remaining for your current quota.