Relay4AI API Docs — Multi-Model AI API Reference

One API for Leading AI Models

Access GPT-4o, Claude, Gemini, DeepSeek and more through a single OpenAI-compatible endpoint. Simple integration, unified billing.

Quick Start

Replace your OpenAI base URL and use your Relay API key:

# Replace your OpenAI base URL BASE_URL="https://relay4ai.cloud/v1" API_KEY="your-relay-api-key" curl "$BASE_URL/chat/completions" \ -H "Authorization: Bearer $API_KEY" \ -H "Content-Type: application/json" \ -d '{"model":"gpt-4o","messages":[{"role":"user","content":"Hello!"}]}'

Works with the OpenAI Python/JS SDK, LangChain, and any OpenAI-compatible client. Just change base_url.

Authentication

Generate an API key from your dashboard. Include it in the Authorization header. All requests require HTTPS.

Include your API key in every request:

Authorization: Bearer YOUR_API_KEY

API keys are managed from your dashboard. You can create multiple keys with custom rate limits.

API Reference

POST /v1/chat/completions

OpenAI-compatible chat completions endpoint. Supports streaming and non-streaming modes.

Parameter

ParameterTypeRequiredDescription
modelstringYesModel ID, e.g. gpt-4o, claude-sonnet-4-6
messagesarrayYesArray of message objects with role and content
temperaturenumberNo0–2. Default varies by model
max_tokensintegerNoMaximum tokens in the response
streambooleanNoEnable SSE streaming. Default false
top_pnumberNoNucleus sampling parameter

Code Examples

cURL

curl https://relay4ai.cloud/v1/chat/completions \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain quantum computing in 3 sentences."} ], "temperature": 0.7, "max_tokens": 200 }'

Python

# pip install openai from openai import OpenAI client = OpenAI( base_url="https://relay4ai.cloud/v1", api_key="YOUR_API_KEY" ) response = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain quantum computing in 3 sentences."} ], temperature=0.7, max_tokens=200 ) print(response.choices[0].message.content)

Node.js

// npm install openai import OpenAI from 'openai'; const client = new OpenAI({ baseURL: 'https://relay4ai.cloud/v1', apiKey: 'YOUR_API_KEY', }); const response = await client.chat.completions.create({ model: 'gpt-4o', messages: [ { role: 'system', content: 'You are a helpful assistant.' }, { role: 'user', content: 'Explain quantum computing in 3 sentences.' }, ], temperature: 0.7, max_tokens: 200, }); console.log(response.choices[0].message.content);

Streaming

Set stream: true for SSE streaming responses. Ideal for real-time chat applications.

Python (streaming)

from openai import OpenAI client = OpenAI( base_url="https://relay4ai.cloud/v1", api_key="YOUR_API_KEY" ) stream = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Write a short poem."}], stream=True ) for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="")

GET /v1/models

Retrieve all available models with their context window sizes.

curl https://relay4ai.cloud/v1/models \ -H "Authorization: Bearer YOUR_API_KEY"

Supported Models

ProviderModel IDContext
OpenAIgpt-4o, gpt-4o-mini128K
Anthropicclaude-opus-4-7, claude-sonnet-4-6200K
Googlegemini-2.5-pro, gemini-2.5-flash1M
DeepSeekdeepseek-ai/DeepSeek-V4-Flash, deepseek-ai/DeepSeek-V3.2128K
MoonshotPro/moonshotai/Kimi-K2.6128K
ZhipuPro/zai-org/GLM-5.1128K
QwenQwen/Qwen3.5-35B-A3B, Qwen/Qwen3.5-122B-A10B128K

More models added as they are released. See full pricing on your dashboard after login.

Error Codes

StatusMeaning
200Success
400Bad request — check your JSON body and parameters
401Invalid or missing API key
402Insufficient credits — top up your balance
429Rate limit exceeded — slow down or upgrade your plan
500Server error — we're on it. Retry with exponential backoff
502Upstream provider error — the model provider is temporarily unavailable

Rate Limits

Rate limits vary by plan. Exceeding limits returns HTTP 429. Free users: 30 RPM. Monthly plan users get higher limits.

PlanRate Limit
Free / Pay-as-you-go30 requests per minute
Starter ($9/mo)60 requests per minute
Pro ($29/mo)120 requests per minute
Max ($99/mo)300 requests per minute

Rate limit headers are included in every response. Check X-RateLimit-Remaining for your current quota.