Drop-in replacement for the OpenAI SDK. We route each request to the cheapest capable model in <10ms — trained on 500,000+ real executions. Your code doesn't change. Your bill shrinks.
# Only one line changes in your existing code: from openai import OpenAI client = OpenAI( api_key="axm_your_key_here", base_url="https://swarm.aletheia-platform.systems/v1", # <-- only change ) resp = client.chat.completions.create( model="auto", # let the router pick cheapest capable model messages=[{"role": "user", "content": "Summarise this document."}] ) print(resp.choices[0].message.content)
import OpenAI from "openai"; const client = new OpenAI({ apiKey: "axm_your_key_here", baseURL: "https://swarm.aletheia-platform.systems/v1", // <-- only change }); const resp = await client.chat.completions.create({ model: "auto", messages: [{ role: "user", content: "Summarise this document." }], }); console.log(resp.choices[0].message.content);
curl https://swarm.aletheia-platform.systems/v1/chat/completions \ -H "Authorization: Bearer axm_your_key_here" \ -H "Content-Type: application/json" \ -d '{ "model": "auto", "messages": [{"role": "user", "content": "Hello!"}] }'
OpenRouter is a proxy. Axiom is a trained router — and it learns from your usage.
Not a rule set. A model. We score every provider on capability match,
latency percentile, current health, and cost per token — then route in under
10ms. Code tasks go to DeepSeek Coder. Long-context summarisation goes to
Gemini. Reasoning goes to o4-mini. You just write model="auto".
Every request we serve is a data point. We learn which provider is cheapest for your specific prompt patterns. After 1,000 requests, we start batching equivalent prompts, pre-empting expensive models, and surfacing cheaper alternatives that produce identical quality for your use case. Completely automatic — no configuration.
-15% to -30% cost at scaleProvider APIs change, prices change, models get deprecated. With Axiom your codebase never touches provider names or endpoints. We absorb breaking changes silently. If OpenAI raises prices tonight, your routing shifts to DeepSeek by morning with zero changes on your end.
Zero migration costChange one line. OpenAI SDK, LangChain, LlamaIndex, Instructor — all work without modification.
You pay provider cost + 20%. Full cost breakdown at /v1/usage. No hidden fees. No minimums.
Traffic is metered per customer. Keys are hashed, never stored in plain text. We do not train on your data.
Track tokens, cost, and model breakdown via GET /v1/usage. Integrate into your dashboards.
Provider down? We failover to the next best option automatically. No 503s, no manual intervention.
Response objects are identical to the OpenAI API contract, including streaming support via SSE.
Use model="auto" to let the router pick, or specify any name directly.
| Model | Provider | Input (per 1M tokens) | Best for |
|---|---|---|---|
| auto DEFAULT | ML Router | cheapest capable | General use — let us decide |
| deepseek-chat | DeepSeek | ~$0.27 | Long chat, summarisation, analysis |
| deepseek-coder | DeepSeek | ~$0.27 | Code generation, debugging |
| gpt-4o-mini | OpenAI | ~$0.15 | Fast, cheap classification & extraction |
| moonshot-v1-8k | Moonshot | ~$1.63 | Chinese-language tasks |
| gemini-1.5-pro | ~$3.50 | 1M-token context, multimodal | |
| gpt-4o | OpenAI | ~$5.00 | Complex reasoning, tool use |
| o4-mini | OpenAI | ~$1.10 | Step-by-step reasoning (CoT) |
All prices are provider list cost. Axiom charges provider cost + 20% markup.
Full list: GET /v1/models
All plans include all models and automatic routing. No overage fees on Unlimited.
Example: 1M GPT-4o input tokens via OpenAI = $5.00. Same workload auto-routed via Axiom to DeepSeek = ~$0.27 + 20% = $0.32 — 15× cheaper.
Yes. Set base_url="https://swarm.aletheia-platform.systems/v1" in the OpenAI SDK (Python or Node). The /v1/chat/completions endpoint accepts and returns the exact same request and response schema. LangChain, LlamaIndex, Instructor, and any OpenAI-compatible client work without any changes to your application code.
model="auto" routing work?Our ML router scores every healthy provider on cost, latency percentile, success rate, and capability match for your specific request type. The top scorer wins. For simple chat, this is almost always DeepSeek (cheapest). For reasoning tasks it routes to o4-mini. For 100k+ token context it routes to Gemini. The model is retrained on 500,000+ executions and improves continuously.
After 1,000 requests from your key, the Pattern Harvester starts clustering your prompt patterns. It identifies which providers produce equivalent quality for your specific use cases and automatically shifts routing toward the cheapest equivalent provider. Most customers see 15–30% cost reduction within 30 days. It is enabled automatically on Growth and Unlimited plans.
We charge: billed = actual_provider_cost × 1.20. The raw provider cost is logged on every request and returned at GET /v1/usage. There are no other fees, no setup fees, and no minimum spend on PAYG.
Circuit breakers detect failures within 3 consecutive errors and route subsequent requests to the next best provider automatically. Completed requests are unaffected. Provider health is visible at GET /health. Degradation is typically transparent to your application.
Prompt content and response content are not retained beyond what's needed to complete your request. Usage metadata (token counts, model used, cost) is retained for billing. We do not train on your prompts. The Pattern Harvester analyses anonymised token count patterns, not content.
Yes. Pass any model name directly: model="deepseek-coder" or model="gpt-4o". The router only activates when you pass model="auto". Use GET /v1/models for the current list of available models and their status.
Enter your email and we'll provision a free PAYG key instantly — no credit card required.
Authorization: Bearer YOUR_KEYhttps://swarm.aletheia-platform.systems/v1