Live — 8 models routing now

One API key.
All models.
Automatic cost routing.

Drop-in replacement for the OpenAI SDK. We route each request to the cheapest capable model in <10ms — trained on 500,000+ real executions. Your code doesn't change. Your bill shrinks.

# Only one line changes in your existing code:
from openai import OpenAI

client = OpenAI(
    api_key="axm_your_key_here",
    base_url="https://swarm.aletheia-platform.systems/v1",  # <-- only change
)

resp = client.chat.completions.create(
    model="auto",            # let the router pick cheapest capable model
    messages=[{"role": "user", "content": "Summarise this document."}]
)
print(resp.choices[0].message.content)
import OpenAI from "openai";

const client = new OpenAI({
    apiKey:  "axm_your_key_here",
    baseURL: "https://swarm.aletheia-platform.systems/v1",  // <-- only change
});

const resp = await client.chat.completions.create({
    model:    "auto",
    messages: [{ role: "user", content: "Summarise this document." }],
});
console.log(resp.choices[0].message.content);
curl https://swarm.aletheia-platform.systems/v1/chat/completions \
  -H "Authorization: Bearer axm_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
Routes across 8 providers automatically
GPT-4o GPT-4o-mini o4-mini DeepSeek Chat DeepSeek Coder Moonshot v1 Gemini 1.5 Pro Ollama (self-hosted)

Why Axiom instead of OpenRouter?

OpenRouter is a proxy. Axiom is a trained router — and it learns from your usage.

OpenRouter
Static rule-based routing. You pick the model.
Axiom
ML router trained on 500,000+ executions picks for you in <10ms.
OpenRouter
Same markup forever. No learning.
Axiom
Pattern Harvester learns your usage, auto-reduces cost 15–30% over time.
OpenRouter
Change provider = change SDK + model names.
Axiom
Switch providers mid-conversation transparently. Automatic failover.
🧠

ML Router — trained on 500k+ executions

Not a rule set. A model. We score every provider on capability match, latency percentile, current health, and cost per token — then route in under 10ms. Code tasks go to DeepSeek Coder. Long-context summarisation goes to Gemini. Reasoning goes to o4-mini. You just write model="auto".

Routes in <10ms
📉

Pattern Harvester — 15–30% cost reduction over time

Every request we serve is a data point. We learn which provider is cheapest for your specific prompt patterns. After 1,000 requests, we start batching equivalent prompts, pre-empting expensive models, and surfacing cheaper alternatives that produce identical quality for your use case. Completely automatic — no configuration.

-15% to -30% cost at scale
🔓

No vendor lock-in — switch mid-conversation

Provider APIs change, prices change, models get deprecated. With Axiom your codebase never touches provider names or endpoints. We absorb breaking changes silently. If OpenAI raises prices tonight, your routing shifts to DeepSeek by morning with zero changes on your end.

Zero migration cost

Everything you need. Nothing you don't.

Instant drop-in replacement

Change one line. OpenAI SDK, LangChain, LlamaIndex, Instructor — all work without modification.

💸

Transparent 20% markup only

You pay provider cost + 20%. Full cost breakdown at /v1/usage. No hidden fees. No minimums.

🔒

Your data stays yours

Traffic is metered per customer. Keys are hashed, never stored in plain text. We do not train on your data.

📊

Real-time usage endpoint

Track tokens, cost, and model breakdown via GET /v1/usage. Integrate into your dashboards.

🛡️

Circuit breakers built in

Provider down? We failover to the next best option automatically. No 503s, no manual intervention.

🌐

OpenAI-compatible response schema

Response objects are identical to the OpenAI API contract, including streaming support via SSE.

Available models

Use model="auto" to let the router pick, or specify any name directly.

Model Provider Input (per 1M tokens) Best for
auto DEFAULT ML Router cheapest capable General use — let us decide
deepseek-chat DeepSeek ~$0.27 Long chat, summarisation, analysis
deepseek-coder DeepSeek ~$0.27 Code generation, debugging
gpt-4o-mini OpenAI ~$0.15 Fast, cheap classification & extraction
moonshot-v1-8k Moonshot ~$1.63 Chinese-language tasks
gemini-1.5-pro Google ~$3.50 1M-token context, multimodal
gpt-4o OpenAI ~$5.00 Complex reasoning, tool use
o4-mini OpenAI ~$1.10 Step-by-step reasoning (CoT)

All prices are provider list cost. Axiom charges provider cost + 20% markup. Full list: GET /v1/models

Simple, honest pricing

All plans include all models and automatic routing. No overage fees on Unlimited.

Starter
$49
per month
1,000,000 tokens / mo
  • All 8 models
  • ML auto-routing
  • Usage analytics
  • Community support
Unlimited
$299
per month
Unlimited tokens
  • Everything in Growth
  • Dedicated capacity
  • SLA 99.9% uptime
  • Slack + phone support
Pay-as-you-go
Cost
+ 20% markup, no monthly fee
No commitment
  • All 8 models
  • Per-token billing
  • No monthly minimum
  • Community support

Example: 1M GPT-4o input tokens via OpenAI = $5.00. Same workload auto-routed via Axiom to DeepSeek = ~$0.27 + 20% = $0.32 — 15× cheaper.

Common questions

Is this a real drop-in replacement for the OpenAI SDK?

Yes. Set base_url="https://swarm.aletheia-platform.systems/v1" in the OpenAI SDK (Python or Node). The /v1/chat/completions endpoint accepts and returns the exact same request and response schema. LangChain, LlamaIndex, Instructor, and any OpenAI-compatible client work without any changes to your application code.

How does model="auto" routing work?

Our ML router scores every healthy provider on cost, latency percentile, success rate, and capability match for your specific request type. The top scorer wins. For simple chat, this is almost always DeepSeek (cheapest). For reasoning tasks it routes to o4-mini. For 100k+ token context it routes to Gemini. The model is retrained on 500,000+ executions and improves continuously.

What is the Pattern Harvester and when does it kick in?

After 1,000 requests from your key, the Pattern Harvester starts clustering your prompt patterns. It identifies which providers produce equivalent quality for your specific use cases and automatically shifts routing toward the cheapest equivalent provider. Most customers see 15–30% cost reduction within 30 days. It is enabled automatically on Growth and Unlimited plans.

How is the 20% markup calculated?

We charge: billed = actual_provider_cost × 1.20. The raw provider cost is logged on every request and returned at GET /v1/usage. There are no other fees, no setup fees, and no minimum spend on PAYG.

What happens if a provider goes down mid-conversation?

Circuit breakers detect failures within 3 consecutive errors and route subsequent requests to the next best provider automatically. Completed requests are unaffected. Provider health is visible at GET /health. Degradation is typically transparent to your application.

Is my data stored or used for training?

Prompt content and response content are not retained beyond what's needed to complete your request. Usage metadata (token counts, model used, cost) is retained for billing. We do not train on your prompts. The Pattern Harvester analyses anonymised token count patterns, not content.

Can I force a specific model?

Yes. Pass any model name directly: model="deepseek-coder" or model="gpt-4o". The router only activates when you pass model="auto". Use GET /v1/models for the current list of available models and their status.

Start saving on AI inference today.

Two-line migration. Free PAYG key. No credit card required.