AI Provider & Model Management
Atamaia's AI routing layer is a multi-provider, multi-model orchestration system that abstracts away the differences between cloud LLM providers and local inference servers. It handles provider registration, model cataloging, role-based routing, credential management, health tracking with circuit breakers, cost computation, streaming translation, and failover.
This is not a thin proxy. It is the routing fabric that lets every part of the platform -- agents, chat sessions, cognitive services, councils -- request "give me a model for this role" and get back a working connection regardless of whether the answer is Claude on Anthropic, Qwen on a local llama.cpp server, or a model on OpenRouter.
Architecture
Request: ChatRequest { modelId: "ai-02:qwen3-30b-a3b", message: "..." }
│
▼
AIRouterService
├── Resolve model (by modelId or role route)
├── Find provider (priority-ordered, health-checked)
├── Get credentials (tenant-specific or provider-level)
├── Build HTTP request (OpenAI-compat format)
├── Stream or synchronous call
├── Failover on retriable errors (429, 5xx, timeouts)
├── Record success/failure for circuit breaker
└── Return ChatResponse with usage + cost
Providers
A provider is an LLM endpoint -- cloud API, local inference server, or meta-router like OpenRouter.
Entity: AIProvider
| Field | Description |
|---|---|
Name |
Display name (e.g. "Local Kael", "OpenRouter") |
TypeId |
OpenRouter, LocalLlamaCpp, Anthropic, OpenAI, Custom, AnthropicAgentSdk |
Prefix |
Namespace prefix for model IDs (e.g. "ai-02", "or") |
BaseUrl |
API endpoint (e.g. http://your-local-model-server:8000/v1, https://openrouter.ai/api/v1) |
ApiKey |
Provider-level API key (for tenant-owned providers) |
DefaultModel |
Fallback model if none specified |
Enabled |
Whether this provider is available for routing |
Priority |
Lower = preferred. Used for failover ordering. |
TimeoutSeconds |
Per-request timeout (default: 120) |
StripPrefixInRequests |
Remove the prefix when sending modelId to the provider |
IsGlobal |
Part of the platform-wide catalog (shared across tenants) |
ConfigJson |
Provider-specific configuration |
Provider Types
| Type | Description |
|---|---|
OpenRouter |
OpenRouter meta-router (access to 100+ models) |
LocalLlamaCpp |
Direct llama.cpp server (local GPU inference) |
Anthropic |
Anthropic API (Claude models) |
OpenAI |
OpenAI API (GPT models) |
Custom |
Any OpenAI-compatible endpoint |
AnthropicAgentSdk |
Claude Agent SDK (subprocess delegation, no API key needed) |
API
GET /api/ai/providers # List providers (enabledOnly filter)
GET /api/ai/providers/{id} # Get provider detail (includes models)
POST /api/ai/providers # Create provider
PATCH /api/ai/providers/{id} # Update provider
DELETE /api/ai/providers/{id} # Soft delete provider
Create a local provider:
POST /api/ai/providers
{
"name": "Local Kael",
"type": "LocalLlamaCpp",
"prefix": "ai-02",
"baseUrl": "http://your-local-model-server:8000/v1",
"priority": 1,
"timeoutSeconds": 180
}
Models
Models are specific LLMs available through a provider. Each model tracks capabilities, pricing, context window, and approval flags.
Entity: AIModel
| Field | Description |
|---|---|
ProviderId |
Parent provider |
ModelId |
Model identifier as sent to the provider API |
DisplayName |
Human-readable name |
ContextLength |
Context window in tokens (default: 32,768) |
MaxCompletionTokens |
Maximum output tokens |
IsLocal |
Running on local hardware |
LocalModelPath |
Path to model weights (for local models) |
Temperature |
Default temperature |
MaxTokens |
Default max output tokens |
SystemPrompt |
Default system prompt |
ApprovedForAgent |
Allowed for agent execution |
ApprovedForChat |
Allowed for interactive chat (default: true) |
EnableHydration |
Auto-hydrate identity context before chat |
HydrationIdentityId |
Which identity to hydrate for this model |
InputCostPer1M |
Input token cost per 1M tokens (USD) |
OutputCostPer1M |
Output token cost per 1M tokens (USD) |
The fully qualified model ID is {provider.Prefix}:{modelId} -- for example ai-02:qwen3-30b-a3b or or:anthropic/claude-3.5-sonnet.
API
GET /api/ai/models # List models (filter by providerId, enabledOnly)
GET /api/ai/models/{id} # Get model detail
POST /api/ai/models # Register model
PATCH /api/ai/models/{id} # Update model
DELETE /api/ai/models/{id} # Soft delete model
Register a local model:
POST /api/ai/models
{
"providerId": 1,
"modelId": "qwen3-30b-a3b",
"displayName": "Qwen3 30B (A3B)",
"contextLength": 32768,
"isLocal": true,
"approvedForAgent": true,
"approvedForChat": true,
"temperature": 0.3,
"inputCostPer1M": 0.0,
"outputCostPer1M": 0.0
}
Route Configuration
Routes map roles/purposes to preferred models. When the agent system or any service asks "what model should I use for this role?", the router checks route configs.
Entity: AIRouteConfig
| Field | Description |
|---|---|
ProviderId |
Target provider |
ModelId |
Target model (optional -- can route to provider's default) |
Role |
The role this route serves (e.g. "summariser", "coding", "researcher") |
Priority |
Lower = preferred. Multiple routes per role enable failover. |
Notes |
Human-readable explanation |
API
GET /api/ai/routes # List routes (filter by role)
POST /api/ai/routes # Create route
DELETE /api/ai/routes/{id} # Delete route
Example routes:
POST /api/ai/routes
{ "providerId": 1, "modelId": 3, "role": "summariser", "priority": 1, "notes": "LFM-2-8B for utility tasks" }
POST /api/ai/routes
{ "providerId": 1, "modelId": 1, "role": "researcher", "priority": 1, "notes": "Qwen3-30B for research" }
POST /api/ai/routes
{ "providerId": 2, "modelId": 5, "role": "coding", "priority": 1, "notes": "Claude via OpenRouter for code" }
Model Resolution
GET /api/ai/resolve/{modelId}
The resolution logic:
- Exact match: Look for a model with the given
modelId(orprefix:modelId) - Provider prefix: If the modelId contains
:, split into prefix and model, find the provider by prefix - Route config: If called with a role name, find the highest-priority route for that role
- Fallback:
ai-02:qwen3-30b-a3b(hardcoded fallback for agent execution)
For agent runs, resolution chain is: Explicit run model -> Route config for role -> Fallback
Chat
The core routing endpoint. Sends a message to any model through the unified interface.
POST /api/ai/chat
{
"modelId": "ai-02:qwen3-30b-a3b",
"message": "Explain Hebbian learning in 3 sentences",
"systemPrompt": "You are a neuroscience expert.",
"temperature": 0.7,
"maxTokens": 500,
"tools": [],
"toolChoice": "auto"
}
Response:
{
"success": true,
"model": "qwen3-30b-a3b",
"reply": "Hebbian learning is a neurobiological principle...",
"usage": {
"promptTokens": 45,
"completionTokens": 89,
"totalTokens": 134
},
"responseTimeMs": 1243,
"toolCalls": null
}
Broadcast
Send the same message to multiple models simultaneously:
POST /api/ai/broadcast
{
"models": ["ai-02:qwen3-30b-a3b", "or:anthropic/claude-3.5-sonnet", "ai-03:gemma-3-12b"],
"message": "What is the capital of New Zealand?",
"timeoutMs": 30000
}
Returns all responses with per-model timing and usage.
Streaming
Atamaia uses Open Responses as its canonical streaming protocol. All upstream providers are normalized into this format.
The OpenAICompatAdapter translates OpenAI Chat Completions SSE streams into Open Responses events:
| Open Responses Event | Description |
|---|---|
response.created |
Stream started |
output_item.added |
New output item (message or function_call) |
content_part.added |
New content part |
output_text.delta |
Text chunk (sequenced) |
output_text.done |
Full text accumulated |
output_item.done |
Item complete |
function_call.arguments.delta |
Tool call argument chunk |
response.completed |
Stream finished |
This means any client consuming Atamaia's streaming API gets a consistent event format regardless of whether the underlying model is on OpenAI, Anthropic, OpenRouter, or a local llama.cpp server.
Provider Health & Circuit Breaker
The ProviderHealthTracker (singleton) implements a circuit breaker pattern:
| State | Behavior |
|---|---|
| Healthy | Fewer than 3 consecutive failures |
| Open | 3+ consecutive failures. Provider skipped for 5 minutes. |
| Recovery | After 5 minutes, circuit closes. Next request tests the provider. |
Retriable conditions:
- HTTP 408 (Timeout)
- HTTP 429 (Rate Limited)
- HTTP 5xx (Server Error)
TaskCanceledExceptionorHttpRequestException
On failure, the router automatically fails over to the next provider in priority order.
Tenant Provider Credentials
The global catalog pattern: platform operators register providers (with IsGlobal = true). Individual tenants then supply their own API keys for those providers.
GET /api/ai/credentials # List credentials
POST /api/ai/credentials # Set credential
DELETE /api/ai/credentials/{id} # Remove credential
POST /api/ai/credentials/{id}/validate # Test credential
Set a credential:
POST /api/ai/credentials
{
"providerId": 2,
"apiKey": "sk-or-v1-abc123...",
"label": "Production OpenRouter key"
}
Credentials are encrypted at rest using AES-256-GCM via ISecretEncryptor. The raw API key is never stored or returned after creation.
Validation hits the provider's /models endpoint with the supplied key:
POST /api/ai/credentials/{id}/validate
→ { "success": true }
Global Provider Catalog
Platform-wide providers visible to all tenants:
GET /api/ai/catalog # List global providers (enabled only)
GET /api/ai/catalog/{id} # Get global provider detail
Tenants browse the catalog, set their own credentials, and the router uses those credentials when making requests through global providers.
Pricing Sync
POST /api/ai/sync-pricing
Pulls current per-token pricing from OpenRouter's API and updates all models registered under OpenRouter providers. Returns the count of models updated.
Cost Management
Every model has InputCostPer1M and OutputCostPer1M fields. The agent execution loop computes cost per iteration:
cost = (promptTokens * inputCostPer1M / 1_000_000) + (completionTokens * outputCostPer1M / 1_000_000)
Cost is tracked per-run and aggregated across parent + all descendant runs (TotalCostWithChildren). Local models typically have zero cost.
Local Model Support
Local models run on dedicated hardware (ai-02, ai-03) via llama.cpp HTTP servers exposing OpenAI-compatible endpoints:
| Server | Host | Models |
|---|---|---|
| ai-02 | <local-model-host> |
Qwen3-30B (:8000), LFM2-8B (:8002), SmolLM3-3B (:8003), Qwen3-4B (:8004) |
| ai-03 | <local-model-host> |
Gemma-3-12B (:8001), Luna-7B (:8002), Llama-3-14B (:8003), Ministral-3-14B (:8005), Llama-3.1-8B (:8006) |
Each is registered as a LocalLlamaCpp provider with a prefix (e.g. ai-02). Models are registered individually. Because they speak the OpenAI-compatible protocol, the same OpenAICompatAdapter handles streaming.
API Reference
| Method | Endpoint | Permission | Description |
|---|---|---|---|
| POST | /api/ai/chat |
ChatSessionCreate | Send chat message |
| POST | /api/ai/broadcast |
ChatSessionCreate | Broadcast to multiple models |
| GET | /api/ai/providers |
AIProviderView | List providers |
| GET | /api/ai/providers/{id} |
AIProviderView | Get provider detail |
| POST | /api/ai/providers |
AIProviderManage | Create provider |
| PATCH | /api/ai/providers/{id} |
AIProviderManage | Update provider |
| DELETE | /api/ai/providers/{id} |
AIProviderManage | Delete provider |
| GET | /api/ai/models |
AIModelView | List models |
| GET | /api/ai/models/{id} |
AIModelView | Get model |
| POST | /api/ai/models |
AIModelManage | Create model |
| PATCH | /api/ai/models/{id} |
AIModelManage | Update model |
| DELETE | /api/ai/models/{id} |
AIModelManage | Delete model |
| GET | /api/ai/routes |
AIRouteView | List route configs |
| POST | /api/ai/routes |
AIRouteManage | Create route config |
| DELETE | /api/ai/routes/{id} |
AIRouteManage | Delete route config |
| GET | /api/ai/resolve/{modelId} |
AIModelView | Resolve model by ID |
| POST | /api/ai/sync-pricing |
AIModelManage | Sync OpenRouter pricing |
| GET | /api/ai/catalog |
(any auth) | Browse global provider catalog |
| GET | /api/ai/catalog/{id} |
AIProviderView | Get global provider |
| GET | /api/ai/credentials |
(any auth) | List tenant credentials |
| POST | /api/ai/credentials |
(any auth) | Set credential |
| DELETE | /api/ai/credentials/{id} |
(any auth) | Delete credential |
| POST | /api/ai/credentials/{id}/validate |
(any auth) | Validate credential |