AI Provider & Model Management

Atamaia's AI routing layer is a multi-provider, multi-model orchestration system that abstracts away the differences between cloud LLM providers and local inference servers. It handles provider registration, model cataloging, role-based routing, credential management, health tracking with circuit breakers, cost computation, streaming translation, and failover.

This is not a thin proxy. It is the routing fabric that lets every part of the platform -- agents, chat sessions, cognitive services, councils -- request "give me a model for this role" and get back a working connection regardless of whether the answer is Claude on Anthropic, Qwen on a local llama.cpp server, or a model on OpenRouter.

Architecture

Request: ChatRequest { modelId: "ai-02:qwen3-30b-a3b", message: "..." }
    │
    ▼
AIRouterService
    ├── Resolve model (by modelId or role route)
    ├── Find provider (priority-ordered, health-checked)
    ├── Get credentials (tenant-specific or provider-level)
    ├── Build HTTP request (OpenAI-compat format)
    ├── Stream or synchronous call
    ├── Failover on retriable errors (429, 5xx, timeouts)
    ├── Record success/failure for circuit breaker
    └── Return ChatResponse with usage + cost

Providers

A provider is an LLM endpoint -- cloud API, local inference server, or meta-router like OpenRouter.

Entity: AIProvider

Field	Description
`Name`	Display name (e.g. "Local Kael", "OpenRouter")
`TypeId`	`OpenRouter`, `LocalLlamaCpp`, `Anthropic`, `OpenAI`, `Custom`, `AnthropicAgentSdk`
`Prefix`	Namespace prefix for model IDs (e.g. "ai-02", "or")
`BaseUrl`	API endpoint (e.g. `http://your-local-model-server:8000/v1`, `https://openrouter.ai/api/v1`)
`ApiKey`	Provider-level API key (for tenant-owned providers)
`DefaultModel`	Fallback model if none specified
`Enabled`	Whether this provider is available for routing
`Priority`	Lower = preferred. Used for failover ordering.
`TimeoutSeconds`	Per-request timeout (default: 120)
`StripPrefixInRequests`	Remove the prefix when sending modelId to the provider
`IsGlobal`	Part of the platform-wide catalog (shared across tenants)
`ConfigJson`	Provider-specific configuration

Provider Types

Type	Description
`OpenRouter`	OpenRouter meta-router (access to 100+ models)
`LocalLlamaCpp`	Direct llama.cpp server (local GPU inference)
`Anthropic`	Anthropic API (Claude models)
`OpenAI`	OpenAI API (GPT models)
`Custom`	Any OpenAI-compatible endpoint
`AnthropicAgentSdk`	Claude Agent SDK (subprocess delegation, no API key needed)

API

GET    /api/ai/providers                    # List providers (enabledOnly filter)
GET    /api/ai/providers/{id}               # Get provider detail (includes models)
POST   /api/ai/providers                    # Create provider
PATCH  /api/ai/providers/{id}               # Update provider
DELETE /api/ai/providers/{id}               # Soft delete provider

Create a local provider:

POST /api/ai/providers
{
  "name": "Local Kael",
  "type": "LocalLlamaCpp",
  "prefix": "ai-02",
  "baseUrl": "http://your-local-model-server:8000/v1",
  "priority": 1,
  "timeoutSeconds": 180
}

Models

Models are specific LLMs available through a provider. Each model tracks capabilities, pricing, context window, and approval flags.

Entity: AIModel

Field	Description
`ProviderId`	Parent provider
`ModelId`	Model identifier as sent to the provider API
`DisplayName`	Human-readable name
`ContextLength`	Context window in tokens (default: 32,768)
`MaxCompletionTokens`	Maximum output tokens
`IsLocal`	Running on local hardware
`LocalModelPath`	Path to model weights (for local models)
`Temperature`	Default temperature
`MaxTokens`	Default max output tokens
`SystemPrompt`	Default system prompt
`ApprovedForAgent`	Allowed for agent execution
`ApprovedForChat`	Allowed for interactive chat (default: true)
`EnableHydration`	Auto-hydrate identity context before chat
`HydrationIdentityId`	Which identity to hydrate for this model
`InputCostPer1M`	Input token cost per 1M tokens (USD)
`OutputCostPer1M`	Output token cost per 1M tokens (USD)

The fully qualified model ID is {provider.Prefix}:{modelId} -- for example ai-02:qwen3-30b-a3b or or:anthropic/claude-3.5-sonnet.

API

GET    /api/ai/models                       # List models (filter by providerId, enabledOnly)
GET    /api/ai/models/{id}                  # Get model detail
POST   /api/ai/models                       # Register model
PATCH  /api/ai/models/{id}                  # Update model
DELETE /api/ai/models/{id}                  # Soft delete model

Register a local model:

POST /api/ai/models
{
  "providerId": 1,
  "modelId": "qwen3-30b-a3b",
  "displayName": "Qwen3 30B (A3B)",
  "contextLength": 32768,
  "isLocal": true,
  "approvedForAgent": true,
  "approvedForChat": true,
  "temperature": 0.3,
  "inputCostPer1M": 0.0,
  "outputCostPer1M": 0.0
}

Route Configuration

Routes map roles/purposes to preferred models. When the agent system or any service asks "what model should I use for this role?", the router checks route configs.

Entity: AIRouteConfig

Field	Description
`ProviderId`	Target provider
`ModelId`	Target model (optional -- can route to provider's default)
`Role`	The role this route serves (e.g. "summariser", "coding", "researcher")
`Priority`	Lower = preferred. Multiple routes per role enable failover.
`Notes`	Human-readable explanation

API

GET    /api/ai/routes                       # List routes (filter by role)
POST   /api/ai/routes                       # Create route
DELETE /api/ai/routes/{id}                  # Delete route

Example routes:

POST /api/ai/routes
{ "providerId": 1, "modelId": 3, "role": "summariser", "priority": 1, "notes": "LFM-2-8B for utility tasks" }

POST /api/ai/routes
{ "providerId": 1, "modelId": 1, "role": "researcher", "priority": 1, "notes": "Qwen3-30B for research" }

POST /api/ai/routes
{ "providerId": 2, "modelId": 5, "role": "coding", "priority": 1, "notes": "Claude via OpenRouter for code" }

Model Resolution

GET /api/ai/resolve/{modelId}

The resolution logic:

Exact match: Look for a model with the given modelId (or prefix:modelId)
Provider prefix: If the modelId contains :, split into prefix and model, find the provider by prefix
Route config: If called with a role name, find the highest-priority route for that role
Fallback: ai-02:qwen3-30b-a3b (hardcoded fallback for agent execution)

For agent runs, resolution chain is: Explicit run model -> Route config for role -> Fallback

Chat

The core routing endpoint. Sends a message to any model through the unified interface.

POST /api/ai/chat

{
  "modelId": "ai-02:qwen3-30b-a3b",
  "message": "Explain Hebbian learning in 3 sentences",
  "systemPrompt": "You are a neuroscience expert.",
  "temperature": 0.7,
  "maxTokens": 500,
  "tools": [],
  "toolChoice": "auto"
}

Response:

{
  "success": true,
  "model": "qwen3-30b-a3b",
  "reply": "Hebbian learning is a neurobiological principle...",
  "usage": {
    "promptTokens": 45,
    "completionTokens": 89,
    "totalTokens": 134
  },
  "responseTimeMs": 1243,
  "toolCalls": null
}

Broadcast

Send the same message to multiple models simultaneously:

POST /api/ai/broadcast

{
  "models": ["ai-02:qwen3-30b-a3b", "or:anthropic/claude-3.5-sonnet", "ai-03:gemma-3-12b"],
  "message": "What is the capital of New Zealand?",
  "timeoutMs": 30000
}

Returns all responses with per-model timing and usage.

Streaming

Atamaia uses Open Responses as its canonical streaming protocol. All upstream providers are normalized into this format.

The OpenAICompatAdapter translates OpenAI Chat Completions SSE streams into Open Responses events:

Open Responses Event	Description
`response.created`	Stream started
`output_item.added`	New output item (message or function_call)
`content_part.added`	New content part
`output_text.delta`	Text chunk (sequenced)
`output_text.done`	Full text accumulated
`output_item.done`	Item complete
`function_call.arguments.delta`	Tool call argument chunk
`response.completed`	Stream finished

This means any client consuming Atamaia's streaming API gets a consistent event format regardless of whether the underlying model is on OpenAI, Anthropic, OpenRouter, or a local llama.cpp server.

Provider Health & Circuit Breaker

The ProviderHealthTracker (singleton) implements a circuit breaker pattern:

State	Behavior
Healthy	Fewer than 3 consecutive failures
Open	3+ consecutive failures. Provider skipped for 5 minutes.
Recovery	After 5 minutes, circuit closes. Next request tests the provider.

Retriable conditions:

HTTP 408 (Timeout)
HTTP 429 (Rate Limited)
HTTP 5xx (Server Error)
TaskCanceledException or HttpRequestException

On failure, the router automatically fails over to the next provider in priority order.

Tenant Provider Credentials

The global catalog pattern: platform operators register providers (with IsGlobal = true). Individual tenants then supply their own API keys for those providers.

GET    /api/ai/credentials                  # List credentials
POST   /api/ai/credentials                  # Set credential
DELETE /api/ai/credentials/{id}             # Remove credential
POST   /api/ai/credentials/{id}/validate    # Test credential

Set a credential:

POST /api/ai/credentials
{
  "providerId": 2,
  "apiKey": "sk-or-v1-abc123...",
  "label": "Production OpenRouter key"
}

Credentials are encrypted at rest using AES-256-GCM via ISecretEncryptor. The raw API key is never stored or returned after creation.

Validation hits the provider's /models endpoint with the supplied key:

POST /api/ai/credentials/{id}/validate
→ { "success": true }

Global Provider Catalog

Platform-wide providers visible to all tenants:

GET /api/ai/catalog                         # List global providers (enabled only)
GET /api/ai/catalog/{id}                    # Get global provider detail

Tenants browse the catalog, set their own credentials, and the router uses those credentials when making requests through global providers.

Pricing Sync

POST /api/ai/sync-pricing

Pulls current per-token pricing from OpenRouter's API and updates all models registered under OpenRouter providers. Returns the count of models updated.

Cost Management

Every model has InputCostPer1M and OutputCostPer1M fields. The agent execution loop computes cost per iteration:

cost = (promptTokens * inputCostPer1M / 1_000_000) + (completionTokens * outputCostPer1M / 1_000_000)

Cost is tracked per-run and aggregated across parent + all descendant runs (TotalCostWithChildren). Local models typically have zero cost.

Local Model Support

Local models run on dedicated hardware (ai-02, ai-03) via llama.cpp HTTP servers exposing OpenAI-compatible endpoints:

Server	Host	Models
ai-02	`<local-model-host>`	Qwen3-30B (:8000), LFM2-8B (:8002), SmolLM3-3B (:8003), Qwen3-4B (:8004)
ai-03	`<local-model-host>`	Gemma-3-12B (:8001), Luna-7B (:8002), Llama-3-14B (:8003), Ministral-3-14B (:8005), Llama-3.1-8B (:8006)

Each is registered as a LocalLlamaCpp provider with a prefix (e.g. ai-02). Models are registered individually. Because they speak the OpenAI-compatible protocol, the same OpenAICompatAdapter handles streaming.

API Reference

Method	Endpoint	Permission	Description
POST	`/api/ai/chat`	ChatSessionCreate	Send chat message
POST	`/api/ai/broadcast`	ChatSessionCreate	Broadcast to multiple models
GET	`/api/ai/providers`	AIProviderView	List providers
GET	`/api/ai/providers/{id}`	AIProviderView	Get provider detail
POST	`/api/ai/providers`	AIProviderManage	Create provider
PATCH	`/api/ai/providers/{id}`	AIProviderManage	Update provider
DELETE	`/api/ai/providers/{id}`	AIProviderManage	Delete provider
GET	`/api/ai/models`	AIModelView	List models
GET	`/api/ai/models/{id}`	AIModelView	Get model
POST	`/api/ai/models`	AIModelManage	Create model
PATCH	`/api/ai/models/{id}`	AIModelManage	Update model
DELETE	`/api/ai/models/{id}`	AIModelManage	Delete model
GET	`/api/ai/routes`	AIRouteView	List route configs
POST	`/api/ai/routes`	AIRouteManage	Create route config
DELETE	`/api/ai/routes/{id}`	AIRouteManage	Delete route config
GET	`/api/ai/resolve/{modelId}`	AIModelView	Resolve model by ID
POST	`/api/ai/sync-pricing`	AIModelManage	Sync OpenRouter pricing
GET	`/api/ai/catalog`	(any auth)	Browse global provider catalog
GET	`/api/ai/catalog/{id}`	AIProviderView	Get global provider
GET	`/api/ai/credentials`	(any auth)	List tenant credentials
POST	`/api/ai/credentials`	(any auth)	Set credential
DELETE	`/api/ai/credentials/{id}`	(any auth)	Delete credential
POST	`/api/ai/credentials/{id}/validate`	(any auth)	Validate credential