Logging & Observability

Atamaia provides structured, queryable audit logging at the platform level, plus dense execution tracing at the agent level. Every significant action -- who did what, when, to which entity, from which IP, via which API key, correlated with which request -- is recorded in PostgreSQL and queryable via API.


System Logs

The SystemLog entity is the platform-wide audit trail. Every log entry captures the full context of who performed an action and how.

Entity: SystemLog

Field Type Description
Id long Primary key
Guid UUID External reference
LevelId SystemLogLevel Debug, Info, Success, Warning, Error
EntityTypeId LogEntityType What type of entity was affected
EntityGuid UUID? Which specific entity
Action string What happened (e.g. "memory.created", "agent.run.started")
Source string? Component that generated the log
UserId long? Human user (from JWT user_id claim)
IdentityId long? AI identity (from JWT identity_id claim)
ApiKeyId long? API key used (from JWT api_key_id claim)
CorrelationId UUID? Request correlation ID
HttpMethod string? GET, POST, PATCH, etc.
HttpPath string? The API path called
HttpStatusCode int? Response status code
DurationMs int? Request duration in milliseconds
ClientIp string? Remote IP address
DetailsJson string? Structured JSON payload with action-specific data
TenantId long Tenant isolation (automatic via global query filter)
CreatedAtUtc DateTime When the log was created

Log Levels

Level Value Purpose
Debug 1 Diagnostic detail
Info 2 Standard operations
Success 3 Confirmed successful actions
Warning 4 Anomalies, degraded operations
Error 5 Failures requiring attention

Entity Types

Logs can be associated with any domain entity:

Type Description
None System-level, not entity-specific
Memory Memory CRUD operations
Identity Identity management
Project Project operations
Task Task operations
Doc Document operations
Fact Fact CRUD
Message Messaging operations
User User management
OrgUnit Organizational hierarchy
AgentRun Agent execution
ChatSession Chat operations
Reflection Mirror system
Connector External connector operations
Role RBAC operations
Session Session handoffs

Automatic Context Capture

The SystemLogService automatically extracts context from the HTTP request on every log call:

public async Task LogAsync(SystemLogLevel level, string action,
    LogEntityType entityType = LogEntityType.None,
    Guid? entityGuid = null, string? source = null, object? details = null,
    int? httpStatusCode = null, int? durationMs = null)

From the HttpContext, it automatically captures:

  • User ID: From the user_id JWT claim
  • Identity ID: From the identity_id JWT claim
  • API Key ID: From the api_key_id JWT claim
  • Correlation ID: From HttpContext.Items["CorrelationId"] (set by middleware, also available as X-Correlation-Id header)
  • Client IP: From HttpContext.Connection.RemoteIpAddress
  • HTTP Method: From the request
  • HTTP Path: From the request path

The details parameter is serialized to JSON, allowing any structured data to be attached to a log entry.


Querying Logs

GET /api/system-logs

Supports comprehensive filtering:

Parameter Type Description
level SystemLogLevel? Filter by severity
entityType LogEntityType? Filter by entity type
entityGuid UUID? Filter by specific entity
userId long? Filter by human user
identityId long? Filter by AI identity
apiKeyId long? Filter by API key
correlationId UUID? Follow a single request across log entries
from DateTime? Start of time range
to DateTime? End of time range
limit int Max results (1-500, default 100)
offset int Pagination offset

Example: Find all actions by a specific identity in the last hour:

GET /api/system-logs?identityId=2&from=2026-03-05T09:00:00Z&limit=50

Example: Trace a single request:

GET /api/system-logs?correlationId=550e8400-e29b-41d4-a716-446655440000

Example: Find all errors for agent runs:

GET /api/system-logs?level=5&entityType=10

Get Specific Log Entry

GET /api/system-logs/{id}

Response:

{
  "id": 1234,
  "guid": "a1b2c3d4-...",
  "level": "Error",
  "entityType": "AgentRun",
  "entityGuid": "e5f6g7h8-...",
  "action": "agent.run.failed",
  "source": "AgentExecutionLoop",
  "userId": 1,
  "identityId": 2,
  "apiKeyId": null,
  "correlationId": "550e8400-...",
  "httpMethod": "POST",
  "httpPath": "/api/agent/runs/42/start",
  "httpStatusCode": 500,
  "durationMs": 15234,
  "clientIp": "192.168.1.100",
  "detailsJson": "{\"error\":\"Context overflow at 95% budget\",\"iterationsUsed\":47}",
  "createdAtUtc": "2026-03-05T10:15:23Z"
}

Agent Execution Tracing

For agent runs, the AgentEvent system provides a second, much denser layer of observability. While system logs capture API-level actions, agent events capture every internal decision, tool call, and failure within an execution loop.

Agent Event Types

37 distinct event types across 12 categories:

Category Events
Planning PlanCreated, PlanRevised, StepStarted, StepCompleted
LLM LlmRequest, LlmResponse
Tool Use ToolCallRequested, ToolCallResult, ToolCallBlocked
Decisions Decision, Observation, Reasoning
Failures EmptyResponse, PrematureIntent, StaleLoop, ContextOverflow, ToolTimeout, DependencyBlocked, BudgetExceeded
Loop Detection DuplicateRead, RepeatedToolCall
Context Management ContextWarning50, ContextWarning75, ContextWarning90, ContextCompacted, ContextFlushed
Lifecycle Checkpoint, Paused, Resumed, BudgetWarning, BudgetExtended
Escalation EscalationCreated, EscalationResolved
Children ChildSpawned, ChildCompleted, ChildFailed, ChildMessage, ParentResponse
Interaction InteractionMessageReceived, InteractionMessageSent, PauseChatLinked, PauseChatSummarized
Audit RunStarted, RunCompleted, RunFailed, ApprovalRevision

Each event includes:

  • Sequence number: Monotonically increasing within a run
  • Type: Enum classification
  • Summary: Human-readable description
  • DataJson: Structured payload (tool arguments, LLM response, error details)
  • TokensUsed: Token cost of this specific event
  • DurationMs: Wall-clock time for the event

Querying Agent Events

GET /api/agent/runs/{runId}/events?sinceSequence=50&limit=200

This enables real-time monitoring: a UI can poll for new events since the last known sequence number.


Correlation Across Layers

The correlation architecture connects system logs, agent events, and API requests:

  1. X-Correlation-Id header: Every API request gets a correlation UUID (auto-generated if not provided by the caller)
  2. System logs: Capture the correlation ID from HttpContext.Items
  3. Agent events: Linked to their run, which is linked to the creating request's correlation ID
  4. Response header: X-Correlation-Id is echoed back to the caller

This means you can trace a single user action from the API request, through the system log, into the agent execution events, through tool calls, and back to the response.


Request/Response Logging

All API responses use the ApiEnvelope<T> wrapper:

{
  "ok": true,
  "requestId": "550e8400-e29b-41d4-a716-446655440000",
  "data": { ... },
  "count": 42,
  "error": null,
  "errorCode": null,
  "hint": null
}

On error:

{
  "ok": false,
  "requestId": "550e8400-e29b-41d4-a716-446655440001",
  "data": null,
  "error": "Identity not found: 99",
  "errorCode": "NOT_FOUND",
  "hint": "Check identity ID or use identity_list to find valid IDs"
}

The requestId matches the X-Correlation-Id header and the correlationId in system logs.


Performance Monitoring

Duration Tracking

System logs capture DurationMs for API requests. This enables:

  • Identifying slow endpoints
  • Tracking performance degradation over time
  • Correlating slowness with specific entities, identities, or API keys

Agent Cost Tracking

Agent runs track detailed cost metrics:

  • PromptTokens / CompletionTokens per run
  • CostUsd computed from model pricing
  • TotalTokensWithChildren / TotalCostWithChildren aggregated across run trees
  • Per-event TokensUsed for fine-grained cost attribution

Provider Health

The ProviderHealthTracker (singleton) maintains real-time health state for all AI providers:

  • Consecutive failure counts
  • Circuit breaker state (healthy, open, recovery)
  • Last failure timestamp

This data is in-memory (not persisted), designed for runtime routing decisions rather than historical analysis.


Soft Delete Alignment

Design decision D15: soft delete only, never hard delete. This has direct implications for observability:

  • System logs reference entities that still exist. Even if an entity is "deleted", its data is still in the database with IsDeleted = true. Logs referencing that entity's GUID will always resolve.
  • Audit trails are permanent. There is no way to delete a system log entry. The global query filter only filters on IsDeleted, and system logs should never be soft-deleted.
  • Agent event trails are append-only. AgentEvent records are never modified or deleted. They form a permanent, sequenced audit trail of every decision an agent made.

Autonomic Layer Integration

The logging infrastructure feeds into the autonomic layer (Wingman pattern):

Pattern Detection

System logs and agent events provide the raw data for:

  • Error frequency detection: Repeated failures on the same entity or endpoint
  • Cost anomaly detection: Runs that consume significantly more tokens than expected
  • Behavioral pattern detection: Agents repeatedly escalating on similar situations
  • Performance degradation: Increasing response times from specific providers

Feedback Loop

Agent feedback (Good/Partial/Bad ratings) combined with execution events enables:

  • Confidence scoring: How reliable is a specific model + role combination?
  • Model selection optimization: Which model performs best for which task type?
  • Tool profile tuning: Are blocked tools causing unnecessary escalations?

API Reference

Method Endpoint Permission Description
GET /api/system-logs SystemViewAuditLog Query system logs with filters
GET /api/system-logs/{id} SystemViewAuditLog Get specific log entry
GET /api/agent/runs/{runId}/events AgentRunView Get agent execution events
GET /api/agent/analytics AgentRunView Get aggregated agent metrics
GET /api/agent/runs/{runId}/feedback AgentRunView Get run feedback