Logging & Observability

Atamaia provides structured, queryable audit logging at the platform level, plus dense execution tracing at the agent level. Every significant action -- who did what, when, to which entity, from which IP, via which API key, correlated with which request -- is recorded in PostgreSQL and queryable via API.

System Logs

The SystemLog entity is the platform-wide audit trail. Every log entry captures the full context of who performed an action and how.

Entity: SystemLog

Field	Type	Description
`Id`	long	Primary key
`Guid`	UUID	External reference
`LevelId`	SystemLogLevel	Debug, Info, Success, Warning, Error
`EntityTypeId`	LogEntityType	What type of entity was affected
`EntityGuid`	UUID?	Which specific entity
`Action`	string	What happened (e.g. "memory.created", "agent.run.started")
`Source`	string?	Component that generated the log
`UserId`	long?	Human user (from JWT `user_id` claim)
`IdentityId`	long?	AI identity (from JWT `identity_id` claim)
`ApiKeyId`	long?	API key used (from JWT `api_key_id` claim)
`CorrelationId`	UUID?	Request correlation ID
`HttpMethod`	string?	GET, POST, PATCH, etc.
`HttpPath`	string?	The API path called
`HttpStatusCode`	int?	Response status code
`DurationMs`	int?	Request duration in milliseconds
`ClientIp`	string?	Remote IP address
`DetailsJson`	string?	Structured JSON payload with action-specific data
`TenantId`	long	Tenant isolation (automatic via global query filter)
`CreatedAtUtc`	DateTime	When the log was created

Log Levels

Level	Value	Purpose
Debug	1	Diagnostic detail
Info	2	Standard operations
Success	3	Confirmed successful actions
Warning	4	Anomalies, degraded operations
Error	5	Failures requiring attention

Entity Types

Logs can be associated with any domain entity:

Type	Description
None	System-level, not entity-specific
Memory	Memory CRUD operations
Identity	Identity management
Project	Project operations
Task	Task operations
Doc	Document operations
Fact	Fact CRUD
Message	Messaging operations
User	User management
OrgUnit	Organizational hierarchy
AgentRun	Agent execution
ChatSession	Chat operations
Reflection	Mirror system
Connector	External connector operations
Role	RBAC operations
Session	Session handoffs

Automatic Context Capture

The SystemLogService automatically extracts context from the HTTP request on every log call:

public async Task LogAsync(SystemLogLevel level, string action,
    LogEntityType entityType = LogEntityType.None,
    Guid? entityGuid = null, string? source = null, object? details = null,
    int? httpStatusCode = null, int? durationMs = null)

From the HttpContext, it automatically captures:

User ID: From the user_id JWT claim
Identity ID: From the identity_id JWT claim
API Key ID: From the api_key_id JWT claim
Correlation ID: From HttpContext.Items["CorrelationId"] (set by middleware, also available as X-Correlation-Id header)
Client IP: From HttpContext.Connection.RemoteIpAddress
HTTP Method: From the request
HTTP Path: From the request path

The details parameter is serialized to JSON, allowing any structured data to be attached to a log entry.

Querying Logs

GET /api/system-logs

Supports comprehensive filtering:

Parameter	Type	Description
`level`	SystemLogLevel?	Filter by severity
`entityType`	LogEntityType?	Filter by entity type
`entityGuid`	UUID?	Filter by specific entity
`userId`	long?	Filter by human user
`identityId`	long?	Filter by AI identity
`apiKeyId`	long?	Filter by API key
`correlationId`	UUID?	Follow a single request across log entries
`from`	DateTime?	Start of time range
`to`	DateTime?	End of time range
`limit`	int	Max results (1-500, default 100)
`offset`	int	Pagination offset

Example: Find all actions by a specific identity in the last hour:

GET /api/system-logs?identityId=2&from=2026-03-05T09:00:00Z&limit=50

Example: Trace a single request:

GET /api/system-logs?correlationId=550e8400-e29b-41d4-a716-446655440000

Example: Find all errors for agent runs:

GET /api/system-logs?level=5&entityType=10

Get Specific Log Entry

GET /api/system-logs/{id}

Response:

{
  "id": 1234,
  "guid": "a1b2c3d4-...",
  "level": "Error",
  "entityType": "AgentRun",
  "entityGuid": "e5f6g7h8-...",
  "action": "agent.run.failed",
  "source": "AgentExecutionLoop",
  "userId": 1,
  "identityId": 2,
  "apiKeyId": null,
  "correlationId": "550e8400-...",
  "httpMethod": "POST",
  "httpPath": "/api/agent/runs/42/start",
  "httpStatusCode": 500,
  "durationMs": 15234,
  "clientIp": "192.168.1.100",
  "detailsJson": "{\"error\":\"Context overflow at 95% budget\",\"iterationsUsed\":47}",
  "createdAtUtc": "2026-03-05T10:15:23Z"
}

Agent Execution Tracing

For agent runs, the AgentEvent system provides a second, much denser layer of observability. While system logs capture API-level actions, agent events capture every internal decision, tool call, and failure within an execution loop.

Agent Event Types

37 distinct event types across 12 categories:

Category	Events
Planning	PlanCreated, PlanRevised, StepStarted, StepCompleted
LLM	LlmRequest, LlmResponse
Tool Use	ToolCallRequested, ToolCallResult, ToolCallBlocked
Decisions	Decision, Observation, Reasoning
Failures	EmptyResponse, PrematureIntent, StaleLoop, ContextOverflow, ToolTimeout, DependencyBlocked, BudgetExceeded
Loop Detection	DuplicateRead, RepeatedToolCall
Context Management	ContextWarning50, ContextWarning75, ContextWarning90, ContextCompacted, ContextFlushed
Lifecycle	Checkpoint, Paused, Resumed, BudgetWarning, BudgetExtended
Escalation	EscalationCreated, EscalationResolved
Children	ChildSpawned, ChildCompleted, ChildFailed, ChildMessage, ParentResponse
Interaction	InteractionMessageReceived, InteractionMessageSent, PauseChatLinked, PauseChatSummarized
Audit	RunStarted, RunCompleted, RunFailed, ApprovalRevision

Each event includes:

Sequence number: Monotonically increasing within a run
Type: Enum classification
Summary: Human-readable description
DataJson: Structured payload (tool arguments, LLM response, error details)
TokensUsed: Token cost of this specific event
DurationMs: Wall-clock time for the event

Querying Agent Events

GET /api/agent/runs/{runId}/events?sinceSequence=50&limit=200

This enables real-time monitoring: a UI can poll for new events since the last known sequence number.

Correlation Across Layers

The correlation architecture connects system logs, agent events, and API requests:

X-Correlation-Id header: Every API request gets a correlation UUID (auto-generated if not provided by the caller)
System logs: Capture the correlation ID from HttpContext.Items
Agent events: Linked to their run, which is linked to the creating request's correlation ID
Response header: X-Correlation-Id is echoed back to the caller

This means you can trace a single user action from the API request, through the system log, into the agent execution events, through tool calls, and back to the response.

Request/Response Logging

All API responses use the ApiEnvelope<T> wrapper:

{
  "ok": true,
  "requestId": "550e8400-e29b-41d4-a716-446655440000",
  "data": { ... },
  "count": 42,
  "error": null,
  "errorCode": null,
  "hint": null
}

On error:

{
  "ok": false,
  "requestId": "550e8400-e29b-41d4-a716-446655440001",
  "data": null,
  "error": "Identity not found: 99",
  "errorCode": "NOT_FOUND",
  "hint": "Check identity ID or use identity_list to find valid IDs"
}

The requestId matches the X-Correlation-Id header and the correlationId in system logs.

Performance Monitoring

Duration Tracking

System logs capture DurationMs for API requests. This enables:

Identifying slow endpoints
Tracking performance degradation over time
Correlating slowness with specific entities, identities, or API keys

Agent Cost Tracking

Agent runs track detailed cost metrics:

PromptTokens / CompletionTokens per run
CostUsd computed from model pricing
TotalTokensWithChildren / TotalCostWithChildren aggregated across run trees
Per-event TokensUsed for fine-grained cost attribution

Provider Health

The ProviderHealthTracker (singleton) maintains real-time health state for all AI providers:

Consecutive failure counts
Circuit breaker state (healthy, open, recovery)
Last failure timestamp

This data is in-memory (not persisted), designed for runtime routing decisions rather than historical analysis.

Soft Delete Alignment

Design decision D15: soft delete only, never hard delete. This has direct implications for observability:

System logs reference entities that still exist. Even if an entity is "deleted", its data is still in the database with IsDeleted = true. Logs referencing that entity's GUID will always resolve.
Audit trails are permanent. There is no way to delete a system log entry. The global query filter only filters on IsDeleted, and system logs should never be soft-deleted.
Agent event trails are append-only. AgentEvent records are never modified or deleted. They form a permanent, sequenced audit trail of every decision an agent made.

Autonomic Layer Integration

The logging infrastructure feeds into the autonomic layer (Wingman pattern):

Pattern Detection

System logs and agent events provide the raw data for:

Error frequency detection: Repeated failures on the same entity or endpoint
Cost anomaly detection: Runs that consume significantly more tokens than expected
Behavioral pattern detection: Agents repeatedly escalating on similar situations
Performance degradation: Increasing response times from specific providers

Feedback Loop

Agent feedback (Good/Partial/Bad ratings) combined with execution events enables:

Confidence scoring: How reliable is a specific model + role combination?
Model selection optimization: Which model performs best for which task type?
Tool profile tuning: Are blocked tools causing unnecessary escalations?

API Reference

Method	Endpoint	Permission	Description
GET	`/api/system-logs`	SystemViewAuditLog	Query system logs with filters
GET	`/api/system-logs/{id}`	SystemViewAuditLog	Get specific log entry
GET	`/api/agent/runs/{runId}/events`	AgentRunView	Get agent execution events
GET	`/api/agent/analytics`	AgentRunView	Get aggregated agent metrics
GET	`/api/agent/runs/{runId}/feedback`	AgentRunView	Get run feedback