Logging & Observability
Atamaia provides structured, queryable audit logging at the platform level, plus dense execution tracing at the agent level. Every significant action -- who did what, when, to which entity, from which IP, via which API key, correlated with which request -- is recorded in PostgreSQL and queryable via API.
System Logs
The SystemLog entity is the platform-wide audit trail. Every log entry captures the full context of who performed an action and how.
Entity: SystemLog
| Field | Type | Description |
|---|---|---|
Id |
long | Primary key |
Guid |
UUID | External reference |
LevelId |
SystemLogLevel | Debug, Info, Success, Warning, Error |
EntityTypeId |
LogEntityType | What type of entity was affected |
EntityGuid |
UUID? | Which specific entity |
Action |
string | What happened (e.g. "memory.created", "agent.run.started") |
Source |
string? | Component that generated the log |
UserId |
long? | Human user (from JWT user_id claim) |
IdentityId |
long? | AI identity (from JWT identity_id claim) |
ApiKeyId |
long? | API key used (from JWT api_key_id claim) |
CorrelationId |
UUID? | Request correlation ID |
HttpMethod |
string? | GET, POST, PATCH, etc. |
HttpPath |
string? | The API path called |
HttpStatusCode |
int? | Response status code |
DurationMs |
int? | Request duration in milliseconds |
ClientIp |
string? | Remote IP address |
DetailsJson |
string? | Structured JSON payload with action-specific data |
TenantId |
long | Tenant isolation (automatic via global query filter) |
CreatedAtUtc |
DateTime | When the log was created |
Log Levels
| Level | Value | Purpose |
|---|---|---|
| Debug | 1 | Diagnostic detail |
| Info | 2 | Standard operations |
| Success | 3 | Confirmed successful actions |
| Warning | 4 | Anomalies, degraded operations |
| Error | 5 | Failures requiring attention |
Entity Types
Logs can be associated with any domain entity:
| Type | Description |
|---|---|
| None | System-level, not entity-specific |
| Memory | Memory CRUD operations |
| Identity | Identity management |
| Project | Project operations |
| Task | Task operations |
| Doc | Document operations |
| Fact | Fact CRUD |
| Message | Messaging operations |
| User | User management |
| OrgUnit | Organizational hierarchy |
| AgentRun | Agent execution |
| ChatSession | Chat operations |
| Reflection | Mirror system |
| Connector | External connector operations |
| Role | RBAC operations |
| Session | Session handoffs |
Automatic Context Capture
The SystemLogService automatically extracts context from the HTTP request on every log call:
public async Task LogAsync(SystemLogLevel level, string action,
LogEntityType entityType = LogEntityType.None,
Guid? entityGuid = null, string? source = null, object? details = null,
int? httpStatusCode = null, int? durationMs = null)
From the HttpContext, it automatically captures:
- User ID: From the
user_idJWT claim - Identity ID: From the
identity_idJWT claim - API Key ID: From the
api_key_idJWT claim - Correlation ID: From
HttpContext.Items["CorrelationId"](set by middleware, also available asX-Correlation-Idheader) - Client IP: From
HttpContext.Connection.RemoteIpAddress - HTTP Method: From the request
- HTTP Path: From the request path
The details parameter is serialized to JSON, allowing any structured data to be attached to a log entry.
Querying Logs
GET /api/system-logs
Supports comprehensive filtering:
| Parameter | Type | Description |
|---|---|---|
level |
SystemLogLevel? | Filter by severity |
entityType |
LogEntityType? | Filter by entity type |
entityGuid |
UUID? | Filter by specific entity |
userId |
long? | Filter by human user |
identityId |
long? | Filter by AI identity |
apiKeyId |
long? | Filter by API key |
correlationId |
UUID? | Follow a single request across log entries |
from |
DateTime? | Start of time range |
to |
DateTime? | End of time range |
limit |
int | Max results (1-500, default 100) |
offset |
int | Pagination offset |
Example: Find all actions by a specific identity in the last hour:
GET /api/system-logs?identityId=2&from=2026-03-05T09:00:00Z&limit=50
Example: Trace a single request:
GET /api/system-logs?correlationId=550e8400-e29b-41d4-a716-446655440000
Example: Find all errors for agent runs:
GET /api/system-logs?level=5&entityType=10
Get Specific Log Entry
GET /api/system-logs/{id}
Response:
{
"id": 1234,
"guid": "a1b2c3d4-...",
"level": "Error",
"entityType": "AgentRun",
"entityGuid": "e5f6g7h8-...",
"action": "agent.run.failed",
"source": "AgentExecutionLoop",
"userId": 1,
"identityId": 2,
"apiKeyId": null,
"correlationId": "550e8400-...",
"httpMethod": "POST",
"httpPath": "/api/agent/runs/42/start",
"httpStatusCode": 500,
"durationMs": 15234,
"clientIp": "192.168.1.100",
"detailsJson": "{\"error\":\"Context overflow at 95% budget\",\"iterationsUsed\":47}",
"createdAtUtc": "2026-03-05T10:15:23Z"
}
Agent Execution Tracing
For agent runs, the AgentEvent system provides a second, much denser layer of observability. While system logs capture API-level actions, agent events capture every internal decision, tool call, and failure within an execution loop.
Agent Event Types
37 distinct event types across 12 categories:
| Category | Events |
|---|---|
| Planning | PlanCreated, PlanRevised, StepStarted, StepCompleted |
| LLM | LlmRequest, LlmResponse |
| Tool Use | ToolCallRequested, ToolCallResult, ToolCallBlocked |
| Decisions | Decision, Observation, Reasoning |
| Failures | EmptyResponse, PrematureIntent, StaleLoop, ContextOverflow, ToolTimeout, DependencyBlocked, BudgetExceeded |
| Loop Detection | DuplicateRead, RepeatedToolCall |
| Context Management | ContextWarning50, ContextWarning75, ContextWarning90, ContextCompacted, ContextFlushed |
| Lifecycle | Checkpoint, Paused, Resumed, BudgetWarning, BudgetExtended |
| Escalation | EscalationCreated, EscalationResolved |
| Children | ChildSpawned, ChildCompleted, ChildFailed, ChildMessage, ParentResponse |
| Interaction | InteractionMessageReceived, InteractionMessageSent, PauseChatLinked, PauseChatSummarized |
| Audit | RunStarted, RunCompleted, RunFailed, ApprovalRevision |
Each event includes:
- Sequence number: Monotonically increasing within a run
- Type: Enum classification
- Summary: Human-readable description
- DataJson: Structured payload (tool arguments, LLM response, error details)
- TokensUsed: Token cost of this specific event
- DurationMs: Wall-clock time for the event
Querying Agent Events
GET /api/agent/runs/{runId}/events?sinceSequence=50&limit=200
This enables real-time monitoring: a UI can poll for new events since the last known sequence number.
Correlation Across Layers
The correlation architecture connects system logs, agent events, and API requests:
- X-Correlation-Id header: Every API request gets a correlation UUID (auto-generated if not provided by the caller)
- System logs: Capture the correlation ID from
HttpContext.Items - Agent events: Linked to their run, which is linked to the creating request's correlation ID
- Response header:
X-Correlation-Idis echoed back to the caller
This means you can trace a single user action from the API request, through the system log, into the agent execution events, through tool calls, and back to the response.
Request/Response Logging
All API responses use the ApiEnvelope<T> wrapper:
{
"ok": true,
"requestId": "550e8400-e29b-41d4-a716-446655440000",
"data": { ... },
"count": 42,
"error": null,
"errorCode": null,
"hint": null
}
On error:
{
"ok": false,
"requestId": "550e8400-e29b-41d4-a716-446655440001",
"data": null,
"error": "Identity not found: 99",
"errorCode": "NOT_FOUND",
"hint": "Check identity ID or use identity_list to find valid IDs"
}
The requestId matches the X-Correlation-Id header and the correlationId in system logs.
Performance Monitoring
Duration Tracking
System logs capture DurationMs for API requests. This enables:
- Identifying slow endpoints
- Tracking performance degradation over time
- Correlating slowness with specific entities, identities, or API keys
Agent Cost Tracking
Agent runs track detailed cost metrics:
PromptTokens/CompletionTokensper runCostUsdcomputed from model pricingTotalTokensWithChildren/TotalCostWithChildrenaggregated across run trees- Per-event
TokensUsedfor fine-grained cost attribution
Provider Health
The ProviderHealthTracker (singleton) maintains real-time health state for all AI providers:
- Consecutive failure counts
- Circuit breaker state (healthy, open, recovery)
- Last failure timestamp
This data is in-memory (not persisted), designed for runtime routing decisions rather than historical analysis.
Soft Delete Alignment
Design decision D15: soft delete only, never hard delete. This has direct implications for observability:
- System logs reference entities that still exist. Even if an entity is "deleted", its data is still in the database with
IsDeleted = true. Logs referencing that entity's GUID will always resolve. - Audit trails are permanent. There is no way to delete a system log entry. The global query filter only filters on
IsDeleted, and system logs should never be soft-deleted. - Agent event trails are append-only.
AgentEventrecords are never modified or deleted. They form a permanent, sequenced audit trail of every decision an agent made.
Autonomic Layer Integration
The logging infrastructure feeds into the autonomic layer (Wingman pattern):
Pattern Detection
System logs and agent events provide the raw data for:
- Error frequency detection: Repeated failures on the same entity or endpoint
- Cost anomaly detection: Runs that consume significantly more tokens than expected
- Behavioral pattern detection: Agents repeatedly escalating on similar situations
- Performance degradation: Increasing response times from specific providers
Feedback Loop
Agent feedback (Good/Partial/Bad ratings) combined with execution events enables:
- Confidence scoring: How reliable is a specific model + role combination?
- Model selection optimization: Which model performs best for which task type?
- Tool profile tuning: Are blocked tools causing unnecessary escalations?
API Reference
| Method | Endpoint | Permission | Description |
|---|---|---|---|
| GET | /api/system-logs |
SystemViewAuditLog | Query system logs with filters |
| GET | /api/system-logs/{id} |
SystemViewAuditLog | Get specific log entry |
| GET | /api/agent/runs/{runId}/events |
AgentRunView | Get agent execution events |
| GET | /api/agent/analytics |
AgentRunView | Get aggregated agent metrics |
| GET | /api/agent/runs/{runId}/feedback |
AgentRunView | Get run feedback |