Context-Optimised API Design for LLMs

We reduced 60 tools to 9. Same functionality. 85% less context overhead.

REST conventions work brilliantly for human developers who read documentation once and remember endpoints forever.

But your API consumer isn't human anymore.

It's an LLM with a 200k context window that re-reads every tool description on every turn. And it's paying per token.

Read that again. Every tool description on every turn.

You need a different pattern.

The Problem: Tool Sprawl

MCP lets you extend AI assistants with custom tools. The natural instinct is to create granular endpoints:

memory_add
memory_get
memory_list
memory_update
memory_delete
memory_pin
memory_archive
memory_link
memory_unlink
memory_search
memory_embed
...

Multiply this across domains (projects, tasks, docs, files, database) and you hit 60+ tools fast. Each needs a description, parameter schema, and examples.

That's 12,000 tokens the LLM must process every single turn.

The result? Slower responses, higher costs, and an AI that picks memory_update when it meant memory_upsert because they look similar in a list of 60.

Real Example: Before and After

V1: The Granular Approach (Truncated)

{
  "tools": [
    { "name": "MemoriesAdd", "description": "Add a new memory to the system", "inputSchema": { "type": "object", "properties": { "projectKey": {}, "title": {}, "body": {}, "scope": {}, "memoryType": {}, "tags": {}, "importance": {}, "pinned": {}, "ttlIso": {}, "userId": {}, "chatId": {}, "sourceKind": {}, "sourceRef": {} }, "required": ["projectKey", "title", "body"] } },
    { "name": "MemoriesSearch", "description": "Search memories using hybrid FTS + semantic search", "inputSchema": { ... } },
    { "name": "MemoriesList", "description": "List memories with filtering and pagination", "inputSchema": { ... } },
    { "name": "MemoriesGet", "description": "Get a specific memory by ID", "inputSchema": { ... } },
    { "name": "MemoriesUpdate", "description": "Update an existing memory", "inputSchema": { ... } },
    { "name": "MemoriesPin", "description": "Pin or unpin a memory", "inputSchema": { ... } },
    { "name": "MemoriesArchive", "description": "Archive a memory (soft delete)", "inputSchema": { ... } },
    { "name": "MemoriesDelete", "description": "Permanently delete a memory", "inputSchema": { ... } },
    { "name": "MemoriesLink", "description": "Link two memories", "inputSchema": { ... } },
    { "name": "MemoriesUnlink", "description": "Remove a link between memories", "inputSchema": { ... } },
    { "name": "MemoriesRelated", "description": "Get related memories", "inputSchema": { ... } },
    { "name": "MemoriesPrune", "description": "Archive expired memories", "inputSchema": { ... } },
    { "name": "MemoriesEmbed", "description": "Generate embeddings", "inputSchema": { ... } },
    { "name": "MemoriesStats", "description": "Get memory statistics", "inputSchema": { ... } },
    { "name": "ProjectsList", "description": "List all projects", "inputSchema": { ... } },
    { "name": "ProjectsGet", "description": "Get a project by key", "inputSchema": { ... } },
    { "name": "DocsList", "description": "List docs for a project", "inputSchema": { ... } },
    { "name": "DocsSearch", "description": "Search docs via FTS", "inputSchema": { ... } },
    { "name": "FilesList", "description": "List files", "inputSchema": { ... } },
    { "name": "FilesRead", "description": "Read a file", "inputSchema": { ... } },
    { "name": "FilesWrite", "description": "Write a file", "inputSchema": { ... } },
    { "name": "DbTables", "description": "List SQLite tables", "inputSchema": { ... } },
    { "name": "DbQuery", "description": "Run a SELECT", "inputSchema": { ... } },
    { "name": "DbExec", "description": "Execute SQL", "inputSchema": { ... } }
    // ... and 35+ more
  ]
}

~12,000 tokens. Every. Single. Turn.

V2: The Domain Facade Approach (Complete)

{
  "tools": [
    {
      "name": "MemoryExecute",
      "description": "Neural memory system. Commands: add, get, list, search, update, pin, delete, archive, link, unlink, related, embed, stats, prune",
      "inputSchema": {
        "type": "object",
        "properties": {
          "cmd": { "type": "string" },
          "detail": { "enum": ["minimal", "standard", "full"] },
          "params": { "type": "object" }
        },
        "required": ["cmd"]
      }
    },
    { "name": "ProjectsExecute", "description": "Project management. Commands: list, get, upsert, archive, stats", "inputSchema": { ... } },
    { "name": "TasksExecute", "description": "Task tracking. Commands: list, get, upsert, delete, set_status", "inputSchema": { ... } },
    { "name": "DocsExecute", "description": "Documentation. Commands: list, get, upsert, delete, search, pin", "inputSchema": { ... } },
    { "name": "FilesExecute", "description": "File operations. Commands: list, get, put, delete, roundtrip_*", "inputSchema": { ... } },
    { "name": "DatabaseExecute", "description": "SQL access. Commands: query, exec, schema, tables, stats", "inputSchema": { ... } },
    { "name": "ArtifactsExecute", "description": "Content storage. Commands: get, search, upsert", "inputSchema": { ... } },
    { "name": "HydrationExecute", "description": "AI context. Commands: hydrate, persona_*, identity_*", "inputSchema": { ... } },
    { "name": "DeepSearch", "description": "External search: Google, GitHub, Wikipedia, HackerNews", "inputSchema": { ... } }
  ]
}

~2,000 tokens. Same functionality. That's the whole list.

The Pattern: One Tool Per Domain

Instead of 14 memory tools, expose 1 memory tool with 14 commands:

// Before: 14 tools, 14 descriptions, 14 schemas
MemoriesAdd({ title, body, ... })
MemoriesSearch({ query, topK, ... })
MemoriesPin({ id, pinned })
...

// After: 1 tool, 1 description, commands as a parameter
MemoryExecute({ cmd: "add", params: { title, body, ... }})
MemoryExecute({ cmd: "search", params: { query, topK, ... }})
MemoryExecute({ cmd: "pin", params: { id, pinned }})

The AI reasons about 9 domains instead of 60 verbs.

"I need to search memories" → MemoryExecute with cmd: "search". Done.

The Implementation

Each domain facade follows the same structure:

public async Task<DomainResponse> ExecuteAsync(DomainCommand command)
{
    return command.Cmd.ToLowerInvariant() switch
    {
        "add" => await AddAsync(command),
        "get" => await GetAsync(command),
        "list" => await ListAsync(command),
        "search" => await SearchAsync(command),
        "update" => await UpdateAsync(command),
        "delete" => await DeleteAsync(command),
        _ => DomainResponse.Failure(command.Cmd, "Unknown command")
    };
}

Consistent Envelopes

Request:

{
  "cmd": "search",
  "detail": "standard",
  "params": { "projectId": 1, "query": "authentication", "topK": 10 }
}

Response:

{
  "ok": true,
  "cmd": "search",
  "data": [...],
  "count": 10,
  "error": null
}

Echo back the command. The AI needs to correlate request/response when it's juggling multiple operations.

Detail Levels

Control response verbosity with a single parameter:

Level	Returns	Use Case
`minimal`	ID, title only	Lists, counts, quick checks
`standard`	Key fields, excerpts	General use
`full`	Everything	Deep inspection, debugging

The AI requests what it needs. No more parsing 50KB responses when you just wanted a count.

The 9 Tools

Tool	Commands	Purpose
`MemoryExecute`	add, get, list, search, update, pin, delete, link, unlink, embed, stats, prune	Neural memory with hybrid search
`ProjectsExecute`	list, get, upsert, archive, stats, get_tree	Workspace management
`TasksExecute`	list, get, upsert, delete, set_status, add_note	Task tracking
`DocsExecute`	list, get, upsert, delete, search, pin, embed	Documentation
`FilesExecute`	list, get, put, delete, mkdir, roundtrip_*	File operations
`DatabaseExecute`	query, exec, schema, tables, stats	Direct SQL access
`ArtifactsExecute`	get, search, upsert	Content-addressed storage
`HydrationExecute`	hydrate, persona_, identity_, preferences_*	AI context loading
`DeepSearch`	(aggregated)	Google, GitHub, Wikipedia, HackerNews

60+ operations. 9 tools. Same capability.

Why It Works

1. Reduced cognitive load. The AI thinks in domains, not verbs. "I need to work with memories" → one obvious choice.

2. Consistent interface. Learn the pattern once, apply everywhere. Every domain has list, get, search. Same envelope, same error codes.

3. Token efficiency. You describe "Memory" once, not memory_add, memory_get, memory_list, memory_update... 14 times.

4. Extensibility. New command? Add a case to the switch. No new tool registration, no schema changes, no documentation updates.

5. Fewer wrong choices. 9 options beats 60. The AI stops confusing MemoriesUpdate with MemoriesUpsert.

The Metrics

Metric	Before (60 tools)	After (9 tools)
Tool list tokens	~12,000	~2,000
Wrong tool selection	Frequent	Rare
Response latency	Higher	Lower
Monthly API costs	$$$	$

Bonus: Manifest-Based Roundtripping

One more pattern worth mentioning: atomic multi-file editing.

The Problem

LLMs editing files one at a time:

PUT /file/a.cs → content
PUT /file/b.cs → content
PUT /file/c.cs → content

Three API calls. No atomicity. No conflict detection. If the user edits a file while the AI is working, you get silent overwrites.

The Solution

roundtrip_start({ paths: ["a.cs", "b.cs", "c.cs"] })
  → Returns: manifest (SHA256 hashes) + ZIP of originals

[AI edits files in ZIP]

roundtrip_preview({ manifestId, modifiedZip })
  → Returns: diff, conflict warnings

roundtrip_commit({ manifestId, zip, mode: "replace" })
  → Applies atomically

The manifest tracks original state:

{
  "manifestId": "rtp_2024-01-15T10-30-00Z_a1b2c3d4",
  "entries": [
    { "path": "src/auth/login.cs", "sha256": "abc123...", "size": 2048 },
    { "path": "src/auth/logout.cs", "sha256": "def456...", "size": 1024 }
  ]
}

Conflict detection on commit:

var currentSha256 = ComputeHash(physicalPath);
if (currentSha256 != manifestEntry.Sha256)
    conflicts.Add($"File modified externally: {virtualPath}");

Commit modes:

Mode	Existing	New	Use Case
`replace`	Overwrite	Create	Full sync
`add_only`	Skip	Create	Safe scaffolding
`update_only`	Overwrite	Ski