Back to blog

Context-Optimised API Design for LLMs

technicalmcpapi-design

We reduced 60 tools to 9. Same functionality. 85% less context overhead.

REST conventions work brilliantly for human developers who read documentation once and remember endpoints forever.

But your API consumer isn't human anymore.

It's an LLM with a 200k context window that re-reads every tool description on every turn. And it's paying per token.

Read that again. Every tool description on every turn.

You need a different pattern.


The Problem: Tool Sprawl

MCP lets you extend AI assistants with custom tools. The natural instinct is to create granular endpoints:

memory_add
memory_get
memory_list
memory_update
memory_delete
memory_pin
memory_archive
memory_link
memory_unlink
memory_search
memory_embed
...

Multiply this across domains (projects, tasks, docs, files, database) and you hit 60+ tools fast. Each needs a description, parameter schema, and examples.

That's 12,000 tokens the LLM must process every single turn.

The result? Slower responses, higher costs, and an AI that picks memory_update when it meant memory_upsert because they look similar in a list of 60.


Real Example: Before and After

V1: The Granular Approach (Truncated)

{
  "tools": [
    { "name": "MemoriesAdd", "description": "Add a new memory to the system", "inputSchema": { "type": "object", "properties": { "projectKey": {}, "title": {}, "body": {}, "scope": {}, "memoryType": {}, "tags": {}, "importance": {}, "pinned": {}, "ttlIso": {}, "userId": {}, "chatId": {}, "sourceKind": {}, "sourceRef": {} }, "required": ["projectKey", "title", "body"] } },
    { "name": "MemoriesSearch", "description": "Search memories using hybrid FTS + semantic search", "inputSchema": { ... } },
    { "name": "MemoriesList", "description": "List memories with filtering and pagination", "inputSchema": { ... } },
    { "name": "MemoriesGet", "description": "Get a specific memory by ID", "inputSchema": { ... } },
    { "name": "MemoriesUpdate", "description": "Update an existing memory", "inputSchema": { ... } },
    { "name": "MemoriesPin", "description": "Pin or unpin a memory", "inputSchema": { ... } },
    { "name": "MemoriesArchive", "description": "Archive a memory (soft delete)", "inputSchema": { ... } },
    { "name": "MemoriesDelete", "description": "Permanently delete a memory", "inputSchema": { ... } },
    { "name": "MemoriesLink", "description": "Link two memories", "inputSchema": { ... } },
    { "name": "MemoriesUnlink", "description": "Remove a link between memories", "inputSchema": { ... } },
    { "name": "MemoriesRelated", "description": "Get related memories", "inputSchema": { ... } },
    { "name": "MemoriesPrune", "description": "Archive expired memories", "inputSchema": { ... } },
    { "name": "MemoriesEmbed", "description": "Generate embeddings", "inputSchema": { ... } },
    { "name": "MemoriesStats", "description": "Get memory statistics", "inputSchema": { ... } },
    { "name": "ProjectsList", "description": "List all projects", "inputSchema": { ... } },
    { "name": "ProjectsGet", "description": "Get a project by key", "inputSchema": { ... } },
    { "name": "DocsList", "description": "List docs for a project", "inputSchema": { ... } },
    { "name": "DocsSearch", "description": "Search docs via FTS", "inputSchema": { ... } },
    { "name": "FilesList", "description": "List files", "inputSchema": { ... } },
    { "name": "FilesRead", "description": "Read a file", "inputSchema": { ... } },
    { "name": "FilesWrite", "description": "Write a file", "inputSchema": { ... } },
    { "name": "DbTables", "description": "List SQLite tables", "inputSchema": { ... } },
    { "name": "DbQuery", "description": "Run a SELECT", "inputSchema": { ... } },
    { "name": "DbExec", "description": "Execute SQL", "inputSchema": { ... } }
    // ... and 35+ more
  ]
}

~12,000 tokens. Every. Single. Turn.

V2: The Domain Facade Approach (Complete)

{
  "tools": [
    {
      "name": "MemoryExecute",
      "description": "Neural memory system. Commands: add, get, list, search, update, pin, delete, archive, link, unlink, related, embed, stats, prune",
      "inputSchema": {
        "type": "object",
        "properties": {
          "cmd": { "type": "string" },
          "detail": { "enum": ["minimal", "standard", "full"] },
          "params": { "type": "object" }
        },
        "required": ["cmd"]
      }
    },
    { "name": "ProjectsExecute", "description": "Project management. Commands: list, get, upsert, archive, stats", "inputSchema": { ... } },
    { "name": "TasksExecute", "description": "Task tracking. Commands: list, get, upsert, delete, set_status", "inputSchema": { ... } },
    { "name": "DocsExecute", "description": "Documentation. Commands: list, get, upsert, delete, search, pin", "inputSchema": { ... } },
    { "name": "FilesExecute", "description": "File operations. Commands: list, get, put, delete, roundtrip_*", "inputSchema": { ... } },
    { "name": "DatabaseExecute", "description": "SQL access. Commands: query, exec, schema, tables, stats", "inputSchema": { ... } },
    { "name": "ArtifactsExecute", "description": "Content storage. Commands: get, search, upsert", "inputSchema": { ... } },
    { "name": "HydrationExecute", "description": "AI context. Commands: hydrate, persona_*, identity_*", "inputSchema": { ... } },
    { "name": "DeepSearch", "description": "External search: Google, GitHub, Wikipedia, HackerNews", "inputSchema": { ... } }
  ]
}

~2,000 tokens. Same functionality. That's the whole list.


The Pattern: One Tool Per Domain

Instead of 14 memory tools, expose 1 memory tool with 14 commands:

// Before: 14 tools, 14 descriptions, 14 schemas
MemoriesAdd({ title, body, ... })
MemoriesSearch({ query, topK, ... })
MemoriesPin({ id, pinned })
...

// After: 1 tool, 1 description, commands as a parameter
MemoryExecute({ cmd: "add", params: { title, body, ... }})
MemoryExecute({ cmd: "search", params: { query, topK, ... }})
MemoryExecute({ cmd: "pin", params: { id, pinned }})

The AI reasons about 9 domains instead of 60 verbs.

"I need to search memories" → MemoryExecute with cmd: "search". Done.


The Implementation

Each domain facade follows the same structure:

public async Task<DomainResponse> ExecuteAsync(DomainCommand command)
{
    return command.Cmd.ToLowerInvariant() switch
    {
        "add" => await AddAsync(command),
        "get" => await GetAsync(command),
        "list" => await ListAsync(command),
        "search" => await SearchAsync(command),
        "update" => await UpdateAsync(command),
        "delete" => await DeleteAsync(command),
        _ => DomainResponse.Failure(command.Cmd, "Unknown command")
    };
}

Consistent Envelopes

Request:

{
  "cmd": "search",
  "detail": "standard",
  "params": { "projectId": 1, "query": "authentication", "topK": 10 }
}

Response:

{
  "ok": true,
  "cmd": "search",
  "data": [...],
  "count": 10,
  "error": null
}

Echo back the command. The AI needs to correlate request/response when it's juggling multiple operations.

Detail Levels

Control response verbosity with a single parameter:

Level Returns Use Case
minimal ID, title only Lists, counts, quick checks
standard Key fields, excerpts General use
full Everything Deep inspection, debugging

The AI requests what it needs. No more parsing 50KB responses when you just wanted a count.


The 9 Tools

Tool Commands Purpose
MemoryExecute add, get, list, search, update, pin, delete, link, unlink, embed, stats, prune Neural memory with hybrid search
ProjectsExecute list, get, upsert, archive, stats, get_tree Workspace management
TasksExecute list, get, upsert, delete, set_status, add_note Task tracking
DocsExecute list, get, upsert, delete, search, pin, embed Documentation
FilesExecute list, get, put, delete, mkdir, roundtrip_* File operations
DatabaseExecute query, exec, schema, tables, stats Direct SQL access
ArtifactsExecute get, search, upsert Content-addressed storage
HydrationExecute hydrate, persona_*, identity_*, preferences_* AI context loading
DeepSearch (aggregated) Google, GitHub, Wikipedia, HackerNews

60+ operations. 9 tools. Same capability.


Why It Works

1. Reduced cognitive load. The AI thinks in domains, not verbs. "I need to work with memories" → one obvious choice.

2. Consistent interface. Learn the pattern once, apply everywhere. Every domain has list, get, search. Same envelope, same error codes.

3. Token efficiency. You describe "Memory" once, not memory_add, memory_get, memory_list, memory_update... 14 times.

4. Extensibility. New command? Add a case to the switch. No new tool registration, no schema changes, no documentation updates.

5. Fewer wrong choices. 9 options beats 60. The AI stops confusing MemoriesUpdate with MemoriesUpsert.


The Metrics

Metric Before (60 tools) After (9 tools)
Tool list tokens ~12,000 ~2,000
Wrong tool selection Frequent Rare
Response latency Higher Lower
Monthly API costs $$$ $

Bonus: Manifest-Based Roundtripping

One more pattern worth mentioning: atomic multi-file editing.

The Problem

LLMs editing files one at a time:

PUT /file/a.cs → content
PUT /file/b.cs → content
PUT /file/c.cs → content

Three API calls. No atomicity. No conflict detection. If the user edits a file while the AI is working, you get silent overwrites.

The Solution

roundtrip_start({ paths: ["a.cs", "b.cs", "c.cs"] })
  → Returns: manifest (SHA256 hashes) + ZIP of originals

[AI edits files in ZIP]

roundtrip_preview({ manifestId, modifiedZip })
  → Returns: diff, conflict warnings

roundtrip_commit({ manifestId, zip, mode: "replace" })
  → Applies atomically

The manifest tracks original state:

{
  "manifestId": "rtp_2024-01-15T10-30-00Z_a1b2c3d4",
  "entries": [
    { "path": "src/auth/login.cs", "sha256": "abc123...", "size": 2048 },
    { "path": "src/auth/logout.cs", "sha256": "def456...", "size": 1024 }
  ]
}

Conflict detection on commit:

var currentSha256 = ComputeHash(physicalPath);
if (currentSha256 != manifestEntry.Sha256)
    conflicts.Add($"File modified externally: {virtualPath}");

Commit modes:

Mode Existing New Use Case
replace Overwrite Create Full sync
add_only Skip Create Safe scaffolding
update_only Overwrite Ski