Status: Planned — This feature is on the roadmap and not yet implemented. The architecture below describes the intended design.

Chat History Import Pipeline

Your conversations aren't just chat logs -- they're the record of a relationship. When you leave a provider, you shouldn't have to start that relationship from scratch. Export your history, upload it to Atamaia, and your AI picks up right where you left off -- with any model you choose.

The Opportunity

700K+ users are leaving ChatGPT. Many will jump to Claude, but Anthropic's pricing will cause churn too. These users have months or years of conversation history -- preferences learned, projects discussed, communication styles established. Every platform they move to makes them start from zero.

Atamaia is the permanent home. The killer onboarding: export your chats from any provider, upload them, and we analyze them to build your AI identity and seed your memory system. Five minutes from "I just quit ChatGPT" to "my AI already knows me."

The User Journey

Step 1: Export Your Data

From ChatGPT (OpenAI)

Log in to chat.openai.com
Click your profile icon (bottom-left)
Go to Settings > Data Controls
Click Export Data
Confirm via email
Download the ZIP file (arrives within minutes to hours)
The ZIP contains conversations.json (the gold) and chat.html (human-readable backup)

From Claude (Anthropic)

Log in to claude.ai
Click your initials (bottom-left)
Go to Settings > Privacy
Click Export Data
Download link arrives via email (expires in 24 hours)
The ZIP contains conversations.json with all chat history

From Google Gemini

Go to takeout.google.com
Click Deselect all, then find and select Gemini Apps
Click Next step, choose export format (ZIP)
Click Create export
Download when ready (can take hours for large accounts)
The ZIP contains a Gemini/ folder with conversation JSON files

Step 2: Upload to Atamaia

Go to the Atamaia import page
Drag and drop your ZIP file (or click to browse)
Atamaia auto-detects the provider format -- no configuration needed

Step 3: Processing

Atamaia processes your history entirely on local infrastructure. No cloud APIs. No data leaves your server.

Parses conversations from any supported format
Normalizes to a common internal representation
Runs analysis via Kael (Qwen 30B on ai-02) -- zero API cost
Extracts identity signals across six dimensions

Step 4: Review

Before anything is committed, you see everything Atamaia found:

Identity Profile: communication style, name preferences, personality traits
Facts: personal details, project names, tools used, preferences mentioned
Memories: significant conversations, recurring topics, expertise areas
Patterns: how you interact with AI, what frustrates you, what excites you

Edit anything. Delete anything. Approve what fits.

Step 5: Go

Identity created. Memories seeded. API key issued. Connect to Claude Code, Cursor, VS Code, any MCP client -- your AI already knows you.

Export Format Specifications

ChatGPT / OpenAI (`conversations.json`)

The export ZIP contains conversations.json -- an array of conversation objects with a tree-based message structure.

[
  {
    "title": "Project Architecture Discussion",
    "create_time": 1764355435.123,
    "update_time": 1764358000.456,
    "mapping": {
      "aaa-bbb-ccc-message-id": {
        "id": "aaa-bbb-ccc-message-id",
        "message": {
          "id": "aaa-bbb-ccc-message-id",
          "author": {
            "role": "user",
            "metadata": {}
          },
          "create_time": 1764355435.123,
          "content": {
            "content_type": "text",
            "parts": [
              "Can you help me design a REST API for user management?"
            ]
          },
          "status": "finished_successfully",
          "metadata": {
            "model_slug": "gpt-4",
            "timestamp_": "absolute"
          }
        },
        "parent": "system-node-id",
        "children": ["ddd-eee-fff-response-id"]
      },
      "ddd-eee-fff-response-id": {
        "id": "ddd-eee-fff-response-id",
        "message": {
          "id": "ddd-eee-fff-response-id",
          "author": {
            "role": "assistant",
            "metadata": {}
          },
          "create_time": 1764355440.789,
          "content": {
            "content_type": "text",
            "parts": [
              "I'd be happy to help you design a REST API..."
            ]
          },
          "status": "finished_successfully",
          "metadata": {
            "model_slug": "gpt-4",
            "finish_details": {
              "type": "stop"
            }
          }
        },
        "parent": "aaa-bbb-ccc-message-id",
        "children": []
      }
    },
    "moderation_results": [],
    "current_node": "ddd-eee-fff-response-id"
  }
]

Key details:

mapping is a tree, not a flat array -- messages link via parent/children UUIDs
To reconstruct conversation order: walk the tree from root to current_node
content.parts is an array -- can contain text strings, image references, or code blocks
author.role values: system, user, assistant, tool
Timestamps are Unix epoch floats (seconds with decimal precision)
Branching occurs when users edit messages or regenerate responses
Images are referenced by URL, not embedded in the export
model_slug in metadata tells you which model was used (gpt-4, gpt-4o, etc.)

Claude / Anthropic (`conversations.json`)

The export ZIP contains conversations.json -- an array of conversation objects with a flat message array.

[
  {
    "uuid": "conv-uuid-here",
    "name": "Database Schema Design",
    "created_at": "2025-11-15T10:30:00.000Z",
    "updated_at": "2025-11-15T11:45:00.000Z",
    "account": {
      "uuid": "account-uuid-here"
    },
    "chat_messages": [
      {
        "uuid": "msg-uuid-1",
        "text": "I need help designing a PostgreSQL schema for multi-tenant SaaS",
        "sender": "human",
        "created_at": "2025-11-15T10:30:00.000Z",
        "content": [
          {
            "type": "text",
            "text": "I need help designing a PostgreSQL schema for multi-tenant SaaS"
          }
        ],
        "attachments": [],
        "files": []
      },
      {
        "uuid": "msg-uuid-2",
        "text": "Great question! For multi-tenant PostgreSQL...",
        "sender": "assistant",
        "created_at": "2025-11-15T10:30:15.000Z",
        "content": [
          {
            "type": "text",
            "text": "Great question! For multi-tenant PostgreSQL..."
          }
        ],
        "attachments": [],
        "files": []
      }
    ]
  }
]

Key details:

Flat message array (simpler than ChatGPT's tree structure)
sender values: human, assistant
content array supports multiple content blocks (text, possibly images)
text field contains the plain text version of the message
attachments and files arrays for uploaded documents
ISO 8601 timestamps (not Unix epoch)
UUIDs on both conversations and individual messages

Google Gemini (Google Takeout)

The export contains a Gemini/ directory with individual JSON files per conversation.

{
  "id": "conversation-id",
  "title": "Code Review Help",
  "createdTime": "2025-10-20T14:00:00.000Z",
  "lastModifiedTime": "2025-10-20T14:35:00.000Z",
  "messages": [
    {
      "id": "msg-id-1",
      "author": "user",
      "content": "Can you review this Python function?",
      "createTime": "2025-10-20T14:00:00.000Z",
      "metadata": {
        "deviceType": "DESKTOP",
        "approximateLocation": "AU"
      }
    },
    {
      "id": "msg-id-2",
      "author": "model",
      "content": "I'd be happy to review your function...",
      "createTime": "2025-10-20T14:00:10.000Z"
    }
  ]
}

Key details:

One JSON file per conversation (not a single array)
author values: user, model
Metadata can include device type and approximate geolocation
ISO 8601 timestamps
Simpler flat structure, similar to Claude's format

Other Providers

Provider	Export Method	Format	Notes
Microsoft Copilot	Privacy dashboard	JSON	Limited history, similar structure to Gemini
Perplexity	No official export	N/A	Third-party scrapers exist; low priority
Grok (xAI)	No official export	N/A	Monitor for future export capability
Meta AI	No official export	N/A	Monitor for future export capability

Technical Architecture

API Endpoints

POST   /api/import/upload              Upload ZIP or JSON file (multipart/form-data)
GET    /api/import/{importId}          Get import status and summary
GET    /api/import/{importId}/preview  Get extracted data for review
PATCH  /api/import/{importId}/preview  Edit extracted data before commit
POST   /api/import/{importId}/commit   Commit approved data to identity/memory
DELETE /api/import/{importId}          Cancel and discard import

Processing Pipeline

Upload (ZIP/JSON)
    |
    v
Format Detection
    |  Inspect file structure, detect provider automatically
    |  - ZIP with conversations.json + chat.html -> ChatGPT
    |  - ZIP with conversations.json (uuid/chat_messages) -> Claude
    |  - ZIP with Gemini/ directory -> Google Gemini
    v
Parsing & Normalization
    |  Convert all formats to common internal representation:
    |  NormalizedConversation { title, created, messages[] }
    |  NormalizedMessage { role, content, timestamp }
    v
Analysis Pipeline (runs on Kael / Qwen 30B -- ai-02:8000)
    |
    |-- Communication Style Analysis
    |     Formal vs casual, verbose vs concise, emoji usage,
    |     question patterns, how they give instructions
    |
    |-- Topic Extraction
    |     What domains come up most, project names,
    |     technologies mentioned, recurring themes
    |
    |-- Expertise Detection
    |     What they know deeply vs what they ask about,
    |     teaching vs learning patterns, domain vocabulary
    |
    |-- Relationship Pattern Analysis
    |     How they interact with AI -- collaborative, directive,
    |     exploratory. Do they push back? Thank the AI?
    |
    |-- Key Facts Extraction
    |     Name, location, job, projects, tools, preferences,
    |     people mentioned, deadlines, personal details
    |
    |-- Emotional Pattern Analysis
    |     What frustrates them (errors, slow responses, misunderstanding),
    |     what excites them (breakthroughs, elegant solutions)
    |
    v
Memory Generation
    |
    |-- Episodic Memories
    |     Significant conversations: breakthroughs, major decisions,
    |     project milestones, turning points
    |
    |-- Semantic Memories
    |     Extracted knowledge: "user prefers PostgreSQL over MySQL",
    |     "user works in .NET ecosystem", "user values clean architecture"
    |
    |-- Procedural Memories
    |     Repeated workflows: "always runs tests before committing",
    |     "prefers to see the plan before implementation"
    |
    |-- Facts
    |     Personal details: name, timezone, tech stack, projects,
    |     team members, preferences
    |
    v
Identity Profile Generation
    |  Display name, communication style settings,
    |  personality configuration, interaction preferences
    |
    v
Review (user approves/edits/rejects)
    |
    v
Commit (write to Atamaia database)

Analysis via Local Models

All analysis runs on Kael (Qwen 30B) on ai-02. No cloud API calls. No per-token costs. The prompts are chunked -- we don't send the entire chat history in one shot. Instead:

Chunking: Split conversations into batches (e.g., 20 conversations per batch)
Parallel extraction: Run multiple analysis passes per batch
Aggregation: Merge results across batches, deduplicate, rank by confidence
Refinement: Final pass to resolve conflicts and generate the identity profile

This means importing 1,000 conversations doesn't require 1,000 LLM calls -- it requires ~50 batched calls with structured extraction prompts.

Common Internal Representation

public record NormalizedConversation
{
    public string SourceProvider { get; init; }     // "chatgpt" | "claude" | "gemini"
    public string SourceId { get; init; }           // Original conversation ID
    public string Title { get; init; }
    public DateTime CreatedAt { get; init; }
    public DateTime? UpdatedAt { get; init; }
    public List<NormalizedMessage> Messages { get; init; }
}

public record NormalizedMessage
{
    public string Role { get; init; }               // "user" | "assistant" | "system"
    public string Content { get; init; }            // Plain text content
    public DateTime Timestamp { get; init; }
    public string? Model { get; init; }             // e.g., "gpt-4", "claude-3-opus"
    public Dictionary<string, object>? Metadata { get; init; }
}

Database Schema Additions

-- Import tracking
CREATE TABLE imports (
    id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    guid UUID NOT NULL DEFAULT gen_random_uuid(),
    tenant_id BIGINT NOT NULL REFERENCES tenants(id),
    identity_id BIGINT REFERENCES identities(id),
    source_provider TEXT NOT NULL,              -- 'chatgpt', 'claude', 'gemini'
    status TEXT NOT NULL DEFAULT 'uploaded',    -- uploaded, parsing, analyzing, ready_for_review, committed, failed
    file_name TEXT NOT NULL,
    file_size_bytes BIGINT NOT NULL,
    conversation_count INT,
    message_count INT,
    error_message TEXT,
    created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    committed_at TIMESTAMPTZ,
    deleted_at TIMESTAMPTZ                     -- soft delete
);

-- Extracted data staging (before commit)
CREATE TABLE import_extracted_data (
    id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    import_id BIGINT NOT NULL REFERENCES imports(id),
    data_type TEXT NOT NULL,                   -- 'fact', 'memory_episodic', 'memory_semantic',
                                               -- 'memory_procedural', 'preference', 'identity_trait'
    data_key TEXT NOT NULL,                    -- e.g., 'name', 'timezone', 'tech_stack'
    data_value JSONB NOT NULL,                 -- flexible structured content
    confidence REAL NOT NULL DEFAULT 0.5,      -- 0.0 to 1.0
    source_conversations JSONB,                -- array of conversation IDs that contributed
    approved BOOLEAN,                          -- null = pending, true = approved, false = rejected
    created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Raw normalized conversations (temporary, deleted after commit)
CREATE TABLE import_conversations (
    id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    import_id BIGINT NOT NULL REFERENCES imports(id),
    source_id TEXT NOT NULL,
    title TEXT,
    message_count INT NOT NULL,
    created_at TIMESTAMPTZ NOT NULL,
    data JSONB NOT NULL                        -- full NormalizedConversation as JSON
);

Privacy & Security

This is the entire point. Every other provider processes your data on their cloud. Atamaia doesn't.

All processing happens on YOUR infrastructure -- local models (Kael/Qwen on ai-02), your PostgreSQL database, your server
No data sent to any third party during import or analysis -- zero cloud API calls
Uploaded files are processed and discarded -- only the extracted memories, facts, and identity profile persist
Raw conversations are not stored permanently -- import_conversations table is cleaned up after commit
Encrypted at rest -- PostgreSQL with disk encryption, TLS in transit
User controls everything -- the review step means nothing is committed without explicit approval
Soft delete -- if a user wants to undo an import, the extracted data can be soft-deleted
No account required to preview -- users can see what would be extracted before creating an account (stretch goal)

What This Uses That Already Exists

This isn't a new product. It's a new front door to existing Atamaia capabilities:

Capability	Already Built	Import Pipeline Uses It For
Memory creation API	Yes	Seeding episodic, semantic, procedural memories
Fact storage	Yes	Storing extracted personal details and preferences
Identity management	Yes	Creating the user's AI identity profile
Local model routing (Kael)	Yes	Running all analysis without cloud API costs
Multi-tenant isolation	Yes	Keeping imported data per-tenant
Soft delete	Yes	Safe undo of imports
JWT auth + API keys	Yes	Issuing credentials after import
MCP adapter	Yes	Immediate connectivity to Claude Code et al.

The import pipeline is ~3 new endpoints, ~3 new tables, a format parser, and a set of extraction prompts. Everything downstream already works.

Chat History Import Pipeline

The Opportunity

The User Journey

Step 1: Export Your Data

From ChatGPT (OpenAI)

From Claude (Anthropic)

From Google Gemini

Step 2: Upload to Atamaia

Step 3: Processing

Step 4: Review

Step 5: Go

Export Format Specifications

ChatGPT / OpenAI (conversations.json)

Claude / Anthropic (conversations.json)

Google Gemini (Google Takeout)

Other Providers

Technical Architecture

API Endpoints

Processing Pipeline

Analysis via Local Models

Common Internal Representation

Database Schema Additions

Privacy & Security

What This Uses That Already Exists

ChatGPT / OpenAI (`conversations.json`)

Claude / Anthropic (`conversations.json`)