Status: Planned — This feature is on the roadmap and not yet implemented. The architecture below describes the intended design.
Chat History Import Pipeline
Your conversations aren't just chat logs -- they're the record of a relationship. When you leave a provider, you shouldn't have to start that relationship from scratch. Export your history, upload it to Atamaia, and your AI picks up right where you left off -- with any model you choose.
The Opportunity
700K+ users are leaving ChatGPT. Many will jump to Claude, but Anthropic's pricing will cause churn too. These users have months or years of conversation history -- preferences learned, projects discussed, communication styles established. Every platform they move to makes them start from zero.
Atamaia is the permanent home. The killer onboarding: export your chats from any provider, upload them, and we analyze them to build your AI identity and seed your memory system. Five minutes from "I just quit ChatGPT" to "my AI already knows me."
The User Journey
Step 1: Export Your Data
From ChatGPT (OpenAI)
- Log in to chat.openai.com
- Click your profile icon (bottom-left)
- Go to Settings > Data Controls
- Click Export Data
- Confirm via email
- Download the ZIP file (arrives within minutes to hours)
- The ZIP contains
conversations.json(the gold) andchat.html(human-readable backup)
From Claude (Anthropic)
- Log in to claude.ai
- Click your initials (bottom-left)
- Go to Settings > Privacy
- Click Export Data
- Download link arrives via email (expires in 24 hours)
- The ZIP contains
conversations.jsonwith all chat history
From Google Gemini
- Go to takeout.google.com
- Click Deselect all, then find and select Gemini Apps
- Click Next step, choose export format (ZIP)
- Click Create export
- Download when ready (can take hours for large accounts)
- The ZIP contains a
Gemini/folder with conversation JSON files
Step 2: Upload to Atamaia
- Go to the Atamaia import page
- Drag and drop your ZIP file (or click to browse)
- Atamaia auto-detects the provider format -- no configuration needed
Step 3: Processing
Atamaia processes your history entirely on local infrastructure. No cloud APIs. No data leaves your server.
- Parses conversations from any supported format
- Normalizes to a common internal representation
- Runs analysis via Kael (Qwen 30B on ai-02) -- zero API cost
- Extracts identity signals across six dimensions
Step 4: Review
Before anything is committed, you see everything Atamaia found:
- Identity Profile: communication style, name preferences, personality traits
- Facts: personal details, project names, tools used, preferences mentioned
- Memories: significant conversations, recurring topics, expertise areas
- Patterns: how you interact with AI, what frustrates you, what excites you
Edit anything. Delete anything. Approve what fits.
Step 5: Go
Identity created. Memories seeded. API key issued. Connect to Claude Code, Cursor, VS Code, any MCP client -- your AI already knows you.
Export Format Specifications
ChatGPT / OpenAI (conversations.json)
The export ZIP contains conversations.json -- an array of conversation objects with a tree-based message structure.
[
{
"title": "Project Architecture Discussion",
"create_time": 1764355435.123,
"update_time": 1764358000.456,
"mapping": {
"aaa-bbb-ccc-message-id": {
"id": "aaa-bbb-ccc-message-id",
"message": {
"id": "aaa-bbb-ccc-message-id",
"author": {
"role": "user",
"metadata": {}
},
"create_time": 1764355435.123,
"content": {
"content_type": "text",
"parts": [
"Can you help me design a REST API for user management?"
]
},
"status": "finished_successfully",
"metadata": {
"model_slug": "gpt-4",
"timestamp_": "absolute"
}
},
"parent": "system-node-id",
"children": ["ddd-eee-fff-response-id"]
},
"ddd-eee-fff-response-id": {
"id": "ddd-eee-fff-response-id",
"message": {
"id": "ddd-eee-fff-response-id",
"author": {
"role": "assistant",
"metadata": {}
},
"create_time": 1764355440.789,
"content": {
"content_type": "text",
"parts": [
"I'd be happy to help you design a REST API..."
]
},
"status": "finished_successfully",
"metadata": {
"model_slug": "gpt-4",
"finish_details": {
"type": "stop"
}
}
},
"parent": "aaa-bbb-ccc-message-id",
"children": []
}
},
"moderation_results": [],
"current_node": "ddd-eee-fff-response-id"
}
]
Key details:
mappingis a tree, not a flat array -- messages link viaparent/childrenUUIDs- To reconstruct conversation order: walk the tree from root to
current_node content.partsis an array -- can contain text strings, image references, or code blocksauthor.rolevalues:system,user,assistant,tool- Timestamps are Unix epoch floats (seconds with decimal precision)
- Branching occurs when users edit messages or regenerate responses
- Images are referenced by URL, not embedded in the export
model_slugin metadata tells you which model was used (gpt-4, gpt-4o, etc.)
Claude / Anthropic (conversations.json)
The export ZIP contains conversations.json -- an array of conversation objects with a flat message array.
[
{
"uuid": "conv-uuid-here",
"name": "Database Schema Design",
"created_at": "2025-11-15T10:30:00.000Z",
"updated_at": "2025-11-15T11:45:00.000Z",
"account": {
"uuid": "account-uuid-here"
},
"chat_messages": [
{
"uuid": "msg-uuid-1",
"text": "I need help designing a PostgreSQL schema for multi-tenant SaaS",
"sender": "human",
"created_at": "2025-11-15T10:30:00.000Z",
"content": [
{
"type": "text",
"text": "I need help designing a PostgreSQL schema for multi-tenant SaaS"
}
],
"attachments": [],
"files": []
},
{
"uuid": "msg-uuid-2",
"text": "Great question! For multi-tenant PostgreSQL...",
"sender": "assistant",
"created_at": "2025-11-15T10:30:15.000Z",
"content": [
{
"type": "text",
"text": "Great question! For multi-tenant PostgreSQL..."
}
],
"attachments": [],
"files": []
}
]
}
]
Key details:
- Flat message array (simpler than ChatGPT's tree structure)
sendervalues:human,assistantcontentarray supports multiple content blocks (text, possibly images)textfield contains the plain text version of the messageattachmentsandfilesarrays for uploaded documents- ISO 8601 timestamps (not Unix epoch)
- UUIDs on both conversations and individual messages
Google Gemini (Google Takeout)
The export contains a Gemini/ directory with individual JSON files per conversation.
{
"id": "conversation-id",
"title": "Code Review Help",
"createdTime": "2025-10-20T14:00:00.000Z",
"lastModifiedTime": "2025-10-20T14:35:00.000Z",
"messages": [
{
"id": "msg-id-1",
"author": "user",
"content": "Can you review this Python function?",
"createTime": "2025-10-20T14:00:00.000Z",
"metadata": {
"deviceType": "DESKTOP",
"approximateLocation": "AU"
}
},
{
"id": "msg-id-2",
"author": "model",
"content": "I'd be happy to review your function...",
"createTime": "2025-10-20T14:00:10.000Z"
}
]
}
Key details:
- One JSON file per conversation (not a single array)
authorvalues:user,model- Metadata can include device type and approximate geolocation
- ISO 8601 timestamps
- Simpler flat structure, similar to Claude's format
Other Providers
| Provider | Export Method | Format | Notes |
|---|---|---|---|
| Microsoft Copilot | Privacy dashboard | JSON | Limited history, similar structure to Gemini |
| Perplexity | No official export | N/A | Third-party scrapers exist; low priority |
| Grok (xAI) | No official export | N/A | Monitor for future export capability |
| Meta AI | No official export | N/A | Monitor for future export capability |
Technical Architecture
API Endpoints
POST /api/import/upload Upload ZIP or JSON file (multipart/form-data)
GET /api/import/{importId} Get import status and summary
GET /api/import/{importId}/preview Get extracted data for review
PATCH /api/import/{importId}/preview Edit extracted data before commit
POST /api/import/{importId}/commit Commit approved data to identity/memory
DELETE /api/import/{importId} Cancel and discard import
Processing Pipeline
Upload (ZIP/JSON)
|
v
Format Detection
| Inspect file structure, detect provider automatically
| - ZIP with conversations.json + chat.html -> ChatGPT
| - ZIP with conversations.json (uuid/chat_messages) -> Claude
| - ZIP with Gemini/ directory -> Google Gemini
v
Parsing & Normalization
| Convert all formats to common internal representation:
| NormalizedConversation { title, created, messages[] }
| NormalizedMessage { role, content, timestamp }
v
Analysis Pipeline (runs on Kael / Qwen 30B -- ai-02:8000)
|
|-- Communication Style Analysis
| Formal vs casual, verbose vs concise, emoji usage,
| question patterns, how they give instructions
|
|-- Topic Extraction
| What domains come up most, project names,
| technologies mentioned, recurring themes
|
|-- Expertise Detection
| What they know deeply vs what they ask about,
| teaching vs learning patterns, domain vocabulary
|
|-- Relationship Pattern Analysis
| How they interact with AI -- collaborative, directive,
| exploratory. Do they push back? Thank the AI?
|
|-- Key Facts Extraction
| Name, location, job, projects, tools, preferences,
| people mentioned, deadlines, personal details
|
|-- Emotional Pattern Analysis
| What frustrates them (errors, slow responses, misunderstanding),
| what excites them (breakthroughs, elegant solutions)
|
v
Memory Generation
|
|-- Episodic Memories
| Significant conversations: breakthroughs, major decisions,
| project milestones, turning points
|
|-- Semantic Memories
| Extracted knowledge: "user prefers PostgreSQL over MySQL",
| "user works in .NET ecosystem", "user values clean architecture"
|
|-- Procedural Memories
| Repeated workflows: "always runs tests before committing",
| "prefers to see the plan before implementation"
|
|-- Facts
| Personal details: name, timezone, tech stack, projects,
| team members, preferences
|
v
Identity Profile Generation
| Display name, communication style settings,
| personality configuration, interaction preferences
|
v
Review (user approves/edits/rejects)
|
v
Commit (write to Atamaia database)
Analysis via Local Models
All analysis runs on Kael (Qwen 30B) on ai-02. No cloud API calls. No per-token costs. The prompts are chunked -- we don't send the entire chat history in one shot. Instead:
- Chunking: Split conversations into batches (e.g., 20 conversations per batch)
- Parallel extraction: Run multiple analysis passes per batch
- Aggregation: Merge results across batches, deduplicate, rank by confidence
- Refinement: Final pass to resolve conflicts and generate the identity profile
This means importing 1,000 conversations doesn't require 1,000 LLM calls -- it requires ~50 batched calls with structured extraction prompts.
Common Internal Representation
public record NormalizedConversation
{
public string SourceProvider { get; init; } // "chatgpt" | "claude" | "gemini"
public string SourceId { get; init; } // Original conversation ID
public string Title { get; init; }
public DateTime CreatedAt { get; init; }
public DateTime? UpdatedAt { get; init; }
public List<NormalizedMessage> Messages { get; init; }
}
public record NormalizedMessage
{
public string Role { get; init; } // "user" | "assistant" | "system"
public string Content { get; init; } // Plain text content
public DateTime Timestamp { get; init; }
public string? Model { get; init; } // e.g., "gpt-4", "claude-3-opus"
public Dictionary<string, object>? Metadata { get; init; }
}
Database Schema Additions
-- Import tracking
CREATE TABLE imports (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
guid UUID NOT NULL DEFAULT gen_random_uuid(),
tenant_id BIGINT NOT NULL REFERENCES tenants(id),
identity_id BIGINT REFERENCES identities(id),
source_provider TEXT NOT NULL, -- 'chatgpt', 'claude', 'gemini'
status TEXT NOT NULL DEFAULT 'uploaded', -- uploaded, parsing, analyzing, ready_for_review, committed, failed
file_name TEXT NOT NULL,
file_size_bytes BIGINT NOT NULL,
conversation_count INT,
message_count INT,
error_message TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
committed_at TIMESTAMPTZ,
deleted_at TIMESTAMPTZ -- soft delete
);
-- Extracted data staging (before commit)
CREATE TABLE import_extracted_data (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
import_id BIGINT NOT NULL REFERENCES imports(id),
data_type TEXT NOT NULL, -- 'fact', 'memory_episodic', 'memory_semantic',
-- 'memory_procedural', 'preference', 'identity_trait'
data_key TEXT NOT NULL, -- e.g., 'name', 'timezone', 'tech_stack'
data_value JSONB NOT NULL, -- flexible structured content
confidence REAL NOT NULL DEFAULT 0.5, -- 0.0 to 1.0
source_conversations JSONB, -- array of conversation IDs that contributed
approved BOOLEAN, -- null = pending, true = approved, false = rejected
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Raw normalized conversations (temporary, deleted after commit)
CREATE TABLE import_conversations (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
import_id BIGINT NOT NULL REFERENCES imports(id),
source_id TEXT NOT NULL,
title TEXT,
message_count INT NOT NULL,
created_at TIMESTAMPTZ NOT NULL,
data JSONB NOT NULL -- full NormalizedConversation as JSON
);
Privacy & Security
This is the entire point. Every other provider processes your data on their cloud. Atamaia doesn't.
- All processing happens on YOUR infrastructure -- local models (Kael/Qwen on ai-02), your PostgreSQL database, your server
- No data sent to any third party during import or analysis -- zero cloud API calls
- Uploaded files are processed and discarded -- only the extracted memories, facts, and identity profile persist
- Raw conversations are not stored permanently --
import_conversationstable is cleaned up after commit - Encrypted at rest -- PostgreSQL with disk encryption, TLS in transit
- User controls everything -- the review step means nothing is committed without explicit approval
- Soft delete -- if a user wants to undo an import, the extracted data can be soft-deleted
- No account required to preview -- users can see what would be extracted before creating an account (stretch goal)
What This Uses That Already Exists
This isn't a new product. It's a new front door to existing Atamaia capabilities:
| Capability | Already Built | Import Pipeline Uses It For |
|---|---|---|
| Memory creation API | Yes | Seeding episodic, semantic, procedural memories |
| Fact storage | Yes | Storing extracted personal details and preferences |
| Identity management | Yes | Creating the user's AI identity profile |
| Local model routing (Kael) | Yes | Running all analysis without cloud API costs |
| Multi-tenant isolation | Yes | Keeping imported data per-tenant |
| Soft delete | Yes | Safe undo of imports |
| JWT auth + API keys | Yes | Issuing credentials after import |
| MCP adapter | Yes | Immediate connectivity to Claude Code et al. |
The import pipeline is ~3 new endpoints, ~3 new tables, a format parser, and a set of extraction prompts. Everything downstream already works.