Batch Operations Guide

Home > Docs > Usage > Batch Operations

This guide explains how to efficiently process multiple messages using EverCore's batch operations.

Overview
Conversation Format
Batch Storage Script
Data Format Specification
Examples
Best Practices
Troubleshooting

Overview

EverCore supports batch processing for efficiently storing multiple messages at once. This is particularly useful for:

Processing historical conversation data
Importing chat logs from other platforms
Group chat conversations with multiple participants
Bulk data migration

Conversation Format

EverCore uses a standardized ConversationFormat for batch operations. This format supports:

Conversation metadata (group info, user details)
Multi-speaker conversations
Timestamps and message IDs

For complete format specifications, see Conversation Format Specification.

Batch Storage Script

Basic Usage

# Store group chat messages (Chinese data)
uv run python src/bootstrap.py src/run_memorize.py \
  --input data/team_chat_zh.json \
  --api-url http://localhost:1995/api/v0/memories \
  --scene team

# Store group chat messages (English data)
uv run python src/bootstrap.py src/run_memorize.py \
  --input data/team_chat_en.json \
  --api-url http://localhost:1995/api/v0/memories \
  --scene team

# Validate file format without storing
uv run python src/bootstrap.py src/run_memorize.py \
  --input data/team_chat_en.json \
  --scene team \
  --validate-only

Script Parameters

Parameter	Required	Description
`--input`	Yes	Path to the conversation data file (JSON format)
`--api-url`	No	API endpoint (default: http://localhost:1995/api/v0/memories)
`--scene`	Yes	Scene type: `solo` or `team`
`--validate-only`	No	Validate format without sending to API

Scene Parameter Explanation

The --scene parameter specifies the memory extraction strategy:

solo - Use for one-on-one conversations with AI assistant
team - Use for multi-person group discussions

Important Note: In your data files, you may see scene values like work, company, or social - these are internal scene descriptors in the data format. The --scene command-line parameter uses different values (solo/team) to specify which extraction pipeline to apply.

Data Format Specification

ConversationFormat Structure

{
  "version": "1.0.0",
  "session_meta": {
    "group_id": "group_001",
    "name": "Project Discussion Group",
    "description": "Team project planning and updates",
    "scene": "team",
    "timezone": "Asia/Shanghai",
    "user_details": {
      "user_101": {
        "full_name": "Alice",
        "role": "Product Manager",
        "nickname": "Ali"
      },
      "user_102": {
        "full_name": "Bob",
        "role": "Engineer"
      }
    }
  },
  "conversation_list": [
    {
      "message_id": "msg_001",
      "create_time": "2025-02-01T10:00:00+00:00",
      "sender": "user_101",
      "content": "Good morning everyone, let's discuss the new feature"
    },
    {
      "message_id": "msg_002",
      "create_time": "2025-02-01T10:05:00+00:00",
      "sender": "user_102",
      "content": "Sure! I've prepared the technical spec"
    }
  ]
}

Required Fields

session_meta:

group_id (string) - Unique identifier for the conversation group
name (string) - Human-readable name for the group
user_details (object) - Map of user IDs to user information

conversation_list:

message_id (string) - Unique identifier for each message
create_time (string) - ISO 8601 timestamp with timezone
sender (string) - User ID (must exist in user_details)
content (string) - Message content

Optional Fields

session_meta:

description (string) - Group description
scene (string) - Internal scene descriptor (team or solo)
timezone (string) - Timezone for the conversation

conversation_list:

sender_name (string) - Override sender's display name

Examples

Example 1: Simple Group Chat

{
  "version": "1.0.0",
  "session_meta": {
    "group_id": "team_standup",
    "name": "Daily Standup",
    "user_details": {
      "alice": {"full_name": "Alice Smith"},
      "bob": {"full_name": "Bob Jones"}
    }
  },
  "conversation_list": [
    {
      "message_id": "msg_1",
      "create_time": "2025-02-01T09:00:00+00:00",
      "sender": "alice",
      "content": "Yesterday I completed the login feature"
    },
    {
      "message_id": "msg_2",
      "create_time": "2025-02-01T09:01:00+00:00",
      "sender": "bob",
      "content": "Great! I'm working on the dashboard today"
    }
  ]
}

###Example 2: Family Chat with Rich Metadata

{
  "version": "1.0.0",
  "session_meta": {
    "group_id": "family_chat_001",
    "name": "Smith Family",
    "description": "Family group chat",
    "scene": "team",
    "timezone": "America/New_York",
    "user_details": {
      "mom": {
        "full_name": "Jane Smith",
        "nickname": "Mom",
        "role": "Parent"
      },
      "dad": {
        "full_name": "John Smith",
        "nickname": "Dad",
        "role": "Parent"
      },
      "daughter": {
        "full_name": "Emily Smith",
        "age": 16
      }
    }
  },
  "conversation_list": [
    {
      "message_id": "fam_001",
      "create_time": "2025-02-01T18:00:00-05:00",
      "sender": "mom",
      "content": "Dinner is ready! Come down please.",
    },
    {
      "message_id": "fam_002",
      "create_time": "2025-02-01T18:02:00-05:00",
      "sender": "daughter",
      "content": "Coming! Just finishing homework."
    }
  ]
}

Example 3: One-on-One Assistant Chat

{
  "version": "1.0.0",
  "session_meta": {
    "group_id": "user_assistant_001",
    "name": "Personal Assistant",
    "scene": "solo",
    "user_details": {
      "user_001": {
        "full_name": "Alex"
      }
    }
  },
  "conversation_list": [
    {
      "message_id": "chat_001",
      "create_time": "2025-02-01T10:00:00+00:00",
      "sender": "user_001",
      "content": "I love playing soccer on weekends"
    },
    {
      "message_id": "chat_002",
      "create_time": "2025-02-01T10:30:00+00:00",
      "sender": "user_001",
      "content": "My favorite team is Barcelona"
    }
  ]
}

Command for solo chat:

uv run python src/bootstrap.py src/run_memorize.py \
  --input my_solo_chat.json \
  --scene solo

Best Practices

1. Data Preparation

Validate before importing: Use --validate-only to check format
Use consistent IDs: Ensure message_id and user IDs are unique
Include timestamps: Always use ISO 8601 format with timezone
Provide user details: Include at least full_name for each user

2. Performance Optimization

Batch size: Process 100-1000 messages at a time for optimal performance
Sequential processing: Script processes messages sequentially to maintain order
Monitor progress: Watch for errors in terminal output
Wait for indexing: Allow 10-15 seconds after completion for search indexes to update

3. Data Quality

Clean content: Remove formatting artifacts or special characters
Accurate timestamps: Ensure chronological order
Complete metadata: Fill in all available user information
Meaningful group IDs: Use descriptive, stable identifiers

4. Scene Selection

Use solo for:
- One-on-one conversations
- Personal AI assistant chats
- Individual user interactions
Use team for:
- Multi-participant discussions
- Team conversations
- Family or social group chats

Troubleshooting

Validation Errors

Problem: --validate-only reports format errors

Solutions:

Check JSON syntax is valid
Verify all required fields are present
Ensure timestamps are in ISO 8601 format
Confirm sender IDs exist in user_details

API Errors

Problem: Script reports API errors when storing

Solutions:

Verify API server is running: curl http://localhost:1995/health
Check API URL is correct (default: http://localhost:1995/api/v0/memories)
Ensure .env has required API keys (LLM_API_KEY, VECTORIZE_API_KEY)
Review error messages for specific issues

Slow Processing

Problem: Batch processing is very slow

Solutions:

This is normal for large batches (each message requires LLM extraction)
Reduce batch size if memory issues occur
Ensure Docker services have adequate resources
Check LLM API rate limits

Missing Memories

Problem: Messages processed but not searchable

Solutions:

Wait 10-15 seconds for indexing to complete
Verify Elasticsearch and Milvus are running
Check MongoDB for stored data
Ensure embeddings were created (requires VECTORIZE_API_KEY)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch Operations Guide

Table of Contents

Overview

Conversation Format

Batch Storage Script

Basic Usage

Script Parameters

Scene Parameter Explanation

Data Format Specification

ConversationFormat Structure

Required Fields

Optional Fields

Examples

Example 1: Simple Group Chat

Example 3: One-on-One Assistant Chat

Best Practices

1. Data Preparation

2. Performance Optimization

3. Data Quality

4. Scene Selection

Troubleshooting

Validation Errors

API Errors

Slow Processing

Missing Memories

See Also

FilesExpand file tree

BATCH_OPERATIONS.md

Latest commit

History

BATCH_OPERATIONS.md

File metadata and controls

Batch Operations Guide

Table of Contents

Overview

Conversation Format

Batch Storage Script

Basic Usage

Script Parameters

Scene Parameter Explanation

Data Format Specification

ConversationFormat Structure

Required Fields

Optional Fields

Examples

Example 1: Simple Group Chat

Example 3: One-on-One Assistant Chat

Best Practices

1. Data Preparation

2. Performance Optimization

3. Data Quality

4. Scene Selection

Troubleshooting

Validation Errors

API Errors

Slow Processing

Missing Memories

See Also