Ingesting Events

ingest() takes a list of events and returns their IDs.

Note

Field naming. The JSON wire format uses snake_case (actor_id, session_id, role_id, team_id). The Python SDK accepts snake_case. The Node SDK accepts camelCase (actorId, sessionId, …) and converts on send. Don't mix conventions across SDKs — pasting actor_id into a Node call (or actorId into the cURL body) returns 422.

Role and team scoping

This is what sets Memsy apart: every event can be scoped to a role and team in your onboarding hierarchy. Extracted memories are stored at the right level of your org — not just a flat blob per user — so searches respect organizational context automatically.

from memsy import EventPayload

client.ingest([
    EventPayload(
        actor_id="user_42",
        session_id="session_1",
        kind="user_message",
        content="I always prefer dark mode in my dev environment.",
        role_id="role_frontend_dev",
        team_id="team_platform",
    )
])

await client.ingest([
  {
    actorId: 'user_42',
    sessionId: 'session_1',
    kind: 'user_message',
    content: 'I always prefer dark mode in my dev environment.',
    roleId: 'role_frontend_dev',
    teamId: 'team_platform',
  },
]);

curl -X POST "$MEMSY_BASE_URL/ingest" \
  -H "Authorization: Bearer $MEMSY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "events": [
      {
        "actor_id": "user_42",
        "session_id": "session_1",
        "kind": "user_message",
        "content": "I always prefer dark mode in my dev environment.",
        "role_id": "role_frontend_dev",
        "team_id": "team_platform"
      }
    ]
  }'

The memory extractor uses the role's and team's promotion_prompt to bias extraction toward what's relevant for that context. Memories then flow up the hierarchy: actor → role → team → org.

role_id/roleId and team_id/teamId are optional. Events without them are stored at the actor scope.

The minimum viable event

from memsy import EventPayload

EventPayload(
    actor_id="user_42",
    session_id="session_1",
    kind="user_message",
    content="Remember my flight is on Friday.",
)

const event = {
  actorId: 'user_42',
  sessionId: 'session_1',
  kind: 'user_message' as const,
  content: 'Remember my flight is on Friday.',
};

{
  "actor_id": "user_42",
  "session_id": "session_1",
  "kind": "user_message",
  "content": "Remember my flight is on Friday."
}

Four fields. Everything else is optional.

Event kinds

kind is a closed enum. The API returns 422 Unprocessable Entity for any other value.

Value	Use for
`user_message`	Something the user said or typed. Most common.
`assistant_message`	Something the assistant said back. Useful so the model can recall its own past statements.
`tool_result`	The output of a tool/function call (search result, calculator answer, API response).
`app_event`	App-level signals that aren't a chat turn — e.g. "user changed plan to Pro", "deployment finished".

If you have a generic CRM event or a webhook payload, app_event is almost always the right answer. The extractor still pulls facts and preferences from it, just without conversational framing.

There are no chat_turn, tool_call, system, or note kinds. Mapping notes:

OpenAI/Anthropic message role user → user_message
role assistant → assistant_message
role tool (or tool/function response) → tool_result
system messages → don't ingest them; they're rarely worth storing as memory

Timestamps

If your events are live, omit ts and the server stamps them. If you're backfilling historical data, always pass an ISO-8601 timestamp:

EventPayload(
    actor_id="user_42",
    session_id="session_1",
    kind="user_message",
    content="...",
    ts="2026-03-15T14:22:10Z",
)

const event = {
  actorId: 'user_42',
  sessionId: 'session_1',
  kind: 'user_message' as const,
  content: '...',
  ts: '2026-03-15T14:22:10Z',
};

{
  "actor_id": "user_42",
  "session_id": "session_1",
  "kind": "user_message",
  "content": "...",
  "ts": "2026-03-15T14:22:10Z"
}

Backfilled events with explicit timestamps get the same treatment as live events during extraction, but the timestamp is used for recency weighting in search.

Metadata

metadata is a JSON-serialised string (not a dict/object). Use it for application-level fields you want back with search results — tags, source identifiers, custom flags:

import json

EventPayload(
    actor_id="user_42",
    session_id="session_1",
    kind="app_event",
    content="User upgraded to pro plan.",
    metadata=json.dumps({"plan": "pro", "source": "billing-webhook"}),
)

const event = {
  actorId: 'user_42',
  sessionId: 'session_1',
  kind: 'app_event' as const,
  content: 'User upgraded to pro plan.',
  metadata: JSON.stringify({ plan: 'pro', source: 'billing-webhook' }),
};

{
  "actor_id": "user_42",
  "session_id": "session_1",
  "kind": "app_event",
  "content": "User upgraded to pro plan.",
  "metadata": "{\"plan\":\"pro\",\"source\":\"billing-webhook\"}"
}

Memsy echoes the raw metadata string back in each SearchResult so you can deserialize it client-side.

Extracted memories carry the metadata of the source event(s) they were derived from under metadata.source_metadata on each search result — up to 5 entries per memory. Each entry is {event_id, metadata: <parsed dict>} when the original string parsed as a JSON object, or {event_id, raw: <original string>} otherwise. You will not lose the metadata you sent, even if it wasn't valid JSON: the server preserves the original under raw instead of rejecting the ingest.

Batching

ingest() is designed for batches. Send many events in one call whenever you can:

batch: list[EventPayload] = []

for msg in conversation_turns:
    batch.append(EventPayload(
        actor_id=user_id,
        session_id=session_id,
        kind="user_message" if msg.role == "user" else "assistant_message",
        content=msg.text,
    ))

result = client.ingest(batch)

const batch = conversationTurns.map((msg) => ({
  actorId: userId,
  sessionId,
  kind: msg.role === 'user' ? 'user_message' : 'assistant_message',
  content: msg.text,
} as const));

const result = await client.ingest(batch);

# Send all events in one request body — array under "events".
curl -X POST "$MEMSY_BASE_URL/ingest" \
  -H "Authorization: Bearer $MEMSY_API_KEY" \
  -H "Content-Type: application/json" \
  -d @batch.json

Recommended batch size: tens to low hundreds. Very large batches (thousands) increase tail latency and risk hitting request-size limits.

Return value

event_ids / eventIds gives you the IDs the server assigned, in the same order as the input. Pair them up if you need to track processing per event:

result = client.ingest(batch)
pairs = dict(zip(result.event_ids, batch))

const result = await client.ingest(batch);
const pairs = Object.fromEntries(
  result.eventIds.map((id, i) => [id, batch[i]])
);

// Response body
{
  "event_ids": ["01JK...", "01JL...", "01JM..."]
}

Idempotency

Memsy de-duplicates within a rolling 60-second window by (actor_id, session_id, kind, content) hash. Identical re-posts within the window are collapsed and the response returns the original event_id for each duplicate, so SDK retries and status() lookups continue to work.
Dedup is best-effort — under platform degradation you may still see fresh event_ids for duplicate submissions. Treat the window as a safety net for transient retries, not a guarantee.
If you genuinely need to re-ingest after the window, change any of the four fields — typically by appending ts to the content or sending a fresh session_id.

Validation rules

POST /ingest returns 422 if any event fails validation:

content must contain at least one non-whitespace character and stay under 8000 characters.
actor_id and session_id must be non-empty after trimming whitespace; both are capped at 256 characters.
role_id and team_id are also capped at 256 characters and collapse to null when only whitespace.
metadata is capped at 4096 characters.
NUL bytes are silently stripped from all string fields before persistence.

Invalid JSON in metadata is not a validation error: the original string is preserved as {"raw": <original_str>} on the event and surfaces back under source_metadata on derived memories.

Failure modes

ingest() itself can fail with (Python / Node):

AuthenticationError / MemsyAuthError — bad API key.
UsageLimitExceeded / MemsyAPIError (status 403) — you've hit a quota (events, tokens, storage).
RateLimitExceeded / MemsyRateLimitError — the SDK retried automatically, but you still exhausted the budget.
MemsyConnectionError — network or timeout.

Individual events can also fail during extraction after a successful ingest. Those show up in status().failed_ids / status().failedIds later. See Async Processing.

Retrieve what you stored → Searching memory.