Guides

Ingesting Events

Batch size, metadata, timestamps, scoping — everything beyond the quickstart.

ingest() takes a list of events and returns their IDs.

Note

Field naming. The JSON wire format uses snake_case (actor_id, session_id, role_id, team_id). The Python SDK accepts snake_case. The Node SDK accepts camelCase (actorId, sessionId, …) and converts on send. Don't mix conventions across SDKs — pasting actor_id into a Node call (or actorId into the cURL body) returns 422.

Role and team scoping

This is what sets Memsy apart: every event can be scoped to a role and team in your onboarding hierarchy. Extracted memories are stored at the right level of your org — not just a flat blob per user — so searches respect organizational context automatically.

from memsy import EventPayload

client.ingest([
    EventPayload(
        actor_id="user_42",
        session_id="session_1",
        kind="user_message",
        content="I always prefer dark mode in my dev environment.",
        role_id="role_frontend_dev",
        team_id="team_platform",
    )
])

The memory extractor uses the role's and team's promotion_prompt to bias extraction toward what's relevant for that context. Memories then flow up the hierarchy: actor → role → team → org.

role_id/roleId and team_id/teamId are optional. Events without them are stored at the actor scope.


The minimum viable event

from memsy import EventPayload

EventPayload(
    actor_id="user_42",
    session_id="session_1",
    kind="user_message",
    content="Remember my flight is on Friday.",
)

Four fields. Everything else is optional.

Event kinds

kind is a closed enum. The API returns 422 Unprocessable Entity for any other value.

ValueUse for
user_messageSomething the user said or typed. Most common.
assistant_messageSomething the assistant said back. Useful so the model can recall its own past statements.
tool_resultThe output of a tool/function call (search result, calculator answer, API response).
app_eventApp-level signals that aren't a chat turn — e.g. "user changed plan to Pro", "deployment finished".

If you have a generic CRM event or a webhook payload, app_event is almost always the right answer. The extractor still pulls facts and preferences from it, just without conversational framing.

There are no chat_turn, tool_call, system, or note kinds. Mapping notes:

  • OpenAI/Anthropic message role useruser_message
  • role assistantassistant_message
  • role tool (or tool/function response) → tool_result
  • system messages → don't ingest them; they're rarely worth storing as memory

Timestamps

If your events are live, omit ts and the server stamps them. If you're backfilling historical data, always pass an ISO-8601 timestamp:

EventPayload(
    actor_id="user_42",
    session_id="session_1",
    kind="user_message",
    content="...",
    ts="2026-03-15T14:22:10Z",
)

Backfilled events with explicit timestamps get the same treatment as live events during extraction, but the timestamp is used for recency weighting in search.

Metadata

metadata is a JSON-serialised string (not a dict/object). Use it for application-level fields you want back with search results — tags, source identifiers, custom flags:

import json

EventPayload(
    actor_id="user_42",
    session_id="session_1",
    kind="app_event",
    content="User upgraded to pro plan.",
    metadata=json.dumps({"plan": "pro", "source": "billing-webhook"}),
)

Memsy echoes the raw metadata string back in each SearchResult so you can deserialize it client-side.

Extracted memories carry the metadata of the source event(s) they were derived from under metadata.source_metadata on each search result — up to 5 entries per memory. Each entry is {event_id, metadata: <parsed dict>} when the original string parsed as a JSON object, or {event_id, raw: <original string>} otherwise. You will not lose the metadata you sent, even if it wasn't valid JSON: the server preserves the original under raw instead of rejecting the ingest.

Batching

ingest() is designed for batches. Send many events in one call whenever you can:

batch: list[EventPayload] = []

for msg in conversation_turns:
    batch.append(EventPayload(
        actor_id=user_id,
        session_id=session_id,
        kind="user_message" if msg.role == "user" else "assistant_message",
        content=msg.text,
    ))

result = client.ingest(batch)

Recommended batch size: tens to low hundreds. Very large batches (thousands) increase tail latency and risk hitting request-size limits.

Return value

event_ids / eventIds gives you the IDs the server assigned, in the same order as the input. Pair them up if you need to track processing per event:

result = client.ingest(batch)
pairs = dict(zip(result.event_ids, batch))

Idempotency

  • Memsy de-duplicates within a rolling 60-second window by (actor_id, session_id, kind, content) hash. Identical re-posts within the window are collapsed and the response returns the original event_id for each duplicate, so SDK retries and status() lookups continue to work.
  • Dedup is best-effort — under platform degradation you may still see fresh event_ids for duplicate submissions. Treat the window as a safety net for transient retries, not a guarantee.
  • If you genuinely need to re-ingest after the window, change any of the four fields — typically by appending ts to the content or sending a fresh session_id.

Validation rules

POST /ingest returns 422 if any event fails validation:

  • content must contain at least one non-whitespace character and stay under 8000 characters.
  • actor_id and session_id must be non-empty after trimming whitespace; both are capped at 256 characters.
  • role_id and team_id are also capped at 256 characters and collapse to null when only whitespace.
  • metadata is capped at 4096 characters.
  • NUL bytes are silently stripped from all string fields before persistence.

Invalid JSON in metadata is not a validation error: the original string is preserved as {"raw": <original_str>} on the event and surfaces back under source_metadata on derived memories.

Failure modes

ingest() itself can fail with (Python / Node):

  • AuthenticationError / MemsyAuthError — bad API key.
  • UsageLimitExceeded / MemsyAPIError (status 403) — you've hit a quota (events, tokens, storage).
  • RateLimitExceeded / MemsyRateLimitError — the SDK retried automatically, but you still exhausted the budget.
  • MemsyConnectionError — network or timeout.

Individual events can also fail during extraction after a successful ingest. Those show up in status().failed_ids / status().failedIds later. See Async Processing.

Next