If you’re building AI agents that talk to people on WhatsApp, you’ve probably thought about memory. How does your agent remember what happened three days ago? How does it know the customer already rejected your offer? How does it avoid asking the same question twice?
The default answer in 2024 was RAG -Retrieval-Augmented Generation. Embed your messages, throw them in a vector database, and retrieve the relevant ones before generating a response.
We tried that. It doesn’t work for conversations.
Instead, we designed a three-layer system. Each layer serves a different purpose, and together they give an AI agent complete conversational awareness.
Each layer serves a different purpose, and together they give an AI agent complete conversational awareness.
┌─────────────────────────────────────────────────┐ │ Layer 3: CONVERSATION STATE │ │ Structured truth. LLM-extracted. │ │ Intent, sentiment, objections, commitments │ │ Updated async after each message batch │ ├─────────────────────────────────────────────────┤ │ Layer 2: ATOMIC MEMORIES │ │ Facts extracted from conversation windows │ │ Embedded, tagged, bi-temporally timestamped │ │ Linked back to source chunk for detail │ │ ADD / UPDATE / DELETE / NOOP lifecycle │ ├─────────────────────────────────────────────────┤ │ Layer 1: CONVERSATION CHUNKS │ │ 3-6 message windows, overlapping │ │ NOT embedded -these are source material │ │ Retrieved by reference when detail is needed │ ├─────────────────────────────────────────────────┤ │ Layer 0: RAW MESSAGES │ │ Source of truth, immutable │ └─────────────────────────────────────────────────┘
Layer 0: Raw Messages
Your message store. Every message with full metadata -sender, timestamp, type, read status. This is the immutable source of truth. No intelligence here, just data.
Layer 1: Conversation Chunks
Groups of 3-6 messages, overlapping, with timestamps and participant info. These capture the narrative flow -the mini-stories within a conversation. When an agent needs to understand how a negotiation unfolded (not just what was decided), it reads the relevant chunks.
Crucially, chunks are not embedded. They exist as source material that memories link back to. This keeps your vector index clean and focused.
Layer 2: Atomic Memories
This is the search layer. Each memory is a single, self-contained fact extracted from a conversation chunk:
- Facts: “Customer owns a flower shop in Palermo”
- Preferences: “Prefers WhatsApp over email for communication”
- Objections: “Said $800 is too expensive, budget is ~$500”
- Commitments: “We promised to send a revised proposal by Monday”
- Events: “Customer was referred by Juan on March 28”
Each memory is embedded for vector search, tagged for filtering, and linked to its source chunk for when you need the full context. Memories follow the ADD/UPDATE/DELETE/NOOP lifecycle -no duplicates, no stale facts.
Memories exist at three scopes: conversation-level (facts about this specific contact), number-level (business context shared across all conversations on a WhatsApp line), and user-level (knowledge that spans all numbers).
Layer 3: Conversation State
The structured truth about where a conversation stands right now. Updated asynchronously after each message batch by an LLM that reads the recent messages and extracts:
- Intent: What is this conversation about? (pricing inquiry, support, onboarding)
- Sentiment: How does the contact feel? (positive, neutral, frustrated)
- Status: Where are we? (negotiating, waiting for response, closed)
- Objections: What has the contact pushed back on?
- Commitments: What has been promised, by whom, and by when?
- Decision history: Key yes/no moments and what triggered them
This is the first thing an agent reads when stepping into a conversation. No searching, no retrieval -just a single row with the current truth.
Read more:
https://wpp.opero.so/blog/why-rag-fails-for-whatsapp-and-what-we-built-instead?utm_source=linkedin
submitted by /u/juancruzlrc
[link] [comments]