As your network grows, so does the volume of your meeting notes. We realized that as the dataset gets larger, finding specific details about a contact requires too much manual scrolling or relying on exact keyword matches.
To solve this, we are thrilled to announce a new natural language chat interface. Instead of searching for "startup", you can simply ask, "What did Andy say about his startup?" and receive a concise answer synthesized exclusively from your own notes, complete with links back to the original source.
Here is a look at the problem, the solution, and the technical architecture powering this new feature.
The Magic: Natural Language Memory Retrieval
The core goal of this feature is to close the semantic gap between conversational questions and your short, factual notes.
When you ask a question, the system does not just search for exact words. It rewrites your query into an optimized search, finds the most relevant notes, and synthesizes a conversational answer. It remains completely stateless on the backend, meaning your frontend manages the conversation history while the backend focuses purely on fast, accurate retrieval.
Early design
** THIS IS EARLY EXPERIMENTATION **
** PRODCUCTION VERSION IS NOT THE SAME **
Client
│
│ POST /chat?personId=<uuid>
│ { "question": "...", "history": [...] }
│
▼
API Handler
│
├─ 1. Authenticate & parse request
│
├─ 2. Rewrite Query
│ │
│ └─► POST localhost:11434/api/chat [llama3.2:3b]
│ prompt: "rewrite as 5-10 word search query"
│ response: "Andy startup company product focus"
│
├─ 3. Embed Query
│ │
│ └─► POST localhost:11434/api/embed [nomic-embed-text]
│ response: [0.123, -0.456, ...] (768 floats)
│
├─ 4. Search Notes
│ │
│ └─► SELECT from pgvector DB
│ ORDER BY similarity LIMIT 5
│ response: Top 5 relevant notes
│
├─ 5. Build RAG Prompt
│ │
│ └─► messages = [ system prompt with notes, history, raw question ]
│
├─ 6. Chat Completion
│ │
│ └─► POST localhost:11434/api/chat [llama3.2:3b]
│ response: "Based on your notes, Andy mentioned..."
│
└─ 7. Return ChatResponse
{ answer, source_note_ids, role: "assistant" }Under the Hood: Infrastructure & Architecture
To keep infrastructure complexity low while maintaining high performance, we integrated this feature directly into our existing stack.
┌──────────────────────────────────────────────────────────────────────┐
│ EC2 │
│ │
│ ┌─────────────────────┐ ┌───────────────────────────────┐ │
│ │ Go API (port 8080) │ │ Ollama (port 11434) │ │
│ │ │ │ │ │
│ │ handler/chat.go │◄────────►│ nomic-embed-text (768 dims) │ │
│ │ lib/rag.go │localhost │ llama3.2:3b (synthesis) │ │
│ │ lib/ollama.go │ │ │ │
│ └──────────┬──────────┘ └───────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ PostgreSQL + pgvector │
│ │ - notes (+ embedding_status) │
│ │ - note_embeddings (vector(768)) │
│ └─────────────────────┘ │
└──────────────────────────────────────────────────────────────────────┘
│ SQS publish (async)
▼
┌─────────────────────┐
│ AWS SQS Queue │
└──────────┬──────────┘
│ triggers
▼
┌──────────────────────────────────────────────────────────────────────┐
│ AWS Lambda (Go, linux/arm64) │
│ lambda/embedding-worker/ │
│ │
│ 1. Consumes SQS messages (NoteEmbeddingEvent) │
│ 2. Calls Ollama via EC2 private IP → nomic-embed-text │
│ 3. Writes vector to note_embeddings table │
│ 4. Updates notes.embedding_status = 'completed' or 'failed' │
└──────────────────────────────────────────────────────────────────────┘
▲
│ re-enqueues pending/failed every 5 min
┌─────────────────────┐
│ Cron Job │
│ retry_embeddings |
└─────────────────────┘
The Asynchronous Ingestion Pipeline
When you type a note, it autosaves every few seconds. If we generated an embedding for every single keystroke, we would flood our queue and waste compute resources.
Instead, we designed a smart ingestion flow. Autosaves quietly flag the note's embedding status as pending in the database. Only explicit user actions (like hitting a save button or deleting a note) trigger an immediate SQS event. A background cron job sweeps up any lingering pending notes every five minutes.
──── AddNote / EditNote (explicit user save) ────────────────────────
User saves / edits a note
│
▼
API Handler
│
├─ SQL sets embedding_status = 'pending'
│
└─ Publish SQS Event (fire-and-forget)
│
▼
AWS SQS Queue
│
▼
Lambda (embedding-worker)
│
├─ Calls EC2 private IP → Ollama nomic-embed-text
│
├─ SUCCESS:
│ INSERT/UPDATE note_embeddings
│ UPDATE notes SET embedding_status = 'completed'
│
└─ FAILURE:
UPDATE notes SET embedding_status = 'failed'
SQS retries (up to 3 times) before Dead Letter Queue
──── AutosaveNote (fires every ~2s while user is typing) ────────────
AutosaveNote
│
├─ SQL sets embedding_status = 'pending'
│
└─ (NO SQS publish to prevent queue flooding)
Cron picks up the final version within 5 minutes
──── DeleteNote ─────────────────────────────────────────────────────
DeleteNote
│
├─ Note row removed from DB (CASCADE handles vector deletion)
│
└─ Publish SQS Event to Lambda for belt-and-suspenders cleanup
──── Cron (every 5 minutes) ─────────────────────────────────────────
SELECT notes WHERE embedding_status IN ('pending', 'failed')
│
└─► Publish SQS Event for each
└─► Lambda processes → status transitions to 'completed'Bulletproof Reliability
To track this asynchronous process, we introduced a simple state machine directly on the notes table. Notes cycle between pending, completed, and failed. If the Ollama service briefly goes offline or a Lambda timeout occurs, the system gracefully marks the note as failed and our cron job automatically retries it later.
┌──────────┐ note saved/updated ┌─────────┐
│ (no row) │ ─────────────────────► │ pending │
└──────────┘ └────┬────┘
│
┌────────────────┴────────────────┐
│ Lambda success │ Lambda failure
▼ ▼
┌───────────┐ ┌────────┐
│ completed │ │ failed │
└───────────┘ └───┬────┘
│
cron re-enqueues to SQS
│
▼
┌─────────┐
│ pending │
└─────────┘This ensures zero data loss in the vector search pipeline, meaning your chat will always have the most up-to-date context on your connections.