0:00
/1:28

As your network grows, so does the volume of your meeting notes. We realized that as the dataset gets larger, finding specific details about a contact requires too much manual scrolling or relying on exact keyword matches.

To solve this, we are thrilled to announce a new natural language chat interface. Instead of searching for "startup", you can simply ask, "What did Andy say about his startup?" and receive a concise answer synthesized exclusively from your own notes, complete with links back to the original source.

Here is a look at the problem, the solution, and the technical architecture powering this new feature.

The Magic: Natural Language Memory Retrieval

The core goal of this feature is to close the semantic gap between conversational questions and your short, factual notes.

When you ask a question, the system does not just search for exact words. It rewrites your query into an optimized search, finds the most relevant notes, and synthesizes a conversational answer. It remains completely stateless on the backend, meaning your frontend manages the conversation history while the backend focuses purely on fast, accurate retrieval.

Early design

** THIS IS EARLY EXPERIMENTATION **
** PRODCUCTION VERSION IS NOT THE SAME **

Client
  │
  │  POST /chat?personId=<uuid>
  │  { "question": "...", "history": [...] }
  │
  ▼
API Handler
  │
  ├─ 1. Authenticate & parse request
  │
  ├─ 2. Rewrite Query
  │       │
  │       └─► POST localhost:11434/api/chat  [llama3.2:3b] 
  │            prompt: "rewrite as 5-10 word search query"
  │            response: "Andy startup company product focus" 
  │
  ├─ 3. Embed Query
  │       │
  │       └─► POST localhost:11434/api/embed  [nomic-embed-text]
  │            response: [0.123, -0.456, ...]  (768 floats)
  │
  ├─ 4. Search Notes
  │       │
  │       └─► SELECT from pgvector DB 
  │            ORDER BY similarity LIMIT 5
  │            response: Top 5 relevant notes
  │
  ├─ 5. Build RAG Prompt
  │       │
  │       └─► messages = [ system prompt with notes, history, raw question ]
  │
  ├─ 6. Chat Completion
  │       │
  │       └─► POST localhost:11434/api/chat  [llama3.2:3b]
  │            response: "Based on your notes, Andy mentioned..."
  │
  └─ 7. Return ChatResponse
          { answer, source_note_ids, role: "assistant" }

Under the Hood: Infrastructure & Architecture

To keep infrastructure complexity low while maintaining high performance, we integrated this feature directly into our existing stack.


┌──────────────────────────────────────────────────────────────────────┐
│                         EC2                                          │
│                                                                      │
│  ┌─────────────────────┐          ┌───────────────────────────────┐  │
│  │  Go API (port 8080) │          │  Ollama (port 11434)          │  │
│  │                     │          │                               │  │
│  │  handler/chat.go    │◄────────►│  nomic-embed-text (768 dims)  │  │
│  │  lib/rag.go         │localhost │  llama3.2:3b (synthesis)      │  │
│  │  lib/ollama.go      │          │                               │  │
│  └──────────┬──────────┘          └───────────────────────────────┘  │
│             │                                                        │
│             ▼                                                        │
│  ┌─────────────────────┐                                             │
│  │  PostgreSQL + pgvector                                            │
│  │  - notes (+ embedding_status)                                     │
│  │  - note_embeddings (vector(768))                                  │
│  └─────────────────────┘                                             │
└──────────────────────────────────────────────────────────────────────┘
         │ SQS publish (async)
         ▼
┌─────────────────────┐
│   AWS SQS Queue     │
└──────────┬──────────┘
           │ triggers
           ▼
┌──────────────────────────────────────────────────────────────────────┐
│  AWS Lambda (Go, linux/arm64)                                        │
│  lambda/embedding-worker/                                            │
│                                                                      │
│  1. Consumes SQS messages (NoteEmbeddingEvent)                       │
│  2. Calls Ollama via EC2 private IP → nomic-embed-text               │
│  3. Writes vector to note_embeddings table                           │
│  4. Updates notes.embedding_status = 'completed' or 'failed'         │
└──────────────────────────────────────────────────────────────────────┘
           ▲
           │ re-enqueues pending/failed every 5 min
┌─────────────────────┐
│  Cron Job           │
│  retry_embeddings   |                                            
└─────────────────────┘

The Asynchronous Ingestion Pipeline

When you type a note, it autosaves every few seconds. If we generated an embedding for every single keystroke, we would flood our queue and waste compute resources.

Instead, we designed a smart ingestion flow. Autosaves quietly flag the note's embedding status as pending in the database. Only explicit user actions (like hitting a save button or deleting a note) trigger an immediate SQS event. A background cron job sweeps up any lingering pending notes every five minutes.

──── AddNote / EditNote (explicit user save) ────────────────────────

User saves / edits a note
  │
  ▼
API Handler
  │
  ├─ SQL sets embedding_status = 'pending'
  │
  └─ Publish SQS Event (fire-and-forget)
          │
          ▼
     AWS SQS Queue
          │
          ▼
     Lambda (embedding-worker)
          │
          ├─ Calls EC2 private IP → Ollama nomic-embed-text
          │
          ├─ SUCCESS:
          │    INSERT/UPDATE note_embeddings
          │    UPDATE notes SET embedding_status = 'completed'
          │
          └─ FAILURE:
               UPDATE notes SET embedding_status = 'failed'
               SQS retries (up to 3 times) before Dead Letter Queue

──── AutosaveNote (fires every ~2s while user is typing) ────────────

AutosaveNote
  │
  ├─ SQL sets embedding_status = 'pending'
  │
  └─ (NO SQS publish to prevent queue flooding)
       Cron picks up the final version within 5 minutes

──── DeleteNote ─────────────────────────────────────────────────────

DeleteNote
  │
  ├─ Note row removed from DB (CASCADE handles vector deletion)
  │
  └─ Publish SQS Event to Lambda for belt-and-suspenders cleanup

──── Cron (every 5 minutes) ─────────────────────────────────────────

  SELECT notes WHERE embedding_status IN ('pending', 'failed')
   │
   └─► Publish SQS Event for each
        └─► Lambda processes → status transitions to 'completed'

Bulletproof Reliability

To track this asynchronous process, we introduced a simple state machine directly on the notes table. Notes cycle between pending, completed, and failed. If the Ollama service briefly goes offline or a Lambda timeout occurs, the system gracefully marks the note as failed and our cron job automatically retries it later.

  ┌──────────┐   note saved/updated   ┌─────────┐
  │ (no row) │ ─────────────────────► │ pending │
  └──────────┘                        └────┬────┘
                                           │
                          ┌────────────────┴────────────────┐
                          │ Lambda success                  │ Lambda failure
                          ▼                                 ▼
                    ┌───────────┐                     ┌────────┐
                    │ completed │                     │ failed │
                    └───────────┘                     └───┬────┘
                                                          │
                                          cron re-enqueues to SQS
                                                          │
                                                          ▼
                                                     ┌─────────┐
                                                     │ pending │
                                                     └─────────┘

This ensures zero data loss in the vector search pipeline, meaning your chat will always have the most up-to-date context on your connections.