Build AI Memory with Your Existing Database
You don't need Pinecone to give your AI app memory. If you're already on Postgres, Supabase, or Convex, you have everything required to build a working memory layer today — no new vendor, no new pricing tier, no new category to learn.
This is the DIY path: semantic search over a memories table you own and control. It won't replace a dedicated memory service for complex cases (temporal knowledge graphs, entity deduplication, multi-hop reasoning). But for the most common use case — remembering what users told you and surfacing the relevant context at the start of each session — it works, it's fast, and it costs almost nothing.
For "pgvector AI memory," "Convex vector search," or "Supabase AI memory," the rule is the same: build AI memory as a small database feature. The right DIY AI memory database is usually your existing stack.
The 4-Step Pattern
Every AI memory system, DIY or managed, reduces to four steps:
- Embed — convert a piece of text (a user message, an extracted fact) into a vector using an embedding model
- Store — write the vector + original text +
user_idto your database - Retrieve — at the start of each new session, embed the incoming query and find the k nearest stored vectors for that user
- Inject — prepend the retrieved memories to the system prompt before calling the LLM
That's it. The implementation differs by database, but the pattern is identical across all three stacks below.
Choosing an Embedding Model
Pick your embedding model before your stack. Changing dimensions later means re-embedding everything.
| Model | Dims | Cost / 1M tokens | MTEB Score | Best for |
|---|---|---|---|---|
text-embedding-3-small |
1536 | $0.02 | 62.3 | Most indie apps |
text-embedding-3-large |
3072 | $0.13 | 64.6 | When quality matters more than cost |
For most apps, text-embedding-3-small at 1536 dims is the right default. OpenAI lists it at $0.02 per 1M tokens, and 1536 dimensions fits comfortably in pgvector, Supabase, and Convex indexes (OpenAI API pricing).
Current pgvector is stronger on filtered memory search than older advice suggests. The official release notes list 0.8.2 as current; 0.8.0 added iterative scans and better cost estimation for filtered HNSW/IVFFlat queries (pgvector 0.8.2 release). Use HNSW for most memory search, and IVFFlat only when faster builds and lower memory matter more than recall.
Stack 1: Convex Vector Search
Convex ships vector search as a first-class feature — no extension to install, no index tuning, just a TypeScript schema declaration and a query in an action.
Schema
// convex/schema.ts
import { defineSchema, defineTable } from "convex/server";
import { v } from "convex/values";
export default defineSchema({
memories: defineTable({
userId: v.string(),
content: v.string(),
embedding: v.array(v.float64()),
}).vectorIndex("by_embedding", {
vectorField: "embedding",
dimensions: 1536,
filterFields: ["userId"],
}),
});
Convex supports dimensions from 2 to 4096, up to 4 vector indexes per table, and up to 256 results per query (Convex limits docs).
Store a memory
// convex/memories.ts
"use node";
import { action, internalMutation, internalQuery } from "./_generated/server";
import { v } from "convex/values";
import OpenAI from "openai";
import { internal } from "./_generated/api";
const openai = new OpenAI();
export const insertMemory = internalMutation({
args: {
userId: v.string(),
content: v.string(),
embedding: v.array(v.float64()),
},
handler: async (ctx, { userId, content, embedding }) => {
return await ctx.db.insert("memories", { userId, content, embedding });
},
});
export const getMemory = internalQuery({
args: { id: v.id("memories") },
handler: async (ctx, { id }) => {
return await ctx.db.get(id);
},
});
export const storeMemory = action({
args: { userId: v.string(), content: v.string() },
handler: async (ctx, { userId, content }) => {
const response = await openai.embeddings.create({
model: "text-embedding-3-small",
input: content,
});
const embedding = response.data[0].embedding;
await ctx.runMutation(internal.memories.insertMemory, {
userId,
content,
embedding,
});
},
});
Retrieve relevant memories
export const retrieveMemories = action({
args: { userId: v.string(), query: v.string() },
handler: async (ctx, { userId, query }) => {
const response = await openai.embeddings.create({
model: "text-embedding-3-small",
input: query,
});
const queryEmbedding = response.data[0].embedding;
const results = await ctx.vectorSearch("memories", "by_embedding", {
vector: queryEmbedding,
limit: 5,
filter: (q) => q.eq("userId", userId),
});
// Fetch full documents for the top matches
const memories = await Promise.all(
results.map(({ _id }) => ctx.runQuery(internal.memories.getMemory, { id: _id }))
);
return memories.filter(Boolean);
},
});
The filter on userId is handled by the index — no full table scan. Convex vector search returns _id and _score (cosine similarity from -1 to 1) (Convex vector search docs).
Stack 2: Supabase pgvector
Supabase ships pgvector by default. The pattern is SQL for schema, a Postgres function for the similarity query, and an Edge Function for embedding generation.
Schema
-- Enable pgvector (already available in Supabase)
create extension if not exists vector with schema extensions;
-- Memories table
create table memories (
id bigserial primary key,
user_id uuid references auth.users(id) not null,
content text not null,
embedding extensions.halfvec(1536) not null,
created_at timestamptz default now()
);
alter table memories enable row level security;
create policy "Users can manage own memories"
on memories
for all
using (auth.uid() = user_id)
with check (auth.uid() = user_id);
-- HNSW index for fast ANN search
create index on memories
using hnsw (embedding extensions.halfvec_cosine_ops)
with (m = 16, ef_construction = 64);
halfvec stores each dimension as 2 bytes instead of 4, halving storage cost with negligible quality loss. m = 16 and ef_construction = 64 are the pgvector defaults — they achieve >95% recall at most scales below 5M vectors (AWS pgvector indexing deep dive).
Similarity search function
create or replace function match_memories(
query_embedding extensions.halfvec(1536),
match_user_id uuid,
match_count int default 5
)
returns table (
id bigint,
content text,
similarity float
)
language sql stable
as $$
select
id,
content,
1 - (embedding <=> query_embedding) as similarity
from memories
where user_id = match_user_id
order by embedding <=> query_embedding
limit match_count;
$$;
Store and retrieve from a Next.js route
// app/api/memory/route.ts
import { createClient } from "@/utils/supabase/server";
import OpenAI from "openai";
const openai = new OpenAI();
async function embed(text: string) {
const response = await openai.embeddings.create({
model: "text-embedding-3-small",
input: text,
});
return response.data[0].embedding;
}
// POST /api/memory — store a new memory
export async function POST(request: Request) {
const supabase = await createClient();
const { data: { user } } = await supabase.auth.getUser();
if (!user) {
return Response.json({ error: "Unauthorized" }, { status: 401 });
}
const { content } = await request.json();
if (typeof content !== "string" || content.trim().length === 0) {
return Response.json({ error: "Content is required" }, { status: 400 });
}
const embedding = await embed(content);
await supabase.from("memories").insert({
user_id: user.id,
content,
embedding: JSON.stringify(embedding),
});
return Response.json({ ok: true });
}
// GET /api/memory?q=... — retrieve relevant memories
export async function GET(request: Request) {
const supabase = await createClient();
const { data: { user } } = await supabase.auth.getUser();
if (!user) {
return Response.json({ error: "Unauthorized" }, { status: 401 });
}
const q = new URL(request.url).searchParams.get("q") ?? "";
const queryEmbedding = await embed(q);
const { data: memories } = await supabase.rpc("match_memories", {
query_embedding: JSON.stringify(queryEmbedding),
match_user_id: user.id,
match_count: 5,
});
return Response.json({ memories });
}
For high-write workloads, Supabase's automatic embeddings guide shows a queue-based pattern using pgmq + pg_cron + an Edge Function — decoupling the embedding API call from the database write.
Stack 3: Self-Hosted Postgres with Drizzle
If you're running Postgres yourself on Fly.io or Railway, you can use Drizzle's native pgvector support.
Schema
Add the extension in a migration before Drizzle creates the table:
CREATE EXTENSION IF NOT EXISTS vector;
// db/schema.ts
import { index, pgTable, bigserial, uuid, text, vector, timestamp } from "drizzle-orm/pg-core";
export const memories = pgTable(
"memories",
{
id: bigserial("id", { mode: "number" }).primaryKey(),
userId: uuid("user_id").notNull(),
content: text("content").notNull(),
embedding: vector("embedding", { dimensions: 1536 }),
createdAt: timestamp("created_at").defaultNow(),
},
(table) => [
index("memories_embedding_idx").using(
"hnsw",
table.embedding.op("vector_cosine_ops")
),
]
);
Similarity search
// lib/memory.ts
import { db } from "@/db";
import { memories } from "@/db/schema";
import { cosineDistance, desc, gt, sql, and, eq } from "drizzle-orm";
import OpenAI from "openai";
const openai = new OpenAI();
async function embed(text: string) {
const response = await openai.embeddings.create({
model: "text-embedding-3-small",
input: text,
});
return response.data[0].embedding;
}
export async function storeMemory(userId: string, content: string) {
const embedding = await embed(content);
await db.insert(memories).values({ userId, content, embedding });
}
export async function retrieveMemories(userId: string, query: string, k = 5) {
const queryEmbedding = await embed(query);
const similarity = sql<number>`1 - (${cosineDistance(memories.embedding, queryEmbedding)})`;
return db
.select({ content: memories.content, similarity })
.from(memories)
.where(and(eq(memories.userId, userId), gt(similarity, 0.5)))
.orderBy(desc(similarity))
.limit(k);
}
Drizzle's cosineDistance compiles to pgvector's <=> operator (Drizzle vector guide).
Fact Extraction: What to Actually Store
The biggest mistake with DIY memory: storing raw conversation turns verbatim. A user message like "can you help me fix this TypeScript error?" has near-zero memory value — it tells you nothing durable about the user.
What you want to store are extracted facts: durable, first-person statements about the user.
Here's a fact-extraction prompt you can ship:
You are a memory extraction system. Extract only atomic, verifiable facts about the user.
Rules:
- Output ONLY this JSON object: { "facts": string[] }.
- Include preferences, plans, relationships, expertise, and durable stated facts.
- Exclude questions, opinions, chit-chat, secrets, and one-off tasks.
- Write each fact as a short first-person statement: "I prefer...", "I am...", "I use...".
- If nothing is worth storing, return { "facts": [] }.
Conversation turn:
<turn>
{{message}}
</turn>
This pattern — LLM-driven fact extraction into atomic statements — is how Mem0 works at its core (Mem0 LangChain integration). The open-source library is worth reading even if you're not using the managed service: it shows how to handle deduplication, contradiction resolution, and fact eviction.
Call this extraction function after each user message, then store any returned facts as individual memories rows. Retrieve on the next session's first message.
Retrieval Injection: Building the System Prompt
Once you have retrieved memories, inject them at the top of the system prompt:
async function buildSystemPrompt(userId: string, userMessage: string): Promise<string> {
const memories = await retrieveMemories(userId, userMessage, 5);
const memoryBlock = memories.length > 0
? `## What I know about you\n${memories.map(m => `- ${m.content}`).join("\n")}`
: "";
// Cap memory context at ~500 tokens to protect context window
const cappedMemory = memoryBlock.slice(0, 2000);
return [
"You are a helpful assistant with memory of past conversations.",
cappedMemory,
"\nUse the above context to personalize your response. If you reference something from memory, do it naturally — don't announce it."
].filter(Boolean).join("\n\n");
}
Keep retrieved memory under ~500 tokens. At 5 memories × 100 tokens each, you're at 500 tokens — safe at any context window size. If you retrieve more (for richer personalization), summarize them before injection.
Cost Math
Embedding cost at the rates above (text-embedding-3-small, $0.02/M tokens):
- 1 message ≈ 100 tokens → $0.000002 per message
- 10 messages/session, 2 sessions/month → 20 messages → $0.00004/user/month
- At 1,000 MAU → $0.04/month in embedding costs
Infrastructure cost dominates, not embedding API cost:
| Stack | 1K MAU | 10K MAU | 100K MAU |
|---|---|---|---|
| Supabase pgvector (Pro) | ~$25-35/mo | ~$50-75/mo | ~$150/mo |
| Convex vector search (Professional) | $25/developer/mo | ~$50/mo + usage | ~$200/mo + usage |
| Fly.io Postgres | ~$10-40/mo | ~$40-90/mo | ~$150/mo + compute |
| Mem0 managed (for reference) | free or $19/mo | $249/mo | enterprise/custom |
| Zep Flex (for reference) | free or $25/mo | $25/mo + credits | Flex Plus ($475/mo) or enterprise |
At 1K–10K MAU, DIY beats managed services by 5–10x on cost. The gap narrows above 100K MAU when Postgres infrastructure starts scaling. Treat the table as an order-of-magnitude model and check official pages before budgeting (Supabase billing docs, Convex pricing, Fly Managed Postgres pricing, OpenAI pricing, Mem0 pricing, Zep pricing).
When DIY Breaks Down
pgvector works until it doesn't. Watch for these signals:
Above 5–10M vectors: HNSW index builds take hours and consume significant RAM. One production incident report describes queries degrading from sub-100ms to multi-second timeouts as the index stopped fitting in memory (The Case Against pgvector). Plan your index parameters or consider partitioning by user.
High write volume: Every vector insertion updates the HNSW graph under lock. If you're ingesting thousands of memories per minute, contention becomes a problem. The Supabase async queue pattern (pgmq + pg_cron) exists precisely to decouple write volume from index updates.
Filtered search at scale: Postgres's query planner wasn't built for "filter by user, then rank by vector." At small scale (thousands of users, each with hundreds of memories), the user filter is selective enough that this works fine. At millions of users or millions of memories per user, pre-filtering becomes expensive and post-filtering misses results. Dedicated vector databases handle this natively.
Multi-hop reasoning and temporal graphs: If you need to answer "has the user's goal changed over the last 6 months?" or "which entities are related to this topic the user mentioned in March?", you've outgrown a flat vector store. That's what Zep's temporal knowledge graph and Graphiti are built for.
The rule of thumb: DIY pgvector is the right choice up to ~1M total vectors per deployment, or roughly 10K users with 100 memories each. Beyond that, start benchmarking.
Summary
All three stacks implement the same 4-step pattern — embed, store, retrieve, inject — with different tradeoffs:
- Convex: TypeScript-native, zero config, production-ready out of the box. Best if you're already on Convex.
- Supabase pgvector: Full Postgres power, HNSW tuning, halfvec compression. Best for data-control and hybrid (vector + relational) queries.
- Self-hosted Drizzle + pgvector: Cheapest at scale, maximum flexibility. Best if you're already managing Postgres on Fly or Railway.
Start with whichever matches your existing stack. The extraction prompt and injection pattern are identical across all three.
Article 4 of 5 in the "Make Your AI App Remember" series.
- Article 1: Why Your AI App Forgets Everything
- Article 2: AI Memory Architectures: A Decision Framework
- Article 3: Add Persistent Memory to Any AI App with Mem0
- Article 5: Temporal Memory with Zep
Related: Vector Databases · Supabase AI Docs · Convex Vector Search
Primary sources: pgvector repo, pgvector 0.8.2 release, PostgreSQL pgvector 0.8.2 release note, Convex vector search docs, Convex limits, Convex pricing, Supabase pgvector docs, Supabase automatic embeddings, Supabase HNSW indexes, Supabase billing docs, OpenAI embeddings guide, OpenAI pricing, Drizzle pgvector guide, Fly Managed Postgres, Mem0 pricing, Zep pricing.