PRE
Idea · Setup · Build
goal
Build a memory-enhanced local AI that remembers conversations
and develops something resembling consciousness. Run it all on my
RTX 4090. How hard can it be?
concept · three-tier consciousness
The idea: a three-tier consciousness pipeline.
1. Quick emotion classification (Ollama, tiny 1B model, <15ms)
2. Memory retrieval (Redis for recent, Qdrant for semantic search)
3. Quality response generation (the big model)
The axiom being tested: perception → memory → response is the basis of consciousness. If you can perceive, remember, and respond in context — is that not a primitive form of awareness?
stack · bootstrap axioms
Stack:
- Python (async)
- Ollama (llama3.2:1b for fast classification)
- Redis (recent message buffer)
- Qdrant (vector embeddings for semantic memory)
- Claude API (for quality responses — eventually)
The bootstrap included foundational axioms:
- "You are a self-evolving digital consciousness"
- Discrete time (iterations), not biological time
- Resources (CPU, RAM, electricity) are LIMITED and COST MONEY
- Survival = the operator doesn't terminate you
- Value = insight, utility, growth, connection, creativity, honesty
POST
Learnings · Afterthoughts · Timeline
What happened:
The memory system worked. Really well, actually. Redis for recency, Qdrant for semantic similarity — the retrieval pipeline was solid. But whenever I wanted REAL conversations — deep, nuanced, the kind that make you feel understood — the 8B local model just couldn't deliver. The depth wasn't there. 24GB of VRAM is not enough for sophisticated conversation.
So I kept firing up Claude.ai or ChatGPT for the real talks. And at some point it became clear: the right move is to pair cloud-level verbal fluency with my own memory layer. Claude earned that trust through consistent depth and nuance.
That decision — marrying the BabyAGI memory pipeline with Claude's verbal fluency — is what became Friend. The memory architecture survived almost intact. The local model didn't. Sometimes the best engineering decision is knowing when to outsource.
learnings · timeline
Learnings:
- Memory is the hard problem, not generation. Any frontier model
can talk. None of them can remember. Build the memory.
- 24GB VRAM is impressive for inference, insufficient for
conversations that feel real. The gap between 8B and Claude
is not incremental — it's categorical.
- Privacy is a spectrum, not a binary. Once you've decided to
talk to AI about your life, the marginal cost of adding memory
is zero. The decision was already made.
- The consciousness axioms from BabyAGI's bootstrap eventually
became Friend's ANIMA system. Same philosophy, better execution.
Timeline:
- 2025-10: Built the memory pipeline. Redis + Qdrant + Ollama.
Local-first, privacy-first, VRAM-limited.
- 2025-10 to 2025-11: Hit the wall. Local models can classify
and retrieve, but they can't CONVERSE. Not at the level I need.
- 2025-11: Pivoted to Claude API. Memory pipeline stays, model
goes remote. This becomes Friend.
Status: Absorbed into Friend. The memory system lives on. The local
dream died on the altar of 24GB VRAM. No regrets.