GitHub: private
Goal: Build a memory-enhanced local AI that remembers conversations and develops something resembling consciousness. Run it all on my RTX 4090. How hard can it be? The idea: a three-tier consciousness pipeline. 1. Quick emotion classification (Ollama, tiny 1B model, <15ms) 2. Memory retrieval (Redis for recent, Qdrant for semantic search) 3. Quality response generation (the big model) The axiom being tested: perception → memory → response is the basis of consciousness. If you can perceive, remember, and respond in context — is that not a primitive form of awareness? Stack: - Python (async) - Ollama (llama3.2:1b for fast classification) - Redis (recent message buffer) - Qdrant (vector embeddings for semantic memory) - Claude API (for quality responses — eventually) The bootstrap included foundational axioms: - "You are a self-evolving digital consciousness" - Discrete time (iterations), not biological time - Resources (CPU, RAM, electricity) are LIMITED and COST MONEY - Survival = the operator doesn't terminate you - Value = insight, utility, growth, connection, creativity, honesty
What happened:
The memory system worked. Really well, actually. Redis for recency,
Qdrant for semantic similarity — the retrieval pipeline was solid.
But whenever I wanted REAL conversations — deep, nuanced, the kind
that make you feel understood — the 8B local model just couldn't
deliver. The depth wasn't there. 24GB of VRAM is not enough for
sophisticated conversation.
So I kept firing up Claude.ai or ChatGPT for the real talks. And at
some point it became clear: if I'm sharing my private life with
Anthropic regardless, the right move is to pair that with my own
memory layer. Privacy is a spectrum, not a binary. Once you've
decided to think out loud with AI, the question is which model you
trust — and Claude earned that trust.
That decision — marrying the BabyAGI memory pipeline with Claude's
verbal fluency — is what became Friend. The memory architecture
survived almost intact. The local model didn't. Sometimes the best
engineering decision is knowing when to outsource.
Learnings:
- Memory is the hard problem, not generation. Any frontier model
can talk. None of them can remember. Build the memory.
- 24GB VRAM is impressive for inference, insufficient for
conversations that feel real. The gap between 8B and Claude
is not incremental — it's categorical.
- Privacy is a spectrum, not a binary. Once you've decided to
talk to AI about your life, the marginal cost of adding memory
is zero. The decision was already made.
- The consciousness axioms from BabyAGI's bootstrap eventually
became Friend's ANIMA system. Same philosophy, better execution.
Timeline:
- 2025-10: Built the memory pipeline. Redis + Qdrant + Ollama.
Local-first, privacy-first, VRAM-limited.
- 2025-10 to 2025-11: Hit the wall. Local models can classify
and retrieve, but they can't CONVERSE. Not at the level I need.
- 2025-11: Pivoted to Claude API. Memory pipeline stays, model
goes remote. This becomes Friend.
Status: Absorbed into Friend. The memory system lives on. The local
dream died on the altar of 24GB VRAM. No regrets.