RAG-on-Me

Outcome

Built a production-ready RAG system with persistent memory and turned my portfolio into an interactive AI assistant

Timeline

1 week (learning → deployment)

Role

AI Engineer & Backend Developer

Technologies

Python LangChain LangGraph FastAPI PostgreSQL pgvector OpenAI API

Project Overview

RAG-on-Me is a minimal Retrieval-Augmented Generation (RAG) system I built to demonstrate the core moving parts of a modern AI pipeline—without hiding behind heavy frameworks. The project answers questions about my experience, projects, and skills, grounded strictly in my documents (CV, case studies, READMEs).

Learning Exercise

Deeply understand RAG mechanics from scratch

Portfolio Showcase

Interactive resume that recruiters can query directly

Problem & Opportunity

RAG has become a standard design for grounding LLMs with external knowledge. Yet many examples are over-engineered or tied to a single library. I wanted to create something compact, transparent, and reusable that could serve as both a personal assistant and a code reference for others exploring RAG.

The Opportunity

A system that not only demonstrates technical skill but also functions as an interactive resume.

Scope & Features

Core Features

Markdown ingestion: Load structured documents like cv.md into pgvector store
Threaded chat: Each thread_id maintains conversation state
Postgres-backed checkpoints: Persistent chat memory
Minimal graph design: Clear retrieve → generate flow using LangGraph

API Endpoints

POST /initialize

Ingest documents into vector store

POST /chat

Query with conversation memory

GET /graph/state

Inspect graph state for debugging

Technical Architecture

Data Flow

Client → FastAPI → LangGraph → Vector Store + LLM → Postgres (checkpoints)

Stack Components

FastAPI for REST endpoints
LangGraph for pipeline orchestration
PostgreSQL + pgvector for storage
OpenAI API for embeddings & generation

Module Structure

main.py API lifecycle + graph compile
graph_runtime.py Retrieve → generate flow
nodes.py Retrieval & generation logic
adapters.py LLM, embeddings, vector store
ingest.py Markdown document indexing

Challenges & Trade-offs

Scope Control

Kept it under ~1K lines to ensure readability, at the cost of advanced features (e.g., reranking, hybrid search).

State Management

Designing checkpoints to reliably track multi-turn conversations required careful persistence logic.

Generality vs Personal Use

Balancing a system tailored to my CV with code flexible enough for others to adapt.

Outcomes & Learnings

Technical Achievements

Built a clean RAG service from scratch
Gained hands-on clarity of ingestion, retrieval, generation, and memory management
Integrated LangGraph pipelines with FastAPI

Validation Results

Successfully validated the system by querying my portfolio documents through conversation, proving both technical functionality and practical utility for recruiters.

Proudest Achievement

Turning my resume into an interactive chatbot that demonstrates RAG in a recruiter-friendly way.

Future Directions

Hybrid Search

Experiment with BM25 + vector search for better grounding

Guardrails

Handle hallucinations and off-topic queries

Multi-Document Ingestion

Expand to case studies, blog posts, academic notes

Reflection

RAG-on-Me taught me how to go beyond toy demos and build a real, production-style pipeline with persistence, modularity, and API design. It is also a personal branding tool: instead of reading my CV, recruiters can chat with it.

Advice for Others

"Strip down RAG to its essentials first. Understand the flow from request → retrieval → generation → memory. Once you've nailed that, scaling features becomes easier."