Approach
University content is hierarchical and deeply context-dependent — a CS algorithms question should never pull from a different faculty's material. The architectural move was three-level tenancy: every document carries institution/faculty/department tags, and the retriever filters on them before embedding search. That keeps answers relevant and makes the platform safe to deploy across multiple schools on one database.
Problem
Students drown in PDFs, recorded lectures, and slide decks. Generic LLM chat has no idea what's actually on their syllabus, and generic RAG ignores the institutional context that makes an answer correct.
How I built it
- ▸Three-level tenancy model (Institution → Faculty → Department) scoped into every retrieval query.
- ▸Multi-modal ingestion: PDFs via parser + OCR, audio via faster-whisper, slides via structured extraction.
- ▸Hybrid retrieval (BM25 + vector) with citation pointers back to the source document.
- ▸Role-based access: students see their own department, admins see their own institution.
Outcome
- →A deployable SaaS shell for campus AI assistants.
- →Answers ground in real uploaded material with citations — no hallucinations.
- →Approaching public launch for a pilot institution.
Stack