RAG Implementation, Neon PostgreSQL Migration & GenAI SDK Changes

Date: February 27, 2026 Scope: stemblockai-backend Status: Architecture Review & Migration Planning

RAG Architecture Overview
RAG Detailed Workflow
RAG Implementation Details (File-by-File)
Initial Data Ingestion Guide
Evaluating RAG Results
Neon PostgreSQL Migration
Google GenAI SDK Changes
Action Items Summary

1. RAG Architecture Overview

Pipeline Flow

┌──────────────────────────────────────────────────────────────────────┐
│                        INGESTION PIPELINE                            │
│                                                                      │
│  Document Input (curriculum, rubrics, feedback corrections)          │
│       │                                                              │
│       ▼                                                              │
│  IngestionService.ingestDocument()                                   │
│       │  Chunks text: 1000 chars, 200 overlap                       │
│       │  Breaks at sentence/paragraph boundaries                    │
│       ▼                                                              │
│  RagService.ingestDocuments()                                        │
│       │  Batch size: 20 documents per cycle                         │
│       ▼                                                              │
│  EmbeddingService.embedBatch()                                       │
│       │  Model: gemini-embedding-001                                │
│       │  Dimensions: 768                                            │
│       │  Task type: RETRIEVAL_DOCUMENT                              │
│       │  SDK: @google/genai (Vertex AI backend)                     │
│       ▼                                                              │
│  VectorStoreService.addDocuments()                                   │
│       │  Storage: PostgreSQL + pgvector extension                   │
│       │  Table: document_embeddings                                 │
│       │  Index: HNSW (cosine distance, m=16, ef=64)                 │
│       ▼                                                              │
│  ✅ Stored in pgvector                                               │
└──────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────┐
│                        RETRIEVAL PIPELINE                            │
│                                                                      │
│  User Query (e.g., evaluation rubric request)                        │
│       │                                                              │
│       ▼                                                              │
│  RagService.query() / RagService.retrieveContext()                   │
│       │                                                              │
│       ▼                                                              │
│  EmbeddingService.embedText(question)                                │
│       │  Same model: gemini-embedding-001, 768D                     │
│       │  Task type: RETRIEVAL_QUERY                                 │
│       ▼                                                              │
│  VectorStoreService.search(queryEmbedding, topK, filter)             │
│       │  SQL: 1 - (embedding <=> query::vector) AS score            │
│       │  Operator: <=> (cosine distance)                            │
│       │  Optional filter: WHERE category = ?                        │
│       │  LIMIT: topK (default 5 for query, 3 for retrieveContext)   │
│       ▼                                                              │
│  Context Assembly                                                    │
│       │  "[Source 1] content...\n\n[Source 2] content..."            │
│       ▼                                                              │
│  GenAIService.generateContent()                                      │
│       │  Model: gemini-2.5-flash (default)                          │
│       │  System: STEM education assistant prompt                    │
│       │  User: Context + Question                                   │
│       ▼                                                              │
│  RagResponse { answer, sources[], model }                            │
└──────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────┐
│                     FEEDBACK & FINE-TUNING LOOP                      │
│                                                                      │
│  User Feedback (rating + correction)                                 │
│       │                                                              │
│       ▼                                                              │
│  FeedbackLoopService.recordFeedback()                                │
│       │  Stores in ai_feedback table                                │
│       │                                                              │
│       ├─── Rating ≥ 4 + has correction ──▶ Auto-ingest to RAG      │
│       │    (immediate improvement via RagService.ingestDocument)     │
│       │                                                              │
│       ▼                                                              │
│  checkTrainingReadiness() — 50+ examples threshold                   │
│       │                                                              │
│       ▼                                                              │
│  FineTuningService.submitTuningJob()                                 │
│       │  Base: gemini-2.0-flash-001                                 │
│       │  API: client.tunings.tune()                                 │
│       │  Tracks: fine_tuning_jobs table                             │
│       ▼                                                              │
│  Tuned model available for future generations                        │
└──────────────────────────────────────────────────────────────────────┘

Where RAG Is Consumed

Consumer	File	How RAG Is Used
STEM Evaluations	`src/evaluations/providers/gemini-llm.provider.ts:78-89`	Retrieves rubric context before evaluation. Prepends `Relevant Evaluation Guidelines:` to the user prompt. Category filter: `rubrics`, topK: 3
Assignment Creation	`src/workflows/assignment-creation/`	Retrieves curriculum standards context when generating assignments
Learning Paths	`src/workflows/learning-paths/`	Retrieves learning context for path generation
Feedback Loop	`src/rag/feedback-loop.service.ts:60-79`	High-quality corrections (rating ≥ 4) auto-ingested into RAG for immediate improvement

Database Tables (3 new tables)

Table	Purpose	Key Columns
`document_embeddings`	Vector store for RAG documents	`id`, `content`, `metadata` (JSONB), `category`, `embedding` (vector[768])
`ai_feedback`	User corrections for model improvement	`userId`, `sessionType`, `originalPrompt`, `originalOutput`, `userCorrection`, `rating`, `usedForTraining`
`fine_tuning_jobs`	Tracks Gemini fine-tuning jobs	`geminiJobName`, `baseModel`, `tunedModelName`, `status`, `sampleCount`

2. RAG Detailed Workflow

2.1 What RAG Actually Does

RAG = "Before asking the AI a question, first search a knowledge base for relevant context and include it in the prompt."

Without RAG:

Prompt: "Evaluate this robotics submission"
→ Gemini answers based only on its training data (generic)

With RAG:

Step 1: Search your knowledge base for relevant rubrics
Step 2: Prompt: "Here are the evaluation guidelines: [retrieved rubrics]
         Now evaluate this robotics submission"
→ Gemini answers with your specific rubric criteria (precise)

2.2 End-to-End: What Happens During an AI Evaluation

When a coach triggers POST /api/v1/evaluations/generate/:submissionId:

1. EVALUATION REQUEST
   gemini-llm.provider.ts:59 → evaluateSubmission(request)

2. RAG CONTEXT RETRIEVAL (lines 78-89)
   ┌─────────────────────────────────────────────────────────────────┐
   │ ragService.retrieveContext(                                     │
   │   "Robotics Challenge evaluation rubric",  ← query             │
   │   "rubrics",                               ← category filter   │
   │   3                                        ← top 3 results     │
   │ )                                                               │
   │                                                                 │
   │ Inside retrieveContext (rag.service.ts:126):                    │
   │                                                                 │
   │ a) EMBED THE QUERY                                              │
   │    embeddingService.embedText("Robotics Challenge...")          │
   │    → Calls Google: gemini-embedding-001 (RETRIEVAL_QUERY)       │
   │    → Returns: [0.12, -0.45, 0.78, ...] (768 numbers)           │
   │                                                                 │
   │ b) SEARCH PGVECTOR (cosine similarity)                          │
   │    vectorStore.search(queryEmbedding, 3, {category: 'rubrics'}) │
   │    → SQL: SELECT content,                                       │
   │           1 - (embedding <=> query::vector) AS score            │
   │           FROM document_embeddings                              │
   │           WHERE category = 'rubrics'                            │
   │           ORDER BY embedding <=> query::vector                  │
   │           LIMIT 3                                               │
   │    → Returns top 3 most similar rubric chunks                   │
   │                                                                 │
   │ c) BUILD CONTEXT STRING                                         │
   │    "[Source 1] Evaluation Rubric: Robot Design..."              │
   │    "[Source 2] Code Quality Criteria: Score 5..."               │
   │    "[Source 3] Documentation Standards..."                      │
   └─────────────────────────────────────────────────────────────────┘

3. AUGMENT THE PROMPT (line 85)
   userPrompt = "Relevant Evaluation Guidelines:\n"
              + [the 3 retrieved rubric chunks]
              + "\n\n"
              + [the original evaluation prompt with student files]

4. GENERATE WITH CONTEXT CACHE (lines 97-106)
   contextCache.generateWithCache({
     model: "gemini-2.5-flash",
     systemInstruction: [evaluation system prompt],  ← cached (90% token savings)
     contents: userPrompt,                           ← includes RAG context
     responseMimeType: "application/json",
   })

5. PARSE & RETURN
   → JSON: { overallScore, confidence, categories, summary, nextSteps }

2.3 End-to-End: Assignment Creation with RAG

When a coach triggers POST /api/v1/workflows/assignment-creation/generate:

assignment-creation.service.ts:618
  ragService.retrieveContext(
    "Quadratic Equations Project grade 9 STEM curriculum standards",
    "curriculum-standards",   ← searches the curriculum-standards category
    3
  )
  → Retrieves NGSS/Common Core standards for that topic and grade level
  → Prepends "Relevant Curriculum Standards:\n[context]" to the prompt
  → Gemini generates assignment aligned with real standards

2.4 End-to-End: Learning Path Generation with RAG

When a coach triggers POST /api/v1/workflows/learning-paths/generate:

learning-paths.service.ts:594
  ragService.retrieveContext(
    "Programming, Robotics STEM learning progression curriculum",
    "curriculum-standards",   ← same category as assignments
    3
  )
  → Retrieves learning progressions for the student's skill gaps
  → Prepends "Relevant Curriculum Standards:\n[context]" to the prompt
  → Gemini generates a personalized 8-12 week learning plan

2.5 How Documents Enter the Knowledge Base

Path A — Manual ingestion (curriculum standards, rubrics):

ingestionService.ingestCurriculumStandards(standards)
  → Chunks text (500 chars, 100 overlap for standards)
  → Each chunk gets ID: "standard-NGSS-MS-PS1-1-chunk-0"
  → ragService.ingestDocuments(chunks)
    → embeddingService.embedBatch(texts)  ← vectors generated
    → vectorStore.addDocuments(docs)      ← stored in PostgreSQL/pgvector

Path B — Automatic from user feedback (adaptive learning):

User gives rating ≥ 4 with a correction
  → feedbackLoopService.recordFeedback(input)
    → Saves to ai_feedback table
    → ragService.ingestDocument({
        content: "Corrected AI Output for evaluation:
                  Original prompt: [truncated]
                  Corrected response: [user's correction]",
        metadata: { category: "feedback-corrections" }
      })
    → Now future queries can retrieve this correction as context

2.6 The Feedback → Fine-Tuning Loop

50+ feedback corrections accumulate
  → feedbackLoopService.triggerFineTuningIfReady()
    → fineTuningService.collectTrainingData()
      → Pulls unused corrections from ai_feedback table
    → fineTuningService.submitTuningJob(examples)
      → Calls Google: client.tunings.tune({
          baseModel: "gemini-2.0-flash-001",
          trainingDataset: { examples: [...] },
          config: { epochCount: 5 }
        })
    → Marks feedback as usedForTraining = true
    → Tracks job in fine_tuning_jobs table
    → When complete: tuned model available for future use

3. RAG Implementation Details

3.1 RagService — The Orchestrator

File: src/rag/rag.service.ts (162 lines)

This is the central coordinator. It exposes two retrieval modes:

// Mode 1: Full RAG — retrieves context AND generates an answer
async query(ragQuery: RagQuery): Promise<RagResponse>

// Mode 2: Context-only — retrieves context for other services to use
async retrieveContext(question, category?, topK?): Promise<{ context, sources }>

query() flow (lines 83-119):

Embed the question → EmbeddingService.embedText()
Search vector store → VectorStoreService.search() with optional category filter
Build context string from top-K results: [Source 1] content...
Generate answer → GenAIService.generateContent() with gemini-2.5-flash
Return answer + sources (content truncated to 200 chars) + model name

retrieveContext() flow (lines 126-151):

Same as query() steps 1-3, but skips generation
Used by GeminiLLMProvider.evaluateSubmission() to augment evaluation prompts

Ingestion (lines 38-78):

Single document: embed → store
Batch: processes in groups of 20, uses embedBatch() for efficiency

3.2 EmbeddingService — Vector Generation

File: src/rag/embedding.service.ts (58 lines)

// Uses Google GenAI SDK
private readonly embeddingModel = 'gemini-embedding-001';

// Single text (taskType: RETRIEVAL_DOCUMENT for ingestion, RETRIEVAL_QUERY for search)
async embedText(text: string, taskType: TaskType = 'RETRIEVAL_DOCUMENT'): Promise<EmbeddingResult>

// Batch texts
async embedBatch(texts: string[], taskType: TaskType = 'RETRIEVAL_DOCUMENT'): Promise<EmbeddingResult[]>

Key implementation detail (lines 18-24):

const response = await client.models.embedContent({
  model: this.embeddingModel,
  contents: text,           // string or string[]
  config: {
    outputDimensionality: 768,  // Google's recommended sweet spot (0.26% loss vs full 3,072D)
    taskType,                   // RETRIEVAL_DOCUMENT for ingestion, RETRIEVAL_QUERY for search
  },
});

Model: gemini-embedding-001 (Google's latest embedding model, #1 on MTEB leaderboard)
Dimensions: 768 (configurable via outputDimensionality, sweet spot for quality/cost)
Task types: RETRIEVAL_DOCUMENT for ingestion, RETRIEVAL_QUERY for search queries
Both single and batch use the same embedContent API — batch passes string[]

3.3 VectorStoreService — pgvector Storage & Search

File: src/rag/vector-store.service.ts (119 lines)

This is the persistence layer. Uses raw SQL via Prisma's $executeRawUnsafe / $queryRawUnsafe because Prisma doesn't natively support the vector type.

Storing documents (lines 28-46):

INSERT INTO document_embeddings (id, content, metadata, category, embedding, created_at, updated_at)
VALUES ($1, $2, $3::jsonb, $4, $5::vector, NOW(), NOW())
ON CONFLICT (id) DO UPDATE SET
  content = EXCLUDED.content, metadata = EXCLUDED.metadata,
  category = EXCLUDED.category, embedding = EXCLUDED.embedding, updated_at = NOW()

Upsert pattern: inserts or updates on ID conflict
Embedding serialized as [0.1,0.2,...] string, cast to ::vector

Similarity search (lines 55-98):

SELECT id, content, metadata, category,
       1 - (embedding <=> $1::vector) AS score
FROM document_embeddings
WHERE category = $3        -- optional filter
ORDER BY embedding <=> $1::vector
LIMIT $2

Operator <=> = cosine distance (pgvector)
Score = 1 - cosine_distance (higher = more similar)
HNSW index used for approximate nearest-neighbor search

Other operations:

deleteDocument(id) — DELETE by ID
clear() — TRUNCATE table
getDocumentCount() — SELECT COUNT(*)

3.4 IngestionService — Document Chunking

File: src/rag/ingestion.service.ts (182 lines)

Handles pre-processing before documents enter the RAG pipeline.

Chunking strategy (lines 148-181):

Default: chunkSize=1000, chunkOverlap=200
Curriculum standards: chunkSize=500, chunkOverlap=100
Rubrics: chunkSize=800, chunkOverlap=150

Tries to break at natural boundaries: \n\n → .\n → . → \n
Only breaks at boundary if it's past 50% of chunk size
Each chunk gets metadata: sourceId, title, category, chunkIndex, totalChunks

Specialized ingestion methods:

ingestCurriculumStandards() — parses framework/gradeLevel/standards structure
ingestRubrics() — parses rubric criteria with score level descriptions

3.5 FeedbackLoopService — Adaptive Learning

File: src/rag/feedback-loop.service.ts (204 lines)

Two-path improvement system:

Path 1 — Immediate RAG improvement (lines 57-79): When a user provides a correction with rating ≥ 4:

await this.ragService.ingestDocument({
  id: `feedback-${feedback.id}`,
  content: `Corrected AI Output for ${input.sessionType}:
Original prompt: ${input.originalPrompt.substring(0, 500)}
Corrected response: ${input.userCorrection}`,
  metadata: { category: 'feedback-corrections', ... },
});

→ Future queries can retrieve these corrections as context

Path 2 — Fine-tuning trigger (lines 113-157):

Threshold: 50+ unused training examples
Calls FineTuningService.collectTrainingData() → submitTuningJob()
Marks feedback as usedForTraining = true after submission

3.6 FineTuningService — Model Customization

File: src/rag/fine-tuning.service.ts (275 lines)

Manages Gemini supervised fine-tuning pipeline.

Job submission (lines 69-148):

const tuningJob = await client.tunings.tune({
  baseModel,                           // default: 'gemini-2.0-flash-001'
  trainingDataset: {
    examples: trainingExamples.map(e => ({
      textInput: e.textInput,          // original prompt
      output: e.output,                // corrected response
    })),
  },
  config: {
    epochCount: config?.epochCount ?? 5,
    learningRateMultiplier: config?.learningRateMultiplier ?? 1.0,
  },
});

Job status polling (lines 153-209):

Checks client.tunings.get(\{ name: job.geminiJobName \})
Maps Gemini states: JOB_STATE_SUCCEEDED → COMPLETED, JOB_STATE_FAILED → FAILED
Updates local DB record with status and tuned model name

Cost: ~$3/1M training tokens (Gemini 2.0 Flash base)

3.7 RagModule — NestJS Wiring

File: src/rag/rag.module.ts (29 lines)

@Module({
  imports: [DatabaseModule],
  providers: [
    EmbeddingService, VectorStoreService, RagService,
    IngestionService, FineTuningService, FeedbackLoopService,
  ],
  exports: [
    RagService, EmbeddingService, VectorStoreService,
    IngestionService, FineTuningService, FeedbackLoopService,
  ],
})
export class RagModule {}

Note: GenAIService is not imported here — it's provided by the SharedModule which is @Global(). RagModule imports DatabaseModule for PrismaService.

3.8 Database Migration

File: prisma/migrations/20260227000000_add_rag_pgvector_system/migration.sql (65 lines)

-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- 3 tables: document_embeddings, ai_feedback, fine_tuning_jobs
-- Key index: HNSW for fast approximate nearest-neighbor (upgraded from IVFFlat)
CREATE INDEX "document_embeddings_embedding_idx" ON "document_embeddings"
USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);

Prisma schema note (schema.prisma:963):

embedding Unsupported("vector(768)")

Prisma uses Unsupported() for pgvector types, which is why all vector operations use raw SQL.

4. Initial Data Ingestion Guide

The RAG system is only useful if it has documents to search. This section explains how to populate the knowledge base for the first time.

4.1 What to Ingest

The RAG system uses categories to filter searches. Each consumer searches a specific category:

Category	Used By	What to Ingest
`rubrics`	STEM Evaluations (`gemini-llm.provider.ts`)	Evaluation rubrics with score criteria
`curriculum-standards`	Assignment Creation, Learning Paths	NGSS, Common Core, state standards
`feedback-corrections`	Automatic (feedback loop)	User corrections — auto-ingested, no manual action needed

4.2 Create a Seeding Script

Create src/scripts/seed-rag.ts following the pattern of the existing src/scripts/seed-users.ts:

import { NestFactory } from '@nestjs/core';
import { AppModule } from '../app.module';
import { IngestionService } from '../rag/ingestion.service';
import { RagService } from '../rag/rag.service';

async function seedRag() {
  const app = await NestFactory.createApplicationContext(AppModule);
  const ingestion = app.get(IngestionService);
  const rag = app.get(RagService);

  console.log('🌱 Starting RAG knowledge base seed...\n');

  // ── Step 1: Ingest Evaluation Rubrics ──────────────────────────
  console.log('📋 Ingesting evaluation rubrics...');

  const rubricCount = await ingestion.ingestRubrics([
    {
      name: 'Robot Design Rubric',
      assignmentType: 'DESIGN',
      criteria: [
        {
          name: 'Mechanical Structure',
          levels: [
            { score: 5, description: 'Innovative design with excellent structural integrity, creative use of components, and optimal weight distribution' },
            { score: 4, description: 'Solid structure with good component usage and functional design' },
            { score: 3, description: 'Adequate structure that functions but has room for optimization' },
            { score: 2, description: 'Basic structure with stability issues or poor component choices' },
            { score: 1, description: 'Incomplete or non-functional mechanical design' },
          ],
        },
        {
          name: 'Functionality',
          levels: [
            { score: 5, description: 'Robot completes all tasks efficiently with consistent performance' },
            { score: 4, description: 'Robot completes most tasks with good reliability' },
            { score: 3, description: 'Robot completes some tasks but with inconsistencies' },
            { score: 2, description: 'Robot has limited functionality or frequent failures' },
            { score: 1, description: 'Robot does not function as intended' },
          ],
        },
        {
          name: 'Creativity and Innovation',
          levels: [
            { score: 5, description: 'Highly original approach with novel solutions to design challenges' },
            { score: 4, description: 'Creative elements with some original thinking' },
            { score: 3, description: 'Standard design approach with minor creative touches' },
            { score: 2, description: 'Mostly follows examples with little original thought' },
            { score: 1, description: 'No creative effort, direct copy of examples' },
          ],
        },
      ],
    },
    {
      name: 'Code Quality Rubric',
      assignmentType: 'CODE',
      criteria: [
        {
          name: 'Logic and Efficiency',
          levels: [
            { score: 5, description: 'Elegant, efficient algorithms with optimal time/space complexity' },
            { score: 4, description: 'Well-structured logic with good efficiency' },
            { score: 3, description: 'Functional logic but with unnecessary redundancy' },
            { score: 2, description: 'Logic works partially or has significant inefficiency' },
            { score: 1, description: 'Broken logic or code that does not compile/run' },
          ],
        },
        {
          name: 'Code Organization',
          levels: [
            { score: 5, description: 'Clean separation of concerns, meaningful naming, consistent style throughout' },
            { score: 4, description: 'Good organization with clear naming conventions' },
            { score: 3, description: 'Adequate organization but inconsistent in places' },
            { score: 2, description: 'Poor organization with confusing variable names' },
            { score: 1, description: 'No discernible organization, unreadable code' },
          ],
        },
        {
          name: 'Comments and Documentation',
          levels: [
            { score: 5, description: 'Comprehensive comments explaining why, clear function documentation' },
            { score: 4, description: 'Good comments on complex sections' },
            { score: 3, description: 'Some comments but missing on key logic' },
            { score: 2, description: 'Minimal or unhelpful comments' },
            { score: 1, description: 'No comments at all' },
          ],
        },
      ],
    },
    {
      name: 'Documentation Rubric',
      assignmentType: 'NOTEBOOK',
      criteria: [
        {
          name: 'Process Documentation',
          levels: [
            { score: 5, description: 'Detailed chronicle of design process including failures, iterations, and reasoning' },
            { score: 4, description: 'Good documentation of process with clear progression' },
            { score: 3, description: 'Basic documentation that covers main steps' },
            { score: 2, description: 'Incomplete documentation with major gaps' },
            { score: 1, description: 'Little to no process documentation' },
          ],
        },
        {
          name: 'Clarity and Organization',
          levels: [
            { score: 5, description: 'Exceptionally clear writing with logical flow, headings, and visual aids' },
            { score: 4, description: 'Well-organized with clear sections and good readability' },
            { score: 3, description: 'Readable but could benefit from better structure' },
            { score: 2, description: 'Difficult to follow or poorly organized' },
            { score: 1, description: 'Incomprehensible or no organization' },
          ],
        },
      ],
    },
    {
      name: 'Technical Writing Rubric',
      assignmentType: 'ESSAY',
      criteria: [
        {
          name: 'Technical Accuracy',
          levels: [
            { score: 5, description: 'All technical claims are accurate and well-supported with evidence' },
            { score: 4, description: 'Mostly accurate with minor oversimplifications' },
            { score: 3, description: 'Generally accurate but with some errors or vague claims' },
            { score: 2, description: 'Multiple technical inaccuracies' },
            { score: 1, description: 'Fundamentally incorrect technical understanding' },
          ],
        },
        {
          name: 'Writing Quality',
          levels: [
            { score: 5, description: 'Excellent grammar, vocabulary, and sentence structure appropriate for age level' },
            { score: 4, description: 'Good writing with minor errors' },
            { score: 3, description: 'Adequate writing but with noticeable errors' },
            { score: 2, description: 'Frequent grammatical errors affecting readability' },
            { score: 1, description: 'Very poor writing quality' },
          ],
        },
      ],
    },
  ]);

  console.log(`✅ Rubrics ingested: ${rubricCount} chunks\n`);

  // ── Step 2: Ingest Curriculum Standards ─────────────────────────
  console.log('📚 Ingesting curriculum standards...');

  const ngssCount = await ingestion.ingestCurriculumStandards({
    framework: 'NGSS',
    gradeLevel: 'Middle School (6-8)',
    standards: [
      {
        id: 'MS-ETS1-1',
        title: 'Define Design Problems',
        description: 'Define the criteria and constraints of a design problem with sufficient precision to ensure a successful solution, taking into account relevant scientific principles and potential impacts on people and the natural environment.',
      },
      {
        id: 'MS-ETS1-2',
        title: 'Evaluate Competing Solutions',
        description: 'Evaluate competing design solutions using a systematic process to determine how well they meet the criteria and constraints of the problem.',
      },
      {
        id: 'MS-ETS1-3',
        title: 'Analyze Data from Tests',
        description: 'Analyze data from tests to determine similarities and differences among several design solutions to identify the best characteristics of each that can be combined into a new solution.',
      },
      {
        id: 'MS-ETS1-4',
        title: 'Develop and Iterate Models',
        description: 'Develop a model to generate data for iterative testing and modification of a proposed object, tool, or process such that an optimal design can be achieved.',
      },
      {
        id: 'MS-PS2-1',
        title: 'Forces and Motion',
        description: 'Apply Newton\'s Third Law to design a solution to a problem involving the motion of two colliding objects. Emphasis is on the change in motion and forces during collision.',
      },
      {
        id: 'MS-PS2-2',
        title: 'Plan Investigation on Forces',
        description: 'Plan an investigation to provide evidence that the change in an object\'s motion depends on the sum of the forces acting on the object and the mass of the object.',
      },
    ],
  });

  const csMathCount = await ingestion.ingestCurriculumStandards({
    framework: 'Common Core Math',
    gradeLevel: 'Middle School (6-8)',
    standards: [
      {
        id: '6.RP.3',
        title: 'Ratios and Proportional Relationships',
        description: 'Use ratio and rate reasoning to solve real-world and mathematical problems, including those involving unit rates, percentages, and proportional relationships in robotics contexts like gear ratios and speed calculations.',
      },
      {
        id: '7.G.6',
        title: 'Geometry - Area and Volume',
        description: 'Solve real-world and mathematical problems involving area, volume, and surface area of two- and three-dimensional objects. Applied to robot chassis design, workspace planning, and component fitting.',
      },
      {
        id: '8.F.4',
        title: 'Functions - Model Relationships',
        description: 'Construct a function to model a linear relationship between two quantities. Applied to sensor calibration, motor speed curves, and PID control in robotics.',
      },
      {
        id: '8.EE.7',
        title: 'Expressions and Equations',
        description: 'Solve linear equations in one variable, including those with rational number coefficients. Applied to calculating distances, speeds, and timing in robot navigation.',
      },
    ],
  });

  console.log(`✅ NGSS standards ingested: ${ngssCount} chunks`);
  console.log(`✅ Common Core Math standards ingested: ${csMathCount} chunks\n`);

  // ── Step 3: Verify ─────────────────────────────────────────────
  const stats = await rag.getStats();
  console.log(`📊 RAG Knowledge Base Stats:`);
  console.log(`   Total documents: ${stats.documentCount}`);

  console.log('\n🎉 RAG seed completed successfully!');
  await app.close();
}

seedRag()
  .then(() => process.exit(0))
  .catch((error) => {
    console.error('❌ Error seeding RAG:', error);
    process.exit(1);
  });

4.3 Run the Seeding Script

cd stemblockai-backend

# Ensure GenAI is configured (embeddings require Google API)
export LLM_PROVIDER=gemini
export GOOGLE_CLOUD_PROJECT=your-project-id
# ... other GenAI env vars

# Run the seed
npx ts-node src/scripts/seed-rag.ts

Expected output:

🌱 Starting RAG knowledge base seed...

📋 Ingesting evaluation rubrics...
✅ Rubrics ingested: 12 chunks

📚 Ingesting curriculum standards...
✅ NGSS standards ingested: 6 chunks
✅ Common Core Math standards ingested: 4 chunks

📊 RAG Knowledge Base Stats:
   Total documents: 22

🎉 RAG seed completed successfully!

4.4 Add to package.json

{
  "scripts": {
    "seed:users": "ts-node src/scripts/seed-users.ts",
    "seed:rag": "ts-node src/scripts/seed-rag.ts",
    "seed:all": "npm run seed:users && npm run seed:rag"
  }
}

4.5 Verify Data in Database

-- Check document counts by category
SELECT category, COUNT(*) as count
FROM document_embeddings
GROUP BY category
ORDER BY count DESC;

-- Expected output:
-- rubrics               | 12
-- curriculum-standards  | 10

4.6 Ingestion for Additional Content

You can ingest additional content at any time using the IngestionService methods:

Custom documents (any category):

await ingestionService.ingestDocument({
  id: 'custom-doc-001',
  title: 'VEX Robotics Competition Rules 2026',
  content: '... full text ...',
  category: 'competition-rules',
  source: 'vex-robotics.com',
});

Additional standards (e.g., state-specific):

await ingestionService.ingestCurriculumStandards({
  framework: 'Texas TEKS',
  gradeLevel: 'Grade 8',
  standards: [
    { id: 'TEKS-8.6A', title: '...', description: '...' },
  ],
});

4.7 Chunking Behavior

Understanding how documents get split is important for quality:

Content Type	Chunk Size	Overlap	Why
General documents	1000 chars	200	Standard — balances context window usage and retrieval precision
Curriculum standards	500 chars	100	Shorter — each standard is a self-contained concept
Rubrics	800 chars	150	Medium — criteria need enough context to be useful

Overlap means consecutive chunks share text at their boundaries. This prevents a relevant sentence from being split across two chunks where neither chunk alone has enough context.

Example with chunkSize=20, overlap=5:

Original:  "The robot must navigate the obstacle course within 60 seconds"
Chunk 1:   "The robot must navig"
Chunk 2:   "navig ate the obstac"   ← "navig" overlaps
Chunk 3:   "bstac le course with"

5. Evaluating RAG Results

5.1 Quick Smoke Test

After ingesting data, verify the RAG pipeline end-to-end:

// In a NestJS test or script
const ragService = app.get(RagService);

// Test 1: Query the rubrics category
const rubricResult = await ragService.query({
  question: 'What are the criteria for evaluating robot design?',
  category: 'rubrics',
  topK: 3,
});

console.log('Answer:', rubricResult.answer);
console.log('Sources:', rubricResult.sources.length);
console.log('Top score:', rubricResult.sources[0]?.score);

// Test 2: Query curriculum standards
const curriculumResult = await ragService.query({
  question: 'What NGSS standards apply to engineering design?',
  category: 'curriculum-standards',
  topK: 3,
});

console.log('Answer:', curriculumResult.answer);
console.log('Sources:', curriculumResult.sources.length);

5.2 What Good Results Look Like

Similarity scores (cosine similarity, 0.0 to 1.0):

Score Range	Meaning	Action
0.85 - 1.0	Excellent match	Content is highly relevant
0.70 - 0.85	Good match	Content is relevant, usable as context
0.50 - 0.70	Weak match	Content is tangentially related, may add noise
Below 0.50	Poor match	Content is not relevant, should not be used

For your system: Scores above 0.70 are typical for rubric/standard retrieval. If top scores are consistently below 0.60, the ingested content may not match the queries well — check chunking or add more specific content.

5.3 Evaluate RAG Impact on Evaluations

Compare evaluation quality with and without RAG:

# Step 1: Run evaluation WITHOUT RAG (empty knowledge base)
# The system gracefully handles this — gemini-llm.provider.ts:87 catches errors

# Step 2: Ingest rubrics
npm run seed:rag

# Step 3: Run the SAME evaluation again WITH RAG

# Step 4: Compare the two outputs:
# - Are category scores more consistent with rubric criteria?
# - Does feedback reference specific rubric language?
# - Are improvements more actionable?

5.4 Monitoring in Production

Key metrics to track:

Metric	How to Measure	Target
RAG retrieval latency	Time from query to context returned	< 500ms
Top-1 similarity score	`sources[0].score` from retrieval	> 0.70 average
Context utilization	Does the AI response reference retrieved sources?	Qualitative review
Feedback loop growth	`SELECT COUNT(*) FROM ai_feedback`	Steady growth
Knowledge base size	`ragService.getStats().documentCount`	Growing via feedback

Check via SQL:

-- Knowledge base growth over time
SELECT DATE(created_at) as date, COUNT(*) as new_docs
FROM document_embeddings
GROUP BY DATE(created_at)
ORDER BY date DESC
LIMIT 30;

-- Feedback corrections ingested to RAG
SELECT COUNT(*) as feedback_in_rag
FROM document_embeddings
WHERE category = 'feedback-corrections';

-- Feedback waiting for fine-tuning
SELECT COUNT(*) as unused_feedback
FROM ai_feedback
WHERE used_for_training = false AND user_correction IS NOT NULL;

5.5 Debugging Poor Results

Problem: AI evaluation doesn't use rubric criteria

Check if documents exist:

SELECT COUNT(*) FROM document_embeddings WHERE category = 'rubrics';

Check retrieval scores — run a test query and inspect sources[].score
If scores are low, the query terms may not match the ingested content. Consider:
- Adding more rubric variations
- Making rubric descriptions more detailed
- Adjusting topK (try 5 instead of 3)

Problem: Wrong category documents retrieved

Verify the category filter is working:

SELECT category, COUNT(*) FROM document_embeddings GROUP BY category;

Check that ingestion uses the correct category strings (must exactly match the filter in the consumer code)

Problem: Retrieval is slow (>2s)

Verify the HNSW index exists:

SELECT indexdef FROM pg_indexes WHERE indexname = 'document_embeddings_embedding_idx';

If missing, create it (see migration 20260227100000_upgrade_ivfflat_to_hnsw_index)
If on Neon, check if compute was cold-started (first query after idle is slow)

5.6 RAG Quality Improvement Cycle

Week 1:  Seed initial rubrics + standards
         → Run evaluations, collect baseline quality

Week 2:  Coaches review AI evaluations, provide corrections
         → High-quality corrections auto-ingest to RAG (rating ≥ 4)
         → RAG results improve immediately

Week 4:  50+ corrections accumulated
         → Fine-tuning triggered automatically
         → Tuned model available for future evaluations

Ongoing: More corrections → better RAG context → better evaluations
         → fewer corrections needed → system stabilizes

6. Neon PostgreSQL Migration

The existing NEON_MIGRATION_GUIDE.md covers the general migration steps. Below are RAG-specific concerns that require additional attention when migrating to Neon.

6.1 pgvector Support on Neon

Neon has native pgvector support — no extra setup needed. However:

Concern	Current (DO/Self-hosted)	Neon	Action Needed
pgvector extension	Manually installed	Pre-installed	None — `CREATE EXTENSION IF NOT EXISTS vector` still works
pgvector version	Depends on PG version	Latest (0.7+)	Verify `SELECT extversion FROM pg_extension WHERE extname = 'vector'`
IVFFlat index	Works	Works	None — supported natively
HNSW index	May not be available	Supported	Consider upgrading from IVFFlat to HNSW for better recall (see 3.3)
Max dimensions	2000	2000	None — we use 768
`vector` type in raw SQL	Works	Works	None

6.2 Connection Pooling Impact on RAG

Neon uses PgBouncer for connection pooling. This affects raw SQL queries:

Potential issue: $executeRawUnsafe with parameterized queries may conflict with PgBouncer's transaction pooling mode.

Current code pattern in vector-store.service.ts:

await this.prisma.$executeRawUnsafe(
  `INSERT INTO document_embeddings ... VALUES ($1, $2, $3::jsonb, $4, $5::vector, ...)`,
  doc.id, doc.content, JSON.stringify(doc.metadata), ...
);

Recommendation:

Use Neon's direct connection string (not pooled) for the DATABASE_URL used by Prisma migrations
Use the pooled connection string for application runtime
Add to Prisma schema:

datasource db {
  provider  = "postgresql"
  url       = env("DATABASE_URL")          // pooled (runtime)
  directUrl = env("DIRECT_DATABASE_URL")   // direct (migrations)
}

Environment variables to add:

# Neon pooled connection (for app runtime)
DATABASE_URL="postgresql://user:pass@ep-xyz-pooler.us-east-2.aws.neon.tech/stemblock_db?sslmode=require"

# Neon direct connection (for migrations)
DIRECT_DATABASE_URL="postgresql://user:pass@ep-xyz.us-east-2.aws.neon.tech/stemblock_db?sslmode=require"

6.3 Index Migration: IVFFlat → HNSW (Recommended)

Current migration uses IVFFlat:

CREATE INDEX ... USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

Neon supports HNSW which has better recall and doesn't require training on existing data:

-- Drop old index
DROP INDEX IF EXISTS document_embeddings_embedding_idx;

-- Create HNSW index (better for dynamic datasets)
CREATE INDEX document_embeddings_embedding_idx ON document_embeddings
USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);

Factor	IVFFlat	HNSW
Recall	~95%	~99%
Build time	Fast	Slower
Query speed	Fast	Slightly slower
Dynamic inserts	Needs periodic re-indexing after bulk inserts	Handles inserts well
Best for	Static datasets	Growing datasets (RAG with feedback loop)

Since the feedback loop continuously ingests new documents, HNSW is the better choice for Neon.

6.4 Cold Start Impact on RAG

Neon scales to zero after 5 min idle. The first RAG query after cold start will:

Wake the database (~500ms-2s)
Load pgvector extension
Execute vector search

Mitigation options:

Accept ~2s latency on first query (acceptable for background AI evaluation)
Set minimum compute to 0.25 CU to prevent full cold start
The existing health check endpoint can serve as a keep-alive

6.5 Storage Considerations

RAG data storage per document:

content: TEXT (~1KB per chunk)
metadata: JSONB (~200 bytes)
embedding: vector(768) = 768 × 4 bytes = 3,072 bytes
Total per chunk: ~4.3 KB

Neon storage tiers:

Plan	Storage	Est. Document Capacity
Free	512 MB	~120K chunks
Launch ($19/mo)	10 GB	~2.3M chunks
Scale ($69/mo)	50 GB	~22M chunks

For a STEM education platform, even the Free tier can hold substantial curriculum + rubric data.

6.6 Migration Steps Specific to RAG

After completing the general migration from NEON_MIGRATION_GUIDE.md:

# 1. Verify pgvector extension exists on Neon
psql "$NEON_URL" -c "SELECT * FROM pg_extension WHERE extname = 'vector';"

# 2. If missing (shouldn't be), enable it
psql "$NEON_URL" -c "CREATE EXTENSION IF NOT EXISTS vector;"

# 3. Verify RAG tables migrated
psql "$NEON_URL" -c "SELECT COUNT(*) FROM document_embeddings;"
psql "$NEON_URL" -c "SELECT COUNT(*) FROM ai_feedback;"
psql "$NEON_URL" -c "SELECT COUNT(*) FROM fine_tuning_jobs;"

# 4. Verify vector index exists
psql "$NEON_URL" -c "
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'document_embeddings'
  AND indexdef LIKE '%vector%';
"

# 5. Test vector search
psql "$NEON_URL" -c "
SELECT id, 1 - (embedding <=> (SELECT embedding FROM document_embeddings LIMIT 1)) AS score
FROM document_embeddings
ORDER BY embedding <=> (SELECT embedding FROM document_embeddings LIMIT 1)
LIMIT 5;
"

6.7 Prisma Schema Change Required

Add directUrl to support Neon's dual-connection architecture:

File: prisma/schema.prisma (line 8-11)

 datasource db {
   provider  = "postgresql"
   url       = env("DATABASE_URL")
+  directUrl = env("DIRECT_DATABASE_URL")
 }

7. Google GenAI SDK Changes

7.1 Current State: Already on `@google/genai`

The codebase has already migrated from @google-cloud/aiplatform (old Vertex AI SDK) to @google/genai (new unified GenAI SDK). Here's what's in place:

Package: @google/genai v1.43.0 (in package.json)

Initialization (genai.service.ts:38-52):

import { GoogleGenAI } from '@google/genai';

const options = {
  vertexai: true,          // Uses Vertex AI backend (not AI Studio)
  project: projectId,
  location,
};

if (serviceAccountKeyBase64) {
  const credentials = JSON.parse(Buffer.from(serviceAccountKeyBase64, 'base64').toString('utf-8'));
  options.googleAuthOptions = { credentials };
}

this.client = new GoogleGenAI(options);

7.2 Key Differences from Old SDK

The existing VERTEX_AI_SETUP.md references the old SDK patterns. Here's what changed:

Aspect	Old (`@google-cloud/aiplatform`)	Current (`@google/genai`)
Package	`@google-cloud/aiplatform`	`@google/genai`
Import	`const \{ VertexAI \} = require(...)`	`import \{ GoogleGenAI \} from '@google/genai'`
Init	`new VertexAI(\{ project, location \})`	`new GoogleGenAI(\{ vertexai: true, project, location \})`
Generate	`model.generateContent(\{ contents: [...] \})`	`client.models.generateContent(\{ model, contents, config \})`
System prompt	Separate parameter	Via `config.systemInstruction`
Embeddings	`const model = vertex.preview.getGenerativeModel(...)`	`client.models.embedContent(\{ model, contents, config \})`
Fine-tuning	Not available in SDK	`client.tunings.tune(\{ baseModel, trainingDataset, config \})`
Context caching	Not available	`client.caches.create(\{ model, config \})`
Models	`gemini-1.5-flash`, `gemini-1.5-pro`	`gemini-2.5-flash`, `gemini-2.5-pro`, `gemini-2.5-flash-lite`
Auth	`GOOGLE_APPLICATION_CREDENTIALS` file only	File, base64-encoded, or default credentials

7.3 API Surface Used (All `@google/genai` Methods)

Here's every GenAI SDK method used across the codebase:

Method	Used In	Purpose
`client.models.generateContent()`	`genai.service.ts:79`, `context-cache.service.ts:118`	Text generation (evaluations, feedback, writing)
`client.models.embedContent()`	`embedding.service.ts:18`, `embedding.service.ts:40`	Vector embeddings for RAG
`client.tunings.tune()`	`fine-tuning.service.ts:107`	Submit fine-tuning jobs
`client.tunings.get()`	`fine-tuning.service.ts:170`	Check fine-tuning job status
`client.caches.create()`	`context-cache.service.ts:57`	Create server-side context cache
`client.caches.delete()`	`context-cache.service.ts:156`	Delete context cache

7.4 Changes Needed If Updating SDK Version

If upgrading @google/genai from v1.43.0 to a newer version, watch for:

1. Breaking changes in embedContent response shape:

// Current (v1.43.0) — embedding.service.ts:26
const embedding = response.embeddings?.[0]?.values;

Check if the response structure changes (e.g., response.embedding.values vs response.embeddings[0].values).

2. Tuning API changes:

// Current — fine-tuning.service.ts:107
const tuningJob = await client.tunings.tune({ ... });

The tuning API is relatively new. Method names and parameters may evolve (e.g., tune() → create()).

3. Context caching TTL format:

// Current — context-cache.service.ts:67
ttl: `${ttl}s`,  // String format "3600s"

Google has been inconsistent with duration formats. Verify s suffix is still required.

4. Model name changes:

// Current models
flashModel = 'gemini-2.5-flash'
proModel   = 'gemini-2.5-pro'
liteModel  = 'gemini-2.5-flash-lite'

These are configured via env vars, so model upgrades (e.g., to gemini-3.0-flash) only need env var changes.

7.5 Authentication Modes

The current GenAIService supports three auth modes (genai.service.ts:34-51):

Mode	When Used	Env Var
Base64 service account key	Non-GCP environments (DigitalOcean)	`GCP_SERVICE_ACCOUNT_KEY_BASE64`
Service account JSON file	Local development	`GOOGLE_APPLICATION_CREDENTIALS`
Default credentials	GCP environments (Cloud Run, GKE)	None needed

No changes needed for Neon migration — auth is independent of the database provider.

7.6 Documents Needing Updates

The VERTEX_AI_SETUP.md is outdated and references:

Old models: gemini-1.5-flash-8b, gemini-1.5-pro-002
Old SDK patterns
No mention of RAG, embeddings, fine-tuning, or context caching

Recommended updates:

Update model references to gemini-2.5-* family
Add RAG embedding model (gemini-embedding-001) documentation
Add context caching setup notes
Add fine-tuning API documentation
Update IAM roles: add Vertex AI Tuning User if using fine-tuning

8. Action Items Summary

For Neon Migration (RAG-specific)

#	Action	Priority	Effort
1	Add `directUrl` to `prisma/schema.prisma`	High	5 min
2	Add `DIRECT_DATABASE_URL` to all environments	High	15 min
3	Verify pgvector extension on Neon after migration	High	5 min
4	Test vector search queries on Neon	High	30 min
5	Consider upgrading IVFFlat → HNSW index	Medium	1 hour
6	Benchmark RAG query latency on Neon (including cold start)	Medium	2 hours
7	Set minimum compute to 0.25 CU if cold start latency unacceptable	Low	5 min

For GenAI SDK

#	Action	Priority	Effort
1	Update `VERTEX_AI_SETUP.md` to reflect `@google/genai` SDK and 2.5 models	Medium	2 hours
2	Pin `@google/genai` version in package.json (avoid `^` prefix)	Medium	5 min
3	Add `Vertex AI Tuning User` IAM role if fine-tuning is used in production	Medium	10 min
4	Monitor `@google/genai` changelog for breaking changes in embedding/tuning APIs	Low	Ongoing

For RAG System

#	Action	Priority	Effort
1	No public API controller exists — add REST endpoints if admin needs RAG management	Low	4 hours
2	Add monitoring/metrics for RAG query latency and cache hit rates	Medium	3 hours
3	Consider adding a seeding script for initial curriculum/rubric data	Medium	2 hours

Document Version: 2.0 Last Updated: February 27, 2026 Author: StemBlock AI Engineering Team

Table of Contents​

1. RAG Architecture Overview​

Pipeline Flow​

Where RAG Is Consumed​

Database Tables (3 new tables)​

2. RAG Detailed Workflow​

2.1 What RAG Actually Does​

2.2 End-to-End: What Happens During an AI Evaluation​

2.3 End-to-End: Assignment Creation with RAG​

2.4 End-to-End: Learning Path Generation with RAG​

2.5 How Documents Enter the Knowledge Base​

2.6 The Feedback → Fine-Tuning Loop​

3. RAG Implementation Details​

3.1 RagService — The Orchestrator​

3.2 EmbeddingService — Vector Generation​

3.3 VectorStoreService — pgvector Storage & Search​

3.4 IngestionService — Document Chunking​

3.5 FeedbackLoopService — Adaptive Learning​

3.6 FineTuningService — Model Customization​

3.7 RagModule — NestJS Wiring​

3.8 Database Migration​

4. Initial Data Ingestion Guide​

4.1 What to Ingest​

4.2 Create a Seeding Script​

4.3 Run the Seeding Script​

4.4 Add to package.json​

4.5 Verify Data in Database​

4.6 Ingestion for Additional Content​

4.7 Chunking Behavior​

5. Evaluating RAG Results​

5.1 Quick Smoke Test​

5.2 What Good Results Look Like​

5.3 Evaluate RAG Impact on Evaluations​

5.4 Monitoring in Production​

5.5 Debugging Poor Results​

5.6 RAG Quality Improvement Cycle​

6. Neon PostgreSQL Migration​

6.1 pgvector Support on Neon​

6.2 Connection Pooling Impact on RAG​

6.3 Index Migration: IVFFlat → HNSW (Recommended)​

6.4 Cold Start Impact on RAG​

6.5 Storage Considerations​

6.6 Migration Steps Specific to RAG​

6.7 Prisma Schema Change Required​

7. Google GenAI SDK Changes​

7.1 Current State: Already on @google/genai​

7.2 Key Differences from Old SDK​

7.3 API Surface Used (All @google/genai Methods)​

7.4 Changes Needed If Updating SDK Version​

7.5 Authentication Modes​

7.6 Documents Needing Updates​

8. Action Items Summary​

For Neon Migration (RAG-specific)​

For GenAI SDK​

For RAG System​

Table of Contents