Skip to main content

RAG Implementation, Neon PostgreSQL Migration & GenAI SDK Changes

Date: February 27, 2026 Scope: stemblockai-backend Status: Architecture Review & Migration Planning


Table of Contents

  1. RAG Architecture Overview
  2. RAG Detailed Workflow
  3. RAG Implementation Details (File-by-File)
  4. Initial Data Ingestion Guide
  5. Evaluating RAG Results
  6. Neon PostgreSQL Migration
  7. Google GenAI SDK Changes
  8. Action Items Summary

1. RAG Architecture Overview

Pipeline Flow

┌──────────────────────────────────────────────────────────────────────┐
│ INGESTION PIPELINE │
│ │
│ Document Input (curriculum, rubrics, feedback corrections) │
│ │ │
│ ▼ │
│ IngestionService.ingestDocument() │
│ │ Chunks text: 1000 chars, 200 overlap │
│ │ Breaks at sentence/paragraph boundaries │
│ ▼ │
│ RagService.ingestDocuments() │
│ │ Batch size: 20 documents per cycle │
│ ▼ │
│ EmbeddingService.embedBatch() │
│ │ Model: gemini-embedding-001 │
│ │ Dimensions: 768 │
│ │ Task type: RETRIEVAL_DOCUMENT │
│ │ SDK: @google/genai (Vertex AI backend) │
│ ▼ │
│ VectorStoreService.addDocuments() │
│ │ Storage: PostgreSQL + pgvector extension │
│ │ Table: document_embeddings │
│ │ Index: HNSW (cosine distance, m=16, ef=64) │
│ ▼ │
│ ✅ Stored in pgvector │
└──────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────┐
│ RETRIEVAL PIPELINE │
│ │
│ User Query (e.g., evaluation rubric request) │
│ │ │
│ ▼ │
│ RagService.query() / RagService.retrieveContext() │
│ │ │
│ ▼ │
│ EmbeddingService.embedText(question) │
│ │ Same model: gemini-embedding-001, 768D │
│ │ Task type: RETRIEVAL_QUERY │
│ ▼ │
│ VectorStoreService.search(queryEmbedding, topK, filter) │
│ │ SQL: 1 - (embedding <=> query::vector) AS score │
│ │ Operator: <=> (cosine distance) │
│ │ Optional filter: WHERE category = ? │
│ │ LIMIT: topK (default 5 for query, 3 for retrieveContext) │
│ ▼ │
│ Context Assembly │
│ │ "[Source 1] content...\n\n[Source 2] content..." │
│ ▼ │
│ GenAIService.generateContent() │
│ │ Model: gemini-2.5-flash (default) │
│ │ System: STEM education assistant prompt │
│ │ User: Context + Question │
│ ▼ │
│ RagResponse { answer, sources[], model } │
└──────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────┐
│ FEEDBACK & FINE-TUNING LOOP │
│ │
│ User Feedback (rating + correction) │
│ │ │
│ ▼ │
│ FeedbackLoopService.recordFeedback() │
│ │ Stores in ai_feedback table │
│ │ │
│ ├─── Rating ≥ 4 + has correction ──▶ Auto-ingest to RAG │
│ │ (immediate improvement via RagService.ingestDocument) │
│ │ │
│ ▼ │
│ checkTrainingReadiness() — 50+ examples threshold │
│ │ │
│ ▼ │
│ FineTuningService.submitTuningJob() │
│ │ Base: gemini-2.0-flash-001 │
│ │ API: client.tunings.tune() │
│ │ Tracks: fine_tuning_jobs table │
│ ▼ │
│ Tuned model available for future generations │
└──────────────────────────────────────────────────────────────────────┘

Where RAG Is Consumed

ConsumerFileHow RAG Is Used
STEM Evaluationssrc/evaluations/providers/gemini-llm.provider.ts:78-89Retrieves rubric context before evaluation. Prepends Relevant Evaluation Guidelines: to the user prompt. Category filter: rubrics, topK: 3
Assignment Creationsrc/workflows/assignment-creation/Retrieves curriculum standards context when generating assignments
Learning Pathssrc/workflows/learning-paths/Retrieves learning context for path generation
Feedback Loopsrc/rag/feedback-loop.service.ts:60-79High-quality corrections (rating ≥ 4) auto-ingested into RAG for immediate improvement

Database Tables (3 new tables)

TablePurposeKey Columns
document_embeddingsVector store for RAG documentsid, content, metadata (JSONB), category, embedding (vector[768])
ai_feedbackUser corrections for model improvementuserId, sessionType, originalPrompt, originalOutput, userCorrection, rating, usedForTraining
fine_tuning_jobsTracks Gemini fine-tuning jobsgeminiJobName, baseModel, tunedModelName, status, sampleCount

2. RAG Detailed Workflow

2.1 What RAG Actually Does

RAG = "Before asking the AI a question, first search a knowledge base for relevant context and include it in the prompt."

Without RAG:

Prompt: "Evaluate this robotics submission"
→ Gemini answers based only on its training data (generic)

With RAG:

Step 1: Search your knowledge base for relevant rubrics
Step 2: Prompt: "Here are the evaluation guidelines: [retrieved rubrics]
Now evaluate this robotics submission"
→ Gemini answers with your specific rubric criteria (precise)

2.2 End-to-End: What Happens During an AI Evaluation

When a coach triggers POST /api/v1/evaluations/generate/:submissionId:

1. EVALUATION REQUEST
gemini-llm.provider.ts:59 → evaluateSubmission(request)

2. RAG CONTEXT RETRIEVAL (lines 78-89)
┌─────────────────────────────────────────────────────────────────┐
│ ragService.retrieveContext( │
│ "Robotics Challenge evaluation rubric", ← query │
│ "rubrics", ← category filter │
│ 3 ← top 3 results │
│ ) │
│ │
│ Inside retrieveContext (rag.service.ts:126): │
│ │
│ a) EMBED THE QUERY │
│ embeddingService.embedText("Robotics Challenge...") │
│ → Calls Google: gemini-embedding-001 (RETRIEVAL_QUERY) │
│ → Returns: [0.12, -0.45, 0.78, ...] (768 numbers) │
│ │
│ b) SEARCH PGVECTOR (cosine similarity) │
│ vectorStore.search(queryEmbedding, 3, {category: 'rubrics'}) │
│ → SQL: SELECT content, │
│ 1 - (embedding <=> query::vector) AS score │
│ FROM document_embeddings │
│ WHERE category = 'rubrics' │
│ ORDER BY embedding <=> query::vector │
│ LIMIT 3 │
│ → Returns top 3 most similar rubric chunks │
│ │
│ c) BUILD CONTEXT STRING │
│ "[Source 1] Evaluation Rubric: Robot Design..." │
│ "[Source 2] Code Quality Criteria: Score 5..." │
│ "[Source 3] Documentation Standards..." │
└─────────────────────────────────────────────────────────────────┘

3. AUGMENT THE PROMPT (line 85)
userPrompt = "Relevant Evaluation Guidelines:\n"
+ [the 3 retrieved rubric chunks]
+ "\n\n"
+ [the original evaluation prompt with student files]

4. GENERATE WITH CONTEXT CACHE (lines 97-106)
contextCache.generateWithCache({
model: "gemini-2.5-flash",
systemInstruction: [evaluation system prompt], ← cached (90% token savings)
contents: userPrompt, ← includes RAG context
responseMimeType: "application/json",
})

5. PARSE & RETURN
→ JSON: { overallScore, confidence, categories, summary, nextSteps }

2.3 End-to-End: Assignment Creation with RAG

When a coach triggers POST /api/v1/workflows/assignment-creation/generate:

assignment-creation.service.ts:618
ragService.retrieveContext(
"Quadratic Equations Project grade 9 STEM curriculum standards",
"curriculum-standards", ← searches the curriculum-standards category
3
)
→ Retrieves NGSS/Common Core standards for that topic and grade level
→ Prepends "Relevant Curriculum Standards:\n[context]" to the prompt
→ Gemini generates assignment aligned with real standards

2.4 End-to-End: Learning Path Generation with RAG

When a coach triggers POST /api/v1/workflows/learning-paths/generate:

learning-paths.service.ts:594
ragService.retrieveContext(
"Programming, Robotics STEM learning progression curriculum",
"curriculum-standards", ← same category as assignments
3
)
→ Retrieves learning progressions for the student's skill gaps
→ Prepends "Relevant Curriculum Standards:\n[context]" to the prompt
→ Gemini generates a personalized 8-12 week learning plan

2.5 How Documents Enter the Knowledge Base

Path A — Manual ingestion (curriculum standards, rubrics):

ingestionService.ingestCurriculumStandards(standards)
→ Chunks text (500 chars, 100 overlap for standards)
→ Each chunk gets ID: "standard-NGSS-MS-PS1-1-chunk-0"
→ ragService.ingestDocuments(chunks)
→ embeddingService.embedBatch(texts) ← vectors generated
→ vectorStore.addDocuments(docs) ← stored in PostgreSQL/pgvector

Path B — Automatic from user feedback (adaptive learning):

User gives rating ≥ 4 with a correction
→ feedbackLoopService.recordFeedback(input)
→ Saves to ai_feedback table
→ ragService.ingestDocument({
content: "Corrected AI Output for evaluation:
Original prompt: [truncated]
Corrected response: [user's correction]",
metadata: { category: "feedback-corrections" }
})
→ Now future queries can retrieve this correction as context

2.6 The Feedback → Fine-Tuning Loop

50+ feedback corrections accumulate
→ feedbackLoopService.triggerFineTuningIfReady()
→ fineTuningService.collectTrainingData()
→ Pulls unused corrections from ai_feedback table
→ fineTuningService.submitTuningJob(examples)
→ Calls Google: client.tunings.tune({
baseModel: "gemini-2.0-flash-001",
trainingDataset: { examples: [...] },
config: { epochCount: 5 }
})
→ Marks feedback as usedForTraining = true
→ Tracks job in fine_tuning_jobs table
→ When complete: tuned model available for future use

3. RAG Implementation Details

3.1 RagService — The Orchestrator

File: src/rag/rag.service.ts (162 lines)

This is the central coordinator. It exposes two retrieval modes:

// Mode 1: Full RAG — retrieves context AND generates an answer
async query(ragQuery: RagQuery): Promise<RagResponse>

// Mode 2: Context-only — retrieves context for other services to use
async retrieveContext(question, category?, topK?): Promise<{ context, sources }>

query() flow (lines 83-119):

  1. Embed the question → EmbeddingService.embedText()
  2. Search vector store → VectorStoreService.search() with optional category filter
  3. Build context string from top-K results: [Source 1] content...
  4. Generate answer → GenAIService.generateContent() with gemini-2.5-flash
  5. Return answer + sources (content truncated to 200 chars) + model name

retrieveContext() flow (lines 126-151):

  • Same as query() steps 1-3, but skips generation
  • Used by GeminiLLMProvider.evaluateSubmission() to augment evaluation prompts

Ingestion (lines 38-78):

  • Single document: embed → store
  • Batch: processes in groups of 20, uses embedBatch() for efficiency

3.2 EmbeddingService — Vector Generation

File: src/rag/embedding.service.ts (58 lines)

// Uses Google GenAI SDK
private readonly embeddingModel = 'gemini-embedding-001';

// Single text (taskType: RETRIEVAL_DOCUMENT for ingestion, RETRIEVAL_QUERY for search)
async embedText(text: string, taskType: TaskType = 'RETRIEVAL_DOCUMENT'): Promise<EmbeddingResult>

// Batch texts
async embedBatch(texts: string[], taskType: TaskType = 'RETRIEVAL_DOCUMENT'): Promise<EmbeddingResult[]>

Key implementation detail (lines 18-24):

const response = await client.models.embedContent({
model: this.embeddingModel,
contents: text, // string or string[]
config: {
outputDimensionality: 768, // Google's recommended sweet spot (0.26% loss vs full 3,072D)
taskType, // RETRIEVAL_DOCUMENT for ingestion, RETRIEVAL_QUERY for search
},
});
  • Model: gemini-embedding-001 (Google's latest embedding model, #1 on MTEB leaderboard)
  • Dimensions: 768 (configurable via outputDimensionality, sweet spot for quality/cost)
  • Task types: RETRIEVAL_DOCUMENT for ingestion, RETRIEVAL_QUERY for search queries
  • Both single and batch use the same embedContent API — batch passes string[]

File: src/rag/vector-store.service.ts (119 lines)

This is the persistence layer. Uses raw SQL via Prisma's $executeRawUnsafe / $queryRawUnsafe because Prisma doesn't natively support the vector type.

Storing documents (lines 28-46):

INSERT INTO document_embeddings (id, content, metadata, category, embedding, created_at, updated_at)
VALUES ($1, $2, $3::jsonb, $4, $5::vector, NOW(), NOW())
ON CONFLICT (id) DO UPDATE SET
content = EXCLUDED.content, metadata = EXCLUDED.metadata,
category = EXCLUDED.category, embedding = EXCLUDED.embedding, updated_at = NOW()
  • Upsert pattern: inserts or updates on ID conflict
  • Embedding serialized as [0.1,0.2,...] string, cast to ::vector

Similarity search (lines 55-98):

SELECT id, content, metadata, category,
1 - (embedding <=> $1::vector) AS score
FROM document_embeddings
WHERE category = $3 -- optional filter
ORDER BY embedding <=> $1::vector
LIMIT $2
  • Operator <=> = cosine distance (pgvector)
  • Score = 1 - cosine_distance (higher = more similar)
  • HNSW index used for approximate nearest-neighbor search

Other operations:

  • deleteDocument(id) — DELETE by ID
  • clear() — TRUNCATE table
  • getDocumentCount() — SELECT COUNT(*)

3.4 IngestionService — Document Chunking

File: src/rag/ingestion.service.ts (182 lines)

Handles pre-processing before documents enter the RAG pipeline.

Chunking strategy (lines 148-181):

Default: chunkSize=1000, chunkOverlap=200
Curriculum standards: chunkSize=500, chunkOverlap=100
Rubrics: chunkSize=800, chunkOverlap=150
  • Tries to break at natural boundaries: \n\n.\n. \n
  • Only breaks at boundary if it's past 50% of chunk size
  • Each chunk gets metadata: sourceId, title, category, chunkIndex, totalChunks

Specialized ingestion methods:

  • ingestCurriculumStandards() — parses framework/gradeLevel/standards structure
  • ingestRubrics() — parses rubric criteria with score level descriptions

3.5 FeedbackLoopService — Adaptive Learning

File: src/rag/feedback-loop.service.ts (204 lines)

Two-path improvement system:

Path 1 — Immediate RAG improvement (lines 57-79): When a user provides a correction with rating ≥ 4:

await this.ragService.ingestDocument({
id: `feedback-${feedback.id}`,
content: `Corrected AI Output for ${input.sessionType}:
Original prompt: ${input.originalPrompt.substring(0, 500)}
Corrected response: ${input.userCorrection}`,
metadata: { category: 'feedback-corrections', ... },
});

→ Future queries can retrieve these corrections as context

Path 2 — Fine-tuning trigger (lines 113-157):

  • Threshold: 50+ unused training examples
  • Calls FineTuningService.collectTrainingData()submitTuningJob()
  • Marks feedback as usedForTraining = true after submission

3.6 FineTuningService — Model Customization

File: src/rag/fine-tuning.service.ts (275 lines)

Manages Gemini supervised fine-tuning pipeline.

Job submission (lines 69-148):

const tuningJob = await client.tunings.tune({
baseModel, // default: 'gemini-2.0-flash-001'
trainingDataset: {
examples: trainingExamples.map(e => ({
textInput: e.textInput, // original prompt
output: e.output, // corrected response
})),
},
config: {
epochCount: config?.epochCount ?? 5,
learningRateMultiplier: config?.learningRateMultiplier ?? 1.0,
},
});

Job status polling (lines 153-209):

  • Checks client.tunings.get(\{ name: job.geminiJobName \})
  • Maps Gemini states: JOB_STATE_SUCCEEDEDCOMPLETED, JOB_STATE_FAILEDFAILED
  • Updates local DB record with status and tuned model name

Cost: ~$3/1M training tokens (Gemini 2.0 Flash base)

3.7 RagModule — NestJS Wiring

File: src/rag/rag.module.ts (29 lines)

@Module({
imports: [DatabaseModule],
providers: [
EmbeddingService, VectorStoreService, RagService,
IngestionService, FineTuningService, FeedbackLoopService,
],
exports: [
RagService, EmbeddingService, VectorStoreService,
IngestionService, FineTuningService, FeedbackLoopService,
],
})
export class RagModule {}

Note: GenAIService is not imported here — it's provided by the SharedModule which is @Global(). RagModule imports DatabaseModule for PrismaService.

3.8 Database Migration

File: prisma/migrations/20260227000000_add_rag_pgvector_system/migration.sql (65 lines)

-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- 3 tables: document_embeddings, ai_feedback, fine_tuning_jobs
-- Key index: HNSW for fast approximate nearest-neighbor (upgraded from IVFFlat)
CREATE INDEX "document_embeddings_embedding_idx" ON "document_embeddings"
USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);

Prisma schema note (schema.prisma:963):

embedding Unsupported("vector(768)")

Prisma uses Unsupported() for pgvector types, which is why all vector operations use raw SQL.


4. Initial Data Ingestion Guide

The RAG system is only useful if it has documents to search. This section explains how to populate the knowledge base for the first time.

4.1 What to Ingest

The RAG system uses categories to filter searches. Each consumer searches a specific category:

CategoryUsed ByWhat to Ingest
rubricsSTEM Evaluations (gemini-llm.provider.ts)Evaluation rubrics with score criteria
curriculum-standardsAssignment Creation, Learning PathsNGSS, Common Core, state standards
feedback-correctionsAutomatic (feedback loop)User corrections — auto-ingested, no manual action needed

4.2 Create a Seeding Script

Create src/scripts/seed-rag.ts following the pattern of the existing src/scripts/seed-users.ts:

import { NestFactory } from '@nestjs/core';
import { AppModule } from '../app.module';
import { IngestionService } from '../rag/ingestion.service';
import { RagService } from '../rag/rag.service';

async function seedRag() {
const app = await NestFactory.createApplicationContext(AppModule);
const ingestion = app.get(IngestionService);
const rag = app.get(RagService);

console.log('🌱 Starting RAG knowledge base seed...\n');

// ── Step 1: Ingest Evaluation Rubrics ──────────────────────────
console.log('📋 Ingesting evaluation rubrics...');

const rubricCount = await ingestion.ingestRubrics([
{
name: 'Robot Design Rubric',
assignmentType: 'DESIGN',
criteria: [
{
name: 'Mechanical Structure',
levels: [
{ score: 5, description: 'Innovative design with excellent structural integrity, creative use of components, and optimal weight distribution' },
{ score: 4, description: 'Solid structure with good component usage and functional design' },
{ score: 3, description: 'Adequate structure that functions but has room for optimization' },
{ score: 2, description: 'Basic structure with stability issues or poor component choices' },
{ score: 1, description: 'Incomplete or non-functional mechanical design' },
],
},
{
name: 'Functionality',
levels: [
{ score: 5, description: 'Robot completes all tasks efficiently with consistent performance' },
{ score: 4, description: 'Robot completes most tasks with good reliability' },
{ score: 3, description: 'Robot completes some tasks but with inconsistencies' },
{ score: 2, description: 'Robot has limited functionality or frequent failures' },
{ score: 1, description: 'Robot does not function as intended' },
],
},
{
name: 'Creativity and Innovation',
levels: [
{ score: 5, description: 'Highly original approach with novel solutions to design challenges' },
{ score: 4, description: 'Creative elements with some original thinking' },
{ score: 3, description: 'Standard design approach with minor creative touches' },
{ score: 2, description: 'Mostly follows examples with little original thought' },
{ score: 1, description: 'No creative effort, direct copy of examples' },
],
},
],
},
{
name: 'Code Quality Rubric',
assignmentType: 'CODE',
criteria: [
{
name: 'Logic and Efficiency',
levels: [
{ score: 5, description: 'Elegant, efficient algorithms with optimal time/space complexity' },
{ score: 4, description: 'Well-structured logic with good efficiency' },
{ score: 3, description: 'Functional logic but with unnecessary redundancy' },
{ score: 2, description: 'Logic works partially or has significant inefficiency' },
{ score: 1, description: 'Broken logic or code that does not compile/run' },
],
},
{
name: 'Code Organization',
levels: [
{ score: 5, description: 'Clean separation of concerns, meaningful naming, consistent style throughout' },
{ score: 4, description: 'Good organization with clear naming conventions' },
{ score: 3, description: 'Adequate organization but inconsistent in places' },
{ score: 2, description: 'Poor organization with confusing variable names' },
{ score: 1, description: 'No discernible organization, unreadable code' },
],
},
{
name: 'Comments and Documentation',
levels: [
{ score: 5, description: 'Comprehensive comments explaining why, clear function documentation' },
{ score: 4, description: 'Good comments on complex sections' },
{ score: 3, description: 'Some comments but missing on key logic' },
{ score: 2, description: 'Minimal or unhelpful comments' },
{ score: 1, description: 'No comments at all' },
],
},
],
},
{
name: 'Documentation Rubric',
assignmentType: 'NOTEBOOK',
criteria: [
{
name: 'Process Documentation',
levels: [
{ score: 5, description: 'Detailed chronicle of design process including failures, iterations, and reasoning' },
{ score: 4, description: 'Good documentation of process with clear progression' },
{ score: 3, description: 'Basic documentation that covers main steps' },
{ score: 2, description: 'Incomplete documentation with major gaps' },
{ score: 1, description: 'Little to no process documentation' },
],
},
{
name: 'Clarity and Organization',
levels: [
{ score: 5, description: 'Exceptionally clear writing with logical flow, headings, and visual aids' },
{ score: 4, description: 'Well-organized with clear sections and good readability' },
{ score: 3, description: 'Readable but could benefit from better structure' },
{ score: 2, description: 'Difficult to follow or poorly organized' },
{ score: 1, description: 'Incomprehensible or no organization' },
],
},
],
},
{
name: 'Technical Writing Rubric',
assignmentType: 'ESSAY',
criteria: [
{
name: 'Technical Accuracy',
levels: [
{ score: 5, description: 'All technical claims are accurate and well-supported with evidence' },
{ score: 4, description: 'Mostly accurate with minor oversimplifications' },
{ score: 3, description: 'Generally accurate but with some errors or vague claims' },
{ score: 2, description: 'Multiple technical inaccuracies' },
{ score: 1, description: 'Fundamentally incorrect technical understanding' },
],
},
{
name: 'Writing Quality',
levels: [
{ score: 5, description: 'Excellent grammar, vocabulary, and sentence structure appropriate for age level' },
{ score: 4, description: 'Good writing with minor errors' },
{ score: 3, description: 'Adequate writing but with noticeable errors' },
{ score: 2, description: 'Frequent grammatical errors affecting readability' },
{ score: 1, description: 'Very poor writing quality' },
],
},
],
},
]);

console.log(`✅ Rubrics ingested: ${rubricCount} chunks\n`);

// ── Step 2: Ingest Curriculum Standards ─────────────────────────
console.log('📚 Ingesting curriculum standards...');

const ngssCount = await ingestion.ingestCurriculumStandards({
framework: 'NGSS',
gradeLevel: 'Middle School (6-8)',
standards: [
{
id: 'MS-ETS1-1',
title: 'Define Design Problems',
description: 'Define the criteria and constraints of a design problem with sufficient precision to ensure a successful solution, taking into account relevant scientific principles and potential impacts on people and the natural environment.',
},
{
id: 'MS-ETS1-2',
title: 'Evaluate Competing Solutions',
description: 'Evaluate competing design solutions using a systematic process to determine how well they meet the criteria and constraints of the problem.',
},
{
id: 'MS-ETS1-3',
title: 'Analyze Data from Tests',
description: 'Analyze data from tests to determine similarities and differences among several design solutions to identify the best characteristics of each that can be combined into a new solution.',
},
{
id: 'MS-ETS1-4',
title: 'Develop and Iterate Models',
description: 'Develop a model to generate data for iterative testing and modification of a proposed object, tool, or process such that an optimal design can be achieved.',
},
{
id: 'MS-PS2-1',
title: 'Forces and Motion',
description: 'Apply Newton\'s Third Law to design a solution to a problem involving the motion of two colliding objects. Emphasis is on the change in motion and forces during collision.',
},
{
id: 'MS-PS2-2',
title: 'Plan Investigation on Forces',
description: 'Plan an investigation to provide evidence that the change in an object\'s motion depends on the sum of the forces acting on the object and the mass of the object.',
},
],
});

const csMathCount = await ingestion.ingestCurriculumStandards({
framework: 'Common Core Math',
gradeLevel: 'Middle School (6-8)',
standards: [
{
id: '6.RP.3',
title: 'Ratios and Proportional Relationships',
description: 'Use ratio and rate reasoning to solve real-world and mathematical problems, including those involving unit rates, percentages, and proportional relationships in robotics contexts like gear ratios and speed calculations.',
},
{
id: '7.G.6',
title: 'Geometry - Area and Volume',
description: 'Solve real-world and mathematical problems involving area, volume, and surface area of two- and three-dimensional objects. Applied to robot chassis design, workspace planning, and component fitting.',
},
{
id: '8.F.4',
title: 'Functions - Model Relationships',
description: 'Construct a function to model a linear relationship between two quantities. Applied to sensor calibration, motor speed curves, and PID control in robotics.',
},
{
id: '8.EE.7',
title: 'Expressions and Equations',
description: 'Solve linear equations in one variable, including those with rational number coefficients. Applied to calculating distances, speeds, and timing in robot navigation.',
},
],
});

console.log(`✅ NGSS standards ingested: ${ngssCount} chunks`);
console.log(`✅ Common Core Math standards ingested: ${csMathCount} chunks\n`);

// ── Step 3: Verify ─────────────────────────────────────────────
const stats = await rag.getStats();
console.log(`📊 RAG Knowledge Base Stats:`);
console.log(` Total documents: ${stats.documentCount}`);

console.log('\n🎉 RAG seed completed successfully!');
await app.close();
}

seedRag()
.then(() => process.exit(0))
.catch((error) => {
console.error('❌ Error seeding RAG:', error);
process.exit(1);
});

4.3 Run the Seeding Script

cd stemblockai-backend

# Ensure GenAI is configured (embeddings require Google API)
export LLM_PROVIDER=gemini
export GOOGLE_CLOUD_PROJECT=your-project-id
# ... other GenAI env vars

# Run the seed
npx ts-node src/scripts/seed-rag.ts

Expected output:

🌱 Starting RAG knowledge base seed...

📋 Ingesting evaluation rubrics...
✅ Rubrics ingested: 12 chunks

📚 Ingesting curriculum standards...
✅ NGSS standards ingested: 6 chunks
✅ Common Core Math standards ingested: 4 chunks

📊 RAG Knowledge Base Stats:
Total documents: 22

🎉 RAG seed completed successfully!

4.4 Add to package.json

{
"scripts": {
"seed:users": "ts-node src/scripts/seed-users.ts",
"seed:rag": "ts-node src/scripts/seed-rag.ts",
"seed:all": "npm run seed:users && npm run seed:rag"
}
}

4.5 Verify Data in Database

-- Check document counts by category
SELECT category, COUNT(*) as count
FROM document_embeddings
GROUP BY category
ORDER BY count DESC;

-- Expected output:
-- rubrics | 12
-- curriculum-standards | 10

4.6 Ingestion for Additional Content

You can ingest additional content at any time using the IngestionService methods:

Custom documents (any category):

await ingestionService.ingestDocument({
id: 'custom-doc-001',
title: 'VEX Robotics Competition Rules 2026',
content: '... full text ...',
category: 'competition-rules',
source: 'vex-robotics.com',
});

Additional standards (e.g., state-specific):

await ingestionService.ingestCurriculumStandards({
framework: 'Texas TEKS',
gradeLevel: 'Grade 8',
standards: [
{ id: 'TEKS-8.6A', title: '...', description: '...' },
],
});

4.7 Chunking Behavior

Understanding how documents get split is important for quality:

Content TypeChunk SizeOverlapWhy
General documents1000 chars200Standard — balances context window usage and retrieval precision
Curriculum standards500 chars100Shorter — each standard is a self-contained concept
Rubrics800 chars150Medium — criteria need enough context to be useful

Overlap means consecutive chunks share text at their boundaries. This prevents a relevant sentence from being split across two chunks where neither chunk alone has enough context.

Example with chunkSize=20, overlap=5:

Original:  "The robot must navigate the obstacle course within 60 seconds"
Chunk 1: "The robot must navig"
Chunk 2: "navig ate the obstac" ← "navig" overlaps
Chunk 3: "bstac le course with"

5. Evaluating RAG Results

5.1 Quick Smoke Test

After ingesting data, verify the RAG pipeline end-to-end:

// In a NestJS test or script
const ragService = app.get(RagService);

// Test 1: Query the rubrics category
const rubricResult = await ragService.query({
question: 'What are the criteria for evaluating robot design?',
category: 'rubrics',
topK: 3,
});

console.log('Answer:', rubricResult.answer);
console.log('Sources:', rubricResult.sources.length);
console.log('Top score:', rubricResult.sources[0]?.score);

// Test 2: Query curriculum standards
const curriculumResult = await ragService.query({
question: 'What NGSS standards apply to engineering design?',
category: 'curriculum-standards',
topK: 3,
});

console.log('Answer:', curriculumResult.answer);
console.log('Sources:', curriculumResult.sources.length);

5.2 What Good Results Look Like

Similarity scores (cosine similarity, 0.0 to 1.0):

Score RangeMeaningAction
0.85 - 1.0Excellent matchContent is highly relevant
0.70 - 0.85Good matchContent is relevant, usable as context
0.50 - 0.70Weak matchContent is tangentially related, may add noise
Below 0.50Poor matchContent is not relevant, should not be used

For your system: Scores above 0.70 are typical for rubric/standard retrieval. If top scores are consistently below 0.60, the ingested content may not match the queries well — check chunking or add more specific content.

5.3 Evaluate RAG Impact on Evaluations

Compare evaluation quality with and without RAG:

# Step 1: Run evaluation WITHOUT RAG (empty knowledge base)
# The system gracefully handles this — gemini-llm.provider.ts:87 catches errors

# Step 2: Ingest rubrics
npm run seed:rag

# Step 3: Run the SAME evaluation again WITH RAG

# Step 4: Compare the two outputs:
# - Are category scores more consistent with rubric criteria?
# - Does feedback reference specific rubric language?
# - Are improvements more actionable?

5.4 Monitoring in Production

Key metrics to track:

MetricHow to MeasureTarget
RAG retrieval latencyTime from query to context returned< 500ms
Top-1 similarity scoresources[0].score from retrieval> 0.70 average
Context utilizationDoes the AI response reference retrieved sources?Qualitative review
Feedback loop growthSELECT COUNT(*) FROM ai_feedbackSteady growth
Knowledge base sizeragService.getStats().documentCountGrowing via feedback

Check via SQL:

-- Knowledge base growth over time
SELECT DATE(created_at) as date, COUNT(*) as new_docs
FROM document_embeddings
GROUP BY DATE(created_at)
ORDER BY date DESC
LIMIT 30;

-- Feedback corrections ingested to RAG
SELECT COUNT(*) as feedback_in_rag
FROM document_embeddings
WHERE category = 'feedback-corrections';

-- Feedback waiting for fine-tuning
SELECT COUNT(*) as unused_feedback
FROM ai_feedback
WHERE used_for_training = false AND user_correction IS NOT NULL;

5.5 Debugging Poor Results

Problem: AI evaluation doesn't use rubric criteria

  1. Check if documents exist:
    SELECT COUNT(*) FROM document_embeddings WHERE category = 'rubrics';
  2. Check retrieval scores — run a test query and inspect sources[].score
  3. If scores are low, the query terms may not match the ingested content. Consider:
    • Adding more rubric variations
    • Making rubric descriptions more detailed
    • Adjusting topK (try 5 instead of 3)

Problem: Wrong category documents retrieved

  1. Verify the category filter is working:
    SELECT category, COUNT(*) FROM document_embeddings GROUP BY category;
  2. Check that ingestion uses the correct category strings (must exactly match the filter in the consumer code)

Problem: Retrieval is slow (>2s)

  1. Verify the HNSW index exists:
    SELECT indexdef FROM pg_indexes WHERE indexname = 'document_embeddings_embedding_idx';
  2. If missing, create it (see migration 20260227100000_upgrade_ivfflat_to_hnsw_index)
  3. If on Neon, check if compute was cold-started (first query after idle is slow)

5.6 RAG Quality Improvement Cycle

Week 1:  Seed initial rubrics + standards
→ Run evaluations, collect baseline quality

Week 2: Coaches review AI evaluations, provide corrections
→ High-quality corrections auto-ingest to RAG (rating ≥ 4)
→ RAG results improve immediately

Week 4: 50+ corrections accumulated
→ Fine-tuning triggered automatically
→ Tuned model available for future evaluations

Ongoing: More corrections → better RAG context → better evaluations
→ fewer corrections needed → system stabilizes

6. Neon PostgreSQL Migration

The existing NEON_MIGRATION_GUIDE.md covers the general migration steps. Below are RAG-specific concerns that require additional attention when migrating to Neon.

6.1 pgvector Support on Neon

Neon has native pgvector support — no extra setup needed. However:

ConcernCurrent (DO/Self-hosted)NeonAction Needed
pgvector extensionManually installedPre-installedNone — CREATE EXTENSION IF NOT EXISTS vector still works
pgvector versionDepends on PG versionLatest (0.7+)Verify SELECT extversion FROM pg_extension WHERE extname = 'vector'
IVFFlat indexWorksWorksNone — supported natively
HNSW indexMay not be availableSupportedConsider upgrading from IVFFlat to HNSW for better recall (see 3.3)
Max dimensions20002000None — we use 768
vector type in raw SQLWorksWorksNone

6.2 Connection Pooling Impact on RAG

Neon uses PgBouncer for connection pooling. This affects raw SQL queries:

Potential issue: $executeRawUnsafe with parameterized queries may conflict with PgBouncer's transaction pooling mode.

Current code pattern in vector-store.service.ts:

await this.prisma.$executeRawUnsafe(
`INSERT INTO document_embeddings ... VALUES ($1, $2, $3::jsonb, $4, $5::vector, ...)`,
doc.id, doc.content, JSON.stringify(doc.metadata), ...
);

Recommendation:

  • Use Neon's direct connection string (not pooled) for the DATABASE_URL used by Prisma migrations
  • Use the pooled connection string for application runtime
  • Add to Prisma schema:
datasource db {
provider = "postgresql"
url = env("DATABASE_URL") // pooled (runtime)
directUrl = env("DIRECT_DATABASE_URL") // direct (migrations)
}

Environment variables to add:

# Neon pooled connection (for app runtime)
DATABASE_URL="postgresql://user:pass@ep-xyz-pooler.us-east-2.aws.neon.tech/stemblock_db?sslmode=require"

# Neon direct connection (for migrations)
DIRECT_DATABASE_URL="postgresql://user:pass@ep-xyz.us-east-2.aws.neon.tech/stemblock_db?sslmode=require"

Current migration uses IVFFlat:

CREATE INDEX ... USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

Neon supports HNSW which has better recall and doesn't require training on existing data:

-- Drop old index
DROP INDEX IF EXISTS document_embeddings_embedding_idx;

-- Create HNSW index (better for dynamic datasets)
CREATE INDEX document_embeddings_embedding_idx ON document_embeddings
USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);
FactorIVFFlatHNSW
Recall~95%~99%
Build timeFastSlower
Query speedFastSlightly slower
Dynamic insertsNeeds periodic re-indexing after bulk insertsHandles inserts well
Best forStatic datasetsGrowing datasets (RAG with feedback loop)

Since the feedback loop continuously ingests new documents, HNSW is the better choice for Neon.

6.4 Cold Start Impact on RAG

Neon scales to zero after 5 min idle. The first RAG query after cold start will:

  1. Wake the database (~500ms-2s)
  2. Load pgvector extension
  3. Execute vector search

Mitigation options:

  • Accept ~2s latency on first query (acceptable for background AI evaluation)
  • Set minimum compute to 0.25 CU to prevent full cold start
  • The existing health check endpoint can serve as a keep-alive

6.5 Storage Considerations

RAG data storage per document:

  • content: TEXT (~1KB per chunk)
  • metadata: JSONB (~200 bytes)
  • embedding: vector(768) = 768 × 4 bytes = 3,072 bytes
  • Total per chunk: ~4.3 KB

Neon storage tiers:

PlanStorageEst. Document Capacity
Free512 MB~120K chunks
Launch ($19/mo)10 GB~2.3M chunks
Scale ($69/mo)50 GB~22M chunks

For a STEM education platform, even the Free tier can hold substantial curriculum + rubric data.

6.6 Migration Steps Specific to RAG

After completing the general migration from NEON_MIGRATION_GUIDE.md:

# 1. Verify pgvector extension exists on Neon
psql "$NEON_URL" -c "SELECT * FROM pg_extension WHERE extname = 'vector';"

# 2. If missing (shouldn't be), enable it
psql "$NEON_URL" -c "CREATE EXTENSION IF NOT EXISTS vector;"

# 3. Verify RAG tables migrated
psql "$NEON_URL" -c "SELECT COUNT(*) FROM document_embeddings;"
psql "$NEON_URL" -c "SELECT COUNT(*) FROM ai_feedback;"
psql "$NEON_URL" -c "SELECT COUNT(*) FROM fine_tuning_jobs;"

# 4. Verify vector index exists
psql "$NEON_URL" -c "
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'document_embeddings'
AND indexdef LIKE '%vector%';
"

# 5. Test vector search
psql "$NEON_URL" -c "
SELECT id, 1 - (embedding <=> (SELECT embedding FROM document_embeddings LIMIT 1)) AS score
FROM document_embeddings
ORDER BY embedding <=> (SELECT embedding FROM document_embeddings LIMIT 1)
LIMIT 5;
"

6.7 Prisma Schema Change Required

Add directUrl to support Neon's dual-connection architecture:

File: prisma/schema.prisma (line 8-11)

 datasource db {
provider = "postgresql"
url = env("DATABASE_URL")
+ directUrl = env("DIRECT_DATABASE_URL")
}

7. Google GenAI SDK Changes

7.1 Current State: Already on @google/genai

The codebase has already migrated from @google-cloud/aiplatform (old Vertex AI SDK) to @google/genai (new unified GenAI SDK). Here's what's in place:

Package: @google/genai v1.43.0 (in package.json)

Initialization (genai.service.ts:38-52):

import { GoogleGenAI } from '@google/genai';

const options = {
vertexai: true, // Uses Vertex AI backend (not AI Studio)
project: projectId,
location,
};

if (serviceAccountKeyBase64) {
const credentials = JSON.parse(Buffer.from(serviceAccountKeyBase64, 'base64').toString('utf-8'));
options.googleAuthOptions = { credentials };
}

this.client = new GoogleGenAI(options);

7.2 Key Differences from Old SDK

The existing VERTEX_AI_SETUP.md references the old SDK patterns. Here's what changed:

AspectOld (@google-cloud/aiplatform)Current (@google/genai)
Package@google-cloud/aiplatform@google/genai
Importconst \{ VertexAI \} = require(...)import \{ GoogleGenAI \} from '@google/genai'
Initnew VertexAI(\{ project, location \})new GoogleGenAI(\{ vertexai: true, project, location \})
Generatemodel.generateContent(\{ contents: [...] \})client.models.generateContent(\{ model, contents, config \})
System promptSeparate parameterVia config.systemInstruction
Embeddingsconst model = vertex.preview.getGenerativeModel(...)client.models.embedContent(\{ model, contents, config \})
Fine-tuningNot available in SDKclient.tunings.tune(\{ baseModel, trainingDataset, config \})
Context cachingNot availableclient.caches.create(\{ model, config \})
Modelsgemini-1.5-flash, gemini-1.5-progemini-2.5-flash, gemini-2.5-pro, gemini-2.5-flash-lite
AuthGOOGLE_APPLICATION_CREDENTIALS file onlyFile, base64-encoded, or default credentials

7.3 API Surface Used (All @google/genai Methods)

Here's every GenAI SDK method used across the codebase:

MethodUsed InPurpose
client.models.generateContent()genai.service.ts:79, context-cache.service.ts:118Text generation (evaluations, feedback, writing)
client.models.embedContent()embedding.service.ts:18, embedding.service.ts:40Vector embeddings for RAG
client.tunings.tune()fine-tuning.service.ts:107Submit fine-tuning jobs
client.tunings.get()fine-tuning.service.ts:170Check fine-tuning job status
client.caches.create()context-cache.service.ts:57Create server-side context cache
client.caches.delete()context-cache.service.ts:156Delete context cache

7.4 Changes Needed If Updating SDK Version

If upgrading @google/genai from v1.43.0 to a newer version, watch for:

1. Breaking changes in embedContent response shape:

// Current (v1.43.0) — embedding.service.ts:26
const embedding = response.embeddings?.[0]?.values;

Check if the response structure changes (e.g., response.embedding.values vs response.embeddings[0].values).

2. Tuning API changes:

// Current — fine-tuning.service.ts:107
const tuningJob = await client.tunings.tune({ ... });

The tuning API is relatively new. Method names and parameters may evolve (e.g., tune()create()).

3. Context caching TTL format:

// Current — context-cache.service.ts:67
ttl: `${ttl}s`, // String format "3600s"

Google has been inconsistent with duration formats. Verify s suffix is still required.

4. Model name changes:

// Current models
flashModel = 'gemini-2.5-flash'
proModel = 'gemini-2.5-pro'
liteModel = 'gemini-2.5-flash-lite'

These are configured via env vars, so model upgrades (e.g., to gemini-3.0-flash) only need env var changes.

7.5 Authentication Modes

The current GenAIService supports three auth modes (genai.service.ts:34-51):

ModeWhen UsedEnv Var
Base64 service account keyNon-GCP environments (DigitalOcean)GCP_SERVICE_ACCOUNT_KEY_BASE64
Service account JSON fileLocal developmentGOOGLE_APPLICATION_CREDENTIALS
Default credentialsGCP environments (Cloud Run, GKE)None needed

No changes needed for Neon migration — auth is independent of the database provider.

7.6 Documents Needing Updates

The VERTEX_AI_SETUP.md is outdated and references:

  • Old models: gemini-1.5-flash-8b, gemini-1.5-pro-002
  • Old SDK patterns
  • No mention of RAG, embeddings, fine-tuning, or context caching

Recommended updates:

  • Update model references to gemini-2.5-* family
  • Add RAG embedding model (gemini-embedding-001) documentation
  • Add context caching setup notes
  • Add fine-tuning API documentation
  • Update IAM roles: add Vertex AI Tuning User if using fine-tuning

8. Action Items Summary

For Neon Migration (RAG-specific)

#ActionPriorityEffort
1Add directUrl to prisma/schema.prismaHigh5 min
2Add DIRECT_DATABASE_URL to all environmentsHigh15 min
3Verify pgvector extension on Neon after migrationHigh5 min
4Test vector search queries on NeonHigh30 min
5Consider upgrading IVFFlat → HNSW indexMedium1 hour
6Benchmark RAG query latency on Neon (including cold start)Medium2 hours
7Set minimum compute to 0.25 CU if cold start latency unacceptableLow5 min

For GenAI SDK

#ActionPriorityEffort
1Update VERTEX_AI_SETUP.md to reflect @google/genai SDK and 2.5 modelsMedium2 hours
2Pin @google/genai version in package.json (avoid ^ prefix)Medium5 min
3Add Vertex AI Tuning User IAM role if fine-tuning is used in productionMedium10 min
4Monitor @google/genai changelog for breaking changes in embedding/tuning APIsLowOngoing

For RAG System

#ActionPriorityEffort
1No public API controller exists — add REST endpoints if admin needs RAG managementLow4 hours
2Add monitoring/metrics for RAG query latency and cache hit ratesMedium3 hours
3Consider adding a seeding script for initial curriculum/rubric dataMedium2 hours

Document Version: 2.0 Last Updated: February 27, 2026 Author: StemBlock AI Engineering Team