RAG Implementation, Neon PostgreSQL Migration & GenAI SDK Changes
Date: February 27, 2026 Scope: stemblockai-backend Status: Architecture Review & Migration Planning
Table of Contents
- RAG Architecture Overview
- RAG Detailed Workflow
- RAG Implementation Details (File-by-File)
- Initial Data Ingestion Guide
- Evaluating RAG Results
- Neon PostgreSQL Migration
- Google GenAI SDK Changes
- Action Items Summary
1. RAG Architecture Overview
Pipeline Flow
┌──────────────────────────────────────────────────────────────────────┐
│ INGESTION PIPELINE │
│ │
│ Document Input (curriculum, rubrics, feedback corrections) │
│ │ │
│ ▼ │
│ IngestionService.ingestDocument() │
│ │ Chunks text: 1000 chars, 200 overlap │
│ │ Breaks at sentence/paragraph boundaries │
│ ▼ │
│ RagService.ingestDocuments() │
│ │ Batch size: 20 documents per cycle │
│ ▼ │
│ EmbeddingService.embedBatch() │
│ │ Model: gemini-embedding-001 │
│ │ Dimensions: 768 │
│ │ Task type: RETRIEVAL_DOCUMENT │
│ │ SDK: @google/genai (Vertex AI backend) │
│ ▼ │
│ VectorStoreService.addDocuments() │
│ │ Storage: PostgreSQL + pgvector extension │
│ │ Table: document_embeddings │
│ │ Index: HNSW (cosine distance, m=16, ef=64) │
│ ▼ │
│ ✅ Stored in pgvector │
└──────────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────────┐
│ RETRIEVAL PIPELINE │
│ │
│ User Query (e.g., evaluation rubric request) │
│ │ │
│ ▼ │
│ RagService.query() / RagService.retrieveContext() │
│ │ │
│ ▼ │
│ EmbeddingService.embedText(question) │
│ │ Same model: gemini-embedding-001, 768D │
│ │ Task type: RETRIEVAL_QUERY │
│ ▼ │
│ VectorStoreService.search(queryEmbedding, topK, filter) │
│ │ SQL: 1 - (embedding <=> query::vector) AS score │
│ │ Operator: <=> (cosine distance) │
│ │ Optional filter: WHERE category = ? │
│ │ LIMIT: topK (default 5 for query, 3 for retrieveContext) │
│ ▼ │
│ Context Assembly │
│ │ "[Source 1] content...\n\n[Source 2] content..." │
│ ▼ │
│ GenAIService.generateContent() │
│ │ Model: gemini-2.5-flash (default) │
│ │ System: STEM education assistant prompt │
│ │ User: Context + Question │
│ ▼ │
│ RagResponse { answer, sources[], model } │
└──────────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────────────┐
│ FEEDBACK & FINE-TUNING LOOP │
│ │
│ User Feedback (rating + correction) │
│ │ │
│ ▼ │
│ FeedbackLoopService.recordFeedback() │
│ │ Stores in ai_feedback table │
│ │ │
│ ├─── Rating ≥ 4 + has correction ──▶ Auto-ingest to RAG │
│ │ (immediate improvement via RagService.ingestDocument) │
│ │ │
│ ▼ │
│ checkTrainingReadiness() — 50+ examples threshold │
│ │ │
│ ▼ │
│ FineTuningService.submitTuningJob() │
│ │ Base: gemini-2.0-flash-001 │
│ │ API: client.tunings.tune() │
│ │ Tracks: fine_tuning_jobs table │
│ ▼ │
│ Tuned model available for future generations │
└──────────────────────────────────────────────────────────────────────┘
Where RAG Is Consumed
| Consumer | File | How RAG Is Used |
|---|---|---|
| STEM Evaluations | src/evaluations/providers/gemini-llm.provider.ts:78-89 | Retrieves rubric context before evaluation. Prepends Relevant Evaluation Guidelines: to the user prompt. Category filter: rubrics, topK: 3 |
| Assignment Creation | src/workflows/assignment-creation/ | Retrieves curriculum standards context when generating assignments |
| Learning Paths | src/workflows/learning-paths/ | Retrieves learning context for path generation |
| Feedback Loop | src/rag/feedback-loop.service.ts:60-79 | High-quality corrections (rating ≥ 4) auto-ingested into RAG for immediate improvement |
Database Tables (3 new tables)
| Table | Purpose | Key Columns |
|---|---|---|
document_embeddings | Vector store for RAG documents | id, content, metadata (JSONB), category, embedding (vector[768]) |
ai_feedback | User corrections for model improvement | userId, sessionType, originalPrompt, originalOutput, userCorrection, rating, usedForTraining |
fine_tuning_jobs | Tracks Gemini fine-tuning jobs | geminiJobName, baseModel, tunedModelName, status, sampleCount |
2. RAG Detailed Workflow
2.1 What RAG Actually Does
RAG = "Before asking the AI a question, first search a knowledge base for relevant context and include it in the prompt."
Without RAG:
Prompt: "Evaluate this robotics submission"
→ Gemini answers based only on its training data (generic)
With RAG:
Step 1: Search your knowledge base for relevant rubrics
Step 2: Prompt: "Here are the evaluation guidelines: [retrieved rubrics]
Now evaluate this robotics submission"
→ Gemini answers with your specific rubric criteria (precise)
2.2 End-to-End: What Happens During an AI Evaluation
When a coach triggers POST /api/v1/evaluations/generate/:submissionId:
1. EVALUATION REQUEST
gemini-llm.provider.ts:59 → evaluateSubmission(request)
2. RAG CONTEXT RETRIEVAL (lines 78-89)
┌─────────────────────────────────────────────────────────────────┐
│ ragService.retrieveContext( │
│ "Robotics Challenge evaluation rubric", ← query │
│ "rubrics", ← category filter │
│ 3 ← top 3 results │
│ ) │
│ │
│ Inside retrieveContext (rag.service.ts:126): │
│ │
│ a) EMBED THE QUERY │
│ embeddingService.embedText("Robotics Challenge...") │
│ → Calls Google: gemini-embedding-001 (RETRIEVAL_QUERY) │
│ → Returns: [0.12, -0.45, 0.78, ...] (768 numbers) │
│ │
│ b) SEARCH PGVECTOR (cosine similarity) │
│ vectorStore.search(queryEmbedding, 3, {category: 'rubrics'}) │
│ → SQL: SELECT content, │
│ 1 - (embedding <=> query::vector) AS score │
│ FROM document_embeddings │
│ WHERE category = 'rubrics' │
│ ORDER BY embedding <=> query::vector │
│ LIMIT 3 │
│ → Returns top 3 most similar rubric chunks │
│ │
│ c) BUILD CONTEXT STRING │
│ "[Source 1] Evaluation Rubric: Robot Design..." │
│ "[Source 2] Code Quality Criteria: Score 5..." │
│ "[Source 3] Documentation Standards..." │
└─────────────────────────────────────────────────────────────────┘
3. AUGMENT THE PROMPT (line 85)
userPrompt = "Relevant Evaluation Guidelines:\n"
+ [the 3 retrieved rubric chunks]
+ "\n\n"
+ [the original evaluation prompt with student files]
4. GENERATE WITH CONTEXT CACHE (lines 97-106)
contextCache.generateWithCache({
model: "gemini-2.5-flash",
systemInstruction: [evaluation system prompt], ← cached (90% token savings)
contents: userPrompt, ← includes RAG context
responseMimeType: "application/json",
})
5. PARSE & RETURN
→ JSON: { overallScore, confidence, categories, summary, nextSteps }
2.3 End-to-End: Assignment Creation with RAG
When a coach triggers POST /api/v1/workflows/assignment-creation/generate:
assignment-creation.service.ts:618
ragService.retrieveContext(
"Quadratic Equations Project grade 9 STEM curriculum standards",
"curriculum-standards", ← searches the curriculum-standards category
3
)
→ Retrieves NGSS/Common Core standards for that topic and grade level
→ Prepends "Relevant Curriculum Standards:\n[context]" to the prompt
→ Gemini generates assignment aligned with real standards
2.4 End-to-End: Learning Path Generation with RAG
When a coach triggers POST /api/v1/workflows/learning-paths/generate:
learning-paths.service.ts:594
ragService.retrieveContext(
"Programming, Robotics STEM learning progression curriculum",
"curriculum-standards", ← same category as assignments
3
)
→ Retrieves learning progressions for the student's skill gaps
→ Prepends "Relevant Curriculum Standards:\n[context]" to the prompt
→ Gemini generates a personalized 8-12 week learning plan
2.5 How Documents Enter the Knowledge Base
Path A — Manual ingestion (curriculum standards, rubrics):
ingestionService.ingestCurriculumStandards(standards)
→ Chunks text (500 chars, 100 overlap for standards)
→ Each chunk gets ID: "standard-NGSS-MS-PS1-1-chunk-0"
→ ragService.ingestDocuments(chunks)
→ embeddingService.embedBatch(texts) ← vectors generated
→ vectorStore.addDocuments(docs) ← stored in PostgreSQL/pgvector
Path B — Automatic from user feedback (adaptive learning):
User gives rating ≥ 4 with a correction
→ feedbackLoopService.recordFeedback(input)
→ Saves to ai_feedback table
→ ragService.ingestDocument({
content: "Corrected AI Output for evaluation:
Original prompt: [truncated]
Corrected response: [user's correction]",
metadata: { category: "feedback-corrections" }
})
→ Now future queries can retrieve this correction as context
2.6 The Feedback → Fine-Tuning Loop
50+ feedback corrections accumulate
→ feedbackLoopService.triggerFineTuningIfReady()
→ fineTuningService.collectTrainingData()
→ Pulls unused corrections from ai_feedback table
→ fineTuningService.submitTuningJob(examples)
→ Calls Google: client.tunings.tune({
baseModel: "gemini-2.0-flash-001",
trainingDataset: { examples: [...] },
config: { epochCount: 5 }
})
→ Marks feedback as usedForTraining = true
→ Tracks job in fine_tuning_jobs table
→ When complete: tuned model available for future use
3. RAG Implementation Details
3.1 RagService — The Orchestrator
File: src/rag/rag.service.ts (162 lines)
This is the central coordinator. It exposes two retrieval modes:
// Mode 1: Full RAG — retrieves context AND generates an answer
async query(ragQuery: RagQuery): Promise<RagResponse>
// Mode 2: Context-only — retrieves context for other services to use
async retrieveContext(question, category?, topK?): Promise<{ context, sources }>
query() flow (lines 83-119):
- Embed the question →
EmbeddingService.embedText() - Search vector store →
VectorStoreService.search()with optional category filter - Build context string from top-K results:
[Source 1] content... - Generate answer →
GenAIService.generateContent()withgemini-2.5-flash - Return answer + sources (content truncated to 200 chars) + model name
retrieveContext() flow (lines 126-151):
- Same as
query()steps 1-3, but skips generation - Used by
GeminiLLMProvider.evaluateSubmission()to augment evaluation prompts
Ingestion (lines 38-78):
- Single document: embed → store
- Batch: processes in groups of 20, uses
embedBatch()for efficiency
3.2 EmbeddingService — Vector Generation
File: src/rag/embedding.service.ts (58 lines)
// Uses Google GenAI SDK
private readonly embeddingModel = 'gemini-embedding-001';
// Single text (taskType: RETRIEVAL_DOCUMENT for ingestion, RETRIEVAL_QUERY for search)
async embedText(text: string, taskType: TaskType = 'RETRIEVAL_DOCUMENT'): Promise<EmbeddingResult>
// Batch texts
async embedBatch(texts: string[], taskType: TaskType = 'RETRIEVAL_DOCUMENT'): Promise<EmbeddingResult[]>
Key implementation detail (lines 18-24):
const response = await client.models.embedContent({
model: this.embeddingModel,
contents: text, // string or string[]
config: {
outputDimensionality: 768, // Google's recommended sweet spot (0.26% loss vs full 3,072D)
taskType, // RETRIEVAL_DOCUMENT for ingestion, RETRIEVAL_QUERY for search
},
});
- Model:
gemini-embedding-001(Google's latest embedding model, #1 on MTEB leaderboard) - Dimensions: 768 (configurable via
outputDimensionality, sweet spot for quality/cost) - Task types:
RETRIEVAL_DOCUMENTfor ingestion,RETRIEVAL_QUERYfor search queries - Both single and batch use the same
embedContentAPI — batch passesstring[]
3.3 VectorStoreService — pgvector Storage & Search
File: src/rag/vector-store.service.ts (119 lines)
This is the persistence layer. Uses raw SQL via Prisma's $executeRawUnsafe / $queryRawUnsafe because Prisma doesn't natively support the vector type.
Storing documents (lines 28-46):
INSERT INTO document_embeddings (id, content, metadata, category, embedding, created_at, updated_at)
VALUES ($1, $2, $3::jsonb, $4, $5::vector, NOW(), NOW())
ON CONFLICT (id) DO UPDATE SET
content = EXCLUDED.content, metadata = EXCLUDED.metadata,
category = EXCLUDED.category, embedding = EXCLUDED.embedding, updated_at = NOW()
- Upsert pattern: inserts or updates on ID conflict
- Embedding serialized as
[0.1,0.2,...]string, cast to::vector
Similarity search (lines 55-98):
SELECT id, content, metadata, category,
1 - (embedding <=> $1::vector) AS score
FROM document_embeddings
WHERE category = $3 -- optional filter
ORDER BY embedding <=> $1::vector
LIMIT $2
- Operator
<=>= cosine distance (pgvector) - Score =
1 - cosine_distance(higher = more similar) - HNSW index used for approximate nearest-neighbor search
Other operations:
deleteDocument(id)— DELETE by IDclear()— TRUNCATE tablegetDocumentCount()— SELECT COUNT(*)
3.4 IngestionService — Document Chunking
File: src/rag/ingestion.service.ts (182 lines)
Handles pre-processing before documents enter the RAG pipeline.
Chunking strategy (lines 148-181):
Default: chunkSize=1000, chunkOverlap=200
Curriculum standards: chunkSize=500, chunkOverlap=100
Rubrics: chunkSize=800, chunkOverlap=150
- Tries to break at natural boundaries:
\n\n→.\n→.→\n - Only breaks at boundary if it's past 50% of chunk size
- Each chunk gets metadata:
sourceId,title,category,chunkIndex,totalChunks
Specialized ingestion methods:
ingestCurriculumStandards()— parses framework/gradeLevel/standards structureingestRubrics()— parses rubric criteria with score level descriptions
3.5 FeedbackLoopService — Adaptive Learning
File: src/rag/feedback-loop.service.ts (204 lines)
Two-path improvement system:
Path 1 — Immediate RAG improvement (lines 57-79): When a user provides a correction with rating ≥ 4:
await this.ragService.ingestDocument({
id: `feedback-${feedback.id}`,
content: `Corrected AI Output for ${input.sessionType}:
Original prompt: ${input.originalPrompt.substring(0, 500)}
Corrected response: ${input.userCorrection}`,
metadata: { category: 'feedback-corrections', ... },
});
→ Future queries can retrieve these corrections as context
Path 2 — Fine-tuning trigger (lines 113-157):
- Threshold: 50+ unused training examples
- Calls
FineTuningService.collectTrainingData()→submitTuningJob() - Marks feedback as
usedForTraining = trueafter submission
3.6 FineTuningService — Model Customization
File: src/rag/fine-tuning.service.ts (275 lines)
Manages Gemini supervised fine-tuning pipeline.
Job submission (lines 69-148):
const tuningJob = await client.tunings.tune({
baseModel, // default: 'gemini-2.0-flash-001'
trainingDataset: {
examples: trainingExamples.map(e => ({
textInput: e.textInput, // original prompt
output: e.output, // corrected response
})),
},
config: {
epochCount: config?.epochCount ?? 5,
learningRateMultiplier: config?.learningRateMultiplier ?? 1.0,
},
});
Job status polling (lines 153-209):
- Checks
client.tunings.get(\{ name: job.geminiJobName \}) - Maps Gemini states:
JOB_STATE_SUCCEEDED→COMPLETED,JOB_STATE_FAILED→FAILED - Updates local DB record with status and tuned model name
Cost: ~$3/1M training tokens (Gemini 2.0 Flash base)
3.7 RagModule — NestJS Wiring
File: src/rag/rag.module.ts (29 lines)
@Module({
imports: [DatabaseModule],
providers: [
EmbeddingService, VectorStoreService, RagService,
IngestionService, FineTuningService, FeedbackLoopService,
],
exports: [
RagService, EmbeddingService, VectorStoreService,
IngestionService, FineTuningService, FeedbackLoopService,
],
})
export class RagModule {}
Note: GenAIService is not imported here — it's provided by the SharedModule which is @Global(). RagModule imports DatabaseModule for PrismaService.
3.8 Database Migration
File: prisma/migrations/20260227000000_add_rag_pgvector_system/migration.sql (65 lines)
-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
-- 3 tables: document_embeddings, ai_feedback, fine_tuning_jobs
-- Key index: HNSW for fast approximate nearest-neighbor (upgraded from IVFFlat)
CREATE INDEX "document_embeddings_embedding_idx" ON "document_embeddings"
USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);
Prisma schema note (schema.prisma:963):
embedding Unsupported("vector(768)")
Prisma uses Unsupported() for pgvector types, which is why all vector operations use raw SQL.
4. Initial Data Ingestion Guide
The RAG system is only useful if it has documents to search. This section explains how to populate the knowledge base for the first time.
4.1 What to Ingest
The RAG system uses categories to filter searches. Each consumer searches a specific category:
| Category | Used By | What to Ingest |
|---|---|---|
rubrics | STEM Evaluations (gemini-llm.provider.ts) | Evaluation rubrics with score criteria |
curriculum-standards | Assignment Creation, Learning Paths | NGSS, Common Core, state standards |
feedback-corrections | Automatic (feedback loop) | User corrections — auto-ingested, no manual action needed |
4.2 Create a Seeding Script
Create src/scripts/seed-rag.ts following the pattern of the existing src/scripts/seed-users.ts:
import { NestFactory } from '@nestjs/core';
import { AppModule } from '../app.module';
import { IngestionService } from '../rag/ingestion.service';
import { RagService } from '../rag/rag.service';
async function seedRag() {
const app = await NestFactory.createApplicationContext(AppModule);
const ingestion = app.get(IngestionService);
const rag = app.get(RagService);
console.log('🌱 Starting RAG knowledge base seed...\n');
// ── Step 1: Ingest Evaluation Rubrics ──────────────────────────
console.log('📋 Ingesting evaluation rubrics...');
const rubricCount = await ingestion.ingestRubrics([
{
name: 'Robot Design Rubric',
assignmentType: 'DESIGN',
criteria: [
{
name: 'Mechanical Structure',
levels: [
{ score: 5, description: 'Innovative design with excellent structural integrity, creative use of components, and optimal weight distribution' },
{ score: 4, description: 'Solid structure with good component usage and functional design' },
{ score: 3, description: 'Adequate structure that functions but has room for optimization' },
{ score: 2, description: 'Basic structure with stability issues or poor component choices' },
{ score: 1, description: 'Incomplete or non-functional mechanical design' },
],
},
{
name: 'Functionality',
levels: [
{ score: 5, description: 'Robot completes all tasks efficiently with consistent performance' },
{ score: 4, description: 'Robot completes most tasks with good reliability' },
{ score: 3, description: 'Robot completes some tasks but with inconsistencies' },
{ score: 2, description: 'Robot has limited functionality or frequent failures' },
{ score: 1, description: 'Robot does not function as intended' },
],
},
{
name: 'Creativity and Innovation',
levels: [
{ score: 5, description: 'Highly original approach with novel solutions to design challenges' },
{ score: 4, description: 'Creative elements with some original thinking' },
{ score: 3, description: 'Standard design approach with minor creative touches' },
{ score: 2, description: 'Mostly follows examples with little original thought' },
{ score: 1, description: 'No creative effort, direct copy of examples' },
],
},
],
},
{
name: 'Code Quality Rubric',
assignmentType: 'CODE',
criteria: [
{
name: 'Logic and Efficiency',
levels: [
{ score: 5, description: 'Elegant, efficient algorithms with optimal time/space complexity' },
{ score: 4, description: 'Well-structured logic with good efficiency' },
{ score: 3, description: 'Functional logic but with unnecessary redundancy' },
{ score: 2, description: 'Logic works partially or has significant inefficiency' },
{ score: 1, description: 'Broken logic or code that does not compile/run' },
],
},
{
name: 'Code Organization',
levels: [
{ score: 5, description: 'Clean separation of concerns, meaningful naming, consistent style throughout' },
{ score: 4, description: 'Good organization with clear naming conventions' },
{ score: 3, description: 'Adequate organization but inconsistent in places' },
{ score: 2, description: 'Poor organization with confusing variable names' },
{ score: 1, description: 'No discernible organization, unreadable code' },
],
},
{
name: 'Comments and Documentation',
levels: [
{ score: 5, description: 'Comprehensive comments explaining why, clear function documentation' },
{ score: 4, description: 'Good comments on complex sections' },
{ score: 3, description: 'Some comments but missing on key logic' },
{ score: 2, description: 'Minimal or unhelpful comments' },
{ score: 1, description: 'No comments at all' },
],
},
],
},
{
name: 'Documentation Rubric',
assignmentType: 'NOTEBOOK',
criteria: [
{
name: 'Process Documentation',
levels: [
{ score: 5, description: 'Detailed chronicle of design process including failures, iterations, and reasoning' },
{ score: 4, description: 'Good documentation of process with clear progression' },
{ score: 3, description: 'Basic documentation that covers main steps' },
{ score: 2, description: 'Incomplete documentation with major gaps' },
{ score: 1, description: 'Little to no process documentation' },
],
},
{
name: 'Clarity and Organization',
levels: [
{ score: 5, description: 'Exceptionally clear writing with logical flow, headings, and visual aids' },
{ score: 4, description: 'Well-organized with clear sections and good readability' },
{ score: 3, description: 'Readable but could benefit from better structure' },
{ score: 2, description: 'Difficult to follow or poorly organized' },
{ score: 1, description: 'Incomprehensible or no organization' },
],
},
],
},
{
name: 'Technical Writing Rubric',
assignmentType: 'ESSAY',
criteria: [
{
name: 'Technical Accuracy',
levels: [
{ score: 5, description: 'All technical claims are accurate and well-supported with evidence' },
{ score: 4, description: 'Mostly accurate with minor oversimplifications' },
{ score: 3, description: 'Generally accurate but with some errors or vague claims' },
{ score: 2, description: 'Multiple technical inaccuracies' },
{ score: 1, description: 'Fundamentally incorrect technical understanding' },
],
},
{
name: 'Writing Quality',
levels: [
{ score: 5, description: 'Excellent grammar, vocabulary, and sentence structure appropriate for age level' },
{ score: 4, description: 'Good writing with minor errors' },
{ score: 3, description: 'Adequate writing but with noticeable errors' },
{ score: 2, description: 'Frequent grammatical errors affecting readability' },
{ score: 1, description: 'Very poor writing quality' },
],
},
],
},
]);
console.log(`✅ Rubrics ingested: ${rubricCount} chunks\n`);
// ── Step 2: Ingest Curriculum Standards ─────────────────────────
console.log('📚 Ingesting curriculum standards...');
const ngssCount = await ingestion.ingestCurriculumStandards({
framework: 'NGSS',
gradeLevel: 'Middle School (6-8)',
standards: [
{
id: 'MS-ETS1-1',
title: 'Define Design Problems',
description: 'Define the criteria and constraints of a design problem with sufficient precision to ensure a successful solution, taking into account relevant scientific principles and potential impacts on people and the natural environment.',
},
{
id: 'MS-ETS1-2',
title: 'Evaluate Competing Solutions',
description: 'Evaluate competing design solutions using a systematic process to determine how well they meet the criteria and constraints of the problem.',
},
{
id: 'MS-ETS1-3',
title: 'Analyze Data from Tests',
description: 'Analyze data from tests to determine similarities and differences among several design solutions to identify the best characteristics of each that can be combined into a new solution.',
},
{
id: 'MS-ETS1-4',
title: 'Develop and Iterate Models',
description: 'Develop a model to generate data for iterative testing and modification of a proposed object, tool, or process such that an optimal design can be achieved.',
},
{
id: 'MS-PS2-1',
title: 'Forces and Motion',
description: 'Apply Newton\'s Third Law to design a solution to a problem involving the motion of two colliding objects. Emphasis is on the change in motion and forces during collision.',
},
{
id: 'MS-PS2-2',
title: 'Plan Investigation on Forces',
description: 'Plan an investigation to provide evidence that the change in an object\'s motion depends on the sum of the forces acting on the object and the mass of the object.',
},
],
});
const csMathCount = await ingestion.ingestCurriculumStandards({
framework: 'Common Core Math',
gradeLevel: 'Middle School (6-8)',
standards: [
{
id: '6.RP.3',
title: 'Ratios and Proportional Relationships',
description: 'Use ratio and rate reasoning to solve real-world and mathematical problems, including those involving unit rates, percentages, and proportional relationships in robotics contexts like gear ratios and speed calculations.',
},
{
id: '7.G.6',
title: 'Geometry - Area and Volume',
description: 'Solve real-world and mathematical problems involving area, volume, and surface area of two- and three-dimensional objects. Applied to robot chassis design, workspace planning, and component fitting.',
},
{
id: '8.F.4',
title: 'Functions - Model Relationships',
description: 'Construct a function to model a linear relationship between two quantities. Applied to sensor calibration, motor speed curves, and PID control in robotics.',
},
{
id: '8.EE.7',
title: 'Expressions and Equations',
description: 'Solve linear equations in one variable, including those with rational number coefficients. Applied to calculating distances, speeds, and timing in robot navigation.',
},
],
});
console.log(`✅ NGSS standards ingested: ${ngssCount} chunks`);
console.log(`✅ Common Core Math standards ingested: ${csMathCount} chunks\n`);
// ── Step 3: Verify ─────────────────────────────────────────────
const stats = await rag.getStats();
console.log(`📊 RAG Knowledge Base Stats:`);
console.log(` Total documents: ${stats.documentCount}`);
console.log('\n🎉 RAG seed completed successfully!');
await app.close();
}
seedRag()
.then(() => process.exit(0))
.catch((error) => {
console.error('❌ Error seeding RAG:', error);
process.exit(1);
});
4.3 Run the Seeding Script
cd stemblockai-backend
# Ensure GenAI is configured (embeddings require Google API)
export LLM_PROVIDER=gemini
export GOOGLE_CLOUD_PROJECT=your-project-id
# ... other GenAI env vars
# Run the seed
npx ts-node src/scripts/seed-rag.ts
Expected output:
🌱 Starting RAG knowledge base seed...
📋 Ingesting evaluation rubrics...
✅ Rubrics ingested: 12 chunks
📚 Ingesting curriculum standards...
✅ NGSS standards ingested: 6 chunks
✅ Common Core Math standards ingested: 4 chunks
📊 RAG Knowledge Base Stats:
Total documents: 22
🎉 RAG seed completed successfully!
4.4 Add to package.json
{
"scripts": {
"seed:users": "ts-node src/scripts/seed-users.ts",
"seed:rag": "ts-node src/scripts/seed-rag.ts",
"seed:all": "npm run seed:users && npm run seed:rag"
}
}
4.5 Verify Data in Database
-- Check document counts by category
SELECT category, COUNT(*) as count
FROM document_embeddings
GROUP BY category
ORDER BY count DESC;
-- Expected output:
-- rubrics | 12
-- curriculum-standards | 10
4.6 Ingestion for Additional Content
You can ingest additional content at any time using the IngestionService methods:
Custom documents (any category):
await ingestionService.ingestDocument({
id: 'custom-doc-001',
title: 'VEX Robotics Competition Rules 2026',
content: '... full text ...',
category: 'competition-rules',
source: 'vex-robotics.com',
});
Additional standards (e.g., state-specific):
await ingestionService.ingestCurriculumStandards({
framework: 'Texas TEKS',
gradeLevel: 'Grade 8',
standards: [
{ id: 'TEKS-8.6A', title: '...', description: '...' },
],
});
4.7 Chunking Behavior
Understanding how documents get split is important for quality:
| Content Type | Chunk Size | Overlap | Why |
|---|---|---|---|
| General documents | 1000 chars | 200 | Standard — balances context window usage and retrieval precision |
| Curriculum standards | 500 chars | 100 | Shorter — each standard is a self-contained concept |
| Rubrics | 800 chars | 150 | Medium — criteria need enough context to be useful |
Overlap means consecutive chunks share text at their boundaries. This prevents a relevant sentence from being split across two chunks where neither chunk alone has enough context.
Example with chunkSize=20, overlap=5:
Original: "The robot must navigate the obstacle course within 60 seconds"
Chunk 1: "The robot must navig"
Chunk 2: "navig ate the obstac" ← "navig" overlaps
Chunk 3: "bstac le course with"
5. Evaluating RAG Results
5.1 Quick Smoke Test
After ingesting data, verify the RAG pipeline end-to-end:
// In a NestJS test or script
const ragService = app.get(RagService);
// Test 1: Query the rubrics category
const rubricResult = await ragService.query({
question: 'What are the criteria for evaluating robot design?',
category: 'rubrics',
topK: 3,
});
console.log('Answer:', rubricResult.answer);
console.log('Sources:', rubricResult.sources.length);
console.log('Top score:', rubricResult.sources[0]?.score);
// Test 2: Query curriculum standards
const curriculumResult = await ragService.query({
question: 'What NGSS standards apply to engineering design?',
category: 'curriculum-standards',
topK: 3,
});
console.log('Answer:', curriculumResult.answer);
console.log('Sources:', curriculumResult.sources.length);
5.2 What Good Results Look Like
Similarity scores (cosine similarity, 0.0 to 1.0):
| Score Range | Meaning | Action |
|---|---|---|
| 0.85 - 1.0 | Excellent match | Content is highly relevant |
| 0.70 - 0.85 | Good match | Content is relevant, usable as context |
| 0.50 - 0.70 | Weak match | Content is tangentially related, may add noise |
| Below 0.50 | Poor match | Content is not relevant, should not be used |
For your system: Scores above 0.70 are typical for rubric/standard retrieval. If top scores are consistently below 0.60, the ingested content may not match the queries well — check chunking or add more specific content.
5.3 Evaluate RAG Impact on Evaluations
Compare evaluation quality with and without RAG:
# Step 1: Run evaluation WITHOUT RAG (empty knowledge base)
# The system gracefully handles this — gemini-llm.provider.ts:87 catches errors
# Step 2: Ingest rubrics
npm run seed:rag
# Step 3: Run the SAME evaluation again WITH RAG
# Step 4: Compare the two outputs:
# - Are category scores more consistent with rubric criteria?
# - Does feedback reference specific rubric language?
# - Are improvements more actionable?
5.4 Monitoring in Production
Key metrics to track:
| Metric | How to Measure | Target |
|---|---|---|
| RAG retrieval latency | Time from query to context returned | < 500ms |
| Top-1 similarity score | sources[0].score from retrieval | > 0.70 average |
| Context utilization | Does the AI response reference retrieved sources? | Qualitative review |
| Feedback loop growth | SELECT COUNT(*) FROM ai_feedback | Steady growth |
| Knowledge base size | ragService.getStats().documentCount | Growing via feedback |
Check via SQL:
-- Knowledge base growth over time
SELECT DATE(created_at) as date, COUNT(*) as new_docs
FROM document_embeddings
GROUP BY DATE(created_at)
ORDER BY date DESC
LIMIT 30;
-- Feedback corrections ingested to RAG
SELECT COUNT(*) as feedback_in_rag
FROM document_embeddings
WHERE category = 'feedback-corrections';
-- Feedback waiting for fine-tuning
SELECT COUNT(*) as unused_feedback
FROM ai_feedback
WHERE used_for_training = false AND user_correction IS NOT NULL;
5.5 Debugging Poor Results
Problem: AI evaluation doesn't use rubric criteria
- Check if documents exist:
SELECT COUNT(*) FROM document_embeddings WHERE category = 'rubrics'; - Check retrieval scores — run a test query and inspect
sources[].score - If scores are low, the query terms may not match the ingested content. Consider:
- Adding more rubric variations
- Making rubric descriptions more detailed
- Adjusting
topK(try 5 instead of 3)
Problem: Wrong category documents retrieved
- Verify the category filter is working:
SELECT category, COUNT(*) FROM document_embeddings GROUP BY category; - Check that ingestion uses the correct category strings (must exactly match the filter in the consumer code)
Problem: Retrieval is slow (>2s)
- Verify the HNSW index exists:
SELECT indexdef FROM pg_indexes WHERE indexname = 'document_embeddings_embedding_idx'; - If missing, create it (see migration
20260227100000_upgrade_ivfflat_to_hnsw_index) - If on Neon, check if compute was cold-started (first query after idle is slow)
5.6 RAG Quality Improvement Cycle
Week 1: Seed initial rubrics + standards
→ Run evaluations, collect baseline quality
Week 2: Coaches review AI evaluations, provide corrections
→ High-quality corrections auto-ingest to RAG (rating ≥ 4)
→ RAG results improve immediately
Week 4: 50+ corrections accumulated
→ Fine-tuning triggered automatically
→ Tuned model available for future evaluations
Ongoing: More corrections → better RAG context → better evaluations
→ fewer corrections needed → system stabilizes
6. Neon PostgreSQL Migration
The existing NEON_MIGRATION_GUIDE.md covers the general migration steps. Below are RAG-specific concerns that require additional attention when migrating to Neon.
6.1 pgvector Support on Neon
Neon has native pgvector support — no extra setup needed. However:
| Concern | Current (DO/Self-hosted) | Neon | Action Needed |
|---|---|---|---|
| pgvector extension | Manually installed | Pre-installed | None — CREATE EXTENSION IF NOT EXISTS vector still works |
| pgvector version | Depends on PG version | Latest (0.7+) | Verify SELECT extversion FROM pg_extension WHERE extname = 'vector' |
| IVFFlat index | Works | Works | None — supported natively |
| HNSW index | May not be available | Supported | Consider upgrading from IVFFlat to HNSW for better recall (see 3.3) |
| Max dimensions | 2000 | 2000 | None — we use 768 |
vector type in raw SQL | Works | Works | None |
6.2 Connection Pooling Impact on RAG
Neon uses PgBouncer for connection pooling. This affects raw SQL queries:
Potential issue: $executeRawUnsafe with parameterized queries may conflict with PgBouncer's transaction pooling mode.
Current code pattern in vector-store.service.ts:
await this.prisma.$executeRawUnsafe(
`INSERT INTO document_embeddings ... VALUES ($1, $2, $3::jsonb, $4, $5::vector, ...)`,
doc.id, doc.content, JSON.stringify(doc.metadata), ...
);
Recommendation:
- Use Neon's direct connection string (not pooled) for the
DATABASE_URLused by Prisma migrations - Use the pooled connection string for application runtime
- Add to Prisma schema:
datasource db {
provider = "postgresql"
url = env("DATABASE_URL") // pooled (runtime)
directUrl = env("DIRECT_DATABASE_URL") // direct (migrations)
}
Environment variables to add:
# Neon pooled connection (for app runtime)
DATABASE_URL="postgresql://user:pass@ep-xyz-pooler.us-east-2.aws.neon.tech/stemblock_db?sslmode=require"
# Neon direct connection (for migrations)
DIRECT_DATABASE_URL="postgresql://user:pass@ep-xyz.us-east-2.aws.neon.tech/stemblock_db?sslmode=require"
6.3 Index Migration: IVFFlat → HNSW (Recommended)
Current migration uses IVFFlat:
CREATE INDEX ... USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
Neon supports HNSW which has better recall and doesn't require training on existing data:
-- Drop old index
DROP INDEX IF EXISTS document_embeddings_embedding_idx;
-- Create HNSW index (better for dynamic datasets)
CREATE INDEX document_embeddings_embedding_idx ON document_embeddings
USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);
| Factor | IVFFlat | HNSW |
|---|---|---|
| Recall | ~95% | ~99% |
| Build time | Fast | Slower |
| Query speed | Fast | Slightly slower |
| Dynamic inserts | Needs periodic re-indexing after bulk inserts | Handles inserts well |
| Best for | Static datasets | Growing datasets (RAG with feedback loop) |
Since the feedback loop continuously ingests new documents, HNSW is the better choice for Neon.
6.4 Cold Start Impact on RAG
Neon scales to zero after 5 min idle. The first RAG query after cold start will:
- Wake the database (~500ms-2s)
- Load pgvector extension
- Execute vector search
Mitigation options:
- Accept ~2s latency on first query (acceptable for background AI evaluation)
- Set minimum compute to 0.25 CU to prevent full cold start
- The existing health check endpoint can serve as a keep-alive
6.5 Storage Considerations
RAG data storage per document:
content: TEXT (~1KB per chunk)metadata: JSONB (~200 bytes)embedding: vector(768) = 768 × 4 bytes = 3,072 bytes- Total per chunk: ~4.3 KB
Neon storage tiers:
| Plan | Storage | Est. Document Capacity |
|---|---|---|
| Free | 512 MB | ~120K chunks |
| Launch ($19/mo) | 10 GB | ~2.3M chunks |
| Scale ($69/mo) | 50 GB | ~22M chunks |
For a STEM education platform, even the Free tier can hold substantial curriculum + rubric data.
6.6 Migration Steps Specific to RAG
After completing the general migration from NEON_MIGRATION_GUIDE.md:
# 1. Verify pgvector extension exists on Neon
psql "$NEON_URL" -c "SELECT * FROM pg_extension WHERE extname = 'vector';"
# 2. If missing (shouldn't be), enable it
psql "$NEON_URL" -c "CREATE EXTENSION IF NOT EXISTS vector;"
# 3. Verify RAG tables migrated
psql "$NEON_URL" -c "SELECT COUNT(*) FROM document_embeddings;"
psql "$NEON_URL" -c "SELECT COUNT(*) FROM ai_feedback;"
psql "$NEON_URL" -c "SELECT COUNT(*) FROM fine_tuning_jobs;"
# 4. Verify vector index exists
psql "$NEON_URL" -c "
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'document_embeddings'
AND indexdef LIKE '%vector%';
"
# 5. Test vector search
psql "$NEON_URL" -c "
SELECT id, 1 - (embedding <=> (SELECT embedding FROM document_embeddings LIMIT 1)) AS score
FROM document_embeddings
ORDER BY embedding <=> (SELECT embedding FROM document_embeddings LIMIT 1)
LIMIT 5;
"
6.7 Prisma Schema Change Required
Add directUrl to support Neon's dual-connection architecture:
File: prisma/schema.prisma (line 8-11)
datasource db {
provider = "postgresql"
url = env("DATABASE_URL")
+ directUrl = env("DIRECT_DATABASE_URL")
}
7. Google GenAI SDK Changes
7.1 Current State: Already on @google/genai
The codebase has already migrated from @google-cloud/aiplatform (old Vertex AI SDK) to @google/genai (new unified GenAI SDK). Here's what's in place:
Package: @google/genai v1.43.0 (in package.json)
Initialization (genai.service.ts:38-52):
import { GoogleGenAI } from '@google/genai';
const options = {
vertexai: true, // Uses Vertex AI backend (not AI Studio)
project: projectId,
location,
};
if (serviceAccountKeyBase64) {
const credentials = JSON.parse(Buffer.from(serviceAccountKeyBase64, 'base64').toString('utf-8'));
options.googleAuthOptions = { credentials };
}
this.client = new GoogleGenAI(options);
7.2 Key Differences from Old SDK
The existing VERTEX_AI_SETUP.md references the old SDK patterns. Here's what changed:
| Aspect | Old (@google-cloud/aiplatform) | Current (@google/genai) |
|---|---|---|
| Package | @google-cloud/aiplatform | @google/genai |
| Import | const \{ VertexAI \} = require(...) | import \{ GoogleGenAI \} from '@google/genai' |
| Init | new VertexAI(\{ project, location \}) | new GoogleGenAI(\{ vertexai: true, project, location \}) |
| Generate | model.generateContent(\{ contents: [...] \}) | client.models.generateContent(\{ model, contents, config \}) |
| System prompt | Separate parameter | Via config.systemInstruction |
| Embeddings | const model = vertex.preview.getGenerativeModel(...) | client.models.embedContent(\{ model, contents, config \}) |
| Fine-tuning | Not available in SDK | client.tunings.tune(\{ baseModel, trainingDataset, config \}) |
| Context caching | Not available | client.caches.create(\{ model, config \}) |
| Models | gemini-1.5-flash, gemini-1.5-pro | gemini-2.5-flash, gemini-2.5-pro, gemini-2.5-flash-lite |
| Auth | GOOGLE_APPLICATION_CREDENTIALS file only | File, base64-encoded, or default credentials |
7.3 API Surface Used (All @google/genai Methods)
Here's every GenAI SDK method used across the codebase:
| Method | Used In | Purpose |
|---|---|---|
client.models.generateContent() | genai.service.ts:79, context-cache.service.ts:118 | Text generation (evaluations, feedback, writing) |
client.models.embedContent() | embedding.service.ts:18, embedding.service.ts:40 | Vector embeddings for RAG |
client.tunings.tune() | fine-tuning.service.ts:107 | Submit fine-tuning jobs |
client.tunings.get() | fine-tuning.service.ts:170 | Check fine-tuning job status |
client.caches.create() | context-cache.service.ts:57 | Create server-side context cache |
client.caches.delete() | context-cache.service.ts:156 | Delete context cache |
7.4 Changes Needed If Updating SDK Version
If upgrading @google/genai from v1.43.0 to a newer version, watch for:
1. Breaking changes in embedContent response shape:
// Current (v1.43.0) — embedding.service.ts:26
const embedding = response.embeddings?.[0]?.values;
Check if the response structure changes (e.g., response.embedding.values vs response.embeddings[0].values).
2. Tuning API changes:
// Current — fine-tuning.service.ts:107
const tuningJob = await client.tunings.tune({ ... });
The tuning API is relatively new. Method names and parameters may evolve (e.g., tune() → create()).
3. Context caching TTL format:
// Current — context-cache.service.ts:67
ttl: `${ttl}s`, // String format "3600s"
Google has been inconsistent with duration formats. Verify s suffix is still required.
4. Model name changes:
// Current models
flashModel = 'gemini-2.5-flash'
proModel = 'gemini-2.5-pro'
liteModel = 'gemini-2.5-flash-lite'
These are configured via env vars, so model upgrades (e.g., to gemini-3.0-flash) only need env var changes.
7.5 Authentication Modes
The current GenAIService supports three auth modes (genai.service.ts:34-51):
| Mode | When Used | Env Var |
|---|---|---|
| Base64 service account key | Non-GCP environments (DigitalOcean) | GCP_SERVICE_ACCOUNT_KEY_BASE64 |
| Service account JSON file | Local development | GOOGLE_APPLICATION_CREDENTIALS |
| Default credentials | GCP environments (Cloud Run, GKE) | None needed |
No changes needed for Neon migration — auth is independent of the database provider.
7.6 Documents Needing Updates
The VERTEX_AI_SETUP.md is outdated and references:
- Old models:
gemini-1.5-flash-8b,gemini-1.5-pro-002 - Old SDK patterns
- No mention of RAG, embeddings, fine-tuning, or context caching
Recommended updates:
- Update model references to
gemini-2.5-*family - Add RAG embedding model (
gemini-embedding-001) documentation - Add context caching setup notes
- Add fine-tuning API documentation
- Update IAM roles: add
Vertex AI Tuning Userif using fine-tuning
8. Action Items Summary
For Neon Migration (RAG-specific)
| # | Action | Priority | Effort |
|---|---|---|---|
| 1 | Add directUrl to prisma/schema.prisma | High | 5 min |
| 2 | Add DIRECT_DATABASE_URL to all environments | High | 15 min |
| 3 | Verify pgvector extension on Neon after migration | High | 5 min |
| 4 | Test vector search queries on Neon | High | 30 min |
| 5 | Consider upgrading IVFFlat → HNSW index | Medium | 1 hour |
| 6 | Benchmark RAG query latency on Neon (including cold start) | Medium | 2 hours |
| 7 | Set minimum compute to 0.25 CU if cold start latency unacceptable | Low | 5 min |
For GenAI SDK
| # | Action | Priority | Effort |
|---|---|---|---|
| 1 | Update VERTEX_AI_SETUP.md to reflect @google/genai SDK and 2.5 models | Medium | 2 hours |
| 2 | Pin @google/genai version in package.json (avoid ^ prefix) | Medium | 5 min |
| 3 | Add Vertex AI Tuning User IAM role if fine-tuning is used in production | Medium | 10 min |
| 4 | Monitor @google/genai changelog for breaking changes in embedding/tuning APIs | Low | Ongoing |
For RAG System
| # | Action | Priority | Effort |
|---|---|---|---|
| 1 | No public API controller exists — add REST endpoints if admin needs RAG management | Low | 4 hours |
| 2 | Add monitoring/metrics for RAG query latency and cache hit rates | Medium | 3 hours |
| 3 | Consider adding a seeding script for initial curriculum/rubric data | Medium | 2 hours |
Document Version: 2.0 Last Updated: February 27, 2026 Author: StemBlock AI Engineering Team