Skip to main content

StemBlock AI: Estimated AI Model Training Costs — 4-Month Plan

Prepared for: Funding Committee Date: February 27, 2026 Confidential


Executive Summary

StemBlock AI is building a custom AI training pipeline to deliver personalized, curriculum-aligned STEM and English writing education for K-12 students. This document outlines the estimated AI usage costs of training and deploying our AI models over a 4-month period (March – June 2026).

Total Estimated AI Usage Cost: $1,850 – $4,900

This investment enables StemBlock AI to:

  • Reduce per-evaluation AI costs by 95%+ through model upgrade + caching
  • Improve evaluation accuracy from ~75% to ~90% expert agreement
  • Generate grade-appropriate assignments aligned to NGSS and Common Core standards
  • Deliver adaptive, personalized learning paths based on individual student progress
  • Build a defensible competitive moat through proprietary training data and fine-tuned models

Note: Development costs are excluded from this estimate. All engineering work is handled internally at no additional cost.


1. Current State & Baseline

Current AI Architecture

  • Models: Gemini 1.5 Flash / 1.5 Pro via @google-cloud/vertexai SDK
  • Provider: Google Vertex AI (with Mistral and Claude as fallbacks)
  • Caching: In-memory LRU, 1-hour TTL, 500 entries max
  • Current customers: 1 (early stage)
  • Current monthly AI infrastructure: < $30/month

Planned Upgrades

ChangeFromTo
SDK@google-cloud/vertexai@google/genai (Gen AI SDK)
Flash modelgemini-1.5-flash-002gemini-2.5-flash
Pro modelgemini-1.5-pro-002gemini-3.1-pro
Lite model(none)gemini-2.5-flash-lite
CachingIn-memory, 1hr TTLMulti-layer: context cache + Redis + semantic

2. AI Training Strategy

Our approach combines three complementary techniques:

StrategyPurposeAI Cost
Model Upgrade (1.5 → 3.1/2.5)Better reasoning, 70-95% lower cost per token$0 (config change)
RAG (Retrieval-Augmented Generation)Ground AI in curriculum standards & education content$25–100 (embeddings)
Supervised Fine-TuningTrain specialized models for STEM eval, writing, assignments$300–1,400 (compute)
Proper CachingReduce redundant API calls by 50-85%$0–240 (Redis, optional)
Neon + pgvectorScalable vector DB for RAG, no separate service needed$0–15/mo (included in Neon plan)

Why NOT Train From Scratch?

Training a custom foundation model would cost $500K–$2M+. Instead, we:

  1. Use Gemini as the foundation — world-class reasoning at $0.10–$2.00/1M tokens
  2. Add RAG — retrieves curriculum standards, rubrics, and exemplars at query time
  3. Fine-tune — teaches the model our evaluation style using our own labeled data
  4. Cache aggressively — eliminates 50-85% of repeat API calls

This is 100-1,000x more cost-effective than building from scratch.


3. Detailed AI Usage Cost Breakdown

3.1 Gemini Model Pricing (Current vs. Upgraded)

ModelInput $/1M tokensOutput $/1M tokensUse Case
gemini-1.5-flash-002 (current)$0.075$0.30Being replaced
gemini-1.5-pro-002 (current)$1.25$5.00Being replaced
gemini-2.5-flash (new default)$0.30$2.50Evaluations, feedback, learning paths
gemini-2.5-flash-lite (new lite)$0.10$0.40Moderation, parent insights, assignments
gemini-3.1-pro (new pro)$2.00$12.00Writing assessment (quality-critical)

3.2 Operational Inference Costs (1 Customer, 4 Months)

With 1 customer (estimated monthly volumes):

ServiceMonthly VolumeModelInput TokensOutput TokensCache Hit %Monthly Cost
STEM Evaluations~200-5002.5 Flash1.5M0.4M50%$0.73
Writing Moderation~50-1002.5 Flash-Lite0.15M0.03M30%$0.02
Writing Feedback~50-1002.5 Flash0.18M0.07M30%$0.16
Writing Assessment~50-1002.5 Flash0.14M0.05M30%$0.12
Coach Feedback~50-2002.5 Flash0.4M0.1M40%$0.22
Parent Insights~20-502.5 Flash-Lite0.05M0.02M85%$0.00
Assignment Gen~10-302.5 Flash-Lite0.03M0.03M80%$0.00
Learning Paths~10-202.5 Flash0.05M0.02M65%$0.02
Monthly Total$1.27
4-Month Total$5.08

Key insight: With Gemini 2.5 Flash + proper caching, operational inference for 1 customer costs approximately $1–2/month. Even at 10x current volume, it would be under $15/month.

3.3 Caching Infrastructure

OptionMonthly Cost4-Month TotalNotes
In-memory (enhanced)$0$0Extend current LRU cache to 7-30 day TTL, increase max entries
Redis (self-hosted on existing infra)$0$0Run alongside backend on existing server
Redis Cloud (managed, free tier)$0$030MB free on Redis Cloud
Redis Cloud (paid, if needed)$5–15$20–60Only if exceeding free tier
Gemini Context Caching~$1–5$4–2090% discount on cached system prompts

Recommended: Start with enhanced in-memory cache (free), add Redis later if needed.

4-month caching cost: $0 – $80

3.4 RAG System Costs

Infrastructure Update (Feb 2026): We are migrating from DigitalOcean Managed PostgreSQL to Neon Serverless Postgres with native pgvector support. This eliminates the need for a separate vector database (Chroma/Qdrant) — embeddings are stored directly in PostgreSQL alongside application data via the document_embeddings table with HNSW indexing (768-dimension vectors using gemini-embedding-001).

ComponentCostNotes
HuggingFace datasets$0Open-source (FineWeb-Edu, essay datasets, code datasets)
VEX Robotics curriculum$0–500Some materials may require licensing
NGSS / Common Core standards$0Public domain
Embedding generation (gemini-embedding-001)$35–75One-time cost: ~5GB corpus at $0.15/1M tokens
Vector database (pgvector on Neon)$0Included in Neon plan — no separate vector DB service needed
Cloud storage (GCS for raw data)$1–5/mo~5GB compressed at $0.02/GB/month

4-month RAG cost: $30 – $70

Why pgvector on Neon Instead of a Separate Vector DB?

FactorSeparate Vector DB (Chroma/Qdrant)pgvector on Neon
Operational cost$0–100/mo (managed) or DevOps overhead (self-hosted)$0 (included in existing database)
Deployment complexityAdditional service to manage, monitor, and scaleSingle database — no additional infrastructure
Data consistencySeparate system, eventual consistency with app DBSame transaction as application data
ScalabilityMust scale independentlyScales with Neon autoscaling (0.25–16 CU)
PerformanceDedicated, optimized for vectorsExcellent for <1M vectors with HNSW indexing
pgvector supportN/ANative Neon extension, no extra cost

For our education corpus (~50K–200K document chunks), pgvector on Neon is more than sufficient and eliminates an entire service from our infrastructure.

3.5 Fine-Tuning Compute (Vertex AI)

Training on Gemini 2.0 Flash (the only Gemini model currently supporting supervised fine-tuning):

ModelTraining ExamplesTokens per ExampleEpochsTotal Training TokensCost at $3/1M
stemblock-eval-v1 (STEM evaluation)2,0002,000520M$60
stemblock-writing-v1 (writing assessment)1,0002,000510M$30
stemblock-assignment-v1 (assignment gen)5002,00055M$15
Hyperparameter experiments (3x runs)35M × 3$315
Evaluation/benchmarking (test inference)~5M$15
Subtotal$435

Note: Fine-tuning on Gemini is extremely cost-effective. The entire training compute for all 3 models is under $500. If we use fewer examples or fewer hyperparameter experiments, it could be under $150.

4-month fine-tuning cost: $150 – $450

3.6 Scaling Projections (Future Growth)

Even as customer count grows, costs remain manageable:

CustomersMonthly EvaluationsMonthly AI Cost (with cache)Annual
1~500$1–2$12–24
10~5,000$10–20$120–240
50~25,000$50–100$600–1,200
100~50,000$100–200$1,200–2,400
500~250,000$500–1,000$6,000–12,000

With Gemini 2.5 Flash-Lite at $0.10/$0.40 per 1M tokens + aggressive caching, StemBlock AI can serve 500 customers for under $1,000/month in AI costs.


4. Total 4-Month AI Cost Summary

CategoryLow EstimateHigh Estimate
Operational inference (4 months, 1 customer)$5$20
Caching infrastructure$0$80
RAG setup (embeddings + storage, pgvector on Neon)$30$70
Data licensing (VEX curriculum, if needed)$0$500
Fine-tuning compute (3 models + experiments)$150$450
Gemini context caching storage$4$20
Neon database infrastructure (4 months)$0$60
Contingency (20%)$38$240
TOTAL AI USAGE COST$227$1,440

Can We Stay Under $5,000?

Yes, comfortably. The total AI usage cost for the 4-month training program is estimated at $227 – $1,440, well within a $5,000 budget. The migration to Neon + pgvector further reduces costs by eliminating the need for a separate vector database service.

The $5,000 budget provides a 3.5x – 22x safety margin, allowing for:

  • Additional fine-tuning experiments
  • Larger training datasets
  • Higher-quality embedding models
  • Optional managed services (Redis Cloud)
  • Scale testing with simulated traffic
  • Extended Gemini 3.1 Pro usage for quality-critical tasks
  • Neon Scale plan upgrade if vector query volume demands it
CategoryBudget% of $5,000
Fine-tuning compute + experiments$1,50030%
RAG embeddings (pgvector storage on Neon)$3006%
Neon database infrastructure (4 months)$3006%
Operational inference (4 months)$2004%
Caching infrastructure$2004%
Contingency / future scaling$2,50050%
Total$5,000100%

5. Why Costs Are So Low

Three factors make this possible:

5.1 Gemini 2.5 Flash-Lite Pricing Revolution

At $0.10 per 1M input tokens and $0.40 per 1M output tokens, Gemini 2.5 Flash-Lite is one of the cheapest production LLMs available. For context:

  • 1 million tokens ≈ 750,000 words ≈ 3,000 essays
  • A single STEM evaluation (~3,000 tokens) costs $0.0004 (less than 1/10th of a cent)

5.2 Aggressive Multi-Layer Caching

Cache LayerMechanismSavings
Gemini Context CacheSystem prompts cached server-side, 90% discount30-50% on input tokens
Response Cache (Redis/in-memory)Full response stored for identical requests50-85% of API calls eliminated
Semantic Cache (future)Similar queries hit cache via embedding similarityAdditional 10-20%

5.3 Fine-Tuning is Token-Priced, Not GPU-Priced

Gemini 2.0 Flash fine-tuning charges $3 per 1M training tokens — meaning you pay per data processed, not per GPU-hour. Training 3 specialized models with 3,500 examples costs under $500.


6. Return on Investment

6.1 Cost Savings (vs. Current Architecture)

MetricCurrent (Mistral, no cache)After (Gemini 2.5 + cache)Savings
Cost per STEM evaluation$0.67$0.00199.8%
Cost per writing assessment$0.88$0.00399.7%
Monthly operational (1 customer)~$30~$293%
Projected monthly (100 customers)~$7,125~$15098%

6.2 Quality Improvements

MetricCurrentAfter RAG + Fine-Tuning
Evaluation accuracy (vs expert)~75% agreement~90% agreement
Grade-appropriate contentGenericPrecisely targeted K-12
Assignment quality (coach rating)3.2/54.5/5 expected
Curriculum alignmentLowHigh (NGSS/Common Core grounded)
Response time2-4 seconds1-2 seconds

6.3 Competitive Advantage

  • Proprietary training data improves with each student interaction
  • RAG-grounded evaluations required by school districts for adoption
  • Sub-cent evaluation costs enable aggressive pricing vs. competitors
  • Fine-tuned models create a moat that takes months to replicate

7. Training Data Sources

DatasetSourceSizeCostPurpose
FineWeb-EduHuggingFace2 GB filteredFreeGeneral K-12 knowledge
ASAP Essay ScoringHuggingFace/Kaggle500 MBFreeWriting assessment ground truth
Common Core ExemplarsPublic domain200 MBFreeGrade-level writing standards
NGSS StandardsPublic domain100 MBFreeSTEM curriculum alignment
VEX Curriculum GuidesVEX Robotics500 MB$0–500Robotics evaluation context
The Stack v2 (edu subset)HuggingFace3 GB filteredFreeCode quality evaluation
Writing PromptsHuggingFace1 GBFreeCreative writing evaluation
StemBlock Internal DataPlatform dataGrowingFreeOur unique evaluation style

8. Monthly Milestone Deliverables

Month 1 (March 2026)

  • Migrate SDK from @google-cloud/vertexai to @google/genai
  • Upgrade models to Gemini 2.5 Flash / 3.1 Pro / 2.5 Flash-Lite
  • Implement multi-layer caching (context cache + extended TTL)
  • Migrate database from DigitalOcean to Neon Serverless Postgres
  • Enable pgvector extension and deploy document_embeddings table
  • Download and curate HuggingFace training datasets

Month 2 (April 2026)

  • Deploy RAG pipeline (pgvector on Neon + embeddings + query service)
  • Integrate RAG context into evaluation and assignment services
  • Begin training data labeling from internal evaluation data
  • First fine-tuning experiment (STEM evaluation model)

Month 3 (May 2026)

  • Deploy fine-tuned STEM evaluation model (shadow mode)
  • Complete writing assessment fine-tuning
  • Enhance adaptive learning paths with RAG context
  • A/B testing: base model vs. fine-tuned

Month 4 (June 2026)

  • All fine-tuned models in production
  • Adaptive learning with personal objectives live
  • Performance monitoring operational
  • Quality benchmark report delivered

9. Risk Management

RiskProbabilityImpactMitigation
Fine-tuned model underperformsLowMediumRAG provides most value; fallback to base model
Gemini 3.1 Pro pricing changesLowLow2.5 Flash handles 90% of tasks; Pro is optional
Training data quality issuesMediumLowStart small, iterate; internal data is highest quality
SDK migration issuesLowLowGoogle provides migration guide; deadline is June 2026
Neon cold start latencyLowLowHealth check ping every 4 min prevents scale-to-zero in production
Neon migration data lossVery LowMediumKeep DigitalOcean running 1 week post-migration; full backup before cutover
Costs exceed $5,000Very LowLow$3,500+ contingency buffer in budget

10. Conclusion

The $5,000 budget is more than sufficient for the complete 4-month AI training program. Estimated actual AI usage costs are $227 – $1,968, providing a substantial safety margin.

Key factors enabling this cost efficiency:

  1. Gemini 2.5 Flash-Lite at $0.10/$0.40 per 1M tokens (99% cheaper than current setup)
  2. Multi-layer caching eliminating 50-85% of API calls
  3. Token-based fine-tuning pricing ($3/1M training tokens) vs. expensive GPU hours
  4. Open-source training data from HuggingFace (no data acquisition costs)
  5. Neon + pgvector — vector storage in the same database as application data, eliminating a separate vector DB service
  6. Neon serverless scaling — scale-to-zero when idle, autoscale under load, pay only for what you use

The investment produces 99%+ cost reduction per evaluation, 15-20% quality improvement, and a proprietary AI advantage that compounds over time — all for under $5,000 in direct AI costs.


For questions about this estimate, please contact the StemBlock AI engineering team.

Sources: