StemBlock AI: Estimated AI Model Training Costs — 4-Month Plan

Prepared for: Funding Committee Date: February 27, 2026 Confidential

Executive Summary

StemBlock AI is building a custom AI training pipeline to deliver personalized, curriculum-aligned STEM and English writing education for K-12 students. This document outlines the estimated AI usage costs of training and deploying our AI models over a 4-month period (March – June 2026).

Total Estimated AI Usage Cost: $1,850 – $4,900

This investment enables StemBlock AI to:

Reduce per-evaluation AI costs by 95%+ through model upgrade + caching
Improve evaluation accuracy from ~75% to ~90% expert agreement
Generate grade-appropriate assignments aligned to NGSS and Common Core standards
Deliver adaptive, personalized learning paths based on individual student progress
Build a defensible competitive moat through proprietary training data and fine-tuned models

Note: Development costs are excluded from this estimate. All engineering work is handled internally at no additional cost.

1. Current State & Baseline

Current AI Architecture

Models: Gemini 1.5 Flash / 1.5 Pro via @google-cloud/vertexai SDK
Provider: Google Vertex AI (with Mistral and Claude as fallbacks)
Caching: In-memory LRU, 1-hour TTL, 500 entries max
Current customers: 1 (early stage)
Current monthly AI infrastructure: < $30/month

Planned Upgrades

Change	From	To
SDK	`@google-cloud/vertexai`	`@google/genai` (Gen AI SDK)
Flash model	`gemini-1.5-flash-002`	`gemini-2.5-flash`
Pro model	`gemini-1.5-pro-002`	`gemini-3.1-pro`
Lite model	(none)	`gemini-2.5-flash-lite`
Caching	In-memory, 1hr TTL	Multi-layer: context cache + Redis + semantic

2. AI Training Strategy

Our approach combines three complementary techniques:

Strategy	Purpose	AI Cost
Model Upgrade (1.5 → 3.1/2.5)	Better reasoning, 70-95% lower cost per token	$0 (config change)
RAG (Retrieval-Augmented Generation)	Ground AI in curriculum standards & education content	$25–100 (embeddings)
Supervised Fine-Tuning	Train specialized models for STEM eval, writing, assignments	$300–1,400 (compute)
Proper Caching	Reduce redundant API calls by 50-85%	$0–240 (Redis, optional)
Neon + pgvector	Scalable vector DB for RAG, no separate service needed	$0–15/mo (included in Neon plan)

Why NOT Train From Scratch?

Training a custom foundation model would cost $500K–$2M+. Instead, we:

Use Gemini as the foundation — world-class reasoning at $0.10–$2.00/1M tokens
Add RAG — retrieves curriculum standards, rubrics, and exemplars at query time
Fine-tune — teaches the model our evaluation style using our own labeled data
Cache aggressively — eliminates 50-85% of repeat API calls

This is 100-1,000x more cost-effective than building from scratch.

3. Detailed AI Usage Cost Breakdown

3.1 Gemini Model Pricing (Current vs. Upgraded)

Model	Input $/1M tokens	Output $/1M tokens	Use Case
~~gemini-1.5-flash-002~~ (current)	$0.075	$0.30	Being replaced
~~gemini-1.5-pro-002~~ (current)	$1.25	$5.00	Being replaced
gemini-2.5-flash (new default)	$0.30	$2.50	Evaluations, feedback, learning paths
gemini-2.5-flash-lite (new lite)	$0.10	$0.40	Moderation, parent insights, assignments
gemini-3.1-pro (new pro)	$2.00	$12.00	Writing assessment (quality-critical)

3.2 Operational Inference Costs (1 Customer, 4 Months)

With 1 customer (estimated monthly volumes):

Service	Monthly Volume	Model	Input Tokens	Output Tokens	Cache Hit %	Monthly Cost
STEM Evaluations	~200-500	2.5 Flash	1.5M	0.4M	50%	$0.73
Writing Moderation	~50-100	2.5 Flash-Lite	0.15M	0.03M	30%	$0.02
Writing Feedback	~50-100	2.5 Flash	0.18M	0.07M	30%	$0.16
Writing Assessment	~50-100	2.5 Flash	0.14M	0.05M	30%	$0.12
Coach Feedback	~50-200	2.5 Flash	0.4M	0.1M	40%	$0.22
Parent Insights	~20-50	2.5 Flash-Lite	0.05M	0.02M	85%	$0.00
Assignment Gen	~10-30	2.5 Flash-Lite	0.03M	0.03M	80%	$0.00
Learning Paths	~10-20	2.5 Flash	0.05M	0.02M	65%	$0.02
Monthly Total						$1.27
4-Month Total						$5.08

Key insight: With Gemini 2.5 Flash + proper caching, operational inference for 1 customer costs approximately $1–2/month. Even at 10x current volume, it would be under $15/month.

3.3 Caching Infrastructure

Option	Monthly Cost	4-Month Total	Notes
In-memory (enhanced)	$0	$0	Extend current LRU cache to 7-30 day TTL, increase max entries
Redis (self-hosted on existing infra)	$0	$0	Run alongside backend on existing server
Redis Cloud (managed, free tier)	$0	$0	30MB free on Redis Cloud
Redis Cloud (paid, if needed)	$5–15	$20–60	Only if exceeding free tier
Gemini Context Caching	~$1–5	$4–20	90% discount on cached system prompts

Recommended: Start with enhanced in-memory cache (free), add Redis later if needed.

4-month caching cost: $0 – $80

3.4 RAG System Costs

Infrastructure Update (Feb 2026): We are migrating from DigitalOcean Managed PostgreSQL to Neon Serverless Postgres with native pgvector support. This eliminates the need for a separate vector database (Chroma/Qdrant) — embeddings are stored directly in PostgreSQL alongside application data via the document_embeddings table with HNSW indexing (768-dimension vectors using gemini-embedding-001).

Component	Cost	Notes
HuggingFace datasets	$0	Open-source (FineWeb-Edu, essay datasets, code datasets)
VEX Robotics curriculum	$0–500	Some materials may require licensing
NGSS / Common Core standards	$0	Public domain
Embedding generation (gemini-embedding-001)	$35–75	One-time cost: ~5GB corpus at $0.15/1M tokens
Vector database (pgvector on Neon)	$0	Included in Neon plan — no separate vector DB service needed
Cloud storage (GCS for raw data)	$1–5/mo	~5GB compressed at $0.02/GB/month

4-month RAG cost: $30 – $70

Why pgvector on Neon Instead of a Separate Vector DB?

Factor	Separate Vector DB (Chroma/Qdrant)	pgvector on Neon
Operational cost	$0–100/mo (managed) or DevOps overhead (self-hosted)	$0 (included in existing database)
Deployment complexity	Additional service to manage, monitor, and scale	Single database — no additional infrastructure
Data consistency	Separate system, eventual consistency with app DB	Same transaction as application data
Scalability	Must scale independently	Scales with Neon autoscaling (0.25–16 CU)
Performance	Dedicated, optimized for vectors	Excellent for <1M vectors with HNSW indexing
pgvector support	N/A	Native Neon extension, no extra cost

For our education corpus (~50K–200K document chunks), pgvector on Neon is more than sufficient and eliminates an entire service from our infrastructure.

3.5 Fine-Tuning Compute (Vertex AI)

Training on Gemini 2.0 Flash (the only Gemini model currently supporting supervised fine-tuning):

Model	Training Examples	Tokens per Example	Epochs	Total Training Tokens	Cost at $3/1M
stemblock-eval-v1 (STEM evaluation)	2,000	2,000	5	20M	$60
stemblock-writing-v1 (writing assessment)	1,000	2,000	5	10M	$30
stemblock-assignment-v1 (assignment gen)	500	2,000	5	5M	$15
Hyperparameter experiments (3x runs)	—	—	—	35M × 3	$315
Evaluation/benchmarking (test inference)	—	—	—	~5M	$15
Subtotal					$435

Note: Fine-tuning on Gemini is extremely cost-effective. The entire training compute for all 3 models is under $500. If we use fewer examples or fewer hyperparameter experiments, it could be under $150.

4-month fine-tuning cost: $150 – $450

3.6 Scaling Projections (Future Growth)

Even as customer count grows, costs remain manageable:

Customers	Monthly Evaluations	Monthly AI Cost (with cache)	Annual
1	~500	$1–2	$12–24
10	~5,000	$10–20	$120–240
50	~25,000	$50–100	$600–1,200
100	~50,000	$100–200	$1,200–2,400
500	~250,000	$500–1,000	$6,000–12,000

With Gemini 2.5 Flash-Lite at $0.10/$0.40 per 1M tokens + aggressive caching, StemBlock AI can serve 500 customers for under $1,000/month in AI costs.

4. Total 4-Month AI Cost Summary

Category	Low Estimate	High Estimate
Operational inference (4 months, 1 customer)	$5	$20
Caching infrastructure	$0	$80
RAG setup (embeddings + storage, pgvector on Neon)	$30	$70
Data licensing (VEX curriculum, if needed)	$0	$500
Fine-tuning compute (3 models + experiments)	$150	$450
Gemini context caching storage	$4	$20
Neon database infrastructure (4 months)	$0	$60
Contingency (20%)	$38	$240

TOTAL AI USAGE COST	$227	$1,440

Can We Stay Under $5,000?

Yes, comfortably. The total AI usage cost for the 4-month training program is estimated at $227 – $1,440, well within a $5,000 budget. The migration to Neon + pgvector further reduces costs by eliminating the need for a separate vector database service.

The $5,000 budget provides a 3.5x – 22x safety margin, allowing for:

Additional fine-tuning experiments
Larger training datasets
Higher-quality embedding models
Optional managed services (Redis Cloud)
Scale testing with simulated traffic
Extended Gemini 3.1 Pro usage for quality-critical tasks
Neon Scale plan upgrade if vector query volume demands it

Budget Allocation (Recommended)

Category	Budget	% of $5,000
Fine-tuning compute + experiments	$1,500	30%
RAG embeddings (pgvector storage on Neon)	$300	6%
Neon database infrastructure (4 months)	$300	6%
Operational inference (4 months)	$200	4%
Caching infrastructure	$200	4%
Contingency / future scaling	$2,500	50%
Total	$5,000	100%

5. Why Costs Are So Low

Three factors make this possible:

5.1 Gemini 2.5 Flash-Lite Pricing Revolution

At $0.10 per 1M input tokens and $0.40 per 1M output tokens, Gemini 2.5 Flash-Lite is one of the cheapest production LLMs available. For context:

1 million tokens ≈ 750,000 words ≈ 3,000 essays
A single STEM evaluation (~3,000 tokens) costs $0.0004 (less than 1/10th of a cent)

5.2 Aggressive Multi-Layer Caching

Cache Layer	Mechanism	Savings
Gemini Context Cache	System prompts cached server-side, 90% discount	30-50% on input tokens
Response Cache (Redis/in-memory)	Full response stored for identical requests	50-85% of API calls eliminated
Semantic Cache (future)	Similar queries hit cache via embedding similarity	Additional 10-20%

5.3 Fine-Tuning is Token-Priced, Not GPU-Priced

Gemini 2.0 Flash fine-tuning charges $3 per 1M training tokens — meaning you pay per data processed, not per GPU-hour. Training 3 specialized models with 3,500 examples costs under $500.

6. Return on Investment

6.1 Cost Savings (vs. Current Architecture)

Metric	Current (Mistral, no cache)	After (Gemini 2.5 + cache)	Savings
Cost per STEM evaluation	$0.67	$0.001	99.8%
Cost per writing assessment	$0.88	$0.003	99.7%
Monthly operational (1 customer)	~$30	~$2	93%
Projected monthly (100 customers)	~$7,125	~$150	98%

6.2 Quality Improvements

Metric	Current	After RAG + Fine-Tuning
Evaluation accuracy (vs expert)	~75% agreement	~90% agreement
Grade-appropriate content	Generic	Precisely targeted K-12
Assignment quality (coach rating)	3.2/5	4.5/5 expected
Curriculum alignment	Low	High (NGSS/Common Core grounded)
Response time	2-4 seconds	1-2 seconds

6.3 Competitive Advantage

Proprietary training data improves with each student interaction
RAG-grounded evaluations required by school districts for adoption
Sub-cent evaluation costs enable aggressive pricing vs. competitors
Fine-tuned models create a moat that takes months to replicate

7. Training Data Sources

Dataset	Source	Size	Cost	Purpose
FineWeb-Edu	HuggingFace	2 GB filtered	Free	General K-12 knowledge
ASAP Essay Scoring	HuggingFace/Kaggle	500 MB	Free	Writing assessment ground truth
Common Core Exemplars	Public domain	200 MB	Free	Grade-level writing standards
NGSS Standards	Public domain	100 MB	Free	STEM curriculum alignment
VEX Curriculum Guides	VEX Robotics	500 MB	$0–500	Robotics evaluation context
The Stack v2 (edu subset)	HuggingFace	3 GB filtered	Free	Code quality evaluation
Writing Prompts	HuggingFace	1 GB	Free	Creative writing evaluation
StemBlock Internal Data	Platform data	Growing	Free	Our unique evaluation style

8. Monthly Milestone Deliverables

Month 1 (March 2026)

Migrate SDK from @google-cloud/vertexai to @google/genai
Upgrade models to Gemini 2.5 Flash / 3.1 Pro / 2.5 Flash-Lite
Implement multi-layer caching (context cache + extended TTL)
Migrate database from DigitalOcean to Neon Serverless Postgres
Enable pgvector extension and deploy document_embeddings table
Download and curate HuggingFace training datasets

Month 2 (April 2026)

Deploy RAG pipeline (pgvector on Neon + embeddings + query service)
Integrate RAG context into evaluation and assignment services
Begin training data labeling from internal evaluation data
First fine-tuning experiment (STEM evaluation model)

Month 3 (May 2026)

Deploy fine-tuned STEM evaluation model (shadow mode)
Complete writing assessment fine-tuning
Enhance adaptive learning paths with RAG context
A/B testing: base model vs. fine-tuned

Month 4 (June 2026)

All fine-tuned models in production
Adaptive learning with personal objectives live
Performance monitoring operational
Quality benchmark report delivered

9. Risk Management

Risk	Probability	Impact	Mitigation
Fine-tuned model underperforms	Low	Medium	RAG provides most value; fallback to base model
Gemini 3.1 Pro pricing changes	Low	Low	2.5 Flash handles 90% of tasks; Pro is optional
Training data quality issues	Medium	Low	Start small, iterate; internal data is highest quality
SDK migration issues	Low	Low	Google provides migration guide; deadline is June 2026
Neon cold start latency	Low	Low	Health check ping every 4 min prevents scale-to-zero in production
Neon migration data loss	Very Low	Medium	Keep DigitalOcean running 1 week post-migration; full backup before cutover
Costs exceed $5,000	Very Low	Low	$3,500+ contingency buffer in budget

10. Conclusion

The $5,000 budget is more than sufficient for the complete 4-month AI training program. Estimated actual AI usage costs are $227 – $1,968, providing a substantial safety margin.

Key factors enabling this cost efficiency:

Gemini 2.5 Flash-Lite at $0.10/$0.40 per 1M tokens (99% cheaper than current setup)
Multi-layer caching eliminating 50-85% of API calls
Token-based fine-tuning pricing ($3/1M training tokens) vs. expensive GPU hours
Open-source training data from HuggingFace (no data acquisition costs)
Neon + pgvector — vector storage in the same database as application data, eliminating a separate vector DB service
Neon serverless scaling — scale-to-zero when idle, autoscale under load, pay only for what you use

The investment produces 99%+ cost reduction per evaluation, 15-20% quality improvement, and a proprietary AI advantage that compounds over time — all for under $5,000 in direct AI costs.

For questions about this estimate, please contact the StemBlock AI engineering team.

Sources:

Executive Summary​

1. Current State & Baseline​

Current AI Architecture​

Planned Upgrades​

2. AI Training Strategy​

Why NOT Train From Scratch?​

3. Detailed AI Usage Cost Breakdown​

3.1 Gemini Model Pricing (Current vs. Upgraded)​

3.2 Operational Inference Costs (1 Customer, 4 Months)​

3.3 Caching Infrastructure​

3.4 RAG System Costs​

Why pgvector on Neon Instead of a Separate Vector DB?​

3.5 Fine-Tuning Compute (Vertex AI)​

3.6 Scaling Projections (Future Growth)​

4. Total 4-Month AI Cost Summary​

Can We Stay Under $5,000?​

Budget Allocation (Recommended)​

5. Why Costs Are So Low​

5.1 Gemini 2.5 Flash-Lite Pricing Revolution​

5.2 Aggressive Multi-Layer Caching​

5.3 Fine-Tuning is Token-Priced, Not GPU-Priced​

6. Return on Investment​

6.1 Cost Savings (vs. Current Architecture)​

6.2 Quality Improvements​

6.3 Competitive Advantage​

7. Training Data Sources​

8. Monthly Milestone Deliverables​

Month 1 (March 2026)​

Month 2 (April 2026)​

Month 3 (May 2026)​

Month 4 (June 2026)​

9. Risk Management​

10. Conclusion​