StemBlock AI: Estimated AI Model Training Costs — 4-Month Plan
Prepared for: Funding Committee Date: February 27, 2026 Confidential
Executive Summary
StemBlock AI is building a custom AI training pipeline to deliver personalized, curriculum-aligned STEM and English writing education for K-12 students. This document outlines the estimated AI usage costs of training and deploying our AI models over a 4-month period (March – June 2026).
Total Estimated AI Usage Cost: $1,850 – $4,900
This investment enables StemBlock AI to:
- Reduce per-evaluation AI costs by 95%+ through model upgrade + caching
- Improve evaluation accuracy from ~75% to ~90% expert agreement
- Generate grade-appropriate assignments aligned to NGSS and Common Core standards
- Deliver adaptive, personalized learning paths based on individual student progress
- Build a defensible competitive moat through proprietary training data and fine-tuned models
Note: Development costs are excluded from this estimate. All engineering work is handled internally at no additional cost.
1. Current State & Baseline
Current AI Architecture
- Models: Gemini 1.5 Flash / 1.5 Pro via
@google-cloud/vertexaiSDK - Provider: Google Vertex AI (with Mistral and Claude as fallbacks)
- Caching: In-memory LRU, 1-hour TTL, 500 entries max
- Current customers: 1 (early stage)
- Current monthly AI infrastructure: < $30/month
Planned Upgrades
| Change | From | To |
|---|---|---|
| SDK | @google-cloud/vertexai | @google/genai (Gen AI SDK) |
| Flash model | gemini-1.5-flash-002 | gemini-2.5-flash |
| Pro model | gemini-1.5-pro-002 | gemini-3.1-pro |
| Lite model | (none) | gemini-2.5-flash-lite |
| Caching | In-memory, 1hr TTL | Multi-layer: context cache + Redis + semantic |
2. AI Training Strategy
Our approach combines three complementary techniques:
| Strategy | Purpose | AI Cost |
|---|---|---|
| Model Upgrade (1.5 → 3.1/2.5) | Better reasoning, 70-95% lower cost per token | $0 (config change) |
| RAG (Retrieval-Augmented Generation) | Ground AI in curriculum standards & education content | $25–100 (embeddings) |
| Supervised Fine-Tuning | Train specialized models for STEM eval, writing, assignments | $300–1,400 (compute) |
| Proper Caching | Reduce redundant API calls by 50-85% | $0–240 (Redis, optional) |
| Neon + pgvector | Scalable vector DB for RAG, no separate service needed | $0–15/mo (included in Neon plan) |
Why NOT Train From Scratch?
Training a custom foundation model would cost $500K–$2M+. Instead, we:
- Use Gemini as the foundation — world-class reasoning at $0.10–$2.00/1M tokens
- Add RAG — retrieves curriculum standards, rubrics, and exemplars at query time
- Fine-tune — teaches the model our evaluation style using our own labeled data
- Cache aggressively — eliminates 50-85% of repeat API calls
This is 100-1,000x more cost-effective than building from scratch.
3. Detailed AI Usage Cost Breakdown
3.1 Gemini Model Pricing (Current vs. Upgraded)
| Model | Input $/1M tokens | Output $/1M tokens | Use Case |
|---|---|---|---|
| $0.075 | $0.30 | Being replaced | |
| $1.25 | $5.00 | Being replaced | |
| gemini-2.5-flash (new default) | $0.30 | $2.50 | Evaluations, feedback, learning paths |
| gemini-2.5-flash-lite (new lite) | $0.10 | $0.40 | Moderation, parent insights, assignments |
| gemini-3.1-pro (new pro) | $2.00 | $12.00 | Writing assessment (quality-critical) |
3.2 Operational Inference Costs (1 Customer, 4 Months)
With 1 customer (estimated monthly volumes):
| Service | Monthly Volume | Model | Input Tokens | Output Tokens | Cache Hit % | Monthly Cost |
|---|---|---|---|---|---|---|
| STEM Evaluations | ~200-500 | 2.5 Flash | 1.5M | 0.4M | 50% | $0.73 |
| Writing Moderation | ~50-100 | 2.5 Flash-Lite | 0.15M | 0.03M | 30% | $0.02 |
| Writing Feedback | ~50-100 | 2.5 Flash | 0.18M | 0.07M | 30% | $0.16 |
| Writing Assessment | ~50-100 | 2.5 Flash | 0.14M | 0.05M | 30% | $0.12 |
| Coach Feedback | ~50-200 | 2.5 Flash | 0.4M | 0.1M | 40% | $0.22 |
| Parent Insights | ~20-50 | 2.5 Flash-Lite | 0.05M | 0.02M | 85% | $0.00 |
| Assignment Gen | ~10-30 | 2.5 Flash-Lite | 0.03M | 0.03M | 80% | $0.00 |
| Learning Paths | ~10-20 | 2.5 Flash | 0.05M | 0.02M | 65% | $0.02 |
| Monthly Total | $1.27 | |||||
| 4-Month Total | $5.08 |
Key insight: With Gemini 2.5 Flash + proper caching, operational inference for 1 customer costs approximately $1–2/month. Even at 10x current volume, it would be under $15/month.
3.3 Caching Infrastructure
| Option | Monthly Cost | 4-Month Total | Notes |
|---|---|---|---|
| In-memory (enhanced) | $0 | $0 | Extend current LRU cache to 7-30 day TTL, increase max entries |
| Redis (self-hosted on existing infra) | $0 | $0 | Run alongside backend on existing server |
| Redis Cloud (managed, free tier) | $0 | $0 | 30MB free on Redis Cloud |
| Redis Cloud (paid, if needed) | $5–15 | $20–60 | Only if exceeding free tier |
| Gemini Context Caching | ~$1–5 | $4–20 | 90% discount on cached system prompts |
Recommended: Start with enhanced in-memory cache (free), add Redis later if needed.
4-month caching cost: $0 – $80
3.4 RAG System Costs
Infrastructure Update (Feb 2026): We are migrating from DigitalOcean Managed PostgreSQL to Neon Serverless Postgres with native pgvector support. This eliminates the need for a separate vector database (Chroma/Qdrant) — embeddings are stored directly in PostgreSQL alongside application data via the
document_embeddingstable with HNSW indexing (768-dimension vectors usinggemini-embedding-001).
| Component | Cost | Notes |
|---|---|---|
| HuggingFace datasets | $0 | Open-source (FineWeb-Edu, essay datasets, code datasets) |
| VEX Robotics curriculum | $0–500 | Some materials may require licensing |
| NGSS / Common Core standards | $0 | Public domain |
| Embedding generation (gemini-embedding-001) | $35–75 | One-time cost: ~5GB corpus at $0.15/1M tokens |
| Vector database (pgvector on Neon) | $0 | Included in Neon plan — no separate vector DB service needed |
| Cloud storage (GCS for raw data) | $1–5/mo | ~5GB compressed at $0.02/GB/month |
4-month RAG cost: $30 – $70
Why pgvector on Neon Instead of a Separate Vector DB?
| Factor | Separate Vector DB (Chroma/Qdrant) | pgvector on Neon |
|---|---|---|
| Operational cost | $0–100/mo (managed) or DevOps overhead (self-hosted) | $0 (included in existing database) |
| Deployment complexity | Additional service to manage, monitor, and scale | Single database — no additional infrastructure |
| Data consistency | Separate system, eventual consistency with app DB | Same transaction as application data |
| Scalability | Must scale independently | Scales with Neon autoscaling (0.25–16 CU) |
| Performance | Dedicated, optimized for vectors | Excellent for <1M vectors with HNSW indexing |
| pgvector support | N/A | Native Neon extension, no extra cost |
For our education corpus (~50K–200K document chunks), pgvector on Neon is more than sufficient and eliminates an entire service from our infrastructure.
3.5 Fine-Tuning Compute (Vertex AI)
Training on Gemini 2.0 Flash (the only Gemini model currently supporting supervised fine-tuning):
| Model | Training Examples | Tokens per Example | Epochs | Total Training Tokens | Cost at $3/1M |
|---|---|---|---|---|---|
| stemblock-eval-v1 (STEM evaluation) | 2,000 | 2,000 | 5 | 20M | $60 |
| stemblock-writing-v1 (writing assessment) | 1,000 | 2,000 | 5 | 10M | $30 |
| stemblock-assignment-v1 (assignment gen) | 500 | 2,000 | 5 | 5M | $15 |
| Hyperparameter experiments (3x runs) | — | — | — | 35M × 3 | $315 |
| Evaluation/benchmarking (test inference) | — | — | — | ~5M | $15 |
| Subtotal | $435 |
Note: Fine-tuning on Gemini is extremely cost-effective. The entire training compute for all 3 models is under $500. If we use fewer examples or fewer hyperparameter experiments, it could be under $150.
4-month fine-tuning cost: $150 – $450
3.6 Scaling Projections (Future Growth)
Even as customer count grows, costs remain manageable:
| Customers | Monthly Evaluations | Monthly AI Cost (with cache) | Annual |
|---|---|---|---|
| 1 | ~500 | $1–2 | $12–24 |
| 10 | ~5,000 | $10–20 | $120–240 |
| 50 | ~25,000 | $50–100 | $600–1,200 |
| 100 | ~50,000 | $100–200 | $1,200–2,400 |
| 500 | ~250,000 | $500–1,000 | $6,000–12,000 |
With Gemini 2.5 Flash-Lite at $0.10/$0.40 per 1M tokens + aggressive caching, StemBlock AI can serve 500 customers for under $1,000/month in AI costs.
4. Total 4-Month AI Cost Summary
| Category | Low Estimate | High Estimate |
|---|---|---|
| Operational inference (4 months, 1 customer) | $5 | $20 |
| Caching infrastructure | $0 | $80 |
| RAG setup (embeddings + storage, pgvector on Neon) | $30 | $70 |
| Data licensing (VEX curriculum, if needed) | $0 | $500 |
| Fine-tuning compute (3 models + experiments) | $150 | $450 |
| Gemini context caching storage | $4 | $20 |
| Neon database infrastructure (4 months) | $0 | $60 |
| Contingency (20%) | $38 | $240 |
| TOTAL AI USAGE COST | $227 | $1,440 |
Can We Stay Under $5,000?
Yes, comfortably. The total AI usage cost for the 4-month training program is estimated at $227 – $1,440, well within a $5,000 budget. The migration to Neon + pgvector further reduces costs by eliminating the need for a separate vector database service.
The $5,000 budget provides a 3.5x – 22x safety margin, allowing for:
- Additional fine-tuning experiments
- Larger training datasets
- Higher-quality embedding models
- Optional managed services (Redis Cloud)
- Scale testing with simulated traffic
- Extended Gemini 3.1 Pro usage for quality-critical tasks
- Neon Scale plan upgrade if vector query volume demands it
Budget Allocation (Recommended)
| Category | Budget | % of $5,000 |
|---|---|---|
| Fine-tuning compute + experiments | $1,500 | 30% |
| RAG embeddings (pgvector storage on Neon) | $300 | 6% |
| Neon database infrastructure (4 months) | $300 | 6% |
| Operational inference (4 months) | $200 | 4% |
| Caching infrastructure | $200 | 4% |
| Contingency / future scaling | $2,500 | 50% |
| Total | $5,000 | 100% |
5. Why Costs Are So Low
Three factors make this possible:
5.1 Gemini 2.5 Flash-Lite Pricing Revolution
At $0.10 per 1M input tokens and $0.40 per 1M output tokens, Gemini 2.5 Flash-Lite is one of the cheapest production LLMs available. For context:
- 1 million tokens ≈ 750,000 words ≈ 3,000 essays
- A single STEM evaluation (~3,000 tokens) costs $0.0004 (less than 1/10th of a cent)
5.2 Aggressive Multi-Layer Caching
| Cache Layer | Mechanism | Savings |
|---|---|---|
| Gemini Context Cache | System prompts cached server-side, 90% discount | 30-50% on input tokens |
| Response Cache (Redis/in-memory) | Full response stored for identical requests | 50-85% of API calls eliminated |
| Semantic Cache (future) | Similar queries hit cache via embedding similarity | Additional 10-20% |
5.3 Fine-Tuning is Token-Priced, Not GPU-Priced
Gemini 2.0 Flash fine-tuning charges $3 per 1M training tokens — meaning you pay per data processed, not per GPU-hour. Training 3 specialized models with 3,500 examples costs under $500.
6. Return on Investment
6.1 Cost Savings (vs. Current Architecture)
| Metric | Current (Mistral, no cache) | After (Gemini 2.5 + cache) | Savings |
|---|---|---|---|
| Cost per STEM evaluation | $0.67 | $0.001 | 99.8% |
| Cost per writing assessment | $0.88 | $0.003 | 99.7% |
| Monthly operational (1 customer) | ~$30 | ~$2 | 93% |
| Projected monthly (100 customers) | ~$7,125 | ~$150 | 98% |
6.2 Quality Improvements
| Metric | Current | After RAG + Fine-Tuning |
|---|---|---|
| Evaluation accuracy (vs expert) | ~75% agreement | ~90% agreement |
| Grade-appropriate content | Generic | Precisely targeted K-12 |
| Assignment quality (coach rating) | 3.2/5 | 4.5/5 expected |
| Curriculum alignment | Low | High (NGSS/Common Core grounded) |
| Response time | 2-4 seconds | 1-2 seconds |
6.3 Competitive Advantage
- Proprietary training data improves with each student interaction
- RAG-grounded evaluations required by school districts for adoption
- Sub-cent evaluation costs enable aggressive pricing vs. competitors
- Fine-tuned models create a moat that takes months to replicate
7. Training Data Sources
| Dataset | Source | Size | Cost | Purpose |
|---|---|---|---|---|
| FineWeb-Edu | HuggingFace | 2 GB filtered | Free | General K-12 knowledge |
| ASAP Essay Scoring | HuggingFace/Kaggle | 500 MB | Free | Writing assessment ground truth |
| Common Core Exemplars | Public domain | 200 MB | Free | Grade-level writing standards |
| NGSS Standards | Public domain | 100 MB | Free | STEM curriculum alignment |
| VEX Curriculum Guides | VEX Robotics | 500 MB | $0–500 | Robotics evaluation context |
| The Stack v2 (edu subset) | HuggingFace | 3 GB filtered | Free | Code quality evaluation |
| Writing Prompts | HuggingFace | 1 GB | Free | Creative writing evaluation |
| StemBlock Internal Data | Platform data | Growing | Free | Our unique evaluation style |
8. Monthly Milestone Deliverables
Month 1 (March 2026)
- Migrate SDK from
@google-cloud/vertexaito@google/genai - Upgrade models to Gemini 2.5 Flash / 3.1 Pro / 2.5 Flash-Lite
- Implement multi-layer caching (context cache + extended TTL)
- Migrate database from DigitalOcean to Neon Serverless Postgres
- Enable pgvector extension and deploy
document_embeddingstable - Download and curate HuggingFace training datasets
Month 2 (April 2026)
- Deploy RAG pipeline (pgvector on Neon + embeddings + query service)
- Integrate RAG context into evaluation and assignment services
- Begin training data labeling from internal evaluation data
- First fine-tuning experiment (STEM evaluation model)
Month 3 (May 2026)
- Deploy fine-tuned STEM evaluation model (shadow mode)
- Complete writing assessment fine-tuning
- Enhance adaptive learning paths with RAG context
- A/B testing: base model vs. fine-tuned
Month 4 (June 2026)
- All fine-tuned models in production
- Adaptive learning with personal objectives live
- Performance monitoring operational
- Quality benchmark report delivered
9. Risk Management
| Risk | Probability | Impact | Mitigation |
|---|---|---|---|
| Fine-tuned model underperforms | Low | Medium | RAG provides most value; fallback to base model |
| Gemini 3.1 Pro pricing changes | Low | Low | 2.5 Flash handles 90% of tasks; Pro is optional |
| Training data quality issues | Medium | Low | Start small, iterate; internal data is highest quality |
| SDK migration issues | Low | Low | Google provides migration guide; deadline is June 2026 |
| Neon cold start latency | Low | Low | Health check ping every 4 min prevents scale-to-zero in production |
| Neon migration data loss | Very Low | Medium | Keep DigitalOcean running 1 week post-migration; full backup before cutover |
| Costs exceed $5,000 | Very Low | Low | $3,500+ contingency buffer in budget |
10. Conclusion
The $5,000 budget is more than sufficient for the complete 4-month AI training program. Estimated actual AI usage costs are $227 – $1,968, providing a substantial safety margin.
Key factors enabling this cost efficiency:
- Gemini 2.5 Flash-Lite at $0.10/$0.40 per 1M tokens (99% cheaper than current setup)
- Multi-layer caching eliminating 50-85% of API calls
- Token-based fine-tuning pricing ($3/1M training tokens) vs. expensive GPU hours
- Open-source training data from HuggingFace (no data acquisition costs)
- Neon + pgvector — vector storage in the same database as application data, eliminating a separate vector DB service
- Neon serverless scaling — scale-to-zero when idle, autoscale under load, pay only for what you use
The investment produces 99%+ cost reduction per evaluation, 15-20% quality improvement, and a proprietary AI advantage that compounds over time — all for under $5,000 in direct AI costs.
For questions about this estimate, please contact the StemBlock AI engineering team.
Sources: