StemBlock AI: Subscription & AI Usage Cost Projections
Prepared for: Funding Committee Date: February 27, 2026 Confidential Aligned with: 5-Year P&L — STEMBLOCK.AI (Pitch Deck Slide 16)
Executive Summary
This document provides a detailed breakdown of StemBlock AI's subscription model, AI usage economics, and 5-year cost projections. All figures align with the P&L presented in our pitch deck.
Key Takeaway: Our AI usage cost structure enables 76–86% gross margins because:
- Gemini 2.5 Flash-Lite costs $0.10 per 1M input tokens — making each AI evaluation less than 1/10th of a cent
- Multi-layer caching eliminates 50–85% of API calls
- AI costs scale sub-linearly with revenue growth (AI/Token is 16–27% of subscription revenue)
Current Status: 1 active customer. Google Vertex AI trial credit expires in ~1 week. Transitioning to production billing with Gemini 2.5 Flash (estimated $1–2/month at current volume).
Infrastructure Update: Migrating from DigitalOcean Managed PostgreSQL to Neon Serverless Postgres with native pgvector support. This consolidates our vector database (RAG) into the same PostgreSQL instance, eliminating the need for a separate vector DB service (Chroma/Qdrant). Neon's scale-to-zero and autoscaling capabilities provide cost-efficient scalability as customer count grows.
1. Subscription Model
1.1 Pricing Tiers
| Tier | Price | Target User | AI Features Included |
|---|---|---|---|
| Community | Free | Students, trial coaches | 10 AI evaluations/mo, 3 parent insights/mo, 1 learning path/mo |
| Pro | $9/mo | Parents | Unlimited English writing evaluation, 20 AI evaluations/mo |
| Team | $29/mo per user | Coaches & Teachers | 100 AI evaluations/mo, 20 assignments/mo, 20 learning paths/mo, workspaces |
| Enterprise | Custom ($299+/mo) | Schools & Districts | Unlimited everything, SSO, API access, white-label |
1.2 AI Features by Tier
| Feature | Community | Pro | Team | Enterprise |
|---|---|---|---|---|
| STEM AI Evaluation | 10/mo | 20/mo | 100/mo | Unlimited |
| English Writing Assessment | View only | Unlimited | Unlimited | Unlimited |
| AI Coach Feedback | — | — | Included | Unlimited |
| Parent Insights | 3/mo | 10/mo | 50/mo | Unlimited |
| Assignment Generator | — | — | 20/mo | Unlimited |
| Learning Path Generator | 1/mo | 5/mo | 20/mo | Unlimited |
| Advanced Analytics | — | — | Yes | Yes |
| Custom Rubrics | — | — | Yes | Yes |
| Team Workspaces | — | — | Yes | Yes |
| API Access | — | — | — | Yes |
| SSO / White Label | — | — | — | Yes |
1.3 Revenue Streams
| Stream | Description | 2026 Projection |
|---|---|---|
| AI Subscriptions | Monthly/yearly SaaS subscriptions | $3,000 |
| Training Programs | Teacher PD workshops, onboarding, certification | $2,000 |
| Total Revenue | $5,000 |
2. AI Usage Economics
2.1 Cost Per AI Action (After Gemini 2.5 Upgrade + Caching)
| AI Action | Model Used | Avg Tokens | Raw Cost | With Cache (est.) | Cost to Deliver |
|---|---|---|---|---|---|
| STEM Evaluation | Gemini 2.5 Flash | ~3,500 | $0.0026 | $0.0010 | < $0.01 |
| Writing Moderation | Gemini 2.5 Flash-Lite | ~1,500 | $0.0003 | $0.0002 | < $0.01 |
| Writing Feedback (Yoshi) | Gemini 2.5 Flash | ~3,000 | $0.0028 | $0.0017 | < $0.01 |
| Writing Assessment | Gemini 2.5 Flash | ~2,500 | $0.0020 | $0.0012 | < $0.01 |
| Coach Feedback | Gemini 2.5 Flash | ~3,500 | $0.0026 | $0.0013 | < $0.01 |
| Parent Insights | Gemini 2.5 Flash-Lite | ~2,000 | $0.0004 | $0.0001 | < $0.01 |
| Assignment Generation | Gemini 2.5 Flash-Lite | ~3,000 | $0.0005 | $0.0001 | < $0.01 |
| Learning Path | Gemini 2.5 Flash | ~4,000 | $0.0036 | $0.0013 | < $0.01 |
Every AI action costs less than 1 cent. At scale, the average cost per AI interaction is approximately $0.001 (one-tenth of a cent).
2.2 Cost Per Subscriber (Monthly)
| Tier | Typical Monthly AI Actions | AI Cost / User / Month | Subscription Price | AI Margin |
|---|---|---|---|---|
| Community | 5–10 evaluations | $0.01 | $0 | -$0.01 (subsidized) |
| Pro | 20–40 writing assessments | $0.04 | $9/mo | 99.6% |
| Team | 50–100 mixed actions | $0.10 | $29/mo | 99.7% |
| Enterprise | 200–500 mixed actions | $0.50 | $299+/mo | 99.8% |
Insight: AI usage costs are negligible relative to subscription revenue. Even at maximum usage, AI costs are < 1% of subscription price per user. This enables 85%+ gross margins as shown in the P&L.
2.3 Why So Cheap?
| Factor | Impact |
|---|---|
| Gemini 2.5 Flash-Lite ($0.10/1M input tokens) | 95% cheaper than Mistral models used in December 2025 |
| Multi-layer caching (context cache + response cache) | 50–85% of API calls eliminated |
| Efficient prompt engineering | Average 3,000 tokens per evaluation (vs. industry average 5,000+) |
| Batch processing (future) | Group similar requests for additional 15–20% savings |
3. 5-Year Cost Projections (Aligned with P&L)
3.1 Revenue & AI Cost Mapping
| Year | Revenue | AI Subscriptions | AI / Token Usage | Infrastructure (Neon + Hosting) | AI as % of Subs | Gross Margin |
|---|---|---|---|---|---|---|
| 2026 | $5,000 | $3,000 | $800 | $324 | 27% | 78% |
| 2027 | $25,000 | $15,000 | $3,000 | $1,200 | 20% | 82% |
| 2028 | $120,000 | $90,000 | $14,000 | $3,600 | 16% | 84% |
| 2029 | $400,000 | $300,000 | $48,000 | $8,400 | 16% | 85% |
| 2030 | $1,000,000 | $750,000 | $120,000 | $18,000 | 16% | 86% |
Infrastructure Cost Detail (Neon Serverless Postgres + Hosting)
| Year | Neon Plan | Neon Monthly | Hosting (App) | Total Infra/Mo | Annual |
|---|---|---|---|---|---|
| 2026 | Free → Launch | $0–15 | $12 | $12–27 | $324 |
| 2027 | Launch | $15–50 | $24 | $39–74 | $1,200 |
| 2028 | Launch → Scale | $50–200 | $50 | $100–250 | $3,600 |
| 2029 | Scale | $200–500 | $100 | $300–600 | $8,400 |
| 2030 | Scale | $500–1,000 | $200 | $700–1,200 | $18,000 |
Why Neon scales well: Neon's consumption-based pricing ($0.106/CU-hour on Launch, $0.222/CU-hour on Scale) means you only pay for active compute. Scale-to-zero eliminates costs during off-hours. pgvector is included at no additional cost — vector storage is billed as standard PostgreSQL storage at $0.35/GB-month.
3.2 Customer Growth & AI Cost Scaling
| Year | Est. Subscribers | Avg Revenue/User/Mo | Avg AI Cost/User/Mo | AI Token Budget | Token Budget / User / Mo |
|---|---|---|---|---|---|
| 2026 | 5–10 | ~$25 | ~$0.10 | $800 | $6.67–$13.33 |
| 2027 | 30–60 | ~$21 | ~$0.15 | $3,000 | $4.17–$8.33 |
| 2028 | 150–300 | ~$25 | ~$0.20 | $14,000 | $3.89–$7.78 |
| 2029 | 400–800 | ~$31 | ~$0.25 | $48,000 | $5.00–$10.00 |
| 2030 | 800–1,500 | ~$42 | ~$0.30 | $120,000 | $6.67–$12.50 |
Note: The per-user AI cost in the P&L budget ($4–$13/user/month) is significantly higher than our actual AI cost (~$0.10–$0.30/user/month). This provides a 20–40x cost buffer for:
- RAG infrastructure hosting
- Fine-tuning experiments
- Model quality upgrades (using Gemini 3.1 Pro for premium features)
- Scaling overhead and contingency
- Vector database and caching infrastructure
3.3 AI Token Usage Budget Breakdown
2026 — $800 Annual AI/Token Budget
| Category | Budget | Purpose |
|---|---|---|
| Operational inference | $25–50 | Production API calls (1 customer, with caching) |
| Fine-tuning compute | $150–300 | Train 3 specialized models on Vertex AI |
| RAG embeddings | $25–50 | One-time: embed 5GB education corpus |
| Gemini context caching | $10–25 | Cache system prompts for 90% input discount |
| Model experiments | $100–200 | Testing, benchmarking, A/B comparisons |
| Contingency | $175–200 | Buffer for unexpected usage spikes |
| Total | $800 |
2027 — $3,000 Annual AI/Token Budget
| Category | Budget | Purpose |
|---|---|---|
| Operational inference | $200–400 | 30–60 customers, growing volume |
| Fine-tuning retraining | $300–500 | Quarterly model updates with new data |
| RAG maintenance | $100–200 | Corpus updates, re-embedding on pgvector |
| Neon database (Launch plan) | $180–600 | Serverless Postgres with pgvector — no separate vector DB needed |
| Premium model usage (3.1 Pro) | $300–500 | Enterprise customers, quality-critical tasks |
| Contingency | $200–500 | |
| Total | $3,000 |
2028+ — Scaling Pattern
As customer count grows, AI costs scale sub-linearly due to:
- Higher cache hit rates — more users = more shared evaluation patterns
- Fine-tuned models — fewer tokens needed (domain-specific = concise)
- Batch processing — group evaluations by assignment for efficiency
- Volume discounts — Vertex AI committed-use pricing at higher volumes
- Neon autoscaling — database compute scales automatically (0.25–56 CU), and pgvector queries benefit from larger shared buffer pools at scale
4. Caching Strategy (Key Cost Enabler)
4.1 Current vs. Planned Caching
| Layer | Current | Planned | Impact |
|---|---|---|---|
| In-memory LRU | 1-hour TTL, 500 entries | 7–30 day TTL, 5,000 entries | 30–50% hit rate → 50–70% |
| Gemini Context Cache | Not implemented | System prompts cached server-side | 90% discount on repeated input tokens |
| Redis Response Cache | Not implemented | Persistent cache, survives restarts | 40–60% additional hit rate |
| Semantic Cache (pgvector) | Not implemented | Similar queries matched via embedding cosine similarity on Neon | 10–20% additional savings |
4.2 Expected Cache Performance by Service
| Service | Current Cache Hit | After Improvement | API Calls Saved/Month |
|---|---|---|---|
| STEM Evaluations | ~0% | 50–65% | 250–325 calls |
| Writing Pipeline | ~5% | 30–40% | 57–76 calls |
| Coach Feedback | ~0% | 40–55% | 80–110 calls |
| Parent Insights | ~85% | 90–95% | 42–47 calls |
| Assignment Gen | ~0% | 80–90% | 24–27 calls |
| Learning Paths | ~0% | 65–75% | 13–15 calls |
4.3 Caching Cost Savings
| Scenario | Monthly AI Cost | Annual AI Cost |
|---|---|---|
| No caching (raw API calls) | ~$25–30 | ~$300–360 |
| With multi-layer caching | ~$5–12 | ~$60–144 |
| Monthly savings | $15–20 | $180–240 |
At current volume (1 customer), caching saves ~$180–240/year. As customer count grows to 100+, caching saves $5,000–15,000/year.
5. Vertex AI Trial Credit Transition
5.1 Current Status
- Trial credit: Expiring in ~1 week (early March 2026)
- Current provider:
@google-cloud/vertexaiSDK - Current models: Gemini 1.5 Flash / 1.5 Pro
5.2 Transition Plan
| Action | Timeline | Cost Impact |
|---|---|---|
| During trial (this week): Run benchmarks comparing 1.5 vs 2.5 Flash | Now | Free (trial credit) |
| During trial: Test fine-tuning pipeline with small dataset | Now | Free (trial credit) |
| During trial: Generate embeddings for RAG corpus | Now | Free (trial credit) |
| After trial: Switch to Gemini Developer API free tier for development | Week 2 | $0 |
| Production: Use Vertex AI with pay-as-you-go billing | Ongoing | $1–2/month |
5.3 Free Tier Options (Post-Trial)
| Provider | Free Tier | Best For |
|---|---|---|
| Gemini Developer API | 15 RPM, 1M tokens/day (Flash) | Development & testing |
| Vertex AI | $300 free credits (new accounts) | Production, if needed |
| Gemini 2.5 Flash-Lite | Included in free tier | Low-cost production |
5.4 Maximum Value from Remaining Trial Credit
Priority tasks to complete before trial expires:
-
Generate RAG embeddings (~$25–50 value) — Embed the full education corpus using
text-embedding-005. This is a one-time cost we can do for free now. -
Run fine-tuning experiments (~$150–300 value) — Train at least the STEM evaluation model (
stemblock-eval-v1) using Gemini 2.0 Flash supervised tuning. -
Benchmark model quality (~$10–20 value) — Run 100 evaluations each on 1.5 Flash, 2.5 Flash, and 2.5 Flash-Lite. Compare quality scores to determine optimal model selection.
-
Test context caching (~$5–10 value) — Validate that Gemini context caching works with our system prompts.
Estimated value extracted from trial: $200–400 in compute that would otherwise be paid.
6. SDK Migration: @google-cloud/vertexai → @google/genai
6.1 Why Migrate?
| Reason | Detail |
|---|---|
| Deprecation deadline | @google-cloud/vertexai deprecated after June 24, 2026 |
| New features | Context caching, embeddings, and image generation only available in new SDK |
| Simplified API | response.text instead of response.candidates[0].content.parts[0].text |
| Unified SDK | Single SDK works with both Gemini Developer API (free) and Vertex AI (production) |
6.2 Migration Scope
| File | Changes Required |
|---|---|
package.json | Replace @google-cloud/vertexai with @google/genai |
gemini-llm.provider.ts | Update initialization, generateContent calls, response parsing |
gemini-writing.provider.ts | Update initialization, model creation, system instruction format |
assignment-creation.service.ts | Update Vertex AI initialization and API calls |
learning-paths.service.ts | Update Vertex AI initialization and API calls |
parent-communication.service.ts | Update Vertex AI initialization and API calls |
.env.example | Update model names and add new config variables |
6.3 Key API Changes
Before (@google-cloud/vertexai):
import { VertexAI } from '@google-cloud/vertexai';
const vertexAI = new VertexAI({ project, location });
const model = vertexAI.getGenerativeModel({ model: 'gemini-1.5-flash-002' });
const result = await model.generateContent({ contents: [...] });
const text = result.response.candidates?.[0]?.content?.parts?.[0]?.text;
After (@google/genai):
import { GoogleGenAI } from '@google/genai';
const ai = new GoogleGenAI({ vertexai: true, project, location });
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: 'prompt',
config: { systemInstruction: 'You are...' },
});
const text = response.text;
6.4 New Capabilities Unlocked
| Capability | Impact |
|---|---|
ai.caches.create() | Server-side context caching — 90% discount on system prompts |
ai.models.embedContent() | Native embedding support for RAG (no separate SDK needed) |
config.responseSchema | Structured JSON output with guaranteed schema compliance |
| Simplified auth | Same SDK works for both Gemini API key and Vertex AI service account |
7. Unit Economics Summary
7.1 Per-Customer Profitability
| Tier | Monthly Revenue | Monthly AI Cost | Monthly Margin | Margin % |
|---|---|---|---|---|
| Community | $0 | $0.01 | -$0.01 | N/A (lead gen) |
| Pro | $9 | $0.04 | $8.96 | 99.6% |
| Team | $29 | $0.10 | $28.90 | 99.7% |
| Enterprise | $299+ | $0.50 | $298.50 | 99.8% |
7.2 Break-Even Analysis
| Scenario | Monthly AI Budget | # of Team Users to Break Even |
|---|---|---|
| P&L Budget ($800/yr = $67/mo) | $67 | 3 Team users cover AI costs |
| Actual AI cost (with caching) | $2 | 1 Pro user covers AI costs |
| With RAG + infrastructure | $20 | 1 Team user covers AI costs |
7.3 LTV:CAC Implications
With AI costs representing < 1% of subscription revenue:
- Customer Lifetime Value (3-year): $1,044 (Team) / $10,764 (Enterprise)
- AI cost per customer lifetime: $3.60 (Team) / $18.00 (Enterprise)
- AI cost as % of LTV: 0.3% (Team) / 0.2% (Enterprise)
8. Risk Factors & Mitigations
| Risk | Impact | Probability | Mitigation |
|---|---|---|---|
| Google increases Gemini pricing | Low | Low | Multi-provider architecture (Mistral, Claude fallbacks built) |
| Trial credit expires before testing | Medium | High (1 week) | Prioritize embeddings + fine-tuning this week |
| Cache hit rates lower than projected | Low | Medium | Even 0% cache, costs are < $30/month at current scale |
| Scale faster than expected | None | Low | AI costs increase by $0.10/user/month (negligible) |
| SDK deprecation deadline (June 2026) | Medium | Certain | Migration planned for Month 1, well ahead of deadline |
9. Investor FAQ
Q: How much does it cost to run AI per student evaluation? A: Less than $0.003 (three-tenths of a cent) with Gemini 2.5 Flash + caching.
Q: What happens to AI costs as you scale to 1,000 customers? A: AI token costs would be approximately $300–600/month (~$3,600–7,200/year). Cache efficiency improves with scale, so costs grow sub-linearly.
Q: Why is the AI/Token line in the P&L higher than actual compute costs? A: The budget includes fine-tuning experiments, RAG infrastructure, model quality upgrades, and contingency. Actual token consumption is a fraction of the budget.
Q: What if Google raises prices? A: We have a multi-provider architecture with Mistral and Claude as fallbacks. We can switch providers in < 1 day via environment variable. Also, the trend is prices going down (Gemini 2.5 Flash-Lite is 95% cheaper than Gemini 1.5 Pro).
Q: Can the $5,000 (2026 AI training budget) cover everything? A: Yes. Estimated direct AI costs for the 4-month training program are $227–$1,968. The $5,000 budget provides a 2.5–22x safety margin.
Q: What's the estimated cost of training the AI model over four months? A: Direct AI usage costs: $227–$1,440 (including operational inference, fine-tuning, RAG setup, and Neon infrastructure). See the companion document AI_TRAINING_COST_ESTIMATE.md for the full breakdown.
Q: Why move from DigitalOcean to Neon for the database? A: Three reasons: (1) pgvector support — Neon includes native pgvector, allowing us to store RAG embeddings directly in PostgreSQL instead of running a separate vector database service; (2) Serverless scaling — Neon scales to zero when idle and autoscales under load, which is ideal for our current 1-customer stage while supporting growth to thousands; (3) Cost efficiency — we start on Neon's Free plan ($0/mo for 100 CU-hours, 0.5GB) and grow to Launch ($0.106/CU-hour) only when needed, vs. DigitalOcean's fixed $10+/mo regardless of usage.
Q: Does using pgvector on Neon replace the need for a dedicated vector database? A: Yes. For our use case (~50K–200K document chunks, 256-dimension embeddings), pgvector with IVFFlat indexing on Neon provides excellent performance. This eliminates the need for Chroma, Qdrant, or Pinecone — saving $0–100/month in managed vector DB costs and reducing operational complexity.
Sources: