StemBlock AI: Subscription & AI Usage Cost Projections

Prepared for: Funding Committee Date: February 27, 2026 Confidential Aligned with: 5-Year P&L — STEMBLOCK.AI (Pitch Deck Slide 16)

Executive Summary

This document provides a detailed breakdown of StemBlock AI's subscription model, AI usage economics, and 5-year cost projections. All figures align with the P&L presented in our pitch deck.

Key Takeaway: Our AI usage cost structure enables 76–86% gross margins because:

Gemini 2.5 Flash-Lite costs $0.10 per 1M input tokens — making each AI evaluation less than 1/10th of a cent
Multi-layer caching eliminates 50–85% of API calls
AI costs scale sub-linearly with revenue growth (AI/Token is 16–27% of subscription revenue)

Current Status: 1 active customer. Google Vertex AI trial credit expires in ~1 week. Transitioning to production billing with Gemini 2.5 Flash (estimated $1–2/month at current volume).

Infrastructure Update: Migrating from DigitalOcean Managed PostgreSQL to Neon Serverless Postgres with native pgvector support. This consolidates our vector database (RAG) into the same PostgreSQL instance, eliminating the need for a separate vector DB service (Chroma/Qdrant). Neon's scale-to-zero and autoscaling capabilities provide cost-efficient scalability as customer count grows.

1. Subscription Model

1.1 Pricing Tiers

Tier	Price	Target User	AI Features Included
Community	Free	Students, trial coaches	10 AI evaluations/mo, 3 parent insights/mo, 1 learning path/mo
Pro	$9/mo	Parents	Unlimited English writing evaluation, 20 AI evaluations/mo
Team	$29/mo per user	Coaches & Teachers	100 AI evaluations/mo, 20 assignments/mo, 20 learning paths/mo, workspaces
Enterprise	Custom ($299+/mo)	Schools & Districts	Unlimited everything, SSO, API access, white-label

1.2 AI Features by Tier

Feature	Community	Pro	Team	Enterprise
STEM AI Evaluation	10/mo	20/mo	100/mo	Unlimited
English Writing Assessment	View only	Unlimited	Unlimited	Unlimited
AI Coach Feedback	—	—	Included	Unlimited
Parent Insights	3/mo	10/mo	50/mo	Unlimited
Assignment Generator	—	—	20/mo	Unlimited
Learning Path Generator	1/mo	5/mo	20/mo	Unlimited
Advanced Analytics	—	—	Yes	Yes
Custom Rubrics	—	—	Yes	Yes
Team Workspaces	—	—	Yes	Yes
API Access	—	—	—	Yes
SSO / White Label	—	—	—	Yes

1.3 Revenue Streams

Stream	Description	2026 Projection
AI Subscriptions	Monthly/yearly SaaS subscriptions	$3,000
Training Programs	Teacher PD workshops, onboarding, certification	$2,000
Total Revenue		$5,000

2. AI Usage Economics

2.1 Cost Per AI Action (After Gemini 2.5 Upgrade + Caching)

AI Action	Model Used	Avg Tokens	Raw Cost	With Cache (est.)	Cost to Deliver
STEM Evaluation	Gemini 2.5 Flash	~3,500	$0.0026	$0.0010	< $0.01
Writing Moderation	Gemini 2.5 Flash-Lite	~1,500	$0.0003	$0.0002	< $0.01
Writing Feedback (Yoshi)	Gemini 2.5 Flash	~3,000	$0.0028	$0.0017	< $0.01
Writing Assessment	Gemini 2.5 Flash	~2,500	$0.0020	$0.0012	< $0.01
Coach Feedback	Gemini 2.5 Flash	~3,500	$0.0026	$0.0013	< $0.01
Parent Insights	Gemini 2.5 Flash-Lite	~2,000	$0.0004	$0.0001	< $0.01
Assignment Generation	Gemini 2.5 Flash-Lite	~3,000	$0.0005	$0.0001	< $0.01
Learning Path	Gemini 2.5 Flash	~4,000	$0.0036	$0.0013	< $0.01

Every AI action costs less than 1 cent. At scale, the average cost per AI interaction is approximately $0.001 (one-tenth of a cent).

2.2 Cost Per Subscriber (Monthly)

Tier	Typical Monthly AI Actions	AI Cost / User / Month	Subscription Price	AI Margin
Community	5–10 evaluations	$0.01	$0	-$0.01 (subsidized)
Pro	20–40 writing assessments	$0.04	$9/mo	99.6%
Team	50–100 mixed actions	$0.10	$29/mo	99.7%
Enterprise	200–500 mixed actions	$0.50	$299+/mo	99.8%

Insight: AI usage costs are negligible relative to subscription revenue. Even at maximum usage, AI costs are < 1% of subscription price per user. This enables 85%+ gross margins as shown in the P&L.

2.3 Why So Cheap?

Factor	Impact
Gemini 2.5 Flash-Lite ($0.10/1M input tokens)	95% cheaper than Mistral models used in December 2025
Multi-layer caching (context cache + response cache)	50–85% of API calls eliminated
Efficient prompt engineering	Average 3,000 tokens per evaluation (vs. industry average 5,000+)
Batch processing (future)	Group similar requests for additional 15–20% savings

3. 5-Year Cost Projections (Aligned with P&L)

3.1 Revenue & AI Cost Mapping

Year	Revenue	AI Subscriptions	AI / Token Usage	Infrastructure (Neon + Hosting)	AI as % of Subs	Gross Margin
2026	$5,000	$3,000	$800	$324	27%	78%
2027	$25,000	$15,000	$3,000	$1,200	20%	82%
2028	$120,000	$90,000	$14,000	$3,600	16%	84%
2029	$400,000	$300,000	$48,000	$8,400	16%	85%
2030	$1,000,000	$750,000	$120,000	$18,000	16%	86%

Infrastructure Cost Detail (Neon Serverless Postgres + Hosting)

Year	Neon Plan	Neon Monthly	Hosting (App)	Total Infra/Mo	Annual
2026	Free → Launch	$0–15	$12	$12–27	$324
2027	Launch	$15–50	$24	$39–74	$1,200
2028	Launch → Scale	$50–200	$50	$100–250	$3,600
2029	Scale	$200–500	$100	$300–600	$8,400
2030	Scale	$500–1,000	$200	$700–1,200	$18,000

Why Neon scales well: Neon's consumption-based pricing ($0.106/CU-hour on Launch, $0.222/CU-hour on Scale) means you only pay for active compute. Scale-to-zero eliminates costs during off-hours. pgvector is included at no additional cost — vector storage is billed as standard PostgreSQL storage at $0.35/GB-month.

3.2 Customer Growth & AI Cost Scaling

Year	Est. Subscribers	Avg Revenue/User/Mo	Avg AI Cost/User/Mo	AI Token Budget	Token Budget / User / Mo
2026	5–10	~$25	~$0.10	$800	$6.67–$13.33
2027	30–60	~$21	~$0.15	$3,000	$4.17–$8.33
2028	150–300	~$25	~$0.20	$14,000	$3.89–$7.78
2029	400–800	~$31	~$0.25	$48,000	$5.00–$10.00
2030	800–1,500	~$42	~$0.30	$120,000	$6.67–$12.50

Note: The per-user AI cost in the P&L budget ($4–$13/user/month) is significantly higher than our actual AI cost (~$0.10–$0.30/user/month). This provides a 20–40x cost buffer for:

RAG infrastructure hosting

Fine-tuning experiments

Model quality upgrades (using Gemini 3.1 Pro for premium features)

Scaling overhead and contingency

Vector database and caching infrastructure

3.3 AI Token Usage Budget Breakdown

2026 — $800 Annual AI/Token Budget

Category	Budget	Purpose
Operational inference	$25–50	Production API calls (1 customer, with caching)
Fine-tuning compute	$150–300	Train 3 specialized models on Vertex AI
RAG embeddings	$25–50	One-time: embed 5GB education corpus
Gemini context caching	$10–25	Cache system prompts for 90% input discount
Model experiments	$100–200	Testing, benchmarking, A/B comparisons
Contingency	$175–200	Buffer for unexpected usage spikes
Total	$800

2027 — $3,000 Annual AI/Token Budget

Category	Budget	Purpose
Operational inference	$200–400	30–60 customers, growing volume
Fine-tuning retraining	$300–500	Quarterly model updates with new data
RAG maintenance	$100–200	Corpus updates, re-embedding on pgvector
Neon database (Launch plan)	$180–600	Serverless Postgres with pgvector — no separate vector DB needed
Premium model usage (3.1 Pro)	$300–500	Enterprise customers, quality-critical tasks
Contingency	$200–500
Total	$3,000

2028+ — Scaling Pattern

As customer count grows, AI costs scale sub-linearly due to:

Higher cache hit rates — more users = more shared evaluation patterns
Fine-tuned models — fewer tokens needed (domain-specific = concise)
Batch processing — group evaluations by assignment for efficiency
Volume discounts — Vertex AI committed-use pricing at higher volumes
Neon autoscaling — database compute scales automatically (0.25–56 CU), and pgvector queries benefit from larger shared buffer pools at scale

4. Caching Strategy (Key Cost Enabler)

4.1 Current vs. Planned Caching

Layer	Current	Planned	Impact
In-memory LRU	1-hour TTL, 500 entries	7–30 day TTL, 5,000 entries	30–50% hit rate → 50–70%
Gemini Context Cache	Not implemented	System prompts cached server-side	90% discount on repeated input tokens
Redis Response Cache	Not implemented	Persistent cache, survives restarts	40–60% additional hit rate
Semantic Cache (pgvector)	Not implemented	Similar queries matched via embedding cosine similarity on Neon	10–20% additional savings

4.2 Expected Cache Performance by Service

Service	Current Cache Hit	After Improvement	API Calls Saved/Month
STEM Evaluations	~0%	50–65%	250–325 calls
Writing Pipeline	~5%	30–40%	57–76 calls
Coach Feedback	~0%	40–55%	80–110 calls
Parent Insights	~85%	90–95%	42–47 calls
Assignment Gen	~0%	80–90%	24–27 calls
Learning Paths	~0%	65–75%	13–15 calls

4.3 Caching Cost Savings

Scenario	Monthly AI Cost	Annual AI Cost
No caching (raw API calls)	~$25–30	~$300–360
With multi-layer caching	~$5–12	~$60–144
Monthly savings	$15–20	$180–240

At current volume (1 customer), caching saves ~$180–240/year. As customer count grows to 100+, caching saves $5,000–15,000/year.

5. Vertex AI Trial Credit Transition

5.1 Current Status

Trial credit: Expiring in ~1 week (early March 2026)
Current provider: @google-cloud/vertexai SDK
Current models: Gemini 1.5 Flash / 1.5 Pro

5.2 Transition Plan

Action	Timeline	Cost Impact
During trial (this week): Run benchmarks comparing 1.5 vs 2.5 Flash	Now	Free (trial credit)
During trial: Test fine-tuning pipeline with small dataset	Now	Free (trial credit)
During trial: Generate embeddings for RAG corpus	Now	Free (trial credit)
After trial: Switch to Gemini Developer API free tier for development	Week 2	$0
Production: Use Vertex AI with pay-as-you-go billing	Ongoing	$1–2/month

5.3 Free Tier Options (Post-Trial)

Provider	Free Tier	Best For
Gemini Developer API	15 RPM, 1M tokens/day (Flash)	Development & testing
Vertex AI	$300 free credits (new accounts)	Production, if needed
Gemini 2.5 Flash-Lite	Included in free tier	Low-cost production

5.4 Maximum Value from Remaining Trial Credit

Priority tasks to complete before trial expires:

Generate RAG embeddings (~$25–50 value) — Embed the full education corpus using text-embedding-005. This is a one-time cost we can do for free now.
Run fine-tuning experiments (~$150–300 value) — Train at least the STEM evaluation model (stemblock-eval-v1) using Gemini 2.0 Flash supervised tuning.
Benchmark model quality (~$10–20 value) — Run 100 evaluations each on 1.5 Flash, 2.5 Flash, and 2.5 Flash-Lite. Compare quality scores to determine optimal model selection.
Test context caching (~$5–10 value) — Validate that Gemini context caching works with our system prompts.

Estimated value extracted from trial: $200–400 in compute that would otherwise be paid.

6. SDK Migration: `@google-cloud/vertexai` → `@google/genai`

6.1 Why Migrate?

Reason	Detail
Deprecation deadline	`@google-cloud/vertexai` deprecated after June 24, 2026
New features	Context caching, embeddings, and image generation only available in new SDK
Simplified API	`response.text` instead of `response.candidates[0].content.parts[0].text`
Unified SDK	Single SDK works with both Gemini Developer API (free) and Vertex AI (production)

6.2 Migration Scope

File	Changes Required
`package.json`	Replace `@google-cloud/vertexai` with `@google/genai`
`gemini-llm.provider.ts`	Update initialization, `generateContent` calls, response parsing
`gemini-writing.provider.ts`	Update initialization, model creation, system instruction format
`assignment-creation.service.ts`	Update Vertex AI initialization and API calls
`learning-paths.service.ts`	Update Vertex AI initialization and API calls
`parent-communication.service.ts`	Update Vertex AI initialization and API calls
`.env.example`	Update model names and add new config variables

6.3 Key API Changes

Before (@google-cloud/vertexai):

import { VertexAI } from '@google-cloud/vertexai';
const vertexAI = new VertexAI({ project, location });
const model = vertexAI.getGenerativeModel({ model: 'gemini-1.5-flash-002' });
const result = await model.generateContent({ contents: [...] });
const text = result.response.candidates?.[0]?.content?.parts?.[0]?.text;

After (@google/genai):

import { GoogleGenAI } from '@google/genai';
const ai = new GoogleGenAI({ vertexai: true, project, location });
const response = await ai.models.generateContent({
  model: 'gemini-2.5-flash',
  contents: 'prompt',
  config: { systemInstruction: 'You are...' },
});
const text = response.text;

6.4 New Capabilities Unlocked

Capability	Impact
`ai.caches.create()`	Server-side context caching — 90% discount on system prompts
`ai.models.embedContent()`	Native embedding support for RAG (no separate SDK needed)
`config.responseSchema`	Structured JSON output with guaranteed schema compliance
Simplified auth	Same SDK works for both Gemini API key and Vertex AI service account

7. Unit Economics Summary

7.1 Per-Customer Profitability

Tier	Monthly Revenue	Monthly AI Cost	Monthly Margin	Margin %
Community	$0	$0.01	-$0.01	N/A (lead gen)
Pro	$9	$0.04	$8.96	99.6%
Team	$29	$0.10	$28.90	99.7%
Enterprise	$299+	$0.50	$298.50	99.8%

7.2 Break-Even Analysis

Scenario	Monthly AI Budget	# of Team Users to Break Even
P&L Budget ($800/yr = $67/mo)	$67	3 Team users cover AI costs
Actual AI cost (with caching)	$2	1 Pro user covers AI costs
With RAG + infrastructure	$20	1 Team user covers AI costs

7.3 LTV:CAC Implications

With AI costs representing < 1% of subscription revenue:

Customer Lifetime Value (3-year): $1,044 (Team) / $10,764 (Enterprise)
AI cost per customer lifetime: $3.60 (Team) / $18.00 (Enterprise)
AI cost as % of LTV: 0.3% (Team) / 0.2% (Enterprise)

8. Risk Factors & Mitigations

Risk	Impact	Probability	Mitigation
Google increases Gemini pricing	Low	Low	Multi-provider architecture (Mistral, Claude fallbacks built)
Trial credit expires before testing	Medium	High (1 week)	Prioritize embeddings + fine-tuning this week
Cache hit rates lower than projected	Low	Medium	Even 0% cache, costs are < $30/month at current scale
Scale faster than expected	None	Low	AI costs increase by $0.10/user/month (negligible)
SDK deprecation deadline (June 2026)	Medium	Certain	Migration planned for Month 1, well ahead of deadline

9. Investor FAQ

Q: How much does it cost to run AI per student evaluation? A: Less than $0.003 (three-tenths of a cent) with Gemini 2.5 Flash + caching.

Q: What happens to AI costs as you scale to 1,000 customers? A: AI token costs would be approximately $300–600/month (~$3,600–7,200/year). Cache efficiency improves with scale, so costs grow sub-linearly.

Q: Why is the AI/Token line in the P&L higher than actual compute costs? A: The budget includes fine-tuning experiments, RAG infrastructure, model quality upgrades, and contingency. Actual token consumption is a fraction of the budget.

Q: What if Google raises prices? A: We have a multi-provider architecture with Mistral and Claude as fallbacks. We can switch providers in < 1 day via environment variable. Also, the trend is prices going down (Gemini 2.5 Flash-Lite is 95% cheaper than Gemini 1.5 Pro).

Q: Can the $5,000 (2026 AI training budget) cover everything? A: Yes. Estimated direct AI costs for the 4-month training program are $227–$1,968. The $5,000 budget provides a 2.5–22x safety margin.

Q: What's the estimated cost of training the AI model over four months? A: Direct AI usage costs: $227–$1,440 (including operational inference, fine-tuning, RAG setup, and Neon infrastructure). See the companion document AI_TRAINING_COST_ESTIMATE.md for the full breakdown.

Q: Why move from DigitalOcean to Neon for the database? A: Three reasons: (1) pgvector support — Neon includes native pgvector, allowing us to store RAG embeddings directly in PostgreSQL instead of running a separate vector database service; (2) Serverless scaling — Neon scales to zero when idle and autoscales under load, which is ideal for our current 1-customer stage while supporting growth to thousands; (3) Cost efficiency — we start on Neon's Free plan ($0/mo for 100 CU-hours, 0.5GB) and grow to Launch ($0.106/CU-hour) only when needed, vs. DigitalOcean's fixed $10+/mo regardless of usage.

Q: Does using pgvector on Neon replace the need for a dedicated vector database? A: Yes. For our use case (~50K–200K document chunks, 256-dimension embeddings), pgvector with IVFFlat indexing on Neon provides excellent performance. This eliminates the need for Chroma, Qdrant, or Pinecone — saving $0–100/month in managed vector DB costs and reducing operational complexity.

Sources:

Executive Summary​

1. Subscription Model​

1.1 Pricing Tiers​

1.2 AI Features by Tier​

1.3 Revenue Streams​

2. AI Usage Economics​

2.1 Cost Per AI Action (After Gemini 2.5 Upgrade + Caching)​

2.2 Cost Per Subscriber (Monthly)​

2.3 Why So Cheap?​

3. 5-Year Cost Projections (Aligned with P&L)​

3.1 Revenue & AI Cost Mapping​

Infrastructure Cost Detail (Neon Serverless Postgres + Hosting)​

3.2 Customer Growth & AI Cost Scaling​

3.3 AI Token Usage Budget Breakdown​

2026 — $800 Annual AI/Token Budget​

2027 — $3,000 Annual AI/Token Budget​

2028+ — Scaling Pattern​

4. Caching Strategy (Key Cost Enabler)​

4.1 Current vs. Planned Caching​

4.2 Expected Cache Performance by Service​

4.3 Caching Cost Savings​

5. Vertex AI Trial Credit Transition​

5.1 Current Status​

5.2 Transition Plan​

5.3 Free Tier Options (Post-Trial)​

5.4 Maximum Value from Remaining Trial Credit​

6. SDK Migration: @google-cloud/vertexai → @google/genai​

6.1 Why Migrate?​

6.2 Migration Scope​

6.3 Key API Changes​

6.4 New Capabilities Unlocked​

7. Unit Economics Summary​

7.1 Per-Customer Profitability​

7.2 Break-Even Analysis​

7.3 LTV:CAC Implications​

8. Risk Factors & Mitigations​

9. Investor FAQ​