Skip to main content

StemBlock AI: Subscription & AI Usage Cost Projections

Prepared for: Funding Committee Date: February 27, 2026 Confidential Aligned with: 5-Year P&L — STEMBLOCK.AI (Pitch Deck Slide 16)


Executive Summary

This document provides a detailed breakdown of StemBlock AI's subscription model, AI usage economics, and 5-year cost projections. All figures align with the P&L presented in our pitch deck.

Key Takeaway: Our AI usage cost structure enables 76–86% gross margins because:

  • Gemini 2.5 Flash-Lite costs $0.10 per 1M input tokens — making each AI evaluation less than 1/10th of a cent
  • Multi-layer caching eliminates 50–85% of API calls
  • AI costs scale sub-linearly with revenue growth (AI/Token is 16–27% of subscription revenue)

Current Status: 1 active customer. Google Vertex AI trial credit expires in ~1 week. Transitioning to production billing with Gemini 2.5 Flash (estimated $1–2/month at current volume).

Infrastructure Update: Migrating from DigitalOcean Managed PostgreSQL to Neon Serverless Postgres with native pgvector support. This consolidates our vector database (RAG) into the same PostgreSQL instance, eliminating the need for a separate vector DB service (Chroma/Qdrant). Neon's scale-to-zero and autoscaling capabilities provide cost-efficient scalability as customer count grows.


1. Subscription Model

1.1 Pricing Tiers

TierPriceTarget UserAI Features Included
CommunityFreeStudents, trial coaches10 AI evaluations/mo, 3 parent insights/mo, 1 learning path/mo
Pro$9/moParentsUnlimited English writing evaluation, 20 AI evaluations/mo
Team$29/mo per userCoaches & Teachers100 AI evaluations/mo, 20 assignments/mo, 20 learning paths/mo, workspaces
EnterpriseCustom ($299+/mo)Schools & DistrictsUnlimited everything, SSO, API access, white-label

1.2 AI Features by Tier

FeatureCommunityProTeamEnterprise
STEM AI Evaluation10/mo20/mo100/moUnlimited
English Writing AssessmentView onlyUnlimitedUnlimitedUnlimited
AI Coach FeedbackIncludedUnlimited
Parent Insights3/mo10/mo50/moUnlimited
Assignment Generator20/moUnlimited
Learning Path Generator1/mo5/mo20/moUnlimited
Advanced AnalyticsYesYes
Custom RubricsYesYes
Team WorkspacesYesYes
API AccessYes
SSO / White LabelYes

1.3 Revenue Streams

StreamDescription2026 Projection
AI SubscriptionsMonthly/yearly SaaS subscriptions$3,000
Training ProgramsTeacher PD workshops, onboarding, certification$2,000
Total Revenue$5,000

2. AI Usage Economics

2.1 Cost Per AI Action (After Gemini 2.5 Upgrade + Caching)

AI ActionModel UsedAvg TokensRaw CostWith Cache (est.)Cost to Deliver
STEM EvaluationGemini 2.5 Flash~3,500$0.0026$0.0010< $0.01
Writing ModerationGemini 2.5 Flash-Lite~1,500$0.0003$0.0002< $0.01
Writing Feedback (Yoshi)Gemini 2.5 Flash~3,000$0.0028$0.0017< $0.01
Writing AssessmentGemini 2.5 Flash~2,500$0.0020$0.0012< $0.01
Coach FeedbackGemini 2.5 Flash~3,500$0.0026$0.0013< $0.01
Parent InsightsGemini 2.5 Flash-Lite~2,000$0.0004$0.0001< $0.01
Assignment GenerationGemini 2.5 Flash-Lite~3,000$0.0005$0.0001< $0.01
Learning PathGemini 2.5 Flash~4,000$0.0036$0.0013< $0.01

Every AI action costs less than 1 cent. At scale, the average cost per AI interaction is approximately $0.001 (one-tenth of a cent).

2.2 Cost Per Subscriber (Monthly)

TierTypical Monthly AI ActionsAI Cost / User / MonthSubscription PriceAI Margin
Community5–10 evaluations$0.01$0-$0.01 (subsidized)
Pro20–40 writing assessments$0.04$9/mo99.6%
Team50–100 mixed actions$0.10$29/mo99.7%
Enterprise200–500 mixed actions$0.50$299+/mo99.8%

Insight: AI usage costs are negligible relative to subscription revenue. Even at maximum usage, AI costs are < 1% of subscription price per user. This enables 85%+ gross margins as shown in the P&L.

2.3 Why So Cheap?

FactorImpact
Gemini 2.5 Flash-Lite ($0.10/1M input tokens)95% cheaper than Mistral models used in December 2025
Multi-layer caching (context cache + response cache)50–85% of API calls eliminated
Efficient prompt engineeringAverage 3,000 tokens per evaluation (vs. industry average 5,000+)
Batch processing (future)Group similar requests for additional 15–20% savings

3. 5-Year Cost Projections (Aligned with P&L)

3.1 Revenue & AI Cost Mapping

YearRevenueAI SubscriptionsAI / Token UsageInfrastructure (Neon + Hosting)AI as % of SubsGross Margin
2026$5,000$3,000$800$32427%78%
2027$25,000$15,000$3,000$1,20020%82%
2028$120,000$90,000$14,000$3,60016%84%
2029$400,000$300,000$48,000$8,40016%85%
2030$1,000,000$750,000$120,000$18,00016%86%

Infrastructure Cost Detail (Neon Serverless Postgres + Hosting)

YearNeon PlanNeon MonthlyHosting (App)Total Infra/MoAnnual
2026Free → Launch$0–15$12$12–27$324
2027Launch$15–50$24$39–74$1,200
2028Launch → Scale$50–200$50$100–250$3,600
2029Scale$200–500$100$300–600$8,400
2030Scale$500–1,000$200$700–1,200$18,000

Why Neon scales well: Neon's consumption-based pricing ($0.106/CU-hour on Launch, $0.222/CU-hour on Scale) means you only pay for active compute. Scale-to-zero eliminates costs during off-hours. pgvector is included at no additional cost — vector storage is billed as standard PostgreSQL storage at $0.35/GB-month.

3.2 Customer Growth & AI Cost Scaling

YearEst. SubscribersAvg Revenue/User/MoAvg AI Cost/User/MoAI Token BudgetToken Budget / User / Mo
20265–10~$25~$0.10$800$6.67–$13.33
202730–60~$21~$0.15$3,000$4.17–$8.33
2028150–300~$25~$0.20$14,000$3.89–$7.78
2029400–800~$31~$0.25$48,000$5.00–$10.00
2030800–1,500~$42~$0.30$120,000$6.67–$12.50

Note: The per-user AI cost in the P&L budget ($4–$13/user/month) is significantly higher than our actual AI cost (~$0.10–$0.30/user/month). This provides a 20–40x cost buffer for:

  • RAG infrastructure hosting
  • Fine-tuning experiments
  • Model quality upgrades (using Gemini 3.1 Pro for premium features)
  • Scaling overhead and contingency
  • Vector database and caching infrastructure

3.3 AI Token Usage Budget Breakdown

2026 — $800 Annual AI/Token Budget

CategoryBudgetPurpose
Operational inference$25–50Production API calls (1 customer, with caching)
Fine-tuning compute$150–300Train 3 specialized models on Vertex AI
RAG embeddings$25–50One-time: embed 5GB education corpus
Gemini context caching$10–25Cache system prompts for 90% input discount
Model experiments$100–200Testing, benchmarking, A/B comparisons
Contingency$175–200Buffer for unexpected usage spikes
Total$800

2027 — $3,000 Annual AI/Token Budget

CategoryBudgetPurpose
Operational inference$200–40030–60 customers, growing volume
Fine-tuning retraining$300–500Quarterly model updates with new data
RAG maintenance$100–200Corpus updates, re-embedding on pgvector
Neon database (Launch plan)$180–600Serverless Postgres with pgvector — no separate vector DB needed
Premium model usage (3.1 Pro)$300–500Enterprise customers, quality-critical tasks
Contingency$200–500
Total$3,000

2028+ — Scaling Pattern

As customer count grows, AI costs scale sub-linearly due to:

  1. Higher cache hit rates — more users = more shared evaluation patterns
  2. Fine-tuned models — fewer tokens needed (domain-specific = concise)
  3. Batch processing — group evaluations by assignment for efficiency
  4. Volume discounts — Vertex AI committed-use pricing at higher volumes
  5. Neon autoscaling — database compute scales automatically (0.25–56 CU), and pgvector queries benefit from larger shared buffer pools at scale

4. Caching Strategy (Key Cost Enabler)

4.1 Current vs. Planned Caching

LayerCurrentPlannedImpact
In-memory LRU1-hour TTL, 500 entries7–30 day TTL, 5,000 entries30–50% hit rate → 50–70%
Gemini Context CacheNot implementedSystem prompts cached server-side90% discount on repeated input tokens
Redis Response CacheNot implementedPersistent cache, survives restarts40–60% additional hit rate
Semantic Cache (pgvector)Not implementedSimilar queries matched via embedding cosine similarity on Neon10–20% additional savings

4.2 Expected Cache Performance by Service

ServiceCurrent Cache HitAfter ImprovementAPI Calls Saved/Month
STEM Evaluations~0%50–65%250–325 calls
Writing Pipeline~5%30–40%57–76 calls
Coach Feedback~0%40–55%80–110 calls
Parent Insights~85%90–95%42–47 calls
Assignment Gen~0%80–90%24–27 calls
Learning Paths~0%65–75%13–15 calls

4.3 Caching Cost Savings

ScenarioMonthly AI CostAnnual AI Cost
No caching (raw API calls)~$25–30~$300–360
With multi-layer caching~$5–12~$60–144
Monthly savings$15–20$180–240

At current volume (1 customer), caching saves ~$180–240/year. As customer count grows to 100+, caching saves $5,000–15,000/year.


5. Vertex AI Trial Credit Transition

5.1 Current Status

  • Trial credit: Expiring in ~1 week (early March 2026)
  • Current provider: @google-cloud/vertexai SDK
  • Current models: Gemini 1.5 Flash / 1.5 Pro

5.2 Transition Plan

ActionTimelineCost Impact
During trial (this week): Run benchmarks comparing 1.5 vs 2.5 FlashNowFree (trial credit)
During trial: Test fine-tuning pipeline with small datasetNowFree (trial credit)
During trial: Generate embeddings for RAG corpusNowFree (trial credit)
After trial: Switch to Gemini Developer API free tier for developmentWeek 2$0
Production: Use Vertex AI with pay-as-you-go billingOngoing$1–2/month

5.3 Free Tier Options (Post-Trial)

ProviderFree TierBest For
Gemini Developer API15 RPM, 1M tokens/day (Flash)Development & testing
Vertex AI$300 free credits (new accounts)Production, if needed
Gemini 2.5 Flash-LiteIncluded in free tierLow-cost production

5.4 Maximum Value from Remaining Trial Credit

Priority tasks to complete before trial expires:

  1. Generate RAG embeddings (~$25–50 value) — Embed the full education corpus using text-embedding-005. This is a one-time cost we can do for free now.

  2. Run fine-tuning experiments (~$150–300 value) — Train at least the STEM evaluation model (stemblock-eval-v1) using Gemini 2.0 Flash supervised tuning.

  3. Benchmark model quality (~$10–20 value) — Run 100 evaluations each on 1.5 Flash, 2.5 Flash, and 2.5 Flash-Lite. Compare quality scores to determine optimal model selection.

  4. Test context caching (~$5–10 value) — Validate that Gemini context caching works with our system prompts.

Estimated value extracted from trial: $200–400 in compute that would otherwise be paid.


6. SDK Migration: @google-cloud/vertexai@google/genai

6.1 Why Migrate?

ReasonDetail
Deprecation deadline@google-cloud/vertexai deprecated after June 24, 2026
New featuresContext caching, embeddings, and image generation only available in new SDK
Simplified APIresponse.text instead of response.candidates[0].content.parts[0].text
Unified SDKSingle SDK works with both Gemini Developer API (free) and Vertex AI (production)

6.2 Migration Scope

FileChanges Required
package.jsonReplace @google-cloud/vertexai with @google/genai
gemini-llm.provider.tsUpdate initialization, generateContent calls, response parsing
gemini-writing.provider.tsUpdate initialization, model creation, system instruction format
assignment-creation.service.tsUpdate Vertex AI initialization and API calls
learning-paths.service.tsUpdate Vertex AI initialization and API calls
parent-communication.service.tsUpdate Vertex AI initialization and API calls
.env.exampleUpdate model names and add new config variables

6.3 Key API Changes

Before (@google-cloud/vertexai):

import { VertexAI } from '@google-cloud/vertexai';
const vertexAI = new VertexAI({ project, location });
const model = vertexAI.getGenerativeModel({ model: 'gemini-1.5-flash-002' });
const result = await model.generateContent({ contents: [...] });
const text = result.response.candidates?.[0]?.content?.parts?.[0]?.text;

After (@google/genai):

import { GoogleGenAI } from '@google/genai';
const ai = new GoogleGenAI({ vertexai: true, project, location });
const response = await ai.models.generateContent({
model: 'gemini-2.5-flash',
contents: 'prompt',
config: { systemInstruction: 'You are...' },
});
const text = response.text;

6.4 New Capabilities Unlocked

CapabilityImpact
ai.caches.create()Server-side context caching — 90% discount on system prompts
ai.models.embedContent()Native embedding support for RAG (no separate SDK needed)
config.responseSchemaStructured JSON output with guaranteed schema compliance
Simplified authSame SDK works for both Gemini API key and Vertex AI service account

7. Unit Economics Summary

7.1 Per-Customer Profitability

TierMonthly RevenueMonthly AI CostMonthly MarginMargin %
Community$0$0.01-$0.01N/A (lead gen)
Pro$9$0.04$8.9699.6%
Team$29$0.10$28.9099.7%
Enterprise$299+$0.50$298.5099.8%

7.2 Break-Even Analysis

ScenarioMonthly AI Budget# of Team Users to Break Even
P&L Budget ($800/yr = $67/mo)$673 Team users cover AI costs
Actual AI cost (with caching)$21 Pro user covers AI costs
With RAG + infrastructure$201 Team user covers AI costs

7.3 LTV:CAC Implications

With AI costs representing < 1% of subscription revenue:

  • Customer Lifetime Value (3-year): $1,044 (Team) / $10,764 (Enterprise)
  • AI cost per customer lifetime: $3.60 (Team) / $18.00 (Enterprise)
  • AI cost as % of LTV: 0.3% (Team) / 0.2% (Enterprise)

8. Risk Factors & Mitigations

RiskImpactProbabilityMitigation
Google increases Gemini pricingLowLowMulti-provider architecture (Mistral, Claude fallbacks built)
Trial credit expires before testingMediumHigh (1 week)Prioritize embeddings + fine-tuning this week
Cache hit rates lower than projectedLowMediumEven 0% cache, costs are < $30/month at current scale
Scale faster than expectedNoneLowAI costs increase by $0.10/user/month (negligible)
SDK deprecation deadline (June 2026)MediumCertainMigration planned for Month 1, well ahead of deadline

9. Investor FAQ

Q: How much does it cost to run AI per student evaluation? A: Less than $0.003 (three-tenths of a cent) with Gemini 2.5 Flash + caching.

Q: What happens to AI costs as you scale to 1,000 customers? A: AI token costs would be approximately $300–600/month (~$3,600–7,200/year). Cache efficiency improves with scale, so costs grow sub-linearly.

Q: Why is the AI/Token line in the P&L higher than actual compute costs? A: The budget includes fine-tuning experiments, RAG infrastructure, model quality upgrades, and contingency. Actual token consumption is a fraction of the budget.

Q: What if Google raises prices? A: We have a multi-provider architecture with Mistral and Claude as fallbacks. We can switch providers in < 1 day via environment variable. Also, the trend is prices going down (Gemini 2.5 Flash-Lite is 95% cheaper than Gemini 1.5 Pro).

Q: Can the $5,000 (2026 AI training budget) cover everything? A: Yes. Estimated direct AI costs for the 4-month training program are $227–$1,968. The $5,000 budget provides a 2.5–22x safety margin.

Q: What's the estimated cost of training the AI model over four months? A: Direct AI usage costs: $227–$1,440 (including operational inference, fine-tuning, RAG setup, and Neon infrastructure). See the companion document AI_TRAINING_COST_ESTIMATE.md for the full breakdown.

Q: Why move from DigitalOcean to Neon for the database? A: Three reasons: (1) pgvector support — Neon includes native pgvector, allowing us to store RAG embeddings directly in PostgreSQL instead of running a separate vector database service; (2) Serverless scaling — Neon scales to zero when idle and autoscales under load, which is ideal for our current 1-customer stage while supporting growth to thousands; (3) Cost efficiency — we start on Neon's Free plan ($0/mo for 100 CU-hours, 0.5GB) and grow to Launch ($0.106/CU-hour) only when needed, vs. DigitalOcean's fixed $10+/mo regardless of usage.

Q: Does using pgvector on Neon replace the need for a dedicated vector database? A: Yes. For our use case (~50K–200K document chunks, 256-dimension embeddings), pgvector with IVFFlat indexing on Neon provides excellent performance. This eliminates the need for Chroma, Qdrant, or Pinecone — saving $0–100/month in managed vector DB costs and reducing operational complexity.


Sources: