Skip to main content

StemBlock AI - Comprehensive AI Usage Analysis

Date: December 21, 2025 Status: Complete Analysis of All Current and Planned AI Features Document Version: 1.0


Executive Summary

StemBlock AI implements a multi-model AI strategy with 3 active LLM providers (Mistral, Claude, OpenAI-ready) across 6 major AI workflows. The platform uses a factory pattern for provider abstraction, enabling flexible provider switching without code changes. Total estimated token usage ranges from 2-5M tokens/month depending on usage patterns and active tiers.


1. Current AI Implementations

1.1 STEM Evaluation Engine (Production)

Location: /src/evaluations/ Status: ✅ Live/Active Primary Provider: Mistral AI (open-mistral-7b)

Overview

Evaluates STEM robotics/coding submissions with AI-powered scoring across 4 categories:

Categories Evaluated

  1. Robot Design - Physical robotics creativity & functionality
  2. Code Quality - Programming logic, efficiency, structure, comments
  3. Documentation - Engineering notebooks, process clarity
  4. Technical Writing - Essay/explanation quality

Workflow

Student submits files (photos, code, notebooks)

evaluateSubmission() called

Files read and prepared (text files only)

Mistral AI evaluates with system + user prompts

Response parsed into structured evaluation

Results cached (1 hour) to prevent duplicate calls

Coach can override scores & publish

Token Estimation per Submission

Typical Submission Content:

  • 1 Python file: ~200-500 lines (600-1500 tokens)
  • 1 Engineering notebook: ~1000 words (800-1200 tokens)
  • 1 Code file (C++): ~300 lines (900-1300 tokens)
  • Total input: ~2000-4000 tokens

Evaluation Response:

  • Structured JSON with scores, feedback, next steps
  • ~500-800 tokens per response

Per Submission Cost:

  • Input: ~3000 tokens
  • Output: ~600 tokens
  • Total: ~3600 tokens per evaluation

Usage Frequency

  • COMMUNITY tier: ~5-10 submissions/month per coach
  • TEAM tier: ~30-50 submissions/month per coach
  • ENTERPRISE: Unlimited

Monthly Token Usage (STEM Evaluation):

  • COMMUNITY: 5 coaches × 7 avg submissions × 3600 tokens = 126,000 tokens
  • TEAM: 20 coaches × 40 submissions × 3600 tokens = 2,880,000 tokens
  • ENTERPRISE: 10 orgs × 100 submissions × 3600 tokens = 1,800,000 tokens
  • Subtotal: ~4.8M tokens/month

Provider Implementation

File: /src/evaluations/providers/mistral-llm.provider.ts

Key Features:

  • Rate limiting: 2000ms minimum between requests
  • Retry logic: 3 retries with exponential backoff (2^n seconds)
  • Caching: 1-hour cache per submission to prevent duplicate evaluations
  • Error handling: Graceful fallback on rate limit errors

Configuration:

MISTRAL_API_KEY = "v0U7dJhSwnBXii28vCc1X6BOvRLN4vkk"
MISTRAL_MODEL = "open-mistral-7b"
MISTRAL_MIN_REQUEST_INTERVAL = 2000ms
MISTRAL_MAX_RETRIES = 5

System Prompts:

  • Student-focused: Age-appropriate, encouraging language
  • Coach-focused: Technical depth, pedagogical guidance
  • Parent-focused: Plain-language insights

Coach Feedback Generation

A separate AI call generates coach-specific feedback (deeper technical insights):

Additional Call per Evaluation:

  • Input: Full submission + AI evaluation context
  • Output: Detailed teaching guidance
  • Tokens: ~2000-3000 additional

Total with Coach Feedback: ~5600-6600 tokens per full evaluation


1.2 English Writing Workflow (Production)

Location: /src/workflows/english-writing/ Status: ✅ Live/Active (Phase 3 Complete) Primary Providers: Mistral + Claude (dual-model strategy)

Overview

Multi-stage AI evaluation of student writing submissions with age-appropriate feedback.

3-Stage Evaluation Pipeline

Stage 1: Content Moderation

  • Provider: Mistral mistral-small-latest (fast, cost-effective)
  • Purpose: Flag inappropriate content before evaluation
  • Output: isAppropriate (boolean), flaggedContent (list), moderationNote (string)
  • Token Cost: 500-1000 tokens per evaluation

Stage 2: Feedback Generation

  • Provider: Mistral mistral-large-latest
  • Purpose: Encourage student with constructive feedback
  • Output: strengths, improvements, suggestions, encouragement
  • Token Cost: 1500-2000 tokens per evaluation

Stage 3: Assessment/Scoring

  • Provider: Mistral mistral-large-latest
  • Purpose: Score writing across multiple dimensions
  • Output: overallScore (0-100), grammarScore, creativityScore, structureScore, contentScore, gradeEquivalent
  • Token Cost: 1500-2000 tokens per evaluation

Writing Submission Characteristics

  • Min word count: 50-500 words (varies by prompt)
  • Avg tokens per submission: 200-800 tokens (writing content itself)
  • Avg feedback token output: ~2000 tokens

Total Tokens per Writing Submission:

  • Input: 800 tokens (writing + prompt context)
  • Moderation: 700 output
  • Feedback: 1500 output
  • Assessment: 1500 output
  • Total: ~4500 tokens per submission

Usage Frequency

  • COMMUNITY tier: View-only, no submissions
  • TEAM tier: ~15-25 assignments/month per coach
  • ENTERPRISE: Unlimited

Monthly Token Usage (English Writing):

  • TEAM: 15 coaches × 20 submissions × 4500 tokens = 1,350,000 tokens
  • ENTERPRISE: 5 orgs × 50 submissions × 4500 tokens = 1,125,000 tokens
  • Subtotal: ~2.5M tokens/month

Provider Implementation

Mistral Writing Provider: /src/workflows/english-writing/providers/mistral-writing.provider.ts

Features:

  • Rate limiting: 2000ms between requests
  • Retry logic: 3 retries with exponential backoff
  • Model selection: Uses small for moderation, large for feedback/assessment

Claude Writing Provider (Ready): /src/workflows/english-writing/providers/claude-writing.provider.ts

Planned Models:

WRITING_MODERATION_MODEL = "claude-3-haiku-20240307" (fastest, cheapest)
WRITING_EVALUATION_MODEL = "claude-3-5-sonnet-20241022" (balanced)
WRITING_FEEDBACK_MODEL = "claude-3-5-sonnet-20241022" (quality)

Database Schema

  • WritingCategory - Prompt categories (e.g., "Narrative", "Persuasive")
  • WritingPrompt - Pre-defined prompts by grade level
  • WritingAssignment - Assignment to student
  • WritingSubmission - Student's submission
  • WritingModeration - Stage 1 results
  • WritingFeedback - Stage 2 results
  • WritingAssessment - Stage 3 results

1.3 Parent Insights Generation (Production)

Location: /src/evaluations/providers/mistral-llm.provider.ts Status: ✅ Implemented (used in parent dashboards) Provider: Mistral AI

Overview

Generates parent-friendly insights from student performance data (plain language, non-technical).

Insights Generated

  • What's Going Well (3-5 items)
  • Areas to Focus On (3-5 items)
  • Ways to Support (3-5 items, at-home activities)

Input Data Required

  • Student name & grade level
  • Average score percentage
  • Last 10 assignments with scores:
    • Assignment title
    • Overall score
    • Category scores (if available)
    • Submission date

Token Estimation

Input:

  • Performance history: ~1000-1500 tokens
  • Prompting context: ~400 tokens
  • Total input: ~1500 tokens

Output:

  • Parent-friendly insights with examples
  • Output: ~500-800 tokens

Total per student insights generation: ~2000-2300 tokens

Usage Frequency

  • Generated on-demand when parents view dashboard
  • Cached per student per week (avoid redundant calls)
  • Monthly: ~5-10 unique insights per parent, ~500-1000 parents
  • Estimate: 500 parents × 7 insights × 2200 tokens = 7.7M tokens

However: Caching reduces actual calls by ~80%

  • Actual monthly: ~1.5M tokens (caching)

1.4 LLM Provider Factory Pattern (Architecture)

Location: /src/evaluations/providers/llm-provider.factory.ts

Design Pattern

Factory pattern enables runtime provider switching without code changes:

export type LLMProviderType = 'mistral' | 'mock' | 'openai' | 'claude';

async getProvider(): Promise<LLMProvider> {
const providerType = configService.get<string>('LLM_PROVIDER', 'mock');

switch (providerType.toLowerCase()) {
case 'mistral':
return this.mistralProvider;
case 'openai':
// Not yet implemented, falls back to mock
return this.mockProvider;
case 'claude':
// Not yet implemented, falls back to mock
return this.mockProvider;
case 'mock':
default:
return this.mockProvider;
}
}

Provider Interface

All providers must implement LLMProvider interface:

interface LLMProvider {
isAvailable(): Promise<boolean>;
getProviderName(): string;
evaluateSubmission(request: EvaluationRequest): Promise<EvaluationResponse>;
generateCoachFeedback(request, score, feedback): Promise<CoachFeedbackResponse>;
generateParentInsights(request): Promise<ParentInsightsResponse>;
}

Available Providers

  1. Mistral (✅ Active)

    • Model: open-mistral-7b
    • Status: Production
    • Cost: $0.14/M input, $0.42/M output tokens
  2. Mock (✅ Active)

    • Status: Testing/Development
    • Returns synthetic data for testing
  3. OpenAI (⏳ Planned)

    • Model: gpt-4 (or gpt-4-turbo)
    • Status: Placeholder, not implemented
    • Cost: $30/M input, $60/M output tokens (gpt-4-turbo)
  4. Claude (⏳ Planned)

    • Model: claude-3-5-sonnet-20241022
    • Status: Placeholder, interface ready
    • Cost: $3/M input, $15/M output tokens

2. Planned AI Features (Phase 4+)

2.1 Automated Grading Workflow

Location: /src/workflows/automated-grading/ Status: ✅ Backend API complete (Frontend pending) Provider: LLM Factory (Mistral primary, OpenAI/Claude ready)

Overview

Streamlines grading process with AI-generated scores and bulk operations.

Features

AI-Powered Grading:

  • Auto-generate initial scores based on rubrics
  • Coaches review queue before publishing
  • Manual score override capability
  • Automatic publication (COMMUNITY) vs. review queue (TEAM+)

Review Queue System (TEAM/ENTERPRISE):

  • AI-generated scores pending coach review
  • Filter by confidence level (high/medium/low)
  • Batch review operations
  • Notes & annotations

Bulk Grading (TEAM/ENTERPRISE):

  • Grade multiple submissions with AI
  • Consistent rubric application
  • Custom rubric support

Token Estimation

Per Submission Grade Generation:

  • Input: Assignment rubric + submission content
  • ~2000-3000 tokens input
  • ~500-800 tokens output (JSON scores + feedback)
  • Total: ~2500-3800 tokens

Tier Usage:

  • COMMUNITY: 10 uses/month (auto-publish)
  • TEAM: 100 uses/month (with review queue)
  • ENTERPRISE: Unlimited

Monthly Tokens:

  • COMMUNITY: 200 coaches × 10 × 3000 = 6M tokens
  • TEAM: 100 coaches × 100 × 3000 = 30M tokens
  • ENTERPRISE: 50 orgs × ∞ (assume 500) × 3000 = 75M tokens
  • Potential Total: ~111M tokens/month (if all tiers maximize usage)

Note: These workflows currently use placeholder AI (mock provider), so no actual tokens used yet.


2.2 Parent Communication Generator

Location: /src/workflows/parent-communication/ Status: ✅ Backend API complete (Frontend pending) Provider: LLM Factory

Overview

AI-generates personalized parent communications from templates.

Communication Types

  1. Progress Report - Summary of student achievement/growth
  2. Concern Alert - Intervention needed, areas for support
  3. Achievement Celebration - Highlight strengths & wins
  4. General Update - Class-wide announcements
  5. Custom - Coach-created templates

Token Estimation

Per Communication Generation:

  • Input: Student data + template + context
  • ~1500-2000 tokens (student history, rubric, notes)
  • ~600-900 tokens output (personalized message)
  • Total: ~2200-2900 tokens per communication

Tier Usage:

  • COMMUNITY: 5 uses/month, copy-to-clipboard only
  • TEAM: 50 uses/month, email sending
  • ENTERPRISE: Unlimited, scheduled sending

Monthly Tokens:

  • COMMUNITY: 500 coaches × 5 × 2500 = 6.25M tokens
  • TEAM: 200 coaches × 50 × 2500 = 25M tokens
  • ENTERPRISE: 100 orgs × 200 × 2500 = 50M tokens
  • Potential Total: ~81.25M tokens/month

2.3 Personalized Learning Paths

Location: /src/workflows/learning-paths/ Status: ✅ Backend API complete (Frontend pending) Provider: LLM Factory

Overview

AI-generates customized multi-week learning plans based on student performance.

Features

Skill Gap Analysis:

  • Compare current vs. target skill levels
  • Identify 3-5 priority improvement areas
  • Recommend specific resources/exercises

Milestone Planning:

  • Weekly milestones over N weeks
  • Recommended activities per milestone
  • Difficulty progression (beginner → advanced)
  • Estimated completion time

Progress Tracking:

  • Mark milestones as complete
  • Coaches can customize path (TEAM+)
  • Visual progress indicators

Token Estimation

Per Learning Path Generation:

  • Input: Student skills, goals, history
  • ~2000-2500 tokens (comprehensive skill profile)
  • ~1500-2000 tokens output (detailed milestone plan)
  • Total: ~3500-4500 tokens per path

Tier Usage:

  • COMMUNITY: 1 generation (view-only)
  • TEAM: 20 generations/month
  • ENTERPRISE: Unlimited

Monthly Tokens:

  • COMMUNITY: 200 coaches × 1 × 4000 = 0.8M tokens (low, mostly one-time)
  • TEAM: 150 coaches × 20 × 4000 = 12M tokens
  • ENTERPRISE: 100 orgs × 100 × 4000 = 40M tokens
  • Potential Total: ~52.8M tokens/month

2.4 Assignment Creation Assistant

Location: /src/workflows/assignment-creation/ Status: ✅ Backend API complete (Frontend pending) Provider: LLM Factory

Overview

AI-generates complete assignments with rubrics, sample solutions, teaching notes.

Features

Assignment Generation:

  • Topic/prompt input by coach
  • Auto-generate assignment description
  • Create comprehensive rubric
  • Generate sample solution/answer key
  • Provide teaching notes & tips

Assignment Types:

  • Problem Set
  • Project
  • Essay
  • Lab
  • Presentation
  • Quiz

Library Management:

  • Save generated assignments for reuse
  • Share across workspace (TEAM+)
  • Clone and customize existing assignments

Token Estimation

Per Assignment Generation:

  • Input: Topic, difficulty level, grade level
  • ~500-800 tokens (brief prompt)
  • Output: Full assignment + rubric + solution + notes
  • ~3000-4000 tokens output (comprehensive)
  • Total: ~3500-4800 tokens per assignment

Tier Usage:

  • COMMUNITY: 0 (not available)
  • TEAM: 20 generations/month
  • ENTERPRISE: Unlimited

Monthly Tokens:

  • TEAM: 150 coaches × 20 × 4000 = 12M tokens
  • ENTERPRISE: 100 orgs × 50 × 4000 = 20M tokens
  • Potential Total: ~32M tokens/month

3. AI Cost Analysis

3.1 Current Model Costs (Mistral)

Mistral Pricing (as of Dec 2024):

  • open-mistral-7b: $0.14/M input, $0.42/M output tokens
  • mistral-large-latest: $0.27/M input, $0.81/M output tokens
  • mistral-small-latest: $0.07/M input, $0.21/M output tokens

3.2 Current Monthly Spend (Active Features)

STEM Evaluation:

  • Input: 4.8M submissions × 3000 tokens × $0.14 = $2,016/month
  • Output: 4.8M submissions × 600 tokens × $0.42 = $1,209.60/month
  • Subtotal: $3,225.60/month

English Writing (Mistral):

  • Moderation: 1.9M × $0.07 (mistral-small) = $133/month
  • Feedback: 1.9M × 1500 × $0.27 = $770/month
  • Assessment: 1.9M × 1500 × $0.27 = $770/month
  • Subtotal: $1,673/month

Parent Insights (Cached):

  • 1.5M tokens × $0.14 avg = $210/month

Coach Feedback Generation:

  • 4.8M × 3000 tokens × $0.14 = $2,016/month

Current Total Monthly Spend: $7,124.60 Annual Current Spend: ~$85,500


3.3 Projected Cost at Full Scale (All Workflows Active)

Scenario: All tiers (COMMUNITY, TEAM, ENTERPRISE) maximized usage

WorkflowCOMMUNITYTEAMENTERPRISETotal
STEM Evaluation$1.26M$30M$18.9M$50.16M
English Writing$6.25M$25M$50M$81.25M
Auto-Grading$6M$30M$75M$111M
Parent Comm$6.25M$25M$50M$81.25M
Learning Paths$0.8M$12M$40M$52.8M
Assignment Creation-$12M$20M$32M
TOTAL TOKENS20.56M134M253.9M408.46M
Mistral Cost$2,878$18,758$35,576$57,212/month

Annual Projected Cost (All Workflows): ~$686,500/month × 12 = $8.24M/year

Note: This assumes maximum usage across all tiers. Realistic scenario is 30-50% of these numbers.


3.4 Cost Optimization Strategies

1. Model Tiering

Use cheaper models for simple tasks:

Moderation → mistral-small ($0.07/$0.21)
Feedback → mistral-large ($0.27/$0.81)
Complex Analysis → GPT-4 ($30/$60/M) only when needed

2. Caching Strategy

  • Current: Parent insights cached 1 hour
  • Potential: Cache all evaluation results (1-7 days)
  • Savings: 60-80% reduction on duplicate submissions

3. Batch Processing

  • Group submissions by assignment
  • Evaluate multiple submissions in single API call
  • Savings: 30-40% reduction on per-submission costs

4. Tier-Based Limits

  • COMMUNITY: No AI-powered workflows (mock only)
  • TEAM: Limited monthly token budgets
  • ENTERPRISE: Pay-as-you-go with discounts

5. Provider Switching

  • Current: Mistral (cost-effective, ~$0.14/$0.42)
  • Switch to Claude 3 Haiku: $0.80/$4 (more expensive)
  • Switch to Claude 3.5 Sonnet: $3/$15 (higher quality, expensive)
  • Keep OpenAI: $30/$60 (GPT-4, ultra-expensive)

Recommendation: Stay with Mistral for high-volume evaluations, use Claude for specialized tasks (writing feedback quality).


4. Critical AI Workloads

By Frequency (Per Year)

WorkloadFrequencyVolumeCriticality
STEM EvaluationOn submission~50K/yearCRITICAL (core feature)
English WritingOn submission~25K/yearHIGH (revenue-driver)
Parent CommunicationOn-demand~10K/yearHIGH (parent engagement)
Parent InsightsOn dashboard load~100K/yearMEDIUM (cached heavily)
Coach FeedbackOn override~15K/yearMEDIUM (optional)
Learning PathsMonthly~5K/yearLOW (premium feature)
Assignment CreationWeekly~3K/yearLOW (admin use)
Auto-GradingOn submission~20K/yearMEDIUM (workflow automation)

By Revenue Impact

  1. STEM Evaluation ⭐⭐⭐⭐⭐

    • Core differentiator
    • Impacts every user
    • Must be reliable & fast
    • 24/7 SLA needed
  2. English Writing ⭐⭐⭐⭐⭐

    • Separate product offering
    • Tier-locked (TEAM+ for coaches)
    • High conversion driver
    • Critical for PLG conversion
  3. Parent Communication ⭐⭐⭐⭐

    • Parent engagement driver
    • Reduces churn
    • Tier-locked (TEAM for coaches)
    • High user satisfaction
  4. Parent Insights ⭐⭐⭐⭐

    • Retention feature
    • Dashboard visibility
    • Heavily cached (low cost)
    • Important for parent value proposition
  5. Learning Paths ⭐⭐⭐

    • Premium feature (TEAM+)
    • Lower adoption currently
    • Growing importance for personalization
  6. Assignment Creation ⭐⭐⭐

    • Time-saver for coaches
    • Lower adoption (advanced users)
    • Nice-to-have, not must-have

5. Provider Readiness Status

Current Implementation

ProviderLocationStatusUsed ForCost
Mistralmistral-llm.provider.ts✅ ProductionSTEM, Writing, Parent Insights$0.14/$0.42
Mockmock-llm.provider.ts✅ TestingDevelopment/TestingFree
ClaudePlaceholder interface ready⏳ PlannedWriting (future)$3/$15
OpenAIPlaceholder interface ready⏳ PlannedFallback option$30/$60

Provider Switch Capability

Switching providers is a simple configuration change:

# Current
LLM_PROVIDER=mistral
MISTRAL_API_KEY=xxx
MISTRAL_MODEL=open-mistral-7b

# To switch to Claude
LLM_PROVIDER=claude
ANTHROPIC_API_KEY=xxx
ANTHROPIC_MODEL=claude-3-5-sonnet-20241022

Implementation effort: 2-4 hours (implement missing providers, test, deploy)


6. Database Schema for AI Features

Evaluation Storage

  • Evaluation model - STEM evaluation results with AI scores & confidence
  • WritingSubmission - Student writing submission
  • WritingModeration - Content moderation results
  • WritingFeedback - AI-generated feedback
  • WritingAssessment - Writing scores

Workflow Storage

  • ReviewQueueItem - Pending coach review items
  • ParentCommunication - Generated communications
  • CommunicationTemplate - Template library
  • LearningPath - AI-generated learning paths
  • AssignmentLibraryItem - Reusable assignments

Subscription Controls

  • SubscriptionTier enum - COMMUNITY, PRO, TEAM, ENTERPRISE
  • FeatureFlag enum - Feature gates (AI_GRADING, ENGLISH_WRITING_WORKFLOW, etc.)
  • UsageRecord - Token usage tracking per user/month

7. Token Usage Tracking

Current Implementation

  • SubscriptionsService tracks usage by feature
  • Monthly limits enforced per tier:
    • COMMUNITY: Limited (0-10 uses/month)
    • TEAM: Moderate (20-100 uses/month)
    • ENTERPRISE: Unlimited

Missing: Token-Level Tracking

Currently tracking by feature uses, not tokens:

  • ✅ Know: "User made 25 grading calls"
  • ❌ Don't know: "User consumed 75K tokens"

Recommendation

Add optional token-level tracking:

interface TokenUsageRecord {
userId: string;
feature: FeatureFlag;
tokensInput: number;
tokensOutput: number;
model: string;
cost: number;
timestamp: DateTime;
}

8. Quality & Reliability

AI Confidence Metrics

STEM Evaluation:

  • AI provides confidence score (0-100%)
  • Scores <70% flagged for manual review
  • Coach override capability on all scores

English Writing:

  • Three-stage pipeline with early-exit for content issues
  • Flagged content stops evaluation before scoring
  • Assessment confidence implicit (not scored)

Parent Insights:

  • No confidence metric currently
  • Plain-language output (less critical for errors)

Error Handling

Rate Limiting:

  • Mistral: 2000ms minimum between requests
  • Exponential backoff on 429 errors (capacity exceeded)
  • Max 5 retries before failure

Caching:

  • STEM evaluation: 1-hour cache
  • Writing moderation: No cache (always fresh)
  • Parent insights: 1-week cache

Fallback Strategy:

  • If Mistral unavailable, falls back to Mock provider
  • Mock returns synthetic but reasonable data
  • No service downtime (degrades gracefully)

Prompt Engineering

STEM Evaluation Prompts:

  • Student-focused: Encouraging, age-appropriate language
  • Coach-focused: Technical depth, pedagogical guidance
  • Parent-focused: Plain-language insights, no jargon

Writing Evaluation Prompts:

  • Grade-level aware (K-2, 3-5, 6-8, 9-12)
  • Moderation: Content appropriateness check
  • Feedback: Strengths + improvements (age-appropriate)
  • Assessment: Scoring rubric with grade equivalents

9. Feature Access Control

Tier-Based AI Access

FeatureCOMMUNITYPROTEAMENTERPRISE
STEM EvaluationView onlyFullFullFull
English WritingView onlyLimited (parents)FullFull
Parent InsightsYes (read-only)YesYesYes
Coach FeedbackNoNoYesYes
Auto GradingNoNoLimitedFull
Learning PathsNoNoLimitedFull
Parent CommNoNoLimitedFull
Assignment CreationNoNoLimitedFull

Usage Limits

FeatureCOMMUNITYTEAMENTERPRISE
STEM Evaluations10/month100/monthUnlimited
Writing Submissions020/monthUnlimited
Parent Communications5/month50/monthUnlimited
Learning Paths1 (one-time)20/monthUnlimited
Assignments Created020/monthUnlimited

10. Recommendations & Next Steps

Immediate (Next 30 Days)

  1. Implement Token Tracking

    • Add token-level usage records
    • Monitor actual vs. projected spend
    • Alert on >$1000/month features
  2. Optimize Caching

    • Extend STEM evaluation cache to 7 days
    • Implement deduplication (same files = same score)
    • Potential savings: 40-60% cost reduction
  3. Monitor Mistral Performance

    • Track response times
    • Monitor error rates
    • Prepare fallback to Claude if needed

Medium-term (60-90 Days)

  1. Implement Claude Writer

    • Port Claude provider for English writing
    • A/B test: Mistral vs. Claude feedback quality
    • Benefit: Better writing feedback, justifies premium tier
  2. Add Cost Dashboard

    • Admin view of daily/monthly spending
    • Feature-by-feature cost breakdown
    • Alerts on unusual spikes
  3. Optimize Prompt Performance

    • User testing of feedback quality
    • Refine rubrics & scoring criteria
    • Target: Same quality with 10-20% fewer tokens

Long-term (6+ Months)

  1. Multi-Model Strategy

    • Keep Mistral for bulk evaluations (cost-effective)
    • Use Claude for premium writing feedback (quality)
    • Use GPT-4 for specialized analysis (fallback)
  2. Fine-tuning Exploration

    • Fine-tune Mistral on STEM evaluation data
    • Fine-tune on writing feedback patterns
    • Potential savings: 30-50% cost reduction with improved quality
  3. Caching Layer Upgrade

    • Move to Redis for distributed caching
    • Implement semantic similarity matching (same content = same score)
    • Cache across multiple submissions if appropriate

11. Critical Success Factors

For Revenue

  • ✅ English Writing must be high quality (justifies TEAM tier)
  • ✅ Parent Communication must drive retention
  • ✅ Costs must stay <10% of revenue

For User Experience

  • ✅ Evaluation confidence must be reliable (coaches trust AI)
  • ✅ Feedback must be actionable (students improve)
  • ✅ Response time must be <5 seconds (users expect speed)

For Business Sustainability

  • ✅ Cost per evaluation < $0.10 (ensure profitability at scale)
  • ✅ Cache hit rate > 40% (reduce redundant calls)
  • ✅ Tier adoption > 30% (many users stay on free tier)

Appendix: File Locations

Core AI Evaluation

  • /src/evaluations/evaluations.service.ts - Main evaluation logic
  • /src/evaluations/providers/llm-provider.factory.ts - Provider factory
  • /src/evaluations/providers/mistral-llm.provider.ts - Mistral implementation
  • /src/evaluations/providers/llm-provider.interface.ts - Provider interface
  • /src/evaluations/providers/mock-llm.provider.ts - Mock for testing

English Writing Workflow

  • /src/workflows/english-writing/english-writing.service.ts - Main service
  • /src/workflows/english-writing/providers/mistral-writing.provider.ts - Mistral writer
  • /src/workflows/english-writing/providers/claude-writing.provider.ts - Claude (ready)
  • /src/workflows/english-writing/providers/writing-evaluator.factory.ts - Provider factory

Other Workflows (Backend Ready, Frontend Pending)

  • /src/workflows/automated-grading/ - Grading workflow
  • /src/workflows/parent-communication/ - Communication generator
  • /src/workflows/learning-paths/ - Learning path generator
  • /src/workflows/assignment-creation/ - Assignment creator

Subscription Control

  • /src/subscriptions/subscriptions.service.ts - Usage tracking & tier enforcement

Configuration

  • /.env - API keys & model selection
  • /prisma/schema.prisma - Database models (FeatureFlag, SubscriptionTier, UsageRecord)

Document Version: 1.0 Last Updated: December 21, 2025 Status: Complete & Ready for Implementation Next Review: Quarterly (Mar 2026)