StemBlock AI - Comprehensive AI Usage Analysis

Date: December 21, 2025 Status: Complete Analysis of All Current and Planned AI Features Document Version: 1.0

Executive Summary

StemBlock AI implements a multi-model AI strategy with 3 active LLM providers (Mistral, Claude, OpenAI-ready) across 6 major AI workflows. The platform uses a factory pattern for provider abstraction, enabling flexible provider switching without code changes. Total estimated token usage ranges from 2-5M tokens/month depending on usage patterns and active tiers.

1. Current AI Implementations

1.1 STEM Evaluation Engine (Production)

Location: /src/evaluations/ Status: ✅ Live/Active Primary Provider: Mistral AI (open-mistral-7b)

Overview

Evaluates STEM robotics/coding submissions with AI-powered scoring across 4 categories:

Categories Evaluated

Robot Design - Physical robotics creativity & functionality
Code Quality - Programming logic, efficiency, structure, comments
Documentation - Engineering notebooks, process clarity
Technical Writing - Essay/explanation quality

Workflow

Student submits files (photos, code, notebooks)
    ↓
evaluateSubmission() called
    ↓
Files read and prepared (text files only)
    ↓
Mistral AI evaluates with system + user prompts
    ↓
Response parsed into structured evaluation
    ↓
Results cached (1 hour) to prevent duplicate calls
    ↓
Coach can override scores & publish

Token Estimation per Submission

Typical Submission Content:

1 Python file: ~200-500 lines (600-1500 tokens)
1 Engineering notebook: ~1000 words (800-1200 tokens)
1 Code file (C++): ~300 lines (900-1300 tokens)
Total input: ~2000-4000 tokens

Evaluation Response:

Structured JSON with scores, feedback, next steps
~500-800 tokens per response

Per Submission Cost:

Input: ~3000 tokens
Output: ~600 tokens
Total: ~3600 tokens per evaluation

Usage Frequency

COMMUNITY tier: ~5-10 submissions/month per coach
TEAM tier: ~30-50 submissions/month per coach
ENTERPRISE: Unlimited

Monthly Token Usage (STEM Evaluation):

COMMUNITY: 5 coaches × 7 avg submissions × 3600 tokens = 126,000 tokens
TEAM: 20 coaches × 40 submissions × 3600 tokens = 2,880,000 tokens
ENTERPRISE: 10 orgs × 100 submissions × 3600 tokens = 1,800,000 tokens
Subtotal: ~4.8M tokens/month

Provider Implementation

File: /src/evaluations/providers/mistral-llm.provider.ts

Key Features:

Rate limiting: 2000ms minimum between requests
Retry logic: 3 retries with exponential backoff (2^n seconds)
Caching: 1-hour cache per submission to prevent duplicate evaluations
Error handling: Graceful fallback on rate limit errors

Configuration:

MISTRAL_API_KEY = "v0U7dJhSwnBXii28vCc1X6BOvRLN4vkk"
MISTRAL_MODEL = "open-mistral-7b"
MISTRAL_MIN_REQUEST_INTERVAL = 2000ms
MISTRAL_MAX_RETRIES = 5

System Prompts:

Student-focused: Age-appropriate, encouraging language
Coach-focused: Technical depth, pedagogical guidance
Parent-focused: Plain-language insights

Coach Feedback Generation

A separate AI call generates coach-specific feedback (deeper technical insights):

Additional Call per Evaluation:

Input: Full submission + AI evaluation context
Output: Detailed teaching guidance
Tokens: ~2000-3000 additional

Total with Coach Feedback: ~5600-6600 tokens per full evaluation

1.2 English Writing Workflow (Production)

Location: /src/workflows/english-writing/ Status: ✅ Live/Active (Phase 3 Complete) Primary Providers: Mistral + Claude (dual-model strategy)

Overview

Multi-stage AI evaluation of student writing submissions with age-appropriate feedback.

3-Stage Evaluation Pipeline

Stage 1: Content Moderation

Provider: Mistral mistral-small-latest (fast, cost-effective)
Purpose: Flag inappropriate content before evaluation
Output: isAppropriate (boolean), flaggedContent (list), moderationNote (string)
Token Cost: 500-1000 tokens per evaluation

Stage 2: Feedback Generation

Provider: Mistral mistral-large-latest
Purpose: Encourage student with constructive feedback
Output: strengths, improvements, suggestions, encouragement
Token Cost: 1500-2000 tokens per evaluation

Stage 3: Assessment/Scoring

Provider: Mistral mistral-large-latest
Purpose: Score writing across multiple dimensions
Output: overallScore (0-100), grammarScore, creativityScore, structureScore, contentScore, gradeEquivalent
Token Cost: 1500-2000 tokens per evaluation

Writing Submission Characteristics

Min word count: 50-500 words (varies by prompt)
Avg tokens per submission: 200-800 tokens (writing content itself)
Avg feedback token output: ~2000 tokens

Total Tokens per Writing Submission:

Input: 800 tokens (writing + prompt context)
Moderation: 700 output
Feedback: 1500 output
Assessment: 1500 output
Total: ~4500 tokens per submission

Usage Frequency

COMMUNITY tier: View-only, no submissions
TEAM tier: ~15-25 assignments/month per coach
ENTERPRISE: Unlimited

Monthly Token Usage (English Writing):

TEAM: 15 coaches × 20 submissions × 4500 tokens = 1,350,000 tokens
ENTERPRISE: 5 orgs × 50 submissions × 4500 tokens = 1,125,000 tokens
Subtotal: ~2.5M tokens/month

Provider Implementation

Mistral Writing Provider: /src/workflows/english-writing/providers/mistral-writing.provider.ts

Features:

Rate limiting: 2000ms between requests
Retry logic: 3 retries with exponential backoff
Model selection: Uses small for moderation, large for feedback/assessment

Claude Writing Provider (Ready): /src/workflows/english-writing/providers/claude-writing.provider.ts

Planned Models:

WRITING_MODERATION_MODEL = "claude-3-haiku-20240307" (fastest, cheapest)
WRITING_EVALUATION_MODEL = "claude-3-5-sonnet-20241022" (balanced)
WRITING_FEEDBACK_MODEL = "claude-3-5-sonnet-20241022" (quality)

Database Schema

WritingCategory - Prompt categories (e.g., "Narrative", "Persuasive")
WritingPrompt - Pre-defined prompts by grade level
WritingAssignment - Assignment to student
WritingSubmission - Student's submission
WritingModeration - Stage 1 results
WritingFeedback - Stage 2 results
WritingAssessment - Stage 3 results

1.3 Parent Insights Generation (Production)

Location: /src/evaluations/providers/mistral-llm.provider.ts Status: ✅ Implemented (used in parent dashboards) Provider: Mistral AI

Overview

Generates parent-friendly insights from student performance data (plain language, non-technical).

Insights Generated

What's Going Well (3-5 items)
Areas to Focus On (3-5 items)
Ways to Support (3-5 items, at-home activities)

Input Data Required

Student name & grade level
Average score percentage
Last 10 assignments with scores:
- Assignment title
- Overall score
- Category scores (if available)
- Submission date

Token Estimation

Input:

Performance history: ~1000-1500 tokens
Prompting context: ~400 tokens
Total input: ~1500 tokens

Output:

Parent-friendly insights with examples
Output: ~500-800 tokens

Total per student insights generation: ~2000-2300 tokens

Usage Frequency

Generated on-demand when parents view dashboard
Cached per student per week (avoid redundant calls)
Monthly: ~5-10 unique insights per parent, ~500-1000 parents
Estimate: 500 parents × 7 insights × 2200 tokens = 7.7M tokens

However: Caching reduces actual calls by ~80%

Actual monthly: ~1.5M tokens (caching)

1.4 LLM Provider Factory Pattern (Architecture)

Location: /src/evaluations/providers/llm-provider.factory.ts

Design Pattern

Factory pattern enables runtime provider switching without code changes:

export type LLMProviderType = 'mistral' | 'mock' | 'openai' | 'claude';

async getProvider(): Promise<LLMProvider> {
  const providerType = configService.get<string>('LLM_PROVIDER', 'mock');

  switch (providerType.toLowerCase()) {
    case 'mistral':
      return this.mistralProvider;
    case 'openai':
      // Not yet implemented, falls back to mock
      return this.mockProvider;
    case 'claude':
      // Not yet implemented, falls back to mock
      return this.mockProvider;
    case 'mock':
    default:
      return this.mockProvider;
  }
}

Provider Interface

All providers must implement LLMProvider interface:

interface LLMProvider {
  isAvailable(): Promise<boolean>;
  getProviderName(): string;
  evaluateSubmission(request: EvaluationRequest): Promise<EvaluationResponse>;
  generateCoachFeedback(request, score, feedback): Promise<CoachFeedbackResponse>;
  generateParentInsights(request): Promise<ParentInsightsResponse>;
}

Available Providers

Mistral (✅ Active)
- Model: open-mistral-7b
- Status: Production
- Cost: $0.14/M input, $0.42/M output tokens
Mock (✅ Active)
- Status: Testing/Development
- Returns synthetic data for testing
OpenAI (⏳ Planned)
- Model: gpt-4 (or gpt-4-turbo)
- Status: Placeholder, not implemented
- Cost: $30/M input, $60/M output tokens (gpt-4-turbo)
Claude (⏳ Planned)
- Model: claude-3-5-sonnet-20241022
- Status: Placeholder, interface ready
- Cost: $3/M input, $15/M output tokens

2. Planned AI Features (Phase 4+)

2.1 Automated Grading Workflow

Location: /src/workflows/automated-grading/ Status: ✅ Backend API complete (Frontend pending) Provider: LLM Factory (Mistral primary, OpenAI/Claude ready)

Overview

Streamlines grading process with AI-generated scores and bulk operations.

Features

AI-Powered Grading:

Auto-generate initial scores based on rubrics
Coaches review queue before publishing
Manual score override capability
Automatic publication (COMMUNITY) vs. review queue (TEAM+)

Review Queue System (TEAM/ENTERPRISE):

AI-generated scores pending coach review
Filter by confidence level (high/medium/low)
Batch review operations
Notes & annotations

Bulk Grading (TEAM/ENTERPRISE):

Grade multiple submissions with AI
Consistent rubric application
Custom rubric support

Token Estimation

Per Submission Grade Generation:

Input: Assignment rubric + submission content
~2000-3000 tokens input
~500-800 tokens output (JSON scores + feedback)
Total: ~2500-3800 tokens

Tier Usage:

COMMUNITY: 10 uses/month (auto-publish)
TEAM: 100 uses/month (with review queue)
ENTERPRISE: Unlimited

Monthly Tokens:

COMMUNITY: 200 coaches × 10 × 3000 = 6M tokens
TEAM: 100 coaches × 100 × 3000 = 30M tokens
ENTERPRISE: 50 orgs × ∞ (assume 500) × 3000 = 75M tokens
Potential Total: ~111M tokens/month (if all tiers maximize usage)

Note: These workflows currently use placeholder AI (mock provider), so no actual tokens used yet.

2.2 Parent Communication Generator

Location: /src/workflows/parent-communication/ Status: ✅ Backend API complete (Frontend pending) Provider: LLM Factory

Overview

AI-generates personalized parent communications from templates.

Communication Types

Progress Report - Summary of student achievement/growth
Concern Alert - Intervention needed, areas for support
Achievement Celebration - Highlight strengths & wins
General Update - Class-wide announcements
Custom - Coach-created templates

Token Estimation

Per Communication Generation:

Input: Student data + template + context
~1500-2000 tokens (student history, rubric, notes)
~600-900 tokens output (personalized message)
Total: ~2200-2900 tokens per communication

Tier Usage:

COMMUNITY: 5 uses/month, copy-to-clipboard only
TEAM: 50 uses/month, email sending
ENTERPRISE: Unlimited, scheduled sending

Monthly Tokens:

COMMUNITY: 500 coaches × 5 × 2500 = 6.25M tokens
TEAM: 200 coaches × 50 × 2500 = 25M tokens
ENTERPRISE: 100 orgs × 200 × 2500 = 50M tokens
Potential Total: ~81.25M tokens/month

2.3 Personalized Learning Paths

Location: /src/workflows/learning-paths/ Status: ✅ Backend API complete (Frontend pending) Provider: LLM Factory

Overview

AI-generates customized multi-week learning plans based on student performance.

Features

Skill Gap Analysis:

Compare current vs. target skill levels
Identify 3-5 priority improvement areas
Recommend specific resources/exercises

Milestone Planning:

Weekly milestones over N weeks
Recommended activities per milestone
Difficulty progression (beginner → advanced)
Estimated completion time

Progress Tracking:

Mark milestones as complete
Coaches can customize path (TEAM+)
Visual progress indicators

Token Estimation

Per Learning Path Generation:

Input: Student skills, goals, history
~2000-2500 tokens (comprehensive skill profile)
~1500-2000 tokens output (detailed milestone plan)
Total: ~3500-4500 tokens per path

Tier Usage:

COMMUNITY: 1 generation (view-only)
TEAM: 20 generations/month
ENTERPRISE: Unlimited

Monthly Tokens:

COMMUNITY: 200 coaches × 1 × 4000 = 0.8M tokens (low, mostly one-time)
TEAM: 150 coaches × 20 × 4000 = 12M tokens
ENTERPRISE: 100 orgs × 100 × 4000 = 40M tokens
Potential Total: ~52.8M tokens/month

2.4 Assignment Creation Assistant

Location: /src/workflows/assignment-creation/ Status: ✅ Backend API complete (Frontend pending) Provider: LLM Factory

Overview

AI-generates complete assignments with rubrics, sample solutions, teaching notes.

Features

Assignment Generation:

Topic/prompt input by coach
Auto-generate assignment description
Create comprehensive rubric
Generate sample solution/answer key
Provide teaching notes & tips

Assignment Types:

Problem Set
Project
Essay
Lab
Presentation
Quiz

Library Management:

Save generated assignments for reuse
Share across workspace (TEAM+)
Clone and customize existing assignments

Token Estimation

Per Assignment Generation:

Input: Topic, difficulty level, grade level
~500-800 tokens (brief prompt)
Output: Full assignment + rubric + solution + notes
~3000-4000 tokens output (comprehensive)
Total: ~3500-4800 tokens per assignment

Tier Usage:

COMMUNITY: 0 (not available)
TEAM: 20 generations/month
ENTERPRISE: Unlimited

Monthly Tokens:

TEAM: 150 coaches × 20 × 4000 = 12M tokens
ENTERPRISE: 100 orgs × 50 × 4000 = 20M tokens
Potential Total: ~32M tokens/month

3. AI Cost Analysis

3.1 Current Model Costs (Mistral)

Mistral Pricing (as of Dec 2024):

open-mistral-7b: $0.14/M input, $0.42/M output tokens
mistral-large-latest: $0.27/M input, $0.81/M output tokens
mistral-small-latest: $0.07/M input, $0.21/M output tokens

3.2 Current Monthly Spend (Active Features)

STEM Evaluation:

Input: 4.8M submissions × 3000 tokens × $0.14 = $2,016/month
Output: 4.8M submissions × 600 tokens × $0.42 = $1,209.60/month
Subtotal: $3,225.60/month

English Writing (Mistral):

Moderation: 1.9M × $0.07 (mistral-small) = $133/month
Feedback: 1.9M × 1500 × $0.27 = $770/month
Assessment: 1.9M × 1500 × $0.27 = $770/month
Subtotal: $1,673/month

Parent Insights (Cached):

1.5M tokens × $0.14 avg = $210/month

Coach Feedback Generation:

4.8M × 3000 tokens × $0.14 = $2,016/month

Current Total Monthly Spend: $7,124.60 Annual Current Spend: ~$85,500

3.3 Projected Cost at Full Scale (All Workflows Active)

Scenario: All tiers (COMMUNITY, TEAM, ENTERPRISE) maximized usage

Workflow	COMMUNITY	TEAM	ENTERPRISE	Total
STEM Evaluation	$1.26M	$30M	$18.9M	$50.16M
English Writing	$6.25M	$25M	$50M	$81.25M
Auto-Grading	$6M	$30M	$75M	$111M
Parent Comm	$6.25M	$25M	$50M	$81.25M
Learning Paths	$0.8M	$12M	$40M	$52.8M
Assignment Creation	-	$12M	$20M	$32M
TOTAL TOKENS	20.56M	134M	253.9M	408.46M
Mistral Cost	$2,878	$18,758	$35,576	$57,212/month

Annual Projected Cost (All Workflows): ~$686,500/month × 12 = $8.24M/year

Note: This assumes maximum usage across all tiers. Realistic scenario is 30-50% of these numbers.

3.4 Cost Optimization Strategies

1. Model Tiering

Use cheaper models for simple tasks:

Moderation → mistral-small ($0.07/$0.21)
Feedback → mistral-large ($0.27/$0.81)
Complex Analysis → GPT-4 ($30/$60/M) only when needed

2. Caching Strategy

Current: Parent insights cached 1 hour
Potential: Cache all evaluation results (1-7 days)
Savings: 60-80% reduction on duplicate submissions

3. Batch Processing

Group submissions by assignment
Evaluate multiple submissions in single API call
Savings: 30-40% reduction on per-submission costs

4. Tier-Based Limits

COMMUNITY: No AI-powered workflows (mock only)
TEAM: Limited monthly token budgets
ENTERPRISE: Pay-as-you-go with discounts

5. Provider Switching

Current: Mistral (cost-effective, ~$0.14/$0.42)
Switch to Claude 3 Haiku: $0.80/$4 (more expensive)
Switch to Claude 3.5 Sonnet: $3/$15 (higher quality, expensive)
Keep OpenAI: $30/$60 (GPT-4, ultra-expensive)

Recommendation: Stay with Mistral for high-volume evaluations, use Claude for specialized tasks (writing feedback quality).

4. Critical AI Workloads

By Frequency (Per Year)

Workload	Frequency	Volume	Criticality
STEM Evaluation	On submission	~50K/year	CRITICAL (core feature)
English Writing	On submission	~25K/year	HIGH (revenue-driver)
Parent Communication	On-demand	~10K/year	HIGH (parent engagement)
Parent Insights	On dashboard load	~100K/year	MEDIUM (cached heavily)
Coach Feedback	On override	~15K/year	MEDIUM (optional)
Learning Paths	Monthly	~5K/year	LOW (premium feature)
Assignment Creation	Weekly	~3K/year	LOW (admin use)
Auto-Grading	On submission	~20K/year	MEDIUM (workflow automation)

By Revenue Impact

STEM Evaluation ⭐⭐⭐⭐⭐
- Core differentiator
- Impacts every user
- Must be reliable & fast
- 24/7 SLA needed
English Writing ⭐⭐⭐⭐⭐
- Separate product offering
- Tier-locked (TEAM+ for coaches)
- High conversion driver
- Critical for PLG conversion
Parent Communication ⭐⭐⭐⭐
- Parent engagement driver
- Reduces churn
- Tier-locked (TEAM for coaches)
- High user satisfaction
Parent Insights ⭐⭐⭐⭐
- Retention feature
- Dashboard visibility
- Heavily cached (low cost)
- Important for parent value proposition
Learning Paths ⭐⭐⭐
- Premium feature (TEAM+)
- Lower adoption currently
- Growing importance for personalization
Assignment Creation ⭐⭐⭐
- Time-saver for coaches
- Lower adoption (advanced users)
- Nice-to-have, not must-have

5. Provider Readiness Status

Current Implementation

Provider	Location	Status	Used For	Cost
Mistral	`mistral-llm.provider.ts`	✅ Production	STEM, Writing, Parent Insights	$0.14/$0.42
Mock	`mock-llm.provider.ts`	✅ Testing	Development/Testing	Free
Claude	Placeholder interface ready	⏳ Planned	Writing (future)	$3/$15
OpenAI	Placeholder interface ready	⏳ Planned	Fallback option	$30/$60

Provider Switch Capability

Switching providers is a simple configuration change:

# Current
LLM_PROVIDER=mistral
MISTRAL_API_KEY=xxx
MISTRAL_MODEL=open-mistral-7b

# To switch to Claude
LLM_PROVIDER=claude
ANTHROPIC_API_KEY=xxx
ANTHROPIC_MODEL=claude-3-5-sonnet-20241022

Implementation effort: 2-4 hours (implement missing providers, test, deploy)

6. Database Schema for AI Features

Evaluation Storage

Evaluation model - STEM evaluation results with AI scores & confidence
WritingSubmission - Student writing submission
WritingModeration - Content moderation results
WritingFeedback - AI-generated feedback
WritingAssessment - Writing scores

Workflow Storage

ReviewQueueItem - Pending coach review items
ParentCommunication - Generated communications
CommunicationTemplate - Template library
LearningPath - AI-generated learning paths
AssignmentLibraryItem - Reusable assignments

Subscription Controls

SubscriptionTier enum - COMMUNITY, PRO, TEAM, ENTERPRISE
FeatureFlag enum - Feature gates (AI_GRADING, ENGLISH_WRITING_WORKFLOW, etc.)
UsageRecord - Token usage tracking per user/month

7. Token Usage Tracking

Current Implementation

SubscriptionsService tracks usage by feature
Monthly limits enforced per tier:
- COMMUNITY: Limited (0-10 uses/month)
- TEAM: Moderate (20-100 uses/month)
- ENTERPRISE: Unlimited

Missing: Token-Level Tracking

Currently tracking by feature uses, not tokens:

✅ Know: "User made 25 grading calls"
❌ Don't know: "User consumed 75K tokens"

Recommendation

Add optional token-level tracking:

interface TokenUsageRecord {
  userId: string;
  feature: FeatureFlag;
  tokensInput: number;
  tokensOutput: number;
  model: string;
  cost: number;
  timestamp: DateTime;
}

8. Quality & Reliability

AI Confidence Metrics

STEM Evaluation:

AI provides confidence score (0-100%)
Scores <70% flagged for manual review
Coach override capability on all scores

English Writing:

Three-stage pipeline with early-exit for content issues
Flagged content stops evaluation before scoring
Assessment confidence implicit (not scored)

Parent Insights:

No confidence metric currently
Plain-language output (less critical for errors)

Error Handling

Rate Limiting:

Mistral: 2000ms minimum between requests
Exponential backoff on 429 errors (capacity exceeded)
Max 5 retries before failure

Caching:

STEM evaluation: 1-hour cache
Writing moderation: No cache (always fresh)
Parent insights: 1-week cache

Fallback Strategy:

If Mistral unavailable, falls back to Mock provider
Mock returns synthetic but reasonable data
No service downtime (degrades gracefully)

Prompt Engineering

STEM Evaluation Prompts:

Student-focused: Encouraging, age-appropriate language
Coach-focused: Technical depth, pedagogical guidance
Parent-focused: Plain-language insights, no jargon

Writing Evaluation Prompts:

Grade-level aware (K-2, 3-5, 6-8, 9-12)
Moderation: Content appropriateness check
Feedback: Strengths + improvements (age-appropriate)
Assessment: Scoring rubric with grade equivalents

9. Feature Access Control

Tier-Based AI Access

Feature	COMMUNITY	PRO	TEAM	ENTERPRISE
STEM Evaluation	View only	Full	Full	Full
English Writing	View only	Limited (parents)	Full	Full
Parent Insights	Yes (read-only)	Yes	Yes	Yes
Coach Feedback	No	No	Yes	Yes
Auto Grading	No	No	Limited	Full
Learning Paths	No	No	Limited	Full
Parent Comm	No	No	Limited	Full
Assignment Creation	No	No	Limited	Full

Usage Limits

Feature	COMMUNITY	TEAM	ENTERPRISE
STEM Evaluations	10/month	100/month	Unlimited
Writing Submissions	0	20/month	Unlimited
Parent Communications	5/month	50/month	Unlimited
Learning Paths	1 (one-time)	20/month	Unlimited
Assignments Created	0	20/month	Unlimited

10. Recommendations & Next Steps

Immediate (Next 30 Days)

Implement Token Tracking
- Add token-level usage records
- Monitor actual vs. projected spend
- Alert on >$1000/month features
Optimize Caching
- Extend STEM evaluation cache to 7 days
- Implement deduplication (same files = same score)
- Potential savings: 40-60% cost reduction
Monitor Mistral Performance
- Track response times
- Monitor error rates
- Prepare fallback to Claude if needed

Medium-term (60-90 Days)

Implement Claude Writer
- Port Claude provider for English writing
- A/B test: Mistral vs. Claude feedback quality
- Benefit: Better writing feedback, justifies premium tier
Add Cost Dashboard
- Admin view of daily/monthly spending
- Feature-by-feature cost breakdown
- Alerts on unusual spikes
Optimize Prompt Performance
- User testing of feedback quality
- Refine rubrics & scoring criteria
- Target: Same quality with 10-20% fewer tokens

Long-term (6+ Months)

Multi-Model Strategy
- Keep Mistral for bulk evaluations (cost-effective)
- Use Claude for premium writing feedback (quality)
- Use GPT-4 for specialized analysis (fallback)
Fine-tuning Exploration
- Fine-tune Mistral on STEM evaluation data
- Fine-tune on writing feedback patterns
- Potential savings: 30-50% cost reduction with improved quality
Caching Layer Upgrade
- Move to Redis for distributed caching
- Implement semantic similarity matching (same content = same score)
- Cache across multiple submissions if appropriate

11. Critical Success Factors

For Revenue

✅ English Writing must be high quality (justifies TEAM tier)
✅ Parent Communication must drive retention
✅ Costs must stay <10% of revenue

For User Experience

✅ Evaluation confidence must be reliable (coaches trust AI)
✅ Feedback must be actionable (students improve)
✅ Response time must be <5 seconds (users expect speed)

For Business Sustainability

✅ Cost per evaluation < $0.10 (ensure profitability at scale)
✅ Cache hit rate > 40% (reduce redundant calls)
✅ Tier adoption > 30% (many users stay on free tier)

Appendix: File Locations

Core AI Evaluation

/src/evaluations/evaluations.service.ts - Main evaluation logic
/src/evaluations/providers/llm-provider.factory.ts - Provider factory
/src/evaluations/providers/mistral-llm.provider.ts - Mistral implementation
/src/evaluations/providers/llm-provider.interface.ts - Provider interface
/src/evaluations/providers/mock-llm.provider.ts - Mock for testing

English Writing Workflow

/src/workflows/english-writing/english-writing.service.ts - Main service
/src/workflows/english-writing/providers/mistral-writing.provider.ts - Mistral writer
/src/workflows/english-writing/providers/claude-writing.provider.ts - Claude (ready)
/src/workflows/english-writing/providers/writing-evaluator.factory.ts - Provider factory

Other Workflows (Backend Ready, Frontend Pending)

/src/workflows/automated-grading/ - Grading workflow
/src/workflows/parent-communication/ - Communication generator
/src/workflows/learning-paths/ - Learning path generator
/src/workflows/assignment-creation/ - Assignment creator

Subscription Control

/src/subscriptions/subscriptions.service.ts - Usage tracking & tier enforcement

Configuration

/.env - API keys & model selection
/prisma/schema.prisma - Database models (FeatureFlag, SubscriptionTier, UsageRecord)

Document Version: 1.0 Last Updated: December 21, 2025 Status: Complete & Ready for Implementation Next Review: Quarterly (Mar 2026)

Executive Summary​

1. Current AI Implementations​

1.1 STEM Evaluation Engine (Production)​

Overview​

Categories Evaluated​

Workflow​

Token Estimation per Submission​

Usage Frequency​

Provider Implementation​

Coach Feedback Generation​

1.2 English Writing Workflow (Production)​

Overview​

3-Stage Evaluation Pipeline​

Writing Submission Characteristics​

Usage Frequency​

Provider Implementation​

Database Schema​

1.3 Parent Insights Generation (Production)​

Overview​

Insights Generated​

Input Data Required​

Token Estimation​

Usage Frequency​

1.4 LLM Provider Factory Pattern (Architecture)​

Design Pattern​

Provider Interface​

Available Providers​

2. Planned AI Features (Phase 4+)​

2.1 Automated Grading Workflow​

Overview​

Features​

Token Estimation​

2.2 Parent Communication Generator​

Overview​

Communication Types​

Token Estimation​

2.3 Personalized Learning Paths​

Overview​

Features​

Token Estimation​

2.4 Assignment Creation Assistant​

Overview​

Features​

Token Estimation​

3. AI Cost Analysis​

3.1 Current Model Costs (Mistral)​

3.2 Current Monthly Spend (Active Features)​

3.3 Projected Cost at Full Scale (All Workflows Active)​

3.4 Cost Optimization Strategies​

1. Model Tiering​

2. Caching Strategy​

3. Batch Processing​

4. Tier-Based Limits​

5. Provider Switching​

4. Critical AI Workloads​

By Frequency (Per Year)​

By Revenue Impact​

5. Provider Readiness Status​

Current Implementation​

Provider Switch Capability​

6. Database Schema for AI Features​

Evaluation Storage​

Workflow Storage​

Subscription Controls​

7. Token Usage Tracking​

Current Implementation​

Missing: Token-Level Tracking​

Recommendation​

8. Quality & Reliability​

AI Confidence Metrics​

Error Handling​

Prompt Engineering​

9. Feature Access Control​

Tier-Based AI Access​

Usage Limits​

10. Recommendations & Next Steps​

Immediate (Next 30 Days)​

Medium-term (60-90 Days)​

Long-term (6+ Months)​

11. Critical Success Factors​

Executive Summary

1. Current AI Implementations

1.1 STEM Evaluation Engine (Production)

Overview

Categories Evaluated

Workflow

Token Estimation per Submission

Usage Frequency

Provider Implementation

Coach Feedback Generation

1.2 English Writing Workflow (Production)

Overview

3-Stage Evaluation Pipeline

Writing Submission Characteristics

Usage Frequency

Provider Implementation

Database Schema

1.3 Parent Insights Generation (Production)

Overview

Insights Generated

Input Data Required

Token Estimation

Usage Frequency

1.4 LLM Provider Factory Pattern (Architecture)

Design Pattern

Provider Interface

Available Providers

2. Planned AI Features (Phase 4+)

2.1 Automated Grading Workflow

Overview

Features

Token Estimation

2.2 Parent Communication Generator

Overview

Communication Types

Token Estimation

2.3 Personalized Learning Paths

Overview

Features

Token Estimation

2.4 Assignment Creation Assistant

Overview

Features

Token Estimation

3. AI Cost Analysis

3.1 Current Model Costs (Mistral)

3.2 Current Monthly Spend (Active Features)

3.3 Projected Cost at Full Scale (All Workflows Active)

3.4 Cost Optimization Strategies

1. Model Tiering

2. Caching Strategy

3. Batch Processing

4. Tier-Based Limits

5. Provider Switching

4. Critical AI Workloads

By Frequency (Per Year)

By Revenue Impact

5. Provider Readiness Status

Current Implementation

Provider Switch Capability

6. Database Schema for AI Features

Evaluation Storage

Workflow Storage

Subscription Controls

7. Token Usage Tracking

Current Implementation

Missing: Token-Level Tracking

Recommendation

8. Quality & Reliability

AI Confidence Metrics

Error Handling

Prompt Engineering

9. Feature Access Control

Tier-Based AI Access

Usage Limits

10. Recommendations & Next Steps

Immediate (Next 30 Days)

Medium-term (60-90 Days)

Long-term (6+ Months)

11. Critical Success Factors