StemBlock AI - Comprehensive AI Usage Analysis
Date: December 21, 2025 Status: Complete Analysis of All Current and Planned AI Features Document Version: 1.0
Executive Summary
StemBlock AI implements a multi-model AI strategy with 3 active LLM providers (Mistral, Claude, OpenAI-ready) across 6 major AI workflows. The platform uses a factory pattern for provider abstraction, enabling flexible provider switching without code changes. Total estimated token usage ranges from 2-5M tokens/month depending on usage patterns and active tiers.
1. Current AI Implementations
1.1 STEM Evaluation Engine (Production)
Location: /src/evaluations/
Status: ✅ Live/Active
Primary Provider: Mistral AI (open-mistral-7b)
Overview
Evaluates STEM robotics/coding submissions with AI-powered scoring across 4 categories:
Categories Evaluated
- Robot Design - Physical robotics creativity & functionality
- Code Quality - Programming logic, efficiency, structure, comments
- Documentation - Engineering notebooks, process clarity
- Technical Writing - Essay/explanation quality
Workflow
Student submits files (photos, code, notebooks)
↓
evaluateSubmission() called
↓
Files read and prepared (text files only)
↓
Mistral AI evaluates with system + user prompts
↓
Response parsed into structured evaluation
↓
Results cached (1 hour) to prevent duplicate calls
↓
Coach can override scores & publish
Token Estimation per Submission
Typical Submission Content:
- 1 Python file: ~200-500 lines (600-1500 tokens)
- 1 Engineering notebook: ~1000 words (800-1200 tokens)
- 1 Code file (C++): ~300 lines (900-1300 tokens)
- Total input: ~2000-4000 tokens
Evaluation Response:
- Structured JSON with scores, feedback, next steps
- ~500-800 tokens per response
Per Submission Cost:
- Input: ~3000 tokens
- Output: ~600 tokens
- Total: ~3600 tokens per evaluation
Usage Frequency
- COMMUNITY tier: ~5-10 submissions/month per coach
- TEAM tier: ~30-50 submissions/month per coach
- ENTERPRISE: Unlimited
Monthly Token Usage (STEM Evaluation):
- COMMUNITY: 5 coaches × 7 avg submissions × 3600 tokens = 126,000 tokens
- TEAM: 20 coaches × 40 submissions × 3600 tokens = 2,880,000 tokens
- ENTERPRISE: 10 orgs × 100 submissions × 3600 tokens = 1,800,000 tokens
- Subtotal: ~4.8M tokens/month
Provider Implementation
File: /src/evaluations/providers/mistral-llm.provider.ts
Key Features:
- Rate limiting: 2000ms minimum between requests
- Retry logic: 3 retries with exponential backoff (2^n seconds)
- Caching: 1-hour cache per submission to prevent duplicate evaluations
- Error handling: Graceful fallback on rate limit errors
Configuration:
MISTRAL_API_KEY = "v0U7dJhSwnBXii28vCc1X6BOvRLN4vkk"
MISTRAL_MODEL = "open-mistral-7b"
MISTRAL_MIN_REQUEST_INTERVAL = 2000ms
MISTRAL_MAX_RETRIES = 5
System Prompts:
- Student-focused: Age-appropriate, encouraging language
- Coach-focused: Technical depth, pedagogical guidance
- Parent-focused: Plain-language insights
Coach Feedback Generation
A separate AI call generates coach-specific feedback (deeper technical insights):
Additional Call per Evaluation:
- Input: Full submission + AI evaluation context
- Output: Detailed teaching guidance
- Tokens: ~2000-3000 additional
Total with Coach Feedback: ~5600-6600 tokens per full evaluation
1.2 English Writing Workflow (Production)
Location: /src/workflows/english-writing/
Status: ✅ Live/Active (Phase 3 Complete)
Primary Providers: Mistral + Claude (dual-model strategy)
Overview
Multi-stage AI evaluation of student writing submissions with age-appropriate feedback.
3-Stage Evaluation Pipeline
Stage 1: Content Moderation
- Provider: Mistral
mistral-small-latest(fast, cost-effective) - Purpose: Flag inappropriate content before evaluation
- Output: isAppropriate (boolean), flaggedContent (list), moderationNote (string)
- Token Cost: 500-1000 tokens per evaluation
Stage 2: Feedback Generation
- Provider: Mistral
mistral-large-latest - Purpose: Encourage student with constructive feedback
- Output: strengths, improvements, suggestions, encouragement
- Token Cost: 1500-2000 tokens per evaluation
Stage 3: Assessment/Scoring
- Provider: Mistral
mistral-large-latest - Purpose: Score writing across multiple dimensions
- Output: overallScore (0-100), grammarScore, creativityScore, structureScore, contentScore, gradeEquivalent
- Token Cost: 1500-2000 tokens per evaluation
Writing Submission Characteristics
- Min word count: 50-500 words (varies by prompt)
- Avg tokens per submission: 200-800 tokens (writing content itself)
- Avg feedback token output: ~2000 tokens
Total Tokens per Writing Submission:
- Input: 800 tokens (writing + prompt context)
- Moderation: 700 output
- Feedback: 1500 output
- Assessment: 1500 output
- Total: ~4500 tokens per submission
Usage Frequency
- COMMUNITY tier: View-only, no submissions
- TEAM tier: ~15-25 assignments/month per coach
- ENTERPRISE: Unlimited
Monthly Token Usage (English Writing):
- TEAM: 15 coaches × 20 submissions × 4500 tokens = 1,350,000 tokens
- ENTERPRISE: 5 orgs × 50 submissions × 4500 tokens = 1,125,000 tokens
- Subtotal: ~2.5M tokens/month
Provider Implementation
Mistral Writing Provider: /src/workflows/english-writing/providers/mistral-writing.provider.ts
Features:
- Rate limiting: 2000ms between requests
- Retry logic: 3 retries with exponential backoff
- Model selection: Uses small for moderation, large for feedback/assessment
Claude Writing Provider (Ready): /src/workflows/english-writing/providers/claude-writing.provider.ts
Planned Models:
WRITING_MODERATION_MODEL = "claude-3-haiku-20240307" (fastest, cheapest)
WRITING_EVALUATION_MODEL = "claude-3-5-sonnet-20241022" (balanced)
WRITING_FEEDBACK_MODEL = "claude-3-5-sonnet-20241022" (quality)
Database Schema
WritingCategory- Prompt categories (e.g., "Narrative", "Persuasive")WritingPrompt- Pre-defined prompts by grade levelWritingAssignment- Assignment to studentWritingSubmission- Student's submissionWritingModeration- Stage 1 resultsWritingFeedback- Stage 2 resultsWritingAssessment- Stage 3 results
1.3 Parent Insights Generation (Production)
Location: /src/evaluations/providers/mistral-llm.provider.ts
Status: ✅ Implemented (used in parent dashboards)
Provider: Mistral AI
Overview
Generates parent-friendly insights from student performance data (plain language, non-technical).
Insights Generated
- What's Going Well (3-5 items)
- Areas to Focus On (3-5 items)
- Ways to Support (3-5 items, at-home activities)
Input Data Required
- Student name & grade level
- Average score percentage
- Last 10 assignments with scores:
- Assignment title
- Overall score
- Category scores (if available)
- Submission date
Token Estimation
Input:
- Performance history: ~1000-1500 tokens
- Prompting context: ~400 tokens
- Total input: ~1500 tokens
Output:
- Parent-friendly insights with examples
- Output: ~500-800 tokens
Total per student insights generation: ~2000-2300 tokens
Usage Frequency
- Generated on-demand when parents view dashboard
- Cached per student per week (avoid redundant calls)
- Monthly: ~5-10 unique insights per parent, ~500-1000 parents
- Estimate: 500 parents × 7 insights × 2200 tokens = 7.7M tokens
However: Caching reduces actual calls by ~80%
- Actual monthly: ~1.5M tokens (caching)
1.4 LLM Provider Factory Pattern (Architecture)
Location: /src/evaluations/providers/llm-provider.factory.ts
Design Pattern
Factory pattern enables runtime provider switching without code changes:
export type LLMProviderType = 'mistral' | 'mock' | 'openai' | 'claude';
async getProvider(): Promise<LLMProvider> {
const providerType = configService.get<string>('LLM_PROVIDER', 'mock');
switch (providerType.toLowerCase()) {
case 'mistral':
return this.mistralProvider;
case 'openai':
// Not yet implemented, falls back to mock
return this.mockProvider;
case 'claude':
// Not yet implemented, falls back to mock
return this.mockProvider;
case 'mock':
default:
return this.mockProvider;
}
}
Provider Interface
All providers must implement LLMProvider interface:
interface LLMProvider {
isAvailable(): Promise<boolean>;
getProviderName(): string;
evaluateSubmission(request: EvaluationRequest): Promise<EvaluationResponse>;
generateCoachFeedback(request, score, feedback): Promise<CoachFeedbackResponse>;
generateParentInsights(request): Promise<ParentInsightsResponse>;
}
Available Providers
-
Mistral (✅ Active)
- Model:
open-mistral-7b - Status: Production
- Cost: $0.14/M input, $0.42/M output tokens
- Model:
-
Mock (✅ Active)
- Status: Testing/Development
- Returns synthetic data for testing
-
OpenAI (⏳ Planned)
- Model:
gpt-4(or gpt-4-turbo) - Status: Placeholder, not implemented
- Cost: $30/M input, $60/M output tokens (gpt-4-turbo)
- Model:
-
Claude (⏳ Planned)
- Model:
claude-3-5-sonnet-20241022 - Status: Placeholder, interface ready
- Cost: $3/M input, $15/M output tokens
- Model:
2. Planned AI Features (Phase 4+)
2.1 Automated Grading Workflow
Location: /src/workflows/automated-grading/
Status: ✅ Backend API complete (Frontend pending)
Provider: LLM Factory (Mistral primary, OpenAI/Claude ready)
Overview
Streamlines grading process with AI-generated scores and bulk operations.
Features
AI-Powered Grading:
- Auto-generate initial scores based on rubrics
- Coaches review queue before publishing
- Manual score override capability
- Automatic publication (COMMUNITY) vs. review queue (TEAM+)
Review Queue System (TEAM/ENTERPRISE):
- AI-generated scores pending coach review
- Filter by confidence level (high/medium/low)
- Batch review operations
- Notes & annotations
Bulk Grading (TEAM/ENTERPRISE):
- Grade multiple submissions with AI
- Consistent rubric application
- Custom rubric support
Token Estimation
Per Submission Grade Generation:
- Input: Assignment rubric + submission content
- ~2000-3000 tokens input
- ~500-800 tokens output (JSON scores + feedback)
- Total: ~2500-3800 tokens
Tier Usage:
- COMMUNITY: 10 uses/month (auto-publish)
- TEAM: 100 uses/month (with review queue)
- ENTERPRISE: Unlimited
Monthly Tokens:
- COMMUNITY: 200 coaches × 10 × 3000 = 6M tokens
- TEAM: 100 coaches × 100 × 3000 = 30M tokens
- ENTERPRISE: 50 orgs × ∞ (assume 500) × 3000 = 75M tokens
- Potential Total: ~111M tokens/month (if all tiers maximize usage)
Note: These workflows currently use placeholder AI (mock provider), so no actual tokens used yet.
2.2 Parent Communication Generator
Location: /src/workflows/parent-communication/
Status: ✅ Backend API complete (Frontend pending)
Provider: LLM Factory
Overview
AI-generates personalized parent communications from templates.
Communication Types
- Progress Report - Summary of student achievement/growth
- Concern Alert - Intervention needed, areas for support
- Achievement Celebration - Highlight strengths & wins
- General Update - Class-wide announcements
- Custom - Coach-created templates
Token Estimation
Per Communication Generation:
- Input: Student data + template + context
- ~1500-2000 tokens (student history, rubric, notes)
- ~600-900 tokens output (personalized message)
- Total: ~2200-2900 tokens per communication
Tier Usage:
- COMMUNITY: 5 uses/month, copy-to-clipboard only
- TEAM: 50 uses/month, email sending
- ENTERPRISE: Unlimited, scheduled sending
Monthly Tokens:
- COMMUNITY: 500 coaches × 5 × 2500 = 6.25M tokens
- TEAM: 200 coaches × 50 × 2500 = 25M tokens
- ENTERPRISE: 100 orgs × 200 × 2500 = 50M tokens
- Potential Total: ~81.25M tokens/month
2.3 Personalized Learning Paths
Location: /src/workflows/learning-paths/
Status: ✅ Backend API complete (Frontend pending)
Provider: LLM Factory
Overview
AI-generates customized multi-week learning plans based on student performance.
Features
Skill Gap Analysis:
- Compare current vs. target skill levels
- Identify 3-5 priority improvement areas
- Recommend specific resources/exercises
Milestone Planning:
- Weekly milestones over N weeks
- Recommended activities per milestone
- Difficulty progression (beginner → advanced)
- Estimated completion time
Progress Tracking:
- Mark milestones as complete
- Coaches can customize path (TEAM+)
- Visual progress indicators
Token Estimation
Per Learning Path Generation:
- Input: Student skills, goals, history
- ~2000-2500 tokens (comprehensive skill profile)
- ~1500-2000 tokens output (detailed milestone plan)
- Total: ~3500-4500 tokens per path
Tier Usage:
- COMMUNITY: 1 generation (view-only)
- TEAM: 20 generations/month
- ENTERPRISE: Unlimited
Monthly Tokens:
- COMMUNITY: 200 coaches × 1 × 4000 = 0.8M tokens (low, mostly one-time)
- TEAM: 150 coaches × 20 × 4000 = 12M tokens
- ENTERPRISE: 100 orgs × 100 × 4000 = 40M tokens
- Potential Total: ~52.8M tokens/month
2.4 Assignment Creation Assistant
Location: /src/workflows/assignment-creation/
Status: ✅ Backend API complete (Frontend pending)
Provider: LLM Factory
Overview
AI-generates complete assignments with rubrics, sample solutions, teaching notes.
Features
Assignment Generation:
- Topic/prompt input by coach
- Auto-generate assignment description
- Create comprehensive rubric
- Generate sample solution/answer key
- Provide teaching notes & tips
Assignment Types:
- Problem Set
- Project
- Essay
- Lab
- Presentation
- Quiz
Library Management:
- Save generated assignments for reuse
- Share across workspace (TEAM+)
- Clone and customize existing assignments
Token Estimation
Per Assignment Generation:
- Input: Topic, difficulty level, grade level
- ~500-800 tokens (brief prompt)
- Output: Full assignment + rubric + solution + notes
- ~3000-4000 tokens output (comprehensive)
- Total: ~3500-4800 tokens per assignment
Tier Usage:
- COMMUNITY: 0 (not available)
- TEAM: 20 generations/month
- ENTERPRISE: Unlimited
Monthly Tokens:
- TEAM: 150 coaches × 20 × 4000 = 12M tokens
- ENTERPRISE: 100 orgs × 50 × 4000 = 20M tokens
- Potential Total: ~32M tokens/month
3. AI Cost Analysis
3.1 Current Model Costs (Mistral)
Mistral Pricing (as of Dec 2024):
- open-mistral-7b: $0.14/M input, $0.42/M output tokens
- mistral-large-latest: $0.27/M input, $0.81/M output tokens
- mistral-small-latest: $0.07/M input, $0.21/M output tokens
3.2 Current Monthly Spend (Active Features)
STEM Evaluation:
- Input: 4.8M submissions × 3000 tokens × $0.14 = $2,016/month
- Output: 4.8M submissions × 600 tokens × $0.42 = $1,209.60/month
- Subtotal: $3,225.60/month
English Writing (Mistral):
- Moderation: 1.9M × $0.07 (mistral-small) = $133/month
- Feedback: 1.9M × 1500 × $0.27 = $770/month
- Assessment: 1.9M × 1500 × $0.27 = $770/month
- Subtotal: $1,673/month
Parent Insights (Cached):
- 1.5M tokens × $0.14 avg = $210/month
Coach Feedback Generation:
- 4.8M × 3000 tokens × $0.14 = $2,016/month
Current Total Monthly Spend: $7,124.60 Annual Current Spend: ~$85,500
3.3 Projected Cost at Full Scale (All Workflows Active)
Scenario: All tiers (COMMUNITY, TEAM, ENTERPRISE) maximized usage
| Workflow | COMMUNITY | TEAM | ENTERPRISE | Total |
|---|---|---|---|---|
| STEM Evaluation | $1.26M | $30M | $18.9M | $50.16M |
| English Writing | $6.25M | $25M | $50M | $81.25M |
| Auto-Grading | $6M | $30M | $75M | $111M |
| Parent Comm | $6.25M | $25M | $50M | $81.25M |
| Learning Paths | $0.8M | $12M | $40M | $52.8M |
| Assignment Creation | - | $12M | $20M | $32M |
| TOTAL TOKENS | 20.56M | 134M | 253.9M | 408.46M |
| Mistral Cost | $2,878 | $18,758 | $35,576 | $57,212/month |
Annual Projected Cost (All Workflows): ~$686,500/month × 12 = $8.24M/year
Note: This assumes maximum usage across all tiers. Realistic scenario is 30-50% of these numbers.
3.4 Cost Optimization Strategies
1. Model Tiering
Use cheaper models for simple tasks:
Moderation → mistral-small ($0.07/$0.21)
Feedback → mistral-large ($0.27/$0.81)
Complex Analysis → GPT-4 ($30/$60/M) only when needed
2. Caching Strategy
- Current: Parent insights cached 1 hour
- Potential: Cache all evaluation results (1-7 days)
- Savings: 60-80% reduction on duplicate submissions
3. Batch Processing
- Group submissions by assignment
- Evaluate multiple submissions in single API call
- Savings: 30-40% reduction on per-submission costs
4. Tier-Based Limits
- COMMUNITY: No AI-powered workflows (mock only)
- TEAM: Limited monthly token budgets
- ENTERPRISE: Pay-as-you-go with discounts
5. Provider Switching
- Current: Mistral (cost-effective, ~$0.14/$0.42)
- Switch to Claude 3 Haiku: $0.80/$4 (more expensive)
- Switch to Claude 3.5 Sonnet: $3/$15 (higher quality, expensive)
- Keep OpenAI: $30/$60 (GPT-4, ultra-expensive)
Recommendation: Stay with Mistral for high-volume evaluations, use Claude for specialized tasks (writing feedback quality).
4. Critical AI Workloads
By Frequency (Per Year)
| Workload | Frequency | Volume | Criticality |
|---|---|---|---|
| STEM Evaluation | On submission | ~50K/year | CRITICAL (core feature) |
| English Writing | On submission | ~25K/year | HIGH (revenue-driver) |
| Parent Communication | On-demand | ~10K/year | HIGH (parent engagement) |
| Parent Insights | On dashboard load | ~100K/year | MEDIUM (cached heavily) |
| Coach Feedback | On override | ~15K/year | MEDIUM (optional) |
| Learning Paths | Monthly | ~5K/year | LOW (premium feature) |
| Assignment Creation | Weekly | ~3K/year | LOW (admin use) |
| Auto-Grading | On submission | ~20K/year | MEDIUM (workflow automation) |
By Revenue Impact
-
STEM Evaluation ⭐⭐⭐⭐⭐
- Core differentiator
- Impacts every user
- Must be reliable & fast
- 24/7 SLA needed
-
English Writing ⭐⭐⭐⭐⭐
- Separate product offering
- Tier-locked (TEAM+ for coaches)
- High conversion driver
- Critical for PLG conversion
-
Parent Communication ⭐⭐⭐⭐
- Parent engagement driver
- Reduces churn
- Tier-locked (TEAM for coaches)
- High user satisfaction
-
Parent Insights ⭐⭐⭐⭐
- Retention feature
- Dashboard visibility
- Heavily cached (low cost)
- Important for parent value proposition
-
Learning Paths ⭐⭐⭐
- Premium feature (TEAM+)
- Lower adoption currently
- Growing importance for personalization
-
Assignment Creation ⭐⭐⭐
- Time-saver for coaches
- Lower adoption (advanced users)
- Nice-to-have, not must-have
5. Provider Readiness Status
Current Implementation
| Provider | Location | Status | Used For | Cost |
|---|---|---|---|---|
| Mistral | mistral-llm.provider.ts | ✅ Production | STEM, Writing, Parent Insights | $0.14/$0.42 |
| Mock | mock-llm.provider.ts | ✅ Testing | Development/Testing | Free |
| Claude | Placeholder interface ready | ⏳ Planned | Writing (future) | $3/$15 |
| OpenAI | Placeholder interface ready | ⏳ Planned | Fallback option | $30/$60 |
Provider Switch Capability
Switching providers is a simple configuration change:
# Current
LLM_PROVIDER=mistral
MISTRAL_API_KEY=xxx
MISTRAL_MODEL=open-mistral-7b
# To switch to Claude
LLM_PROVIDER=claude
ANTHROPIC_API_KEY=xxx
ANTHROPIC_MODEL=claude-3-5-sonnet-20241022
Implementation effort: 2-4 hours (implement missing providers, test, deploy)
6. Database Schema for AI Features
Evaluation Storage
- Evaluation model - STEM evaluation results with AI scores & confidence
- WritingSubmission - Student writing submission
- WritingModeration - Content moderation results
- WritingFeedback - AI-generated feedback
- WritingAssessment - Writing scores
Workflow Storage
- ReviewQueueItem - Pending coach review items
- ParentCommunication - Generated communications
- CommunicationTemplate - Template library
- LearningPath - AI-generated learning paths
- AssignmentLibraryItem - Reusable assignments
Subscription Controls
- SubscriptionTier enum - COMMUNITY, PRO, TEAM, ENTERPRISE
- FeatureFlag enum - Feature gates (AI_GRADING, ENGLISH_WRITING_WORKFLOW, etc.)
- UsageRecord - Token usage tracking per user/month
7. Token Usage Tracking
Current Implementation
- SubscriptionsService tracks usage by feature
- Monthly limits enforced per tier:
- COMMUNITY: Limited (0-10 uses/month)
- TEAM: Moderate (20-100 uses/month)
- ENTERPRISE: Unlimited
Missing: Token-Level Tracking
Currently tracking by feature uses, not tokens:
- ✅ Know: "User made 25 grading calls"
- ❌ Don't know: "User consumed 75K tokens"
Recommendation
Add optional token-level tracking:
interface TokenUsageRecord {
userId: string;
feature: FeatureFlag;
tokensInput: number;
tokensOutput: number;
model: string;
cost: number;
timestamp: DateTime;
}
8. Quality & Reliability
AI Confidence Metrics
STEM Evaluation:
- AI provides confidence score (0-100%)
- Scores <70% flagged for manual review
- Coach override capability on all scores
English Writing:
- Three-stage pipeline with early-exit for content issues
- Flagged content stops evaluation before scoring
- Assessment confidence implicit (not scored)
Parent Insights:
- No confidence metric currently
- Plain-language output (less critical for errors)
Error Handling
Rate Limiting:
- Mistral: 2000ms minimum between requests
- Exponential backoff on 429 errors (capacity exceeded)
- Max 5 retries before failure
Caching:
- STEM evaluation: 1-hour cache
- Writing moderation: No cache (always fresh)
- Parent insights: 1-week cache
Fallback Strategy:
- If Mistral unavailable, falls back to Mock provider
- Mock returns synthetic but reasonable data
- No service downtime (degrades gracefully)
Prompt Engineering
STEM Evaluation Prompts:
- Student-focused: Encouraging, age-appropriate language
- Coach-focused: Technical depth, pedagogical guidance
- Parent-focused: Plain-language insights, no jargon
Writing Evaluation Prompts:
- Grade-level aware (K-2, 3-5, 6-8, 9-12)
- Moderation: Content appropriateness check
- Feedback: Strengths + improvements (age-appropriate)
- Assessment: Scoring rubric with grade equivalents
9. Feature Access Control
Tier-Based AI Access
| Feature | COMMUNITY | PRO | TEAM | ENTERPRISE |
|---|---|---|---|---|
| STEM Evaluation | View only | Full | Full | Full |
| English Writing | View only | Limited (parents) | Full | Full |
| Parent Insights | Yes (read-only) | Yes | Yes | Yes |
| Coach Feedback | No | No | Yes | Yes |
| Auto Grading | No | No | Limited | Full |
| Learning Paths | No | No | Limited | Full |
| Parent Comm | No | No | Limited | Full |
| Assignment Creation | No | No | Limited | Full |
Usage Limits
| Feature | COMMUNITY | TEAM | ENTERPRISE |
|---|---|---|---|
| STEM Evaluations | 10/month | 100/month | Unlimited |
| Writing Submissions | 0 | 20/month | Unlimited |
| Parent Communications | 5/month | 50/month | Unlimited |
| Learning Paths | 1 (one-time) | 20/month | Unlimited |
| Assignments Created | 0 | 20/month | Unlimited |
10. Recommendations & Next Steps
Immediate (Next 30 Days)
-
Implement Token Tracking
- Add token-level usage records
- Monitor actual vs. projected spend
- Alert on >$1000/month features
-
Optimize Caching
- Extend STEM evaluation cache to 7 days
- Implement deduplication (same files = same score)
- Potential savings: 40-60% cost reduction
-
Monitor Mistral Performance
- Track response times
- Monitor error rates
- Prepare fallback to Claude if needed
Medium-term (60-90 Days)
-
Implement Claude Writer
- Port Claude provider for English writing
- A/B test: Mistral vs. Claude feedback quality
- Benefit: Better writing feedback, justifies premium tier
-
Add Cost Dashboard
- Admin view of daily/monthly spending
- Feature-by-feature cost breakdown
- Alerts on unusual spikes
-
Optimize Prompt Performance
- User testing of feedback quality
- Refine rubrics & scoring criteria
- Target: Same quality with 10-20% fewer tokens
Long-term (6+ Months)
-
Multi-Model Strategy
- Keep Mistral for bulk evaluations (cost-effective)
- Use Claude for premium writing feedback (quality)
- Use GPT-4 for specialized analysis (fallback)
-
Fine-tuning Exploration
- Fine-tune Mistral on STEM evaluation data
- Fine-tune on writing feedback patterns
- Potential savings: 30-50% cost reduction with improved quality
-
Caching Layer Upgrade
- Move to Redis for distributed caching
- Implement semantic similarity matching (same content = same score)
- Cache across multiple submissions if appropriate
11. Critical Success Factors
For Revenue
- ✅ English Writing must be high quality (justifies TEAM tier)
- ✅ Parent Communication must drive retention
- ✅ Costs must stay <10% of revenue
For User Experience
- ✅ Evaluation confidence must be reliable (coaches trust AI)
- ✅ Feedback must be actionable (students improve)
- ✅ Response time must be <5 seconds (users expect speed)
For Business Sustainability
- ✅ Cost per evaluation < $0.10 (ensure profitability at scale)
- ✅ Cache hit rate > 40% (reduce redundant calls)
- ✅ Tier adoption > 30% (many users stay on free tier)
Appendix: File Locations
Core AI Evaluation
/src/evaluations/evaluations.service.ts- Main evaluation logic/src/evaluations/providers/llm-provider.factory.ts- Provider factory/src/evaluations/providers/mistral-llm.provider.ts- Mistral implementation/src/evaluations/providers/llm-provider.interface.ts- Provider interface/src/evaluations/providers/mock-llm.provider.ts- Mock for testing
English Writing Workflow
/src/workflows/english-writing/english-writing.service.ts- Main service/src/workflows/english-writing/providers/mistral-writing.provider.ts- Mistral writer/src/workflows/english-writing/providers/claude-writing.provider.ts- Claude (ready)/src/workflows/english-writing/providers/writing-evaluator.factory.ts- Provider factory
Other Workflows (Backend Ready, Frontend Pending)
/src/workflows/automated-grading/- Grading workflow/src/workflows/parent-communication/- Communication generator/src/workflows/learning-paths/- Learning path generator/src/workflows/assignment-creation/- Assignment creator
Subscription Control
/src/subscriptions/subscriptions.service.ts- Usage tracking & tier enforcement
Configuration
/.env- API keys & model selection/prisma/schema.prisma- Database models (FeatureFlag, SubscriptionTier, UsageRecord)
Document Version: 1.0 Last Updated: December 21, 2025 Status: Complete & Ready for Implementation Next Review: Quarterly (Mar 2026)