Spaces:
Sleeping
Sleeping
milwright
commited on
Commit
·
fed2199
1
Parent(s):
56287e6
simplify ui language and integrate dual gemma models
Browse files- use gemma-3-27b for hints, gemma-3-12b for word selection
- simplify score display and progression messages
- add passage tracking (1/2, 2/2) in header
- clarify "2 passages per round, 2 rounds per level" system
- LEADERBOARD_ROADMAP.md +0 -171
- README-testing-framework.md +217 -0
- index.html +8 -0
- model-testing.html +629 -0
- src/aiService.js +214 -107
- src/app.js +6 -8
- src/clozeGameEngine.js +3 -3
- src/modelTestingFramework.js +703 -0
- src/testAIService.js +154 -0
- src/testGameRunner.js +473 -0
- src/testReportGenerator.js +453 -0
- src/userRankingInterface.js +650 -0
- src/welcomeOverlay.js +2 -2
- test-prompts-lm-studio.md +0 -262
LEADERBOARD_ROADMAP.md
DELETED
|
@@ -1,171 +0,0 @@
|
|
| 1 |
-
# Cloze Reader Leaderboard Implementation Roadmap
|
| 2 |
-
|
| 3 |
-
## Overview
|
| 4 |
-
This document outlines the implementation plan for adding a competitive leaderboard system to the Cloze Reader game, where players can submit their scores using 3-letter acronyms.
|
| 5 |
-
|
| 6 |
-
## Phase 1: Core Infrastructure (Week 1-2)
|
| 7 |
-
|
| 8 |
-
### 1.1 Database Schema
|
| 9 |
-
- Create leaderboard table structure:
|
| 10 |
-
```sql
|
| 11 |
-
leaderboard {
|
| 12 |
-
id: UUID
|
| 13 |
-
acronym: VARCHAR(3)
|
| 14 |
-
score: INTEGER
|
| 15 |
-
level_reached: INTEGER
|
| 16 |
-
total_time: INTEGER (seconds)
|
| 17 |
-
created_at: TIMESTAMP
|
| 18 |
-
ip_hash: VARCHAR(64) // For rate limiting
|
| 19 |
-
}
|
| 20 |
-
```
|
| 21 |
-
|
| 22 |
-
### 1.2 API Endpoints
|
| 23 |
-
- `POST /api/leaderboard/submit` - Submit new score
|
| 24 |
-
- `GET /api/leaderboard/top/{period}` - Get top scores (daily/weekly/all-time)
|
| 25 |
-
- `GET /api/leaderboard/check-acronym/{acronym}` - Validate acronym availability
|
| 26 |
-
|
| 27 |
-
### 1.3 Score Calculation
|
| 28 |
-
- Base score = (correct_answers * 100) * level_multiplier
|
| 29 |
-
- Time bonus = max(0, 1000 - seconds_per_round)
|
| 30 |
-
- Streak bonus = consecutive_correct * 50
|
| 31 |
-
|
| 32 |
-
## Phase 2: Frontend Integration (Week 2-3)
|
| 33 |
-
|
| 34 |
-
### 2.1 UI Components
|
| 35 |
-
- **Leaderboard Modal** (`leaderboardModal.js`)
|
| 36 |
-
- Top 10 display with rank, acronym, score, level
|
| 37 |
-
- Period toggle (Today/Week/All-Time)
|
| 38 |
-
- Personal best highlight
|
| 39 |
-
|
| 40 |
-
### 2.2 Score Submission Flow
|
| 41 |
-
- End-of-game prompt for acronym entry
|
| 42 |
-
- 3-letter validation (A-Z only)
|
| 43 |
-
- Profanity filter implementation
|
| 44 |
-
- Success/error feedback
|
| 45 |
-
|
| 46 |
-
### 2.3 Visual Elements
|
| 47 |
-
- Trophy icons for top 3 positions
|
| 48 |
-
- Animated score counter
|
| 49 |
-
- Level badges display
|
| 50 |
-
|
| 51 |
-
## Phase 3: Security & Performance (Week 3-4)
|
| 52 |
-
|
| 53 |
-
### 3.1 Anti-Cheat Measures
|
| 54 |
-
- Server-side score validation
|
| 55 |
-
- Rate limiting (1 submission per 5 minutes per IP)
|
| 56 |
-
- Score feasibility checks (max possible score per level)
|
| 57 |
-
- Request signing with session tokens
|
| 58 |
-
|
| 59 |
-
### 3.2 Caching Strategy
|
| 60 |
-
- Redis cache for top 100 scores
|
| 61 |
-
- 5-minute TTL for leaderboard queries
|
| 62 |
-
- Real-time updates for top 10 changes
|
| 63 |
-
|
| 64 |
-
### 3.3 Data Persistence
|
| 65 |
-
- PostgreSQL for primary storage
|
| 66 |
-
- Daily backups of leaderboard data
|
| 67 |
-
- Archived monthly snapshots
|
| 68 |
-
|
| 69 |
-
## Phase 4: Advanced Features (Week 4-5)
|
| 70 |
-
|
| 71 |
-
### 4.1 Achievement System
|
| 72 |
-
- "First Timer" - First leaderboard entry
|
| 73 |
-
- "Vocabulary Master" - 10+ correct in a row
|
| 74 |
-
- "Speed Reader" - Complete round < 30 seconds
|
| 75 |
-
- "Persistent Scholar" - Play 7 days straight
|
| 76 |
-
|
| 77 |
-
### 4.2 Social Features
|
| 78 |
-
- Share score to social media
|
| 79 |
-
- Challenge link generation
|
| 80 |
-
- Friend acronym tracking
|
| 81 |
-
|
| 82 |
-
### 4.3 Analytics Dashboard
|
| 83 |
-
- Player retention metrics
|
| 84 |
-
- Popular acronym analysis
|
| 85 |
-
- Score distribution graphs
|
| 86 |
-
|
| 87 |
-
## Technical Implementation Details
|
| 88 |
-
|
| 89 |
-
### Backend Changes Required
|
| 90 |
-
|
| 91 |
-
1. **FastAPI Endpoints** (`app.py`):
|
| 92 |
-
```python
|
| 93 |
-
@app.post("/api/leaderboard/submit")
|
| 94 |
-
async def submit_score(score_data: ScoreSubmission)
|
| 95 |
-
|
| 96 |
-
@app.get("/api/leaderboard/top/{period}")
|
| 97 |
-
async def get_leaderboard(period: str, limit: int = 10)
|
| 98 |
-
```
|
| 99 |
-
|
| 100 |
-
2. **Database Models** (`models.py` - new file):
|
| 101 |
-
```python
|
| 102 |
-
class LeaderboardEntry(Base):
|
| 103 |
-
__tablename__ = "leaderboard"
|
| 104 |
-
# Schema implementation
|
| 105 |
-
```
|
| 106 |
-
|
| 107 |
-
3. **Validation Service** (`validation.py` - new file):
|
| 108 |
-
- Acronym format validation
|
| 109 |
-
- Profanity checking
|
| 110 |
-
- Score feasibility verification
|
| 111 |
-
|
| 112 |
-
### Frontend Changes Required
|
| 113 |
-
|
| 114 |
-
1. **Game Engine Integration** (`clozeGameEngine.js`):
|
| 115 |
-
- Track game metrics for scoring
|
| 116 |
-
- Call submission API on game end
|
| 117 |
-
- Store session data for validation
|
| 118 |
-
|
| 119 |
-
2. **UI Updates** (`app.js`):
|
| 120 |
-
- Add leaderboard button to main menu
|
| 121 |
-
- Integrate submission modal
|
| 122 |
-
- Handle API responses
|
| 123 |
-
|
| 124 |
-
3. **New Modules**:
|
| 125 |
-
- `leaderboardService.js` - API communication
|
| 126 |
-
- `scoreCalculator.js` - Client-side scoring logic
|
| 127 |
-
- `leaderboardUI.js` - UI component management
|
| 128 |
-
|
| 129 |
-
## Deployment Considerations
|
| 130 |
-
|
| 131 |
-
### Infrastructure Requirements
|
| 132 |
-
- Database: PostgreSQL 14+
|
| 133 |
-
- Cache: Redis 6+
|
| 134 |
-
- API rate limiting: nginx or API Gateway
|
| 135 |
-
- SSL certificate for secure submissions
|
| 136 |
-
|
| 137 |
-
### Environment Variables
|
| 138 |
-
```
|
| 139 |
-
DATABASE_URL=postgresql://...
|
| 140 |
-
REDIS_URL=redis://...
|
| 141 |
-
LEADERBOARD_SECRET=... # For request signing
|
| 142 |
-
PROFANITY_API_KEY=... # Optional external service
|
| 143 |
-
```
|
| 144 |
-
|
| 145 |
-
### Migration Strategy
|
| 146 |
-
1. Deploy database schema
|
| 147 |
-
2. Enable API endpoints (feature flagged)
|
| 148 |
-
3. Gradual UI rollout (A/B testing)
|
| 149 |
-
4. Full launch with announcement
|
| 150 |
-
|
| 151 |
-
## Success Metrics
|
| 152 |
-
|
| 153 |
-
- **Engagement**: 30% of players submit scores
|
| 154 |
-
- **Retention**: 15% return to beat their score
|
| 155 |
-
- **Performance**: <100ms leaderboard load time
|
| 156 |
-
- **Security**: Zero validated cheating incidents
|
| 157 |
-
|
| 158 |
-
## Timeline Summary
|
| 159 |
-
|
| 160 |
-
- **Week 1-2**: Backend infrastructure
|
| 161 |
-
- **Week 2-3**: Frontend integration
|
| 162 |
-
- **Week 3-4**: Security hardening
|
| 163 |
-
- **Week 4-5**: Advanced features
|
| 164 |
-
- **Week 6**: Testing & deployment
|
| 165 |
-
|
| 166 |
-
## Open Questions
|
| 167 |
-
|
| 168 |
-
1. Should we allow Unicode characters in acronyms?
|
| 169 |
-
2. Reset frequency for periodic leaderboards?
|
| 170 |
-
3. Maximum entries per player per day?
|
| 171 |
-
4. Prize/reward system for top performers?
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
README-testing-framework.md
ADDED
|
@@ -0,0 +1,217 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Cloze Reader Model Testing Framework
|
| 2 |
+
|
| 3 |
+
A comprehensive testing system for evaluating AI models across all tasks in the Cloze Reader application, including both OpenRouter and local LLM (LM Studio) models.
|
| 4 |
+
|
| 5 |
+
## Features
|
| 6 |
+
|
| 7 |
+
### 🎯 Comprehensive Testing
|
| 8 |
+
- **Word Selection Testing**: Evaluates vocabulary selection accuracy, difficulty matching, and response quality
|
| 9 |
+
- **Contextualization Testing**: Tests historical and literary context generation for books and authors
|
| 10 |
+
- **Chat Hints Testing**: Assesses all 4 question types (part of speech, sentence role, word category, synonym)
|
| 11 |
+
- **Performance Monitoring**: Tracks response times, success rates, and error patterns
|
| 12 |
+
- **User Satisfaction Ratings**: Collect user feedback on model performance after each round
|
| 13 |
+
|
| 14 |
+
### 🏠 Local LLM Support
|
| 15 |
+
- **LM Studio Integration**: Auto-detects models running on port 1234
|
| 16 |
+
- **Real-time Status**: Shows connection status and available models
|
| 17 |
+
- **Response Cleaning**: Handles local LLM output artifacts automatically
|
| 18 |
+
- **Fallback Testing**: Graceful handling when local server is unavailable
|
| 19 |
+
|
| 20 |
+
### 📊 Advanced Analytics
|
| 21 |
+
- **Multi-format Reports**: JSON, CSV, and Markdown outputs
|
| 22 |
+
- **Performance Comparisons**: Side-by-side model analysis
|
| 23 |
+
- **Quality Scoring**: Detailed evaluation metrics for each task
|
| 24 |
+
- **Interactive Game Testing**: Real-time performance monitoring during gameplay
|
| 25 |
+
- **User Ranking Integration**: 5-star ratings for word selection, passage quality, hint helpfulness, and overall experience
|
| 26 |
+
|
| 27 |
+
## Quick Start
|
| 28 |
+
|
| 29 |
+
### 1. Start the Testing Interface
|
| 30 |
+
```bash
|
| 31 |
+
# Start development server
|
| 32 |
+
make dev
|
| 33 |
+
# or
|
| 34 |
+
python local-server.py 8000
|
| 35 |
+
|
| 36 |
+
# Open testing interface
|
| 37 |
+
open http://localhost:8000/model-testing.html
|
| 38 |
+
```
|
| 39 |
+
|
| 40 |
+
### 2. Setup Local LLM (Optional)
|
| 41 |
+
```bash
|
| 42 |
+
# Start LM Studio server on port 1234
|
| 43 |
+
# Load your preferred model (e.g., Gemma-3-12b, Llama-3.1-8b)
|
| 44 |
+
# The framework will auto-detect available models
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
### 3. Run Tests
|
| 48 |
+
1. Select models to test (OpenRouter and/or local models)
|
| 49 |
+
2. Click "Start Comprehensive Test" for full evaluation
|
| 50 |
+
3. Or click "Test Selected Model in Game" for interactive testing
|
| 51 |
+
4. Results are automatically saved to the `/output` folder
|
| 52 |
+
|
| 53 |
+
## Test Results
|
| 54 |
+
|
| 55 |
+
### CSV Output Format
|
| 56 |
+
Results are saved as timestamped CSV files with columns for:
|
| 57 |
+
- Model performance metrics (overall score, success rates)
|
| 58 |
+
- Response time analytics (average, min, max)
|
| 59 |
+
- Task-specific scores (word selection, contextualization, chat hints)
|
| 60 |
+
- Error rates and reliability metrics
|
| 61 |
+
- User satisfaction ratings (1-5 stars per category)
|
| 62 |
+
- User comments and feedback count
|
| 63 |
+
|
| 64 |
+
### Game Testing Output
|
| 65 |
+
Interactive game sessions generate JSON reports with:
|
| 66 |
+
- Real-time AI interaction logs
|
| 67 |
+
- User performance analytics
|
| 68 |
+
- Response time breakdowns
|
| 69 |
+
- Error tracking and categorization
|
| 70 |
+
- User satisfaction ratings per round
|
| 71 |
+
- Qualitative feedback and comments
|
| 72 |
+
|
| 73 |
+
## Model Categories
|
| 74 |
+
|
| 75 |
+
### OpenRouter Models
|
| 76 |
+
- GPT-4o, GPT-4o Mini
|
| 77 |
+
- Claude 3.5 Sonnet, Claude 3 Haiku
|
| 78 |
+
- Gemini Pro 1.5
|
| 79 |
+
- Llama 3.1 (8B, 70B)
|
| 80 |
+
- Mistral 7B, Phi-3 Medium, Qwen 2 7B
|
| 81 |
+
|
| 82 |
+
### Local LLM Models (LM Studio)
|
| 83 |
+
- Auto-detected from running server
|
| 84 |
+
- Supports any OpenAI-compatible model
|
| 85 |
+
- Common options: Gemma-3-12b, Llama-3.1-8b, Mistral-7b
|
| 86 |
+
|
| 87 |
+
## Testing Methodology
|
| 88 |
+
|
| 89 |
+
### Word Selection Evaluation
|
| 90 |
+
- **Accuracy**: Words exist in source passage
|
| 91 |
+
- **Difficulty Matching**: Length and complexity appropriate for level
|
| 92 |
+
- **Quality Scoring**: Avoids overly common words at higher difficulties
|
| 93 |
+
- **Performance**: Response time and success rate tracking
|
| 94 |
+
- **User Rating**: 5-star scale for vocabulary appropriateness
|
| 95 |
+
|
| 96 |
+
### Contextualization Assessment
|
| 97 |
+
- **Relevance**: Mentions book title, author, historical context
|
| 98 |
+
- **Educational Value**: Appropriate for language learners
|
| 99 |
+
- **Completeness**: Balanced length (100-500 characters)
|
| 100 |
+
- **Literary Terms**: Uses appropriate academic vocabulary
|
| 101 |
+
- **User Rating**: Passage quality and educational value scoring
|
| 102 |
+
|
| 103 |
+
### Chat Hints Analysis
|
| 104 |
+
- **Question Type Coverage**: All 4 hint categories tested
|
| 105 |
+
- **Educational Appropriateness**: Helps without revealing answers
|
| 106 |
+
- **Response Quality**: Clear, concise, and helpful explanations
|
| 107 |
+
- **Consistency**: Performance across different question types
|
| 108 |
+
- **User Rating**: Helpfulness and clarity of AI hints
|
| 109 |
+
|
| 110 |
+
### User Experience Rating
|
| 111 |
+
After each round, users can rate:
|
| 112 |
+
- **Word Selection Quality** (1-5 stars)
|
| 113 |
+
- **Passage Selection** (1-5 stars)
|
| 114 |
+
- **Hint Helpfulness** (1-5 stars)
|
| 115 |
+
- **Overall Experience** (1-5 stars)
|
| 116 |
+
- **Optional Comments** for detailed feedback
|
| 117 |
+
|
| 118 |
+
## Architecture
|
| 119 |
+
|
| 120 |
+
### Core Components
|
| 121 |
+
- **ModelTestingFramework**: Main testing orchestrator
|
| 122 |
+
- **TestAIService**: Performance-tracking AI service wrapper
|
| 123 |
+
- **TestGameRunner**: Real-time game session monitoring
|
| 124 |
+
- **TestReportGenerator**: Multi-format report generation
|
| 125 |
+
|
| 126 |
+
### File Structure
|
| 127 |
+
```
|
| 128 |
+
src/
|
| 129 |
+
├── modelTestingFramework.js # Main testing logic
|
| 130 |
+
├── testAIService.js # AI service wrapper
|
| 131 |
+
├── testGameRunner.js # Game monitoring
|
| 132 |
+
└── testReportGenerator.js # Report generation
|
| 133 |
+
|
| 134 |
+
model-testing.html # Testing interface UI
|
| 135 |
+
output/ # Test results folder
|
| 136 |
+
```
|
| 137 |
+
|
| 138 |
+
## Usage Examples
|
| 139 |
+
|
| 140 |
+
### Automated Testing
|
| 141 |
+
```javascript
|
| 142 |
+
import { ModelTestingFramework } from './src/modelTestingFramework.js';
|
| 143 |
+
|
| 144 |
+
const framework = new ModelTestingFramework();
|
| 145 |
+
const results = await framework.runComprehensiveTest();
|
| 146 |
+
console.log('Results saved to output folder');
|
| 147 |
+
```
|
| 148 |
+
|
| 149 |
+
### Custom Model Testing
|
| 150 |
+
```javascript
|
| 151 |
+
const customModel = {
|
| 152 |
+
id: 'my-local-model',
|
| 153 |
+
name: 'Custom Local Model',
|
| 154 |
+
provider: 'local'
|
| 155 |
+
};
|
| 156 |
+
|
| 157 |
+
const result = await framework.testModel(customModel);
|
| 158 |
+
```
|
| 159 |
+
|
| 160 |
+
### Report Generation
|
| 161 |
+
```javascript
|
| 162 |
+
import { TestReportGenerator } from './src/testReportGenerator.js';
|
| 163 |
+
|
| 164 |
+
const generator = new TestReportGenerator();
|
| 165 |
+
const reports = await generator.generateAllReports(testResults);
|
| 166 |
+
// Generates JSON, CSV, and Markdown reports
|
| 167 |
+
```
|
| 168 |
+
|
| 169 |
+
## Integration with Existing Codebase
|
| 170 |
+
|
| 171 |
+
The testing framework integrates seamlessly with the existing Cloze Reader architecture:
|
| 172 |
+
|
| 173 |
+
- **aiService.js**: Framework uses the same AI service patterns
|
| 174 |
+
- **conversationManager.js**: Chat hint testing leverages existing conversation logic
|
| 175 |
+
- **clozeGameEngine.js**: Game testing monitors actual game interactions
|
| 176 |
+
- **bookDataService.js**: Uses same book data and quality filtering
|
| 177 |
+
|
| 178 |
+
## Troubleshooting
|
| 179 |
+
|
| 180 |
+
### Local LLM Issues
|
| 181 |
+
- Ensure LM Studio is running on port 1234
|
| 182 |
+
- Check that a model is loaded and ready
|
| 183 |
+
- Verify CORS is enabled in LM Studio settings
|
| 184 |
+
|
| 185 |
+
### API Key Issues
|
| 186 |
+
- OpenRouter API key must be set via environment variable or meta tag
|
| 187 |
+
- Local models don't require API keys
|
| 188 |
+
|
| 189 |
+
### Performance Issues
|
| 190 |
+
- Large model testing can take 10-30 minutes
|
| 191 |
+
- Consider testing fewer models or specific categories
|
| 192 |
+
- Monitor network connectivity for OpenRouter models
|
| 193 |
+
|
| 194 |
+
## Contributing
|
| 195 |
+
|
| 196 |
+
The testing framework is designed to be extensible:
|
| 197 |
+
|
| 198 |
+
1. Add new model providers in `ModelTestingFramework.constructor()`
|
| 199 |
+
2. Extend evaluation metrics in the respective `evaluate*` methods
|
| 200 |
+
3. Add new report formats in `TestReportGenerator`
|
| 201 |
+
4. Enhance UI components in `model-testing.html`
|
| 202 |
+
|
| 203 |
+
## Results Interpretation
|
| 204 |
+
|
| 205 |
+
### Overall Scores
|
| 206 |
+
- **90-100**: Excellent performance across all tasks
|
| 207 |
+
- **80-89**: Very good with minor weaknesses
|
| 208 |
+
- **70-79**: Good performance with some limitations
|
| 209 |
+
- **60-69**: Adequate but needs improvement
|
| 210 |
+
- **Below 60**: Poor performance, not recommended
|
| 211 |
+
|
| 212 |
+
### Success Rate Thresholds
|
| 213 |
+
- **Word Selection**: >80% for production use
|
| 214 |
+
- **Contextualization**: >90% for educational content
|
| 215 |
+
- **Chat Hints**: >85% for effective tutoring
|
| 216 |
+
|
| 217 |
+
Use these benchmarks to select the best model for your specific needs and performance requirements.
|
index.html
CHANGED
|
@@ -62,5 +62,13 @@
|
|
| 62 |
</div>
|
| 63 |
|
| 64 |
<script src="./src/app.js" type="module"></script>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 65 |
</body>
|
| 66 |
</html>
|
|
|
|
| 62 |
</div>
|
| 63 |
|
| 64 |
<script src="./src/app.js" type="module"></script>
|
| 65 |
+
<script type="module">
|
| 66 |
+
// Load test runner and ranking interface only in test mode
|
| 67 |
+
const urlParams = new URLSearchParams(window.location.search);
|
| 68 |
+
if (urlParams.get('testMode') === 'true') {
|
| 69 |
+
import('./src/testGameRunner.js');
|
| 70 |
+
import('./src/userRankingInterface.js');
|
| 71 |
+
}
|
| 72 |
+
</script>
|
| 73 |
</body>
|
| 74 |
</html>
|
model-testing.html
ADDED
|
@@ -0,0 +1,629 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!DOCTYPE html>
|
| 2 |
+
<html lang="en">
|
| 3 |
+
<head>
|
| 4 |
+
<meta charset="UTF-8">
|
| 5 |
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
| 6 |
+
<title>Cloze Reader - Model Testing Framework</title>
|
| 7 |
+
<style>
|
| 8 |
+
body {
|
| 9 |
+
font-family: 'Georgia', serif;
|
| 10 |
+
background: linear-gradient(135deg, #f5f3f0 0%, #e8e4df 100%);
|
| 11 |
+
margin: 0;
|
| 12 |
+
padding: 20px;
|
| 13 |
+
min-height: 100vh;
|
| 14 |
+
}
|
| 15 |
+
|
| 16 |
+
.container {
|
| 17 |
+
max-width: 1200px;
|
| 18 |
+
margin: 0 auto;
|
| 19 |
+
background: rgba(255, 255, 255, 0.95);
|
| 20 |
+
border-radius: 15px;
|
| 21 |
+
box-shadow: 0 10px 30px rgba(0, 0, 0, 0.1);
|
| 22 |
+
padding: 40px;
|
| 23 |
+
}
|
| 24 |
+
|
| 25 |
+
h1 {
|
| 26 |
+
text-align: center;
|
| 27 |
+
color: #2c3e50;
|
| 28 |
+
font-size: 2.5rem;
|
| 29 |
+
margin-bottom: 10px;
|
| 30 |
+
text-shadow: 2px 2px 4px rgba(0, 0, 0, 0.1);
|
| 31 |
+
}
|
| 32 |
+
|
| 33 |
+
.subtitle {
|
| 34 |
+
text-align: center;
|
| 35 |
+
color: #7f8c8d;
|
| 36 |
+
font-size: 1.2rem;
|
| 37 |
+
margin-bottom: 40px;
|
| 38 |
+
}
|
| 39 |
+
|
| 40 |
+
.model-selection {
|
| 41 |
+
background: #f8f9fa;
|
| 42 |
+
border-radius: 10px;
|
| 43 |
+
padding: 30px;
|
| 44 |
+
margin-bottom: 30px;
|
| 45 |
+
border: 2px solid #e9ecef;
|
| 46 |
+
}
|
| 47 |
+
|
| 48 |
+
.model-selection h2 {
|
| 49 |
+
color: #2c3e50;
|
| 50 |
+
margin-bottom: 20px;
|
| 51 |
+
font-size: 1.5rem;
|
| 52 |
+
}
|
| 53 |
+
|
| 54 |
+
.model-grid {
|
| 55 |
+
display: grid;
|
| 56 |
+
grid-template-columns: repeat(auto-fit, minmax(300px, 1fr));
|
| 57 |
+
gap: 15px;
|
| 58 |
+
margin-bottom: 20px;
|
| 59 |
+
}
|
| 60 |
+
|
| 61 |
+
.model-option {
|
| 62 |
+
background: white;
|
| 63 |
+
border: 2px solid #dee2e6;
|
| 64 |
+
border-radius: 8px;
|
| 65 |
+
padding: 15px;
|
| 66 |
+
cursor: pointer;
|
| 67 |
+
transition: all 0.3s ease;
|
| 68 |
+
position: relative;
|
| 69 |
+
}
|
| 70 |
+
|
| 71 |
+
.model-option:hover {
|
| 72 |
+
border-color: #007bff;
|
| 73 |
+
box-shadow: 0 4px 8px rgba(0, 123, 255, 0.2);
|
| 74 |
+
}
|
| 75 |
+
|
| 76 |
+
.model-option.selected {
|
| 77 |
+
border-color: #28a745;
|
| 78 |
+
background: #f8fff9;
|
| 79 |
+
}
|
| 80 |
+
|
| 81 |
+
.model-option input[type="checkbox"] {
|
| 82 |
+
position: absolute;
|
| 83 |
+
top: 10px;
|
| 84 |
+
right: 10px;
|
| 85 |
+
transform: scale(1.2);
|
| 86 |
+
}
|
| 87 |
+
|
| 88 |
+
.model-name {
|
| 89 |
+
font-weight: bold;
|
| 90 |
+
color: #2c3e50;
|
| 91 |
+
margin-bottom: 5px;
|
| 92 |
+
}
|
| 93 |
+
|
| 94 |
+
.model-provider {
|
| 95 |
+
color: #6c757d;
|
| 96 |
+
font-size: 0.9rem;
|
| 97 |
+
margin-bottom: 5px;
|
| 98 |
+
}
|
| 99 |
+
|
| 100 |
+
.model-id {
|
| 101 |
+
color: #495057;
|
| 102 |
+
font-size: 0.8rem;
|
| 103 |
+
font-family: monospace;
|
| 104 |
+
background: #f1f3f4;
|
| 105 |
+
padding: 2px 6px;
|
| 106 |
+
border-radius: 4px;
|
| 107 |
+
}
|
| 108 |
+
|
| 109 |
+
.controls {
|
| 110 |
+
display: flex;
|
| 111 |
+
gap: 15px;
|
| 112 |
+
align-items: center;
|
| 113 |
+
flex-wrap: wrap;
|
| 114 |
+
}
|
| 115 |
+
|
| 116 |
+
.btn {
|
| 117 |
+
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
| 118 |
+
color: white;
|
| 119 |
+
border: none;
|
| 120 |
+
padding: 12px 24px;
|
| 121 |
+
border-radius: 8px;
|
| 122 |
+
font-size: 1rem;
|
| 123 |
+
cursor: pointer;
|
| 124 |
+
transition: all 0.3s ease;
|
| 125 |
+
font-weight: 500;
|
| 126 |
+
}
|
| 127 |
+
|
| 128 |
+
.btn:hover {
|
| 129 |
+
transform: translateY(-2px);
|
| 130 |
+
box-shadow: 0 6px 20px rgba(102, 126, 234, 0.4);
|
| 131 |
+
}
|
| 132 |
+
|
| 133 |
+
.btn:disabled {
|
| 134 |
+
background: #6c757d;
|
| 135 |
+
cursor: not-allowed;
|
| 136 |
+
transform: none;
|
| 137 |
+
box-shadow: none;
|
| 138 |
+
}
|
| 139 |
+
|
| 140 |
+
.btn-secondary {
|
| 141 |
+
background: linear-gradient(135deg, #f093fb 0%, #f5576c 100%);
|
| 142 |
+
}
|
| 143 |
+
|
| 144 |
+
.btn-success {
|
| 145 |
+
background: linear-gradient(135deg, #4facfe 0%, #00f2fe 100%);
|
| 146 |
+
}
|
| 147 |
+
|
| 148 |
+
.progress-section {
|
| 149 |
+
margin-top: 30px;
|
| 150 |
+
padding: 20px;
|
| 151 |
+
background: #f8f9fa;
|
| 152 |
+
border-radius: 10px;
|
| 153 |
+
display: none;
|
| 154 |
+
}
|
| 155 |
+
|
| 156 |
+
.progress-section.active {
|
| 157 |
+
display: block;
|
| 158 |
+
}
|
| 159 |
+
|
| 160 |
+
.progress-bar {
|
| 161 |
+
width: 100%;
|
| 162 |
+
height: 8px;
|
| 163 |
+
background: #e9ecef;
|
| 164 |
+
border-radius: 4px;
|
| 165 |
+
overflow: hidden;
|
| 166 |
+
margin-bottom: 10px;
|
| 167 |
+
}
|
| 168 |
+
|
| 169 |
+
.progress-fill {
|
| 170 |
+
height: 100%;
|
| 171 |
+
background: linear-gradient(90deg, #667eea, #764ba2);
|
| 172 |
+
width: 0%;
|
| 173 |
+
transition: width 0.3s ease;
|
| 174 |
+
}
|
| 175 |
+
|
| 176 |
+
.status-message {
|
| 177 |
+
color: #495057;
|
| 178 |
+
font-size: 1rem;
|
| 179 |
+
margin-bottom: 10px;
|
| 180 |
+
}
|
| 181 |
+
|
| 182 |
+
.test-log {
|
| 183 |
+
background: #2d3748;
|
| 184 |
+
color: #e2e8f0;
|
| 185 |
+
padding: 15px;
|
| 186 |
+
border-radius: 8px;
|
| 187 |
+
font-family: 'Courier New', monospace;
|
| 188 |
+
font-size: 0.9rem;
|
| 189 |
+
max-height: 300px;
|
| 190 |
+
overflow-y: auto;
|
| 191 |
+
white-space: pre-wrap;
|
| 192 |
+
}
|
| 193 |
+
|
| 194 |
+
.results-section {
|
| 195 |
+
margin-top: 30px;
|
| 196 |
+
padding: 20px;
|
| 197 |
+
background: #f8f9fa;
|
| 198 |
+
border-radius: 10px;
|
| 199 |
+
display: none;
|
| 200 |
+
}
|
| 201 |
+
|
| 202 |
+
.results-section.active {
|
| 203 |
+
display: block;
|
| 204 |
+
}
|
| 205 |
+
|
| 206 |
+
.results-grid {
|
| 207 |
+
display: grid;
|
| 208 |
+
grid-template-columns: repeat(auto-fit, minmax(300px, 1fr));
|
| 209 |
+
gap: 20px;
|
| 210 |
+
margin-top: 20px;
|
| 211 |
+
}
|
| 212 |
+
|
| 213 |
+
.result-card {
|
| 214 |
+
background: white;
|
| 215 |
+
border-radius: 8px;
|
| 216 |
+
padding: 20px;
|
| 217 |
+
box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
|
| 218 |
+
}
|
| 219 |
+
|
| 220 |
+
.result-card h3 {
|
| 221 |
+
color: #2c3e50;
|
| 222 |
+
margin-bottom: 15px;
|
| 223 |
+
font-size: 1.2rem;
|
| 224 |
+
}
|
| 225 |
+
|
| 226 |
+
.metric {
|
| 227 |
+
display: flex;
|
| 228 |
+
justify-content: space-between;
|
| 229 |
+
margin-bottom: 10px;
|
| 230 |
+
padding-bottom: 8px;
|
| 231 |
+
border-bottom: 1px solid #e9ecef;
|
| 232 |
+
}
|
| 233 |
+
|
| 234 |
+
.metric:last-child {
|
| 235 |
+
border-bottom: none;
|
| 236 |
+
margin-bottom: 0;
|
| 237 |
+
}
|
| 238 |
+
|
| 239 |
+
.metric-label {
|
| 240 |
+
color: #6c757d;
|
| 241 |
+
font-weight: 500;
|
| 242 |
+
}
|
| 243 |
+
|
| 244 |
+
.metric-value {
|
| 245 |
+
color: #2c3e50;
|
| 246 |
+
font-weight: bold;
|
| 247 |
+
}
|
| 248 |
+
|
| 249 |
+
.score-high { color: #28a745; }
|
| 250 |
+
.score-medium { color: #ffc107; }
|
| 251 |
+
.score-low { color: #dc3545; }
|
| 252 |
+
|
| 253 |
+
.game-section {
|
| 254 |
+
margin-top: 30px;
|
| 255 |
+
padding: 20px;
|
| 256 |
+
background: #f8f9fa;
|
| 257 |
+
border-radius: 10px;
|
| 258 |
+
display: none;
|
| 259 |
+
}
|
| 260 |
+
|
| 261 |
+
.game-section.active {
|
| 262 |
+
display: block;
|
| 263 |
+
}
|
| 264 |
+
|
| 265 |
+
.game-frame {
|
| 266 |
+
width: 100%;
|
| 267 |
+
height: 600px;
|
| 268 |
+
border: none;
|
| 269 |
+
border-radius: 8px;
|
| 270 |
+
background: white;
|
| 271 |
+
}
|
| 272 |
+
|
| 273 |
+
@media (max-width: 768px) {
|
| 274 |
+
.container {
|
| 275 |
+
padding: 20px;
|
| 276 |
+
}
|
| 277 |
+
|
| 278 |
+
.model-grid {
|
| 279 |
+
grid-template-columns: 1fr;
|
| 280 |
+
}
|
| 281 |
+
|
| 282 |
+
.controls {
|
| 283 |
+
flex-direction: column;
|
| 284 |
+
align-items: stretch;
|
| 285 |
+
}
|
| 286 |
+
}
|
| 287 |
+
</style>
|
| 288 |
+
</head>
|
| 289 |
+
<body>
|
| 290 |
+
<div class="container">
|
| 291 |
+
<h1>Model Testing Framework</h1>
|
| 292 |
+
<p class="subtitle">Comprehensive evaluation of AI models for the Cloze Reader application</p>
|
| 293 |
+
|
| 294 |
+
<div class="model-selection">
|
| 295 |
+
<h2>Select Models to Test</h2>
|
| 296 |
+
<div id="modelGrid" class="model-grid">
|
| 297 |
+
<!-- Models will be populated by JavaScript -->
|
| 298 |
+
</div>
|
| 299 |
+
|
| 300 |
+
<div class="controls">
|
| 301 |
+
<button id="selectAllBtn" class="btn btn-secondary">Select All</button>
|
| 302 |
+
<button id="clearAllBtn" class="btn btn-secondary">Clear All</button>
|
| 303 |
+
<button id="startTestBtn" class="btn">Start Comprehensive Test</button>
|
| 304 |
+
<button id="testGameBtn" class="btn btn-success">Test Selected Model in Game</button>
|
| 305 |
+
</div>
|
| 306 |
+
</div>
|
| 307 |
+
|
| 308 |
+
<div id="progressSection" class="progress-section">
|
| 309 |
+
<h2>Testing Progress</h2>
|
| 310 |
+
<div class="progress-bar">
|
| 311 |
+
<div id="progressFill" class="progress-fill"></div>
|
| 312 |
+
</div>
|
| 313 |
+
<div id="statusMessage" class="status-message">Initializing tests...</div>
|
| 314 |
+
<div id="testLog" class="test-log"></div>
|
| 315 |
+
</div>
|
| 316 |
+
|
| 317 |
+
<div id="resultsSection" class="results-section">
|
| 318 |
+
<h2>Test Results</h2>
|
| 319 |
+
<p>Results have been saved to the output folder as CSV files.</p>
|
| 320 |
+
<div id="resultsGrid" class="results-grid">
|
| 321 |
+
<!-- Results will be populated by JavaScript -->
|
| 322 |
+
</div>
|
| 323 |
+
</div>
|
| 324 |
+
|
| 325 |
+
<div id="gameSection" class="game-section">
|
| 326 |
+
<h2>Interactive Game Testing</h2>
|
| 327 |
+
<p>Test the selected model by playing the game. Performance will be logged for analysis.</p>
|
| 328 |
+
<iframe id="gameFrame" class="game-frame" src="about:blank"></iframe>
|
| 329 |
+
</div>
|
| 330 |
+
</div>
|
| 331 |
+
|
| 332 |
+
<script type="module">
|
| 333 |
+
import { ModelTestingFramework } from './src/modelTestingFramework.js';
|
| 334 |
+
|
| 335 |
+
class ModelTestingUI {
|
| 336 |
+
constructor() {
|
| 337 |
+
this.framework = new ModelTestingFramework();
|
| 338 |
+
this.selectedModels = new Set();
|
| 339 |
+
this.isTestingInProgress = false;
|
| 340 |
+
this.localServerStatus = null;
|
| 341 |
+
|
| 342 |
+
this.initializeUI();
|
| 343 |
+
this.setupEventListeners();
|
| 344 |
+
}
|
| 345 |
+
|
| 346 |
+
async initializeUI() {
|
| 347 |
+
await this.checkLocalServer();
|
| 348 |
+
await this.populateModelGrid();
|
| 349 |
+
}
|
| 350 |
+
|
| 351 |
+
async checkLocalServer() {
|
| 352 |
+
this.localServerStatus = await this.framework.testLocalServerConnection();
|
| 353 |
+
if (this.localServerStatus.connected) {
|
| 354 |
+
console.log('Local LM Studio server detected:', this.localServerStatus.models.length, 'models available');
|
| 355 |
+
await this.framework.detectLocalModels();
|
| 356 |
+
} else {
|
| 357 |
+
console.log('Local LM Studio server not available:', this.localServerStatus.error);
|
| 358 |
+
}
|
| 359 |
+
}
|
| 360 |
+
|
| 361 |
+
populateModelGrid() {
|
| 362 |
+
const grid = document.getElementById('modelGrid');
|
| 363 |
+
grid.innerHTML = '';
|
| 364 |
+
|
| 365 |
+
// Add local server status indicator
|
| 366 |
+
if (this.localServerStatus) {
|
| 367 |
+
const statusDiv = document.createElement('div');
|
| 368 |
+
statusDiv.className = 'server-status';
|
| 369 |
+
statusDiv.style.cssText = `
|
| 370 |
+
grid-column: 1 / -1;
|
| 371 |
+
padding: 15px;
|
| 372 |
+
margin-bottom: 15px;
|
| 373 |
+
border-radius: 8px;
|
| 374 |
+
font-weight: bold;
|
| 375 |
+
text-align: center;
|
| 376 |
+
${this.localServerStatus.connected
|
| 377 |
+
? 'background: #d4edda; color: #155724; border: 1px solid #c3e6cb;'
|
| 378 |
+
: 'background: #f8d7da; color: #721c24; border: 1px solid #f5c6cb;'
|
| 379 |
+
}
|
| 380 |
+
`;
|
| 381 |
+
|
| 382 |
+
if (this.localServerStatus.connected) {
|
| 383 |
+
statusDiv.innerHTML = `
|
| 384 |
+
✓ Local LM Studio Server Connected (Port 1234)<br>
|
| 385 |
+
<small>${this.localServerStatus.models.length} model(s) available</small>
|
| 386 |
+
`;
|
| 387 |
+
} else {
|
| 388 |
+
statusDiv.innerHTML = `
|
| 389 |
+
✗ Local LM Studio Server Not Available<br>
|
| 390 |
+
<small>Start LM Studio on port 1234 to test local models</small>
|
| 391 |
+
`;
|
| 392 |
+
}
|
| 393 |
+
|
| 394 |
+
grid.appendChild(statusDiv);
|
| 395 |
+
}
|
| 396 |
+
|
| 397 |
+
this.framework.models.forEach(model => {
|
| 398 |
+
const modelDiv = document.createElement('div');
|
| 399 |
+
modelDiv.className = 'model-option';
|
| 400 |
+
modelDiv.dataset.modelId = model.id;
|
| 401 |
+
|
| 402 |
+
// Disable local models if server is not connected
|
| 403 |
+
const isDisabled = model.provider === 'local' && !this.localServerStatus?.connected;
|
| 404 |
+
if (isDisabled) {
|
| 405 |
+
modelDiv.classList.add('disabled');
|
| 406 |
+
modelDiv.style.opacity = '0.5';
|
| 407 |
+
modelDiv.style.cursor = 'not-allowed';
|
| 408 |
+
}
|
| 409 |
+
|
| 410 |
+
const providerLabel = model.provider === 'local'
|
| 411 |
+
? `LOCAL ${this.localServerStatus?.connected ? '(✓)' : '(✗)'}`
|
| 412 |
+
: model.provider.toUpperCase();
|
| 413 |
+
|
| 414 |
+
modelDiv.innerHTML = `
|
| 415 |
+
<input type="checkbox" id="model-${model.id}" ${isDisabled ? 'disabled' : ''} />
|
| 416 |
+
<div class="model-name">${model.name}</div>
|
| 417 |
+
<div class="model-provider">${providerLabel}</div>
|
| 418 |
+
<div class="model-id">${model.id}</div>
|
| 419 |
+
`;
|
| 420 |
+
|
| 421 |
+
const checkbox = modelDiv.querySelector('input');
|
| 422 |
+
checkbox.addEventListener('change', (e) => {
|
| 423 |
+
if (e.target.checked) {
|
| 424 |
+
this.selectedModels.add(model);
|
| 425 |
+
modelDiv.classList.add('selected');
|
| 426 |
+
} else {
|
| 427 |
+
this.selectedModels.delete(model);
|
| 428 |
+
modelDiv.classList.remove('selected');
|
| 429 |
+
}
|
| 430 |
+
this.updateControlsState();
|
| 431 |
+
});
|
| 432 |
+
|
| 433 |
+
if (!isDisabled) {
|
| 434 |
+
modelDiv.addEventListener('click', (e) => {
|
| 435 |
+
if (e.target !== checkbox) {
|
| 436 |
+
checkbox.click();
|
| 437 |
+
}
|
| 438 |
+
});
|
| 439 |
+
}
|
| 440 |
+
|
| 441 |
+
grid.appendChild(modelDiv);
|
| 442 |
+
});
|
| 443 |
+
}
|
| 444 |
+
|
| 445 |
+
setupEventListeners() {
|
| 446 |
+
document.getElementById('selectAllBtn').addEventListener('click', () => {
|
| 447 |
+
this.selectAllModels();
|
| 448 |
+
});
|
| 449 |
+
|
| 450 |
+
document.getElementById('clearAllBtn').addEventListener('click', () => {
|
| 451 |
+
this.clearAllModels();
|
| 452 |
+
});
|
| 453 |
+
|
| 454 |
+
document.getElementById('startTestBtn').addEventListener('click', () => {
|
| 455 |
+
this.startComprehensiveTest();
|
| 456 |
+
});
|
| 457 |
+
|
| 458 |
+
document.getElementById('testGameBtn').addEventListener('click', () => {
|
| 459 |
+
this.startGameTest();
|
| 460 |
+
});
|
| 461 |
+
}
|
| 462 |
+
|
| 463 |
+
selectAllModels() {
|
| 464 |
+
this.framework.models.forEach(model => {
|
| 465 |
+
this.selectedModels.add(model);
|
| 466 |
+
const modelDiv = document.querySelector(`[data-model-id="${model.id}"]`);
|
| 467 |
+
const checkbox = modelDiv.querySelector('input');
|
| 468 |
+
checkbox.checked = true;
|
| 469 |
+
modelDiv.classList.add('selected');
|
| 470 |
+
});
|
| 471 |
+
this.updateControlsState();
|
| 472 |
+
}
|
| 473 |
+
|
| 474 |
+
clearAllModels() {
|
| 475 |
+
this.selectedModels.clear();
|
| 476 |
+
document.querySelectorAll('.model-option').forEach(div => {
|
| 477 |
+
div.classList.remove('selected');
|
| 478 |
+
div.querySelector('input').checked = false;
|
| 479 |
+
});
|
| 480 |
+
this.updateControlsState();
|
| 481 |
+
}
|
| 482 |
+
|
| 483 |
+
updateControlsState() {
|
| 484 |
+
const hasSelection = this.selectedModels.size > 0;
|
| 485 |
+
document.getElementById('startTestBtn').disabled = !hasSelection || this.isTestingInProgress;
|
| 486 |
+
document.getElementById('testGameBtn').disabled = this.selectedModels.size !== 1 || this.isTestingInProgress;
|
| 487 |
+
}
|
| 488 |
+
|
| 489 |
+
async startComprehensiveTest() {
|
| 490 |
+
if (this.selectedModels.size === 0) {
|
| 491 |
+
alert('Please select at least one model to test.');
|
| 492 |
+
return;
|
| 493 |
+
}
|
| 494 |
+
|
| 495 |
+
this.isTestingInProgress = true;
|
| 496 |
+
this.updateControlsState();
|
| 497 |
+
|
| 498 |
+
const progressSection = document.getElementById('progressSection');
|
| 499 |
+
const progressFill = document.getElementById('progressFill');
|
| 500 |
+
const statusMessage = document.getElementById('statusMessage');
|
| 501 |
+
const testLog = document.getElementById('testLog');
|
| 502 |
+
|
| 503 |
+
progressSection.classList.add('active');
|
| 504 |
+
testLog.textContent = '';
|
| 505 |
+
|
| 506 |
+
const modelsArray = Array.from(this.selectedModels);
|
| 507 |
+
let completedTests = 0;
|
| 508 |
+
|
| 509 |
+
try {
|
| 510 |
+
for (let i = 0; i < modelsArray.length; i++) {
|
| 511 |
+
const model = modelsArray[i];
|
| 512 |
+
const progress = (i / modelsArray.length) * 100;
|
| 513 |
+
|
| 514 |
+
progressFill.style.width = `${progress}%`;
|
| 515 |
+
statusMessage.textContent = `Testing ${model.name} (${i + 1}/${modelsArray.length})...`;
|
| 516 |
+
|
| 517 |
+
this.log(`Starting test for ${model.name}...`);
|
| 518 |
+
|
| 519 |
+
try {
|
| 520 |
+
const result = await this.framework.testModel(model);
|
| 521 |
+
this.log(`✓ ${model.name} completed - Score: ${result.overallScore.toFixed(1)}`);
|
| 522 |
+
completedTests++;
|
| 523 |
+
} catch (error) {
|
| 524 |
+
this.log(`✗ ${model.name} failed: ${error.message}`);
|
| 525 |
+
}
|
| 526 |
+
|
| 527 |
+
progressFill.style.width = `${((i + 1) / modelsArray.length) * 100}%`;
|
| 528 |
+
}
|
| 529 |
+
|
| 530 |
+
statusMessage.textContent = `Testing completed! ${completedTests}/${modelsArray.length} models tested successfully.`;
|
| 531 |
+
this.log(`\\nTesting completed! Results saved to output folder.`);
|
| 532 |
+
|
| 533 |
+
// Show results
|
| 534 |
+
this.displayResults();
|
| 535 |
+
|
| 536 |
+
} catch (error) {
|
| 537 |
+
this.log(`\\nTesting failed: ${error.message}`);
|
| 538 |
+
statusMessage.textContent = 'Testing failed. Check the log for details.';
|
| 539 |
+
} finally {
|
| 540 |
+
this.isTestingInProgress = false;
|
| 541 |
+
this.updateControlsState();
|
| 542 |
+
}
|
| 543 |
+
}
|
| 544 |
+
|
| 545 |
+
startGameTest() {
|
| 546 |
+
if (this.selectedModels.size !== 1) {
|
| 547 |
+
alert('Please select exactly one model for game testing.');
|
| 548 |
+
return;
|
| 549 |
+
}
|
| 550 |
+
|
| 551 |
+
const selectedModel = Array.from(this.selectedModels)[0];
|
| 552 |
+
const gameSection = document.getElementById('gameSection');
|
| 553 |
+
const gameFrame = document.getElementById('gameFrame');
|
| 554 |
+
|
| 555 |
+
// Construct URL with model parameter
|
| 556 |
+
const gameUrl = `index.html?testModel=${encodeURIComponent(selectedModel.id)}&testMode=true`;
|
| 557 |
+
if (selectedModel.provider === 'local') {
|
| 558 |
+
gameUrl += '&local=true';
|
| 559 |
+
}
|
| 560 |
+
|
| 561 |
+
gameFrame.src = gameUrl;
|
| 562 |
+
gameSection.classList.add('active');
|
| 563 |
+
|
| 564 |
+
this.log(`Starting game test with ${selectedModel.name}...`);
|
| 565 |
+
}
|
| 566 |
+
|
| 567 |
+
displayResults() {
|
| 568 |
+
const resultsSection = document.getElementById('resultsSection');
|
| 569 |
+
const resultsGrid = document.getElementById('resultsGrid');
|
| 570 |
+
|
| 571 |
+
resultsGrid.innerHTML = '';
|
| 572 |
+
|
| 573 |
+
this.framework.testResults.tests.forEach(result => {
|
| 574 |
+
const card = document.createElement('div');
|
| 575 |
+
card.className = 'result-card';
|
| 576 |
+
|
| 577 |
+
const overallScoreClass = this.getScoreClass(result.overallScore);
|
| 578 |
+
|
| 579 |
+
card.innerHTML = `
|
| 580 |
+
<h3>${result.modelName}</h3>
|
| 581 |
+
<div class="metric">
|
| 582 |
+
<span class="metric-label">Overall Score</span>
|
| 583 |
+
<span class="metric-value ${overallScoreClass}">${result.overallScore?.toFixed(1) || 'N/A'}</span>
|
| 584 |
+
</div>
|
| 585 |
+
<div class="metric">
|
| 586 |
+
<span class="metric-label">Word Selection Success</span>
|
| 587 |
+
<span class="metric-value">${(result.wordSelection?.successRate * 100)?.toFixed(1) || 'N/A'}%</span>
|
| 588 |
+
</div>
|
| 589 |
+
<div class="metric">
|
| 590 |
+
<span class="metric-label">Contextualization Success</span>
|
| 591 |
+
<span class="metric-value">${(result.contextualization?.successRate * 100)?.toFixed(1) || 'N/A'}%</span>
|
| 592 |
+
</div>
|
| 593 |
+
<div class="metric">
|
| 594 |
+
<span class="metric-label">Chat Hints Success</span>
|
| 595 |
+
<span class="metric-value">${(result.chatHints?.successRate * 100)?.toFixed(1) || 'N/A'}%</span>
|
| 596 |
+
</div>
|
| 597 |
+
<div class="metric">
|
| 598 |
+
<span class="metric-label">Average Response Time</span>
|
| 599 |
+
<span class="metric-value">${result.wordSelection?.averageTime?.toFixed(0) || 'N/A'}ms</span>
|
| 600 |
+
</div>
|
| 601 |
+
`;
|
| 602 |
+
|
| 603 |
+
resultsGrid.appendChild(card);
|
| 604 |
+
});
|
| 605 |
+
|
| 606 |
+
resultsSection.classList.add('active');
|
| 607 |
+
}
|
| 608 |
+
|
| 609 |
+
getScoreClass(score) {
|
| 610 |
+
if (score >= 80) return 'score-high';
|
| 611 |
+
if (score >= 60) return 'score-medium';
|
| 612 |
+
return 'score-low';
|
| 613 |
+
}
|
| 614 |
+
|
| 615 |
+
log(message) {
|
| 616 |
+
const testLog = document.getElementById('testLog');
|
| 617 |
+
const timestamp = new Date().toLocaleTimeString();
|
| 618 |
+
testLog.textContent += `[${timestamp}] ${message}\\n`;
|
| 619 |
+
testLog.scrollTop = testLog.scrollHeight;
|
| 620 |
+
}
|
| 621 |
+
}
|
| 622 |
+
|
| 623 |
+
// Initialize the testing UI when the page loads
|
| 624 |
+
window.addEventListener('DOMContentLoaded', () => {
|
| 625 |
+
new ModelTestingUI();
|
| 626 |
+
});
|
| 627 |
+
</script>
|
| 628 |
+
</body>
|
| 629 |
+
</html>
|
src/aiService.js
CHANGED
|
@@ -4,12 +4,17 @@ class OpenRouterService {
|
|
| 4 |
this.isLocalMode = this.checkLocalMode();
|
| 5 |
this.apiUrl = this.isLocalMode ? 'http://localhost:1234/v1/chat/completions' : 'https://openrouter.ai/api/v1/chat/completions';
|
| 6 |
this.apiKey = this.getApiKey();
|
| 7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
console.log('AI Service initialized:', {
|
| 10 |
mode: this.isLocalMode ? 'Local LLM' : 'OpenRouter',
|
| 11 |
url: this.apiUrl,
|
| 12 |
-
|
|
|
|
| 13 |
});
|
| 14 |
}
|
| 15 |
|
|
@@ -86,15 +91,18 @@ class OpenRouterService {
|
|
| 86 |
method: 'POST',
|
| 87 |
headers,
|
| 88 |
body: JSON.stringify({
|
| 89 |
-
model: this.
|
| 90 |
messages: [{
|
|
|
|
|
|
|
|
|
|
| 91 |
role: 'user',
|
| 92 |
-
content:
|
| 93 |
-
|
| 94 |
-
${prompt}`
|
| 95 |
}],
|
| 96 |
-
max_tokens:
|
| 97 |
-
temperature: 0.
|
|
|
|
|
|
|
| 98 |
})
|
| 99 |
});
|
| 100 |
|
|
@@ -104,19 +112,73 @@ ${prompt}`
|
|
| 104 |
|
| 105 |
const data = await response.json();
|
| 106 |
|
|
|
|
|
|
|
| 107 |
// Check if data and choices exist before accessing
|
| 108 |
if (!data || !data.choices || data.choices.length === 0) {
|
| 109 |
console.error('Invalid API response structure:', data);
|
| 110 |
return 'Unable to generate hint at this time';
|
| 111 |
}
|
| 112 |
|
| 113 |
-
// Check if message
|
| 114 |
-
if (!data.choices[0].message
|
| 115 |
-
console.error('No
|
| 116 |
return 'Unable to generate hint at this time';
|
| 117 |
}
|
| 118 |
|
| 119 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 120 |
|
| 121 |
// Clean up AI response artifacts
|
| 122 |
content = content
|
|
@@ -176,35 +238,20 @@ ${prompt}`
|
|
| 176 |
'X-Title': 'Cloze Reader'
|
| 177 |
},
|
| 178 |
body: JSON.stringify({
|
| 179 |
-
model: this.
|
| 180 |
messages: [{
|
|
|
|
|
|
|
|
|
|
| 181 |
role: 'user',
|
| 182 |
-
content: `
|
| 183 |
-
|
| 184 |
-
DIFFICULTY LEVEL ${level}:
|
| 185 |
-
${difficultyGuidance}
|
| 186 |
-
|
| 187 |
-
CLOZE DELETION PRINCIPLES:
|
| 188 |
-
- Select words that require understanding context and vocabulary to identify
|
| 189 |
-
- Choose words essential for comprehension that test language ability
|
| 190 |
-
- Target words where deletion creates meaningful cognitive gaps
|
| 191 |
-
|
| 192 |
-
REQUIREMENTS:
|
| 193 |
-
- Choose clear, properly-spelled words (no OCR errors like "andsatires")
|
| 194 |
-
- Select meaningful nouns, verbs, or adjectives (${wordLengthConstraint})
|
| 195 |
-
- Words must appear EXACTLY as written in the passage
|
| 196 |
-
- Avoid: capitalized words, ALL-CAPS words, function words, archaic terms, proper nouns, technical jargon
|
| 197 |
-
- Skip any words that look malformed or concatenated
|
| 198 |
-
- Avoid dated or potentially offensive terms
|
| 199 |
-
- PREFER words from the middle portions of the passage when possible
|
| 200 |
-
- If struggling to find ${count} perfect words, prioritize returning SOMETHING over returning nothing
|
| 201 |
-
|
| 202 |
-
Return ONLY a JSON array of the selected words.
|
| 203 |
|
| 204 |
Passage: "${passage}"`
|
| 205 |
}],
|
| 206 |
-
max_tokens:
|
| 207 |
-
temperature: 0.
|
|
|
|
|
|
|
| 208 |
})
|
| 209 |
});
|
| 210 |
|
|
@@ -220,13 +267,35 @@ Passage: "${passage}"`
|
|
| 220 |
throw new Error(`OpenRouter API error: ${data.error.message || JSON.stringify(data.error)}`);
|
| 221 |
}
|
| 222 |
|
|
|
|
|
|
|
|
|
|
| 223 |
// Check if response has expected structure
|
| 224 |
-
if (!data.choices || !data.choices[0] || !data.choices[0].message
|
| 225 |
console.error('Invalid word selection API response structure:', data);
|
| 226 |
-
|
|
|
|
| 227 |
}
|
| 228 |
|
| 229 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 230 |
|
| 231 |
// Clean up local LLM artifacts
|
| 232 |
if (this.isLocalMode) {
|
|
@@ -237,33 +306,55 @@ Passage: "${passage}"`
|
|
| 237 |
try {
|
| 238 |
let words;
|
| 239 |
|
| 240 |
-
//
|
| 241 |
-
|
| 242 |
-
//
|
| 243 |
-
|
|
|
|
|
|
|
|
|
|
| 244 |
words = JSON.parse(content);
|
| 245 |
-
}
|
| 246 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 247 |
if (content.includes(',')) {
|
| 248 |
words = content.split(',').map(w => w.trim());
|
| 249 |
} else {
|
| 250 |
// Single word
|
| 251 |
words = [content.trim()];
|
| 252 |
}
|
|
|
|
|
|
|
| 253 |
}
|
| 254 |
-
} else {
|
| 255 |
-
words = JSON.parse(content);
|
| 256 |
}
|
| 257 |
|
| 258 |
if (Array.isArray(words)) {
|
| 259 |
-
//
|
| 260 |
-
const problematicWords = ['negro', 'retard', 'retarded', 'nigger', 'chinaman', 'jap', 'gypsy', 'savage', 'primitive', 'heathen'];
|
| 261 |
const validWords = words.filter(word => {
|
| 262 |
const cleanWord = word.replace(/[^a-zA-Z]/g, '');
|
| 263 |
-
const lowerWord = cleanWord.toLowerCase();
|
| 264 |
-
|
| 265 |
-
// Skip problematic words
|
| 266 |
-
if (problematicWords.includes(lowerWord)) return false;
|
| 267 |
|
| 268 |
// Check length constraints
|
| 269 |
if (level <= 2) {
|
|
@@ -288,14 +379,9 @@ Passage: "${passage}"`
|
|
| 288 |
const matches = content.match(/"([^"]+)"/g);
|
| 289 |
if (matches) {
|
| 290 |
const words = matches.map(m => m.replace(/"/g, ''));
|
| 291 |
-
//
|
| 292 |
-
const problematicWords = ['negro', 'retard', 'retarded', 'nigger', 'chinaman', 'jap', 'gypsy', 'savage', 'primitive', 'heathen'];
|
| 293 |
const validWords = words.filter(word => {
|
| 294 |
const cleanWord = word.replace(/[^a-zA-Z]/g, '');
|
| 295 |
-
const lowerWord = cleanWord.toLowerCase();
|
| 296 |
-
|
| 297 |
-
// Skip problematic words
|
| 298 |
-
if (problematicWords.includes(lowerWord)) return false;
|
| 299 |
|
| 300 |
// Check length constraints
|
| 301 |
if (level <= 2) {
|
|
@@ -368,45 +454,25 @@ Passage: "${passage}"`
|
|
| 368 |
headers,
|
| 369 |
signal: controller.signal,
|
| 370 |
body: JSON.stringify({
|
| 371 |
-
model: this.
|
| 372 |
messages: [{
|
|
|
|
|
|
|
|
|
|
| 373 |
role: 'user',
|
| 374 |
-
content: `
|
| 375 |
|
| 376 |
-
|
| 377 |
-
${
|
| 378 |
|
| 379 |
-
|
|
|
|
| 380 |
|
| 381 |
-
|
| 382 |
-
Title: "${book1.title}" by ${book1.author}
|
| 383 |
-
Text: "${passage1}"
|
| 384 |
-
Select ${blanksPerPassage} words for blanks.
|
| 385 |
-
|
| 386 |
-
PASSAGE 2:
|
| 387 |
-
Title: "${book2.title}" by ${book2.author}
|
| 388 |
-
Text: "${passage2}"
|
| 389 |
-
Select ${blanksPerPassage} words for blanks.
|
| 390 |
-
|
| 391 |
-
SELECTION RULES:
|
| 392 |
-
- Select EXACTLY ${blanksPerPassage} word${blanksPerPassage > 1 ? 's' : ''} per passage, no more, no less
|
| 393 |
-
- Choose meaningful nouns, verbs, or adjectives (${wordLengthConstraint})
|
| 394 |
-
- Avoid capitalized words, ALL-CAPS words, and table of contents entries
|
| 395 |
-
- Avoid dated or potentially offensive terms
|
| 396 |
-
- NEVER select words from the first or last sentence/clause of each passage
|
| 397 |
-
- Choose words from the middle portions for better context dependency
|
| 398 |
-
- Words must appear EXACTLY as written in the passage
|
| 399 |
-
|
| 400 |
-
For each passage return:
|
| 401 |
-
- "words": array of EXACTLY ${blanksPerPassage} selected word${blanksPerPassage > 1 ? 's' : ''} (exactly as they appear in the text)
|
| 402 |
-
- "context": one-sentence intro about the book/author
|
| 403 |
-
|
| 404 |
-
CRITICAL: The "words" array must contain exactly ${blanksPerPassage} element${blanksPerPassage > 1 ? 's' : ''} for each passage.
|
| 405 |
-
|
| 406 |
-
Return as JSON: {"passage1": {...}, "passage2": {...}}`
|
| 407 |
}],
|
| 408 |
max_tokens: 800,
|
| 409 |
-
temperature: 0.5
|
|
|
|
| 410 |
})
|
| 411 |
});
|
| 412 |
|
|
@@ -425,13 +491,34 @@ Return as JSON: {"passage1": {...}, "passage2": {...}}`
|
|
| 425 |
throw new Error(`OpenRouter API error: ${data.error.message || JSON.stringify(data.error)}`);
|
| 426 |
}
|
| 427 |
|
|
|
|
|
|
|
| 428 |
// Check if response has expected structure
|
| 429 |
-
if (!data.choices || !data.choices[0] || !data.choices[0].message
|
| 430 |
console.error('Invalid batch API response structure:', data);
|
| 431 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 432 |
}
|
| 433 |
|
| 434 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 435 |
|
| 436 |
try {
|
| 437 |
// Try to extract JSON from the response
|
|
@@ -491,15 +578,10 @@ Return as JSON: {"passage1": {...}, "passage2": {...}}`
|
|
| 491 |
parsed.passage1.words = parsed.passage1.words.filter(word => word && word.trim() !== '');
|
| 492 |
parsed.passage2.words = parsed.passage2.words.filter(word => word && word.trim() !== '');
|
| 493 |
|
| 494 |
-
//
|
| 495 |
const validateWords = (words, passageText) => {
|
| 496 |
-
const problematicWords = ['negro', 'retard', 'retarded', 'nigger', 'chinaman', 'jap', 'gypsy', 'savage', 'primitive', 'heathen'];
|
| 497 |
return words.filter(word => {
|
| 498 |
const cleanWord = word.replace(/[^a-zA-Z]/g, '');
|
| 499 |
-
const lowerWord = cleanWord.toLowerCase();
|
| 500 |
-
|
| 501 |
-
// Skip problematic words
|
| 502 |
-
if (problematicWords.includes(lowerWord)) return false;
|
| 503 |
|
| 504 |
// Check if word appears in all caps in the passage (like "VOLUME")
|
| 505 |
if (passageText.includes(word.toUpperCase()) && word === word.toUpperCase()) {
|
|
@@ -609,13 +691,17 @@ Return as JSON: {"passage1": {...}, "passage2": {...}}`
|
|
| 609 |
'X-Title': 'Cloze Reader'
|
| 610 |
},
|
| 611 |
body: JSON.stringify({
|
| 612 |
-
model: this.
|
| 613 |
messages: [{
|
|
|
|
|
|
|
|
|
|
| 614 |
role: 'user',
|
| 615 |
-
content: `
|
| 616 |
}],
|
| 617 |
-
max_tokens:
|
| 618 |
-
temperature: 0.
|
|
|
|
| 619 |
})
|
| 620 |
});
|
| 621 |
|
|
@@ -633,13 +719,34 @@ Return as JSON: {"passage1": {...}, "passage2": {...}}`
|
|
| 633 |
throw new Error(`OpenRouter API error: ${data.error.message || JSON.stringify(data.error)}`);
|
| 634 |
}
|
| 635 |
|
|
|
|
|
|
|
| 636 |
// Check if response has expected structure
|
| 637 |
-
if (!data.choices || !data.choices[0] || !data.choices[0].message
|
| 638 |
console.error('Invalid contextualization API response structure:', data);
|
| 639 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 640 |
}
|
| 641 |
|
| 642 |
-
|
| 643 |
|
| 644 |
// Clean up AI response artifacts
|
| 645 |
content = content
|
|
|
|
| 4 |
this.isLocalMode = this.checkLocalMode();
|
| 5 |
this.apiUrl = this.isLocalMode ? 'http://localhost:1234/v1/chat/completions' : 'https://openrouter.ai/api/v1/chat/completions';
|
| 6 |
this.apiKey = this.getApiKey();
|
| 7 |
+
|
| 8 |
+
// Dual model configuration: Gemma-3-27b for hints/query-answering, Gemma-3-12b for everything else
|
| 9 |
+
this.hintModel = this.isLocalMode ? 'gemma-3-12b' : 'google/gemma-3-27b-it';
|
| 10 |
+
this.primaryModel = this.isLocalMode ? 'gemma-3-12b' : 'google/gemma-3-12b-it';
|
| 11 |
+
this.model = this.primaryModel; // Default model for backward compatibility
|
| 12 |
|
| 13 |
console.log('AI Service initialized:', {
|
| 14 |
mode: this.isLocalMode ? 'Local LLM' : 'OpenRouter',
|
| 15 |
url: this.apiUrl,
|
| 16 |
+
primaryModel: this.primaryModel,
|
| 17 |
+
hintModel: this.hintModel
|
| 18 |
});
|
| 19 |
}
|
| 20 |
|
|
|
|
| 91 |
method: 'POST',
|
| 92 |
headers,
|
| 93 |
body: JSON.stringify({
|
| 94 |
+
model: this.hintModel, // Use Gemma-3-27b for hints
|
| 95 |
messages: [{
|
| 96 |
+
role: 'system',
|
| 97 |
+
content: 'You are a helpful assistant that provides hints for word puzzles. Never reveal the answer word directly.'
|
| 98 |
+
}, {
|
| 99 |
role: 'user',
|
| 100 |
+
content: prompt
|
|
|
|
|
|
|
| 101 |
}],
|
| 102 |
+
max_tokens: 150,
|
| 103 |
+
temperature: 0.7,
|
| 104 |
+
// Try to disable reasoning mode for hints
|
| 105 |
+
response_format: { type: "text" }
|
| 106 |
})
|
| 107 |
});
|
| 108 |
|
|
|
|
| 112 |
|
| 113 |
const data = await response.json();
|
| 114 |
|
| 115 |
+
console.log('Hint API response:', JSON.stringify(data, null, 2));
|
| 116 |
+
|
| 117 |
// Check if data and choices exist before accessing
|
| 118 |
if (!data || !data.choices || data.choices.length === 0) {
|
| 119 |
console.error('Invalid API response structure:', data);
|
| 120 |
return 'Unable to generate hint at this time';
|
| 121 |
}
|
| 122 |
|
| 123 |
+
// Check if message exists
|
| 124 |
+
if (!data.choices[0].message) {
|
| 125 |
+
console.error('No message in API response');
|
| 126 |
return 'Unable to generate hint at this time';
|
| 127 |
}
|
| 128 |
|
| 129 |
+
// OSS-20B model returns content in 'reasoning' field when using reasoning mode
|
| 130 |
+
let content = data.choices[0].message.content || '';
|
| 131 |
+
|
| 132 |
+
// If content is empty, check for reasoning field
|
| 133 |
+
if (!content && data.choices[0].message.reasoning) {
|
| 134 |
+
content = data.choices[0].message.reasoning;
|
| 135 |
+
}
|
| 136 |
+
|
| 137 |
+
// Still no content? Check reasoning_details
|
| 138 |
+
if (!content && data.choices[0].message.reasoning_details?.length > 0) {
|
| 139 |
+
content = data.choices[0].message.reasoning_details[0].text;
|
| 140 |
+
}
|
| 141 |
+
|
| 142 |
+
if (!content) {
|
| 143 |
+
console.error('No content found in hint response');
|
| 144 |
+
// Provide a generic hint based on the prompt type
|
| 145 |
+
if (prompt.toLowerCase().includes('synonym')) {
|
| 146 |
+
return 'Think of a word that means something similar';
|
| 147 |
+
} else if (prompt.toLowerCase().includes('definition')) {
|
| 148 |
+
return 'Consider what this word means in context';
|
| 149 |
+
} else if (prompt.toLowerCase().includes('category')) {
|
| 150 |
+
return 'Think about what type or category this word belongs to';
|
| 151 |
+
} else {
|
| 152 |
+
return 'Consider the context around the blank';
|
| 153 |
+
}
|
| 154 |
+
}
|
| 155 |
+
|
| 156 |
+
content = content.trim();
|
| 157 |
+
|
| 158 |
+
// For OSS-20B, extract hint from reasoning text if needed
|
| 159 |
+
if (content.includes('The user') || content.includes('We need to')) {
|
| 160 |
+
// This looks like reasoning text, try to extract the actual hint
|
| 161 |
+
// Look for text about synonyms, definitions, or clues
|
| 162 |
+
const hintPatterns = [
|
| 163 |
+
/synonym[s]?.*?(?:is|are|include[s]?|would be)\s+([^.]+)/i,
|
| 164 |
+
/means?\s+([^.]+)/i,
|
| 165 |
+
/refers? to\s+([^.]+)/i,
|
| 166 |
+
/describes?\s+([^.]+)/i,
|
| 167 |
+
];
|
| 168 |
+
|
| 169 |
+
for (const pattern of hintPatterns) {
|
| 170 |
+
const match = content.match(pattern);
|
| 171 |
+
if (match) {
|
| 172 |
+
content = match[1];
|
| 173 |
+
break;
|
| 174 |
+
}
|
| 175 |
+
}
|
| 176 |
+
|
| 177 |
+
// If still has reasoning markers, just return a fallback
|
| 178 |
+
if (content.includes('The user') || content.includes('We need to')) {
|
| 179 |
+
return 'Think about words that mean something similar';
|
| 180 |
+
}
|
| 181 |
+
}
|
| 182 |
|
| 183 |
// Clean up AI response artifacts
|
| 184 |
content = content
|
|
|
|
| 238 |
'X-Title': 'Cloze Reader'
|
| 239 |
},
|
| 240 |
body: JSON.stringify({
|
| 241 |
+
model: this.primaryModel, // Use Gemma-3-12b for word selection
|
| 242 |
messages: [{
|
| 243 |
+
role: 'system',
|
| 244 |
+
content: 'Select words for a cloze exercise. Return ONLY a JSON array of words, nothing else.'
|
| 245 |
+
}, {
|
| 246 |
role: 'user',
|
| 247 |
+
content: `Select ${count} ${level <= 2 ? 'easy' : level <= 4 ? 'medium' : 'challenging'} words (${wordLengthConstraint}) from this passage. Choose meaningful nouns, verbs, or adjectives. Avoid capitalized words and proper nouns.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 248 |
|
| 249 |
Passage: "${passage}"`
|
| 250 |
}],
|
| 251 |
+
max_tokens: 200,
|
| 252 |
+
temperature: 0.5,
|
| 253 |
+
// Try to disable reasoning mode for word selection
|
| 254 |
+
response_format: { type: "text" }
|
| 255 |
})
|
| 256 |
});
|
| 257 |
|
|
|
|
| 267 |
throw new Error(`OpenRouter API error: ${data.error.message || JSON.stringify(data.error)}`);
|
| 268 |
}
|
| 269 |
|
| 270 |
+
// Log the full response to debug structure
|
| 271 |
+
console.log('Full API response:', JSON.stringify(data, null, 2));
|
| 272 |
+
|
| 273 |
// Check if response has expected structure
|
| 274 |
+
if (!data.choices || !data.choices[0] || !data.choices[0].message) {
|
| 275 |
console.error('Invalid word selection API response structure:', data);
|
| 276 |
+
console.error('Choices[0]:', data.choices?.[0]);
|
| 277 |
+
throw new Error('API response missing expected structure');
|
| 278 |
}
|
| 279 |
|
| 280 |
+
// OSS-20B model returns content in 'reasoning' field when using reasoning mode
|
| 281 |
+
let content = data.choices[0].message.content || '';
|
| 282 |
+
|
| 283 |
+
// If content is empty, check for reasoning field
|
| 284 |
+
if (!content && data.choices[0].message.reasoning) {
|
| 285 |
+
content = data.choices[0].message.reasoning;
|
| 286 |
+
}
|
| 287 |
+
|
| 288 |
+
// Still no content? Check reasoning_details
|
| 289 |
+
if (!content && data.choices[0].message.reasoning_details?.length > 0) {
|
| 290 |
+
content = data.choices[0].message.reasoning_details[0].text;
|
| 291 |
+
}
|
| 292 |
+
|
| 293 |
+
if (!content) {
|
| 294 |
+
console.error('No content found in API response');
|
| 295 |
+
throw new Error('API response missing content');
|
| 296 |
+
}
|
| 297 |
+
|
| 298 |
+
content = content.trim();
|
| 299 |
|
| 300 |
// Clean up local LLM artifacts
|
| 301 |
if (this.isLocalMode) {
|
|
|
|
| 306 |
try {
|
| 307 |
let words;
|
| 308 |
|
| 309 |
+
// Try to parse JSON first
|
| 310 |
+
try {
|
| 311 |
+
// Check if content contains JSON array anywhere in it
|
| 312 |
+
const jsonMatch = content.match(/\[[\s\S]*?\]/);
|
| 313 |
+
if (jsonMatch) {
|
| 314 |
+
words = JSON.parse(jsonMatch[0]);
|
| 315 |
+
} else {
|
| 316 |
words = JSON.parse(content);
|
| 317 |
+
}
|
| 318 |
+
} catch {
|
| 319 |
+
// If not JSON, check if this is reasoning text from OSS-20B
|
| 320 |
+
if (content.includes('pick') || content.includes('Let\'s')) {
|
| 321 |
+
// Extract words from reasoning text
|
| 322 |
+
// Look for quoted words or words after "pick"
|
| 323 |
+
const quotedWords = content.match(/"([^"]+)"/g);
|
| 324 |
+
if (quotedWords) {
|
| 325 |
+
words = quotedWords.map(w => w.replace(/"/g, ''));
|
| 326 |
+
} else {
|
| 327 |
+
// Look for pattern like "Let's pick 'word'" or "pick word"
|
| 328 |
+
const pickMatch = content.match(/pick\s+['"]?(\w+)['"]?/i);
|
| 329 |
+
if (pickMatch) {
|
| 330 |
+
words = [pickMatch[1]];
|
| 331 |
+
} else {
|
| 332 |
+
// For local LLM, try comma-separated
|
| 333 |
+
if (this.isLocalMode && content.includes(',')) {
|
| 334 |
+
words = content.split(',').map(w => w.trim());
|
| 335 |
+
} else {
|
| 336 |
+
// Single word
|
| 337 |
+
words = [content.trim()];
|
| 338 |
+
}
|
| 339 |
+
}
|
| 340 |
+
}
|
| 341 |
+
} else if (this.isLocalMode) {
|
| 342 |
+
// For local LLM, try comma-separated
|
| 343 |
if (content.includes(',')) {
|
| 344 |
words = content.split(',').map(w => w.trim());
|
| 345 |
} else {
|
| 346 |
// Single word
|
| 347 |
words = [content.trim()];
|
| 348 |
}
|
| 349 |
+
} else {
|
| 350 |
+
throw new Error('Could not parse words from response');
|
| 351 |
}
|
|
|
|
|
|
|
| 352 |
}
|
| 353 |
|
| 354 |
if (Array.isArray(words)) {
|
| 355 |
+
// Validate word lengths based on level
|
|
|
|
| 356 |
const validWords = words.filter(word => {
|
| 357 |
const cleanWord = word.replace(/[^a-zA-Z]/g, '');
|
|
|
|
|
|
|
|
|
|
|
|
|
| 358 |
|
| 359 |
// Check length constraints
|
| 360 |
if (level <= 2) {
|
|
|
|
| 379 |
const matches = content.match(/"([^"]+)"/g);
|
| 380 |
if (matches) {
|
| 381 |
const words = matches.map(m => m.replace(/"/g, ''));
|
| 382 |
+
// Validate word lengths
|
|
|
|
| 383 |
const validWords = words.filter(word => {
|
| 384 |
const cleanWord = word.replace(/[^a-zA-Z]/g, '');
|
|
|
|
|
|
|
|
|
|
|
|
|
| 385 |
|
| 386 |
// Check length constraints
|
| 387 |
if (level <= 2) {
|
|
|
|
| 454 |
headers,
|
| 455 |
signal: controller.signal,
|
| 456 |
body: JSON.stringify({
|
| 457 |
+
model: this.primaryModel, // Use Gemma-3-12b for batch processing
|
| 458 |
messages: [{
|
| 459 |
+
role: 'system',
|
| 460 |
+
content: 'Process passages for cloze exercises. Return ONLY a JSON object.'
|
| 461 |
+
}, {
|
| 462 |
role: 'user',
|
| 463 |
+
content: `Select ${blanksPerPassage} ${level <= 2 ? 'easy' : level <= 4 ? 'medium' : 'challenging'} words (${wordLengthConstraint}) from each passage.
|
| 464 |
|
| 465 |
+
Passage 1 ("${book1.title}" by ${book1.author}):
|
| 466 |
+
${passage1}
|
| 467 |
|
| 468 |
+
Passage 2 ("${book2.title}" by ${book2.author}):
|
| 469 |
+
${passage2}
|
| 470 |
|
| 471 |
+
Return JSON: {"passage1": {"words": [${blanksPerPassage} words], "context": "one sentence about book"}, "passage2": {"words": [${blanksPerPassage} words], "context": "one sentence about book"}}`
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 472 |
}],
|
| 473 |
max_tokens: 800,
|
| 474 |
+
temperature: 0.5,
|
| 475 |
+
response_format: { type: "text" }
|
| 476 |
})
|
| 477 |
});
|
| 478 |
|
|
|
|
| 491 |
throw new Error(`OpenRouter API error: ${data.error.message || JSON.stringify(data.error)}`);
|
| 492 |
}
|
| 493 |
|
| 494 |
+
console.log('Batch API response:', JSON.stringify(data, null, 2));
|
| 495 |
+
|
| 496 |
// Check if response has expected structure
|
| 497 |
+
if (!data.choices || !data.choices[0] || !data.choices[0].message) {
|
| 498 |
console.error('Invalid batch API response structure:', data);
|
| 499 |
+
console.error('Choices[0]:', data.choices?.[0]);
|
| 500 |
+
throw new Error('API response missing expected structure');
|
| 501 |
+
}
|
| 502 |
+
|
| 503 |
+
// OSS-20B model returns content in 'reasoning' field when using reasoning mode
|
| 504 |
+
let content = data.choices[0].message.content || '';
|
| 505 |
+
|
| 506 |
+
// If content is empty, check for reasoning field
|
| 507 |
+
if (!content && data.choices[0].message.reasoning) {
|
| 508 |
+
content = data.choices[0].message.reasoning;
|
| 509 |
+
}
|
| 510 |
+
|
| 511 |
+
// Still no content? Check reasoning_details
|
| 512 |
+
if (!content && data.choices[0].message.reasoning_details?.length > 0) {
|
| 513 |
+
content = data.choices[0].message.reasoning_details[0].text;
|
| 514 |
}
|
| 515 |
|
| 516 |
+
if (!content) {
|
| 517 |
+
console.error('No content found in batch API response');
|
| 518 |
+
throw new Error('API response missing content');
|
| 519 |
+
}
|
| 520 |
+
|
| 521 |
+
content = content.trim();
|
| 522 |
|
| 523 |
try {
|
| 524 |
// Try to extract JSON from the response
|
|
|
|
| 578 |
parsed.passage1.words = parsed.passage1.words.filter(word => word && word.trim() !== '');
|
| 579 |
parsed.passage2.words = parsed.passage2.words.filter(word => word && word.trim() !== '');
|
| 580 |
|
| 581 |
+
// Validate word lengths based on level
|
| 582 |
const validateWords = (words, passageText) => {
|
|
|
|
| 583 |
return words.filter(word => {
|
| 584 |
const cleanWord = word.replace(/[^a-zA-Z]/g, '');
|
|
|
|
|
|
|
|
|
|
|
|
|
| 585 |
|
| 586 |
// Check if word appears in all caps in the passage (like "VOLUME")
|
| 587 |
if (passageText.includes(word.toUpperCase()) && word === word.toUpperCase()) {
|
|
|
|
| 691 |
'X-Title': 'Cloze Reader'
|
| 692 |
},
|
| 693 |
body: JSON.stringify({
|
| 694 |
+
model: this.primaryModel, // Use Gemma-3-12b for contextualization
|
| 695 |
messages: [{
|
| 696 |
+
role: 'system',
|
| 697 |
+
content: 'Write one factual sentence about the given literary work.'
|
| 698 |
+
}, {
|
| 699 |
role: 'user',
|
| 700 |
+
content: `"${title}" by ${author}`
|
| 701 |
}],
|
| 702 |
+
max_tokens: 150,
|
| 703 |
+
temperature: 0.5,
|
| 704 |
+
response_format: { type: "text" }
|
| 705 |
})
|
| 706 |
});
|
| 707 |
|
|
|
|
| 719 |
throw new Error(`OpenRouter API error: ${data.error.message || JSON.stringify(data.error)}`);
|
| 720 |
}
|
| 721 |
|
| 722 |
+
console.log('Context API response:', JSON.stringify(data, null, 2));
|
| 723 |
+
|
| 724 |
// Check if response has expected structure
|
| 725 |
+
if (!data.choices || !data.choices[0] || !data.choices[0].message) {
|
| 726 |
console.error('Invalid contextualization API response structure:', data);
|
| 727 |
+
console.error('Choices[0]:', data.choices?.[0]);
|
| 728 |
+
throw new Error('API response missing expected structure');
|
| 729 |
+
}
|
| 730 |
+
|
| 731 |
+
// OSS-20B model returns content in 'reasoning' field when using reasoning mode
|
| 732 |
+
let content = data.choices[0].message.content || '';
|
| 733 |
+
|
| 734 |
+
// If content is empty, check for reasoning field
|
| 735 |
+
if (!content && data.choices[0].message.reasoning) {
|
| 736 |
+
content = data.choices[0].message.reasoning;
|
| 737 |
+
}
|
| 738 |
+
|
| 739 |
+
// Still no content? Check reasoning_details
|
| 740 |
+
if (!content && data.choices[0].message.reasoning_details?.length > 0) {
|
| 741 |
+
content = data.choices[0].message.reasoning_details[0].text;
|
| 742 |
+
}
|
| 743 |
+
|
| 744 |
+
if (!content) {
|
| 745 |
+
console.error('No content found in context API response');
|
| 746 |
+
throw new Error('API response missing content');
|
| 747 |
}
|
| 748 |
|
| 749 |
+
content = content.trim();
|
| 750 |
|
| 751 |
// Clean up AI response artifacts
|
| 752 |
content = content
|
src/app.js
CHANGED
|
@@ -72,7 +72,8 @@ class App {
|
|
| 72 |
|
| 73 |
// Show level information
|
| 74 |
const blanksCount = roundData.blanks.length;
|
| 75 |
-
const
|
|
|
|
| 76 |
|
| 77 |
this.elements.roundInfo.innerHTML = levelInfo;
|
| 78 |
|
|
@@ -155,22 +156,19 @@ class App {
|
|
| 155 |
}
|
| 156 |
|
| 157 |
displayResults(results) {
|
| 158 |
-
let message = `Score: ${results.correct}/${results.total}
|
| 159 |
-
|
| 160 |
-
// Show "Required" information at all levels for consistency
|
| 161 |
-
message += ` - Required: ${results.requiredCorrect}/${results.total}`;
|
| 162 |
|
| 163 |
if (results.passed) {
|
| 164 |
// Check if this completes the requirements for level advancement
|
| 165 |
const roundsCompleted = this.game.roundsPassedAtCurrentLevel + 1; // +1 for this round
|
| 166 |
if (roundsCompleted >= 2) {
|
| 167 |
-
message += `
|
| 168 |
} else {
|
| 169 |
-
message += `
|
| 170 |
}
|
| 171 |
this.elements.result.className = 'mt-4 text-center font-semibold text-green-600';
|
| 172 |
} else {
|
| 173 |
-
message += ` -
|
| 174 |
this.elements.result.className = 'mt-4 text-center font-semibold text-red-600';
|
| 175 |
}
|
| 176 |
|
|
|
|
| 72 |
|
| 73 |
// Show level information
|
| 74 |
const blanksCount = roundData.blanks.length;
|
| 75 |
+
const passageNumber = this.game.currentPassageIndex + 1;
|
| 76 |
+
const levelInfo = `Level ${this.game.currentLevel} • Passage ${passageNumber}/2 • ${blanksCount} blank${blanksCount > 1 ? 's' : ''}`;
|
| 77 |
|
| 78 |
this.elements.roundInfo.innerHTML = levelInfo;
|
| 79 |
|
|
|
|
| 156 |
}
|
| 157 |
|
| 158 |
displayResults(results) {
|
| 159 |
+
let message = `Score: ${results.correct}/${results.total}`;
|
|
|
|
|
|
|
|
|
|
| 160 |
|
| 161 |
if (results.passed) {
|
| 162 |
// Check if this completes the requirements for level advancement
|
| 163 |
const roundsCompleted = this.game.roundsPassedAtCurrentLevel + 1; // +1 for this round
|
| 164 |
if (roundsCompleted >= 2) {
|
| 165 |
+
message += ` ✓ Level ${this.game.currentLevel + 1} unlocked!`;
|
| 166 |
} else {
|
| 167 |
+
message += ` ✓ Passed (1 more round needed for next level)`;
|
| 168 |
}
|
| 169 |
this.elements.result.className = 'mt-4 text-center font-semibold text-green-600';
|
| 170 |
} else {
|
| 171 |
+
message += ` - Try again (need ${results.requiredCorrect}/${results.total})`;
|
| 172 |
this.elements.result.className = 'mt-4 text-center font-semibold text-red-600';
|
| 173 |
}
|
| 174 |
|
src/clozeGameEngine.js
CHANGED
|
@@ -885,17 +885,17 @@ class ClozeGame {
|
|
| 885 |
// Track successful rounds and advance level after 2 successful rounds
|
| 886 |
if (roundPassed) {
|
| 887 |
this.roundsPassedAtCurrentLevel++;
|
| 888 |
-
console.log(`Round passed
|
| 889 |
|
| 890 |
// Advance level after 2 successful rounds
|
| 891 |
if (this.roundsPassedAtCurrentLevel >= 2) {
|
| 892 |
this.currentLevel++;
|
| 893 |
this.roundsPassedAtCurrentLevel = 0; // Reset counter for new level
|
| 894 |
-
console.log(`
|
| 895 |
}
|
| 896 |
} else {
|
| 897 |
// Failed round - do not reset the counter, user must accumulate 2 passes
|
| 898 |
-
console.log(`Round
|
| 899 |
}
|
| 900 |
|
| 901 |
// Clear chat conversations for new round
|
|
|
|
| 885 |
// Track successful rounds and advance level after 2 successful rounds
|
| 886 |
if (roundPassed) {
|
| 887 |
this.roundsPassedAtCurrentLevel++;
|
| 888 |
+
console.log(`Round passed at level ${this.currentLevel}`);
|
| 889 |
|
| 890 |
// Advance level after 2 successful rounds
|
| 891 |
if (this.roundsPassedAtCurrentLevel >= 2) {
|
| 892 |
this.currentLevel++;
|
| 893 |
this.roundsPassedAtCurrentLevel = 0; // Reset counter for new level
|
| 894 |
+
console.log(`Advanced to level ${this.currentLevel}`);
|
| 895 |
}
|
| 896 |
} else {
|
| 897 |
// Failed round - do not reset the counter, user must accumulate 2 passes
|
| 898 |
+
console.log(`Round not passed. Need ${2 - this.roundsPassedAtCurrentLevel} more round(s) to advance`);
|
| 899 |
}
|
| 900 |
|
| 901 |
// Clear chat conversations for new round
|
src/modelTestingFramework.js
ADDED
|
@@ -0,0 +1,703 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
/**
|
| 2 |
+
* Comprehensive Model Testing Framework for Cloze Reader
|
| 3 |
+
* Tests all AI-powered features across different models
|
| 4 |
+
*/
|
| 5 |
+
|
| 6 |
+
class ModelTestingFramework {
|
| 7 |
+
constructor() {
|
| 8 |
+
this.models = [
|
| 9 |
+
// OpenRouter Models
|
| 10 |
+
{ id: 'openai/gpt-4o', name: 'GPT-4o', provider: 'openrouter' },
|
| 11 |
+
{ id: 'openai/gpt-4o-mini', name: 'GPT-4o Mini', provider: 'openrouter' },
|
| 12 |
+
{ id: 'anthropic/claude-3.5-sonnet', name: 'Claude 3.5 Sonnet', provider: 'openrouter' },
|
| 13 |
+
{ id: 'anthropic/claude-3-haiku', name: 'Claude 3 Haiku', provider: 'openrouter' },
|
| 14 |
+
{ id: 'google/gemini-pro-1.5', name: 'Gemini Pro 1.5', provider: 'openrouter' },
|
| 15 |
+
{ id: 'meta-llama/llama-3.1-8b-instruct', name: 'Llama 3.1 8B', provider: 'openrouter' },
|
| 16 |
+
{ id: 'meta-llama/llama-3.1-70b-instruct', name: 'Llama 3.1 70B', provider: 'openrouter' },
|
| 17 |
+
{ id: 'mistralai/mistral-7b-instruct', name: 'Mistral 7B', provider: 'openrouter' },
|
| 18 |
+
{ id: 'microsoft/phi-3-medium-4k-instruct', name: 'Phi-3 Medium', provider: 'openrouter' },
|
| 19 |
+
{ id: 'qwen/qwen-2-7b-instruct', name: 'Qwen 2 7B', provider: 'openrouter' },
|
| 20 |
+
|
| 21 |
+
// Local LLM Models (LM Studio compatible)
|
| 22 |
+
{ id: 'local-llm', name: 'Local LLM (Auto-detect)', provider: 'local' },
|
| 23 |
+
{ id: 'gemma-3-12b', name: 'Gemma 3 12B (Local)', provider: 'local' },
|
| 24 |
+
{ id: 'llama-3.1-8b', name: 'Llama 3.1 8B (Local)', provider: 'local' },
|
| 25 |
+
{ id: 'mistral-7b', name: 'Mistral 7B (Local)', provider: 'local' },
|
| 26 |
+
{ id: 'qwen-2-7b', name: 'Qwen 2 7B (Local)', provider: 'local' },
|
| 27 |
+
{ id: 'phi-3-medium', name: 'Phi-3 Medium (Local)', provider: 'local' },
|
| 28 |
+
{ id: 'custom-local', name: 'Custom Local Model', provider: 'local' }
|
| 29 |
+
];
|
| 30 |
+
|
| 31 |
+
this.testResults = {
|
| 32 |
+
timestamp: new Date().toISOString(),
|
| 33 |
+
tests: []
|
| 34 |
+
};
|
| 35 |
+
|
| 36 |
+
this.testPassages = [
|
| 37 |
+
{
|
| 38 |
+
text: "The old man sat by the fireplace, reading his favorite book. The flames danced in the hearth, casting shadows on the walls. He turned each page carefully, savoring every word of the ancient tale.",
|
| 39 |
+
difficulty: 3,
|
| 40 |
+
expectedWords: ['favorite', 'flames', 'shadows', 'carefully', 'ancient']
|
| 41 |
+
},
|
| 42 |
+
{
|
| 43 |
+
text: "In the garden, colorful flowers bloomed under the warm sunshine. Bees buzzed from blossom to blossom, collecting nectar for their hive. The gardener watched with satisfaction as his hard work flourished.",
|
| 44 |
+
difficulty: 2,
|
| 45 |
+
expectedWords: ['colorful', 'warm', 'buzzed', 'collecting', 'satisfaction']
|
| 46 |
+
},
|
| 47 |
+
{
|
| 48 |
+
text: "The protagonist's journey through the labyrinthine corridors revealed the edifice's architectural complexity. Each ornate chamber contained mysterious artifacts that suggested an ancient civilization's sophisticated understanding of mathematics and astronomy.",
|
| 49 |
+
difficulty: 8,
|
| 50 |
+
expectedWords: ['labyrinthine', 'edifice', 'architectural', 'ornate', 'artifacts', 'civilization', 'sophisticated']
|
| 51 |
+
}
|
| 52 |
+
];
|
| 53 |
+
|
| 54 |
+
this.chatQuestions = [
|
| 55 |
+
{ type: 'part_of_speech', prompt: 'What part of speech is this word?' },
|
| 56 |
+
{ type: 'sentence_role', prompt: 'What role does this word play in the sentence?' },
|
| 57 |
+
{ type: 'word_category', prompt: 'What category or type of word is this?' },
|
| 58 |
+
{ type: 'synonym', prompt: 'Can you suggest a synonym for this word?' }
|
| 59 |
+
];
|
| 60 |
+
}
|
| 61 |
+
|
| 62 |
+
async runComprehensiveTest(selectedModels = null) {
|
| 63 |
+
const modelsToTest = selectedModels || this.models;
|
| 64 |
+
console.log(`Starting comprehensive test of ${modelsToTest.length} models...`);
|
| 65 |
+
|
| 66 |
+
for (const model of modelsToTest) {
|
| 67 |
+
console.log(`\nTesting model: ${model.name}`);
|
| 68 |
+
const modelResults = await this.testModel(model);
|
| 69 |
+
this.testResults.tests.push(modelResults);
|
| 70 |
+
|
| 71 |
+
// Save intermediate results
|
| 72 |
+
await this.saveResults();
|
| 73 |
+
}
|
| 74 |
+
|
| 75 |
+
console.log('\nAll tests completed!');
|
| 76 |
+
return this.testResults;
|
| 77 |
+
}
|
| 78 |
+
|
| 79 |
+
async testModel(model) {
|
| 80 |
+
const startTime = Date.now();
|
| 81 |
+
const results = {
|
| 82 |
+
modelId: model.id,
|
| 83 |
+
modelName: model.name,
|
| 84 |
+
provider: model.provider,
|
| 85 |
+
timestamp: new Date().toISOString(),
|
| 86 |
+
totalTime: 0,
|
| 87 |
+
wordSelection: {},
|
| 88 |
+
contextualization: {},
|
| 89 |
+
chatHints: {},
|
| 90 |
+
errorRates: {},
|
| 91 |
+
overallScore: 0
|
| 92 |
+
};
|
| 93 |
+
|
| 94 |
+
try {
|
| 95 |
+
// Test word selection across different difficulty levels
|
| 96 |
+
results.wordSelection = await this.testWordSelection(model);
|
| 97 |
+
|
| 98 |
+
// Test contextualization
|
| 99 |
+
results.contextualization = await this.testContextualization(model);
|
| 100 |
+
|
| 101 |
+
// Test chat hint generation
|
| 102 |
+
results.chatHints = await this.testChatHints(model);
|
| 103 |
+
|
| 104 |
+
// Calculate overall metrics
|
| 105 |
+
results.totalTime = Date.now() - startTime;
|
| 106 |
+
results.overallScore = this.calculateOverallScore(results);
|
| 107 |
+
|
| 108 |
+
} catch (error) {
|
| 109 |
+
console.error(`Error testing model ${model.name}:`, error);
|
| 110 |
+
results.error = error.message;
|
| 111 |
+
results.overallScore = 0;
|
| 112 |
+
}
|
| 113 |
+
|
| 114 |
+
return results;
|
| 115 |
+
}
|
| 116 |
+
|
| 117 |
+
async testWordSelection(model) {
|
| 118 |
+
const results = {
|
| 119 |
+
tests: [],
|
| 120 |
+
averageTime: 0,
|
| 121 |
+
successRate: 0,
|
| 122 |
+
qualityScore: 0,
|
| 123 |
+
difficultyAccuracy: 0
|
| 124 |
+
};
|
| 125 |
+
|
| 126 |
+
let totalTime = 0;
|
| 127 |
+
let successCount = 0;
|
| 128 |
+
let qualitySum = 0;
|
| 129 |
+
let difficultySum = 0;
|
| 130 |
+
|
| 131 |
+
for (const passage of this.testPassages) {
|
| 132 |
+
const testStart = Date.now();
|
| 133 |
+
|
| 134 |
+
try {
|
| 135 |
+
const words = await this.performWordSelection(model, passage);
|
| 136 |
+
const testTime = Date.now() - testStart;
|
| 137 |
+
totalTime += testTime;
|
| 138 |
+
|
| 139 |
+
const test = {
|
| 140 |
+
passageLength: passage.text.length,
|
| 141 |
+
targetDifficulty: passage.difficulty,
|
| 142 |
+
responseTime: testTime,
|
| 143 |
+
selectedWords: words,
|
| 144 |
+
wordCount: words.length,
|
| 145 |
+
success: words.length > 0,
|
| 146 |
+
qualityScore: this.evaluateWordQuality(words, passage),
|
| 147 |
+
difficultyScore: this.evaluateDifficultyMatch(words, passage.difficulty)
|
| 148 |
+
};
|
| 149 |
+
|
| 150 |
+
results.tests.push(test);
|
| 151 |
+
|
| 152 |
+
if (test.success) {
|
| 153 |
+
successCount++;
|
| 154 |
+
qualitySum += test.qualityScore;
|
| 155 |
+
difficultySum += test.difficultyScore;
|
| 156 |
+
}
|
| 157 |
+
|
| 158 |
+
} catch (error) {
|
| 159 |
+
results.tests.push({
|
| 160 |
+
passageLength: passage.text.length,
|
| 161 |
+
targetDifficulty: passage.difficulty,
|
| 162 |
+
responseTime: Date.now() - testStart,
|
| 163 |
+
error: error.message,
|
| 164 |
+
success: false
|
| 165 |
+
});
|
| 166 |
+
}
|
| 167 |
+
|
| 168 |
+
// Brief pause between tests
|
| 169 |
+
await new Promise(resolve => setTimeout(resolve, 1000));
|
| 170 |
+
}
|
| 171 |
+
|
| 172 |
+
results.averageTime = totalTime / this.testPassages.length;
|
| 173 |
+
results.successRate = successCount / this.testPassages.length;
|
| 174 |
+
results.qualityScore = successCount > 0 ? qualitySum / successCount : 0;
|
| 175 |
+
results.difficultyAccuracy = successCount > 0 ? difficultySum / successCount : 0;
|
| 176 |
+
|
| 177 |
+
return results;
|
| 178 |
+
}
|
| 179 |
+
|
| 180 |
+
async testContextualization(model) {
|
| 181 |
+
const results = {
|
| 182 |
+
tests: [],
|
| 183 |
+
averageTime: 0,
|
| 184 |
+
successRate: 0,
|
| 185 |
+
relevanceScore: 0
|
| 186 |
+
};
|
| 187 |
+
|
| 188 |
+
const testBooks = [
|
| 189 |
+
{ title: 'Pride and Prejudice', author: 'Jane Austen' },
|
| 190 |
+
{ title: 'The Adventures of Tom Sawyer', author: 'Mark Twain' },
|
| 191 |
+
{ title: 'Moby Dick', author: 'Herman Melville' }
|
| 192 |
+
];
|
| 193 |
+
|
| 194 |
+
let totalTime = 0;
|
| 195 |
+
let successCount = 0;
|
| 196 |
+
let relevanceSum = 0;
|
| 197 |
+
|
| 198 |
+
for (const book of testBooks) {
|
| 199 |
+
const testStart = Date.now();
|
| 200 |
+
|
| 201 |
+
try {
|
| 202 |
+
const context = await this.performContextualization(model, book);
|
| 203 |
+
const testTime = Date.now() - testStart;
|
| 204 |
+
totalTime += testTime;
|
| 205 |
+
|
| 206 |
+
const test = {
|
| 207 |
+
bookTitle: book.title,
|
| 208 |
+
author: book.author,
|
| 209 |
+
responseTime: testTime,
|
| 210 |
+
contextLength: context.length,
|
| 211 |
+
success: context.length > 0,
|
| 212 |
+
relevanceScore: this.evaluateContextRelevance(context, book)
|
| 213 |
+
};
|
| 214 |
+
|
| 215 |
+
results.tests.push(test);
|
| 216 |
+
|
| 217 |
+
if (test.success) {
|
| 218 |
+
successCount++;
|
| 219 |
+
relevanceSum += test.relevanceScore;
|
| 220 |
+
}
|
| 221 |
+
|
| 222 |
+
} catch (error) {
|
| 223 |
+
results.tests.push({
|
| 224 |
+
bookTitle: book.title,
|
| 225 |
+
author: book.author,
|
| 226 |
+
responseTime: Date.now() - testStart,
|
| 227 |
+
error: error.message,
|
| 228 |
+
success: false
|
| 229 |
+
});
|
| 230 |
+
}
|
| 231 |
+
|
| 232 |
+
await new Promise(resolve => setTimeout(resolve, 1000));
|
| 233 |
+
}
|
| 234 |
+
|
| 235 |
+
results.averageTime = totalTime / testBooks.length;
|
| 236 |
+
results.successRate = successCount / testBooks.length;
|
| 237 |
+
results.relevanceScore = successCount > 0 ? relevanceSum / successCount : 0;
|
| 238 |
+
|
| 239 |
+
return results;
|
| 240 |
+
}
|
| 241 |
+
|
| 242 |
+
async testChatHints(model) {
|
| 243 |
+
const results = {
|
| 244 |
+
tests: [],
|
| 245 |
+
averageTime: 0,
|
| 246 |
+
successRate: 0,
|
| 247 |
+
helpfulnessScore: 0,
|
| 248 |
+
questionTypePerformance: {}
|
| 249 |
+
};
|
| 250 |
+
|
| 251 |
+
const testWords = [
|
| 252 |
+
{ word: 'magnificent', sentence: 'The cathedral was truly magnificent.', difficulty: 5 },
|
| 253 |
+
{ word: 'whispered', sentence: 'She whispered the secret to her friend.', difficulty: 3 },
|
| 254 |
+
{ word: 'extraordinary', sentence: 'His performance was extraordinary.', difficulty: 7 }
|
| 255 |
+
];
|
| 256 |
+
|
| 257 |
+
let totalTime = 0;
|
| 258 |
+
let successCount = 0;
|
| 259 |
+
let helpfulnessSum = 0;
|
| 260 |
+
|
| 261 |
+
// Initialize question type tracking
|
| 262 |
+
this.chatQuestions.forEach(q => {
|
| 263 |
+
results.questionTypePerformance[q.type] = {
|
| 264 |
+
tests: 0,
|
| 265 |
+
successes: 0,
|
| 266 |
+
averageScore: 0
|
| 267 |
+
};
|
| 268 |
+
});
|
| 269 |
+
|
| 270 |
+
for (const testWord of testWords) {
|
| 271 |
+
for (const question of this.chatQuestions) {
|
| 272 |
+
const testStart = Date.now();
|
| 273 |
+
|
| 274 |
+
try {
|
| 275 |
+
const hint = await this.performChatHint(model, testWord, question);
|
| 276 |
+
const testTime = Date.now() - testStart;
|
| 277 |
+
totalTime += testTime;
|
| 278 |
+
|
| 279 |
+
const helpfulnessScore = this.evaluateHintHelpfulness(hint, testWord, question);
|
| 280 |
+
|
| 281 |
+
const test = {
|
| 282 |
+
word: testWord.word,
|
| 283 |
+
questionType: question.type,
|
| 284 |
+
difficulty: testWord.difficulty,
|
| 285 |
+
responseTime: testTime,
|
| 286 |
+
hintLength: hint.length,
|
| 287 |
+
success: hint.length > 10, // Minimum meaningful response
|
| 288 |
+
helpfulnessScore: helpfulnessScore
|
| 289 |
+
};
|
| 290 |
+
|
| 291 |
+
results.tests.push(test);
|
| 292 |
+
|
| 293 |
+
// Update question type performance
|
| 294 |
+
const qtPerf = results.questionTypePerformance[question.type];
|
| 295 |
+
qtPerf.tests++;
|
| 296 |
+
|
| 297 |
+
if (test.success) {
|
| 298 |
+
successCount++;
|
| 299 |
+
helpfulnessSum += helpfulnessScore;
|
| 300 |
+
qtPerf.successes++;
|
| 301 |
+
qtPerf.averageScore += helpfulnessScore;
|
| 302 |
+
}
|
| 303 |
+
|
| 304 |
+
} catch (error) {
|
| 305 |
+
results.tests.push({
|
| 306 |
+
word: testWord.word,
|
| 307 |
+
questionType: question.type,
|
| 308 |
+
difficulty: testWord.difficulty,
|
| 309 |
+
responseTime: Date.now() - testStart,
|
| 310 |
+
error: error.message,
|
| 311 |
+
success: false
|
| 312 |
+
});
|
| 313 |
+
|
| 314 |
+
results.questionTypePerformance[question.type].tests++;
|
| 315 |
+
}
|
| 316 |
+
|
| 317 |
+
await new Promise(resolve => setTimeout(resolve, 500));
|
| 318 |
+
}
|
| 319 |
+
}
|
| 320 |
+
|
| 321 |
+
// Calculate averages for question types
|
| 322 |
+
Object.keys(results.questionTypePerformance).forEach(type => {
|
| 323 |
+
const perf = results.questionTypePerformance[type];
|
| 324 |
+
perf.successRate = perf.tests > 0 ? perf.successes / perf.tests : 0;
|
| 325 |
+
perf.averageScore = perf.successes > 0 ? perf.averageScore / perf.successes : 0;
|
| 326 |
+
});
|
| 327 |
+
|
| 328 |
+
const totalTests = testWords.length * this.chatQuestions.length;
|
| 329 |
+
results.averageTime = totalTime / totalTests;
|
| 330 |
+
results.successRate = successCount / totalTests;
|
| 331 |
+
results.helpfulnessScore = successCount > 0 ? helpfulnessSum / successCount : 0;
|
| 332 |
+
|
| 333 |
+
return results;
|
| 334 |
+
}
|
| 335 |
+
|
| 336 |
+
async performWordSelection(model, passage) {
|
| 337 |
+
// Create a temporary AI service instance for this model
|
| 338 |
+
const aiService = await this.createModelAIService(model);
|
| 339 |
+
|
| 340 |
+
const prompt = `Select ${Math.min(3, Math.floor(passage.difficulty / 2) + 1)} appropriate words to remove from this passage for a cloze exercise at difficulty level ${passage.difficulty}:
|
| 341 |
+
|
| 342 |
+
"${passage.text}"
|
| 343 |
+
|
| 344 |
+
Return only a JSON array of words, like: ["word1", "word2", "word3"]`;
|
| 345 |
+
|
| 346 |
+
const response = await aiService.makeAIRequest(prompt);
|
| 347 |
+
|
| 348 |
+
try {
|
| 349 |
+
return JSON.parse(response);
|
| 350 |
+
} catch {
|
| 351 |
+
// Try to extract words from non-JSON response
|
| 352 |
+
const matches = response.match(/\[.*?\]/);
|
| 353 |
+
if (matches) {
|
| 354 |
+
return JSON.parse(matches[0]);
|
| 355 |
+
}
|
| 356 |
+
return [];
|
| 357 |
+
}
|
| 358 |
+
}
|
| 359 |
+
|
| 360 |
+
async performContextualization(model, book) {
|
| 361 |
+
const aiService = await this.createModelAIService(model);
|
| 362 |
+
|
| 363 |
+
const prompt = `Provide a brief historical and literary context for "${book.title}" by ${book.author}. Keep it concise and educational, suitable for language learners.`;
|
| 364 |
+
|
| 365 |
+
return await aiService.makeAIRequest(prompt);
|
| 366 |
+
}
|
| 367 |
+
|
| 368 |
+
async performChatHint(model, testWord, question) {
|
| 369 |
+
const aiService = await this.createModelAIService(model);
|
| 370 |
+
|
| 371 |
+
const prompt = `You are helping a student understand a word in context. The word is "${testWord.word}" in the sentence: "${testWord.sentence}"
|
| 372 |
+
|
| 373 |
+
${question.prompt}
|
| 374 |
+
|
| 375 |
+
Provide a helpful hint without revealing the word directly. Keep your response concise and educational.`;
|
| 376 |
+
|
| 377 |
+
return await aiService.makeAIRequest(prompt);
|
| 378 |
+
}
|
| 379 |
+
|
| 380 |
+
async createModelAIService(model) {
|
| 381 |
+
// Use the testing AI service for better performance tracking
|
| 382 |
+
const { TestAIService } = await import('./testAIService.js');
|
| 383 |
+
|
| 384 |
+
const config = {
|
| 385 |
+
modelId: model.id,
|
| 386 |
+
provider: model.provider,
|
| 387 |
+
isLocal: model.provider === 'local'
|
| 388 |
+
};
|
| 389 |
+
|
| 390 |
+
return new TestAIService(config);
|
| 391 |
+
}
|
| 392 |
+
|
| 393 |
+
async detectLocalModels() {
|
| 394 |
+
// Attempt to detect available local models from LM Studio
|
| 395 |
+
try {
|
| 396 |
+
const response = await fetch('http://localhost:1234/v1/models');
|
| 397 |
+
if (response.ok) {
|
| 398 |
+
const data = await response.json();
|
| 399 |
+
const detectedModels = data.data.map(model => ({
|
| 400 |
+
id: model.id,
|
| 401 |
+
name: `${model.id} (Local)`,
|
| 402 |
+
provider: 'local'
|
| 403 |
+
}));
|
| 404 |
+
|
| 405 |
+
// Update the local models list
|
| 406 |
+
this.models = this.models.filter(m => m.provider !== 'local');
|
| 407 |
+
this.models.push(...detectedModels);
|
| 408 |
+
|
| 409 |
+
return detectedModels;
|
| 410 |
+
}
|
| 411 |
+
} catch (error) {
|
| 412 |
+
console.log('No local LM Studio server detected on port 1234');
|
| 413 |
+
}
|
| 414 |
+
|
| 415 |
+
// Return default local models if detection fails
|
| 416 |
+
return this.models.filter(m => m.provider === 'local');
|
| 417 |
+
}
|
| 418 |
+
|
| 419 |
+
async testLocalServerConnection() {
|
| 420 |
+
try {
|
| 421 |
+
const response = await fetch('http://localhost:1234/v1/models', {
|
| 422 |
+
method: 'GET',
|
| 423 |
+
headers: {
|
| 424 |
+
'Content-Type': 'application/json'
|
| 425 |
+
}
|
| 426 |
+
});
|
| 427 |
+
|
| 428 |
+
if (response.ok) {
|
| 429 |
+
const data = await response.json();
|
| 430 |
+
return {
|
| 431 |
+
connected: true,
|
| 432 |
+
models: data.data || [],
|
| 433 |
+
serverInfo: data
|
| 434 |
+
};
|
| 435 |
+
} else {
|
| 436 |
+
return {
|
| 437 |
+
connected: false,
|
| 438 |
+
error: `HTTP ${response.status}: ${response.statusText}`
|
| 439 |
+
};
|
| 440 |
+
}
|
| 441 |
+
} catch (error) {
|
| 442 |
+
return {
|
| 443 |
+
connected: false,
|
| 444 |
+
error: error.message
|
| 445 |
+
};
|
| 446 |
+
}
|
| 447 |
+
}
|
| 448 |
+
|
| 449 |
+
evaluateWordQuality(words, passage) {
|
| 450 |
+
if (!words || words.length === 0) return 0;
|
| 451 |
+
|
| 452 |
+
let score = 0;
|
| 453 |
+
const text = passage.text.toLowerCase();
|
| 454 |
+
|
| 455 |
+
for (const word of words) {
|
| 456 |
+
const wordLower = word.toLowerCase();
|
| 457 |
+
|
| 458 |
+
// Check if word exists in passage
|
| 459 |
+
if (text.includes(wordLower)) score += 20;
|
| 460 |
+
|
| 461 |
+
// Check word length appropriateness
|
| 462 |
+
const expectedMinLength = Math.max(4, passage.difficulty);
|
| 463 |
+
const expectedMaxLength = Math.min(12, passage.difficulty + 6);
|
| 464 |
+
|
| 465 |
+
if (word.length >= expectedMinLength && word.length <= expectedMaxLength) {
|
| 466 |
+
score += 15;
|
| 467 |
+
}
|
| 468 |
+
|
| 469 |
+
// Avoid overly common words for higher difficulties
|
| 470 |
+
const commonWords = ['the', 'and', 'but', 'for', 'are', 'was', 'his', 'her'];
|
| 471 |
+
if (passage.difficulty > 5 && !commonWords.includes(wordLower)) {
|
| 472 |
+
score += 10;
|
| 473 |
+
}
|
| 474 |
+
}
|
| 475 |
+
|
| 476 |
+
return Math.min(100, score / words.length);
|
| 477 |
+
}
|
| 478 |
+
|
| 479 |
+
evaluateDifficultyMatch(words, targetDifficulty) {
|
| 480 |
+
if (!words || words.length === 0) return 0;
|
| 481 |
+
|
| 482 |
+
let score = 0;
|
| 483 |
+
|
| 484 |
+
for (const word of words) {
|
| 485 |
+
const wordLength = word.length;
|
| 486 |
+
const expectedMin = Math.max(4, targetDifficulty);
|
| 487 |
+
const expectedMax = Math.min(14, targetDifficulty + 6);
|
| 488 |
+
|
| 489 |
+
if (wordLength >= expectedMin && wordLength <= expectedMax) {
|
| 490 |
+
score += 100;
|
| 491 |
+
} else {
|
| 492 |
+
// Partial credit for close matches
|
| 493 |
+
const distance = Math.min(
|
| 494 |
+
Math.abs(wordLength - expectedMin),
|
| 495 |
+
Math.abs(wordLength - expectedMax)
|
| 496 |
+
);
|
| 497 |
+
score += Math.max(0, 100 - (distance * 20));
|
| 498 |
+
}
|
| 499 |
+
}
|
| 500 |
+
|
| 501 |
+
return score / words.length;
|
| 502 |
+
}
|
| 503 |
+
|
| 504 |
+
evaluateContextRelevance(context, book) {
|
| 505 |
+
if (!context || context.length < 20) return 0;
|
| 506 |
+
|
| 507 |
+
let score = 0;
|
| 508 |
+
const contextLower = context.toLowerCase();
|
| 509 |
+
|
| 510 |
+
// Check for book title mention
|
| 511 |
+
if (contextLower.includes(book.title.toLowerCase())) score += 25;
|
| 512 |
+
|
| 513 |
+
// Check for author mention
|
| 514 |
+
if (contextLower.includes(book.author.toLowerCase().split(' ').pop())) score += 25;
|
| 515 |
+
|
| 516 |
+
// Check for literary/historical terms
|
| 517 |
+
const literaryTerms = ['novel', 'literature', 'author', 'published', 'century', 'period', 'style', 'theme'];
|
| 518 |
+
const foundTerms = literaryTerms.filter(term => contextLower.includes(term));
|
| 519 |
+
score += Math.min(30, foundTerms.length * 5);
|
| 520 |
+
|
| 521 |
+
// Length appropriateness (100-500 chars is good)
|
| 522 |
+
if (context.length >= 100 && context.length <= 500) score += 20;
|
| 523 |
+
|
| 524 |
+
return Math.min(100, score);
|
| 525 |
+
}
|
| 526 |
+
|
| 527 |
+
evaluateHintHelpfulness(hint, testWord, question) {
|
| 528 |
+
if (!hint || hint.length < 10) return 0;
|
| 529 |
+
|
| 530 |
+
let score = 0;
|
| 531 |
+
const hintLower = hint.toLowerCase();
|
| 532 |
+
const wordLower = testWord.word.toLowerCase();
|
| 533 |
+
|
| 534 |
+
// Penalize if the word is revealed directly
|
| 535 |
+
if (hintLower.includes(wordLower)) {
|
| 536 |
+
score -= 50;
|
| 537 |
+
}
|
| 538 |
+
|
| 539 |
+
// Check for question-appropriate responses
|
| 540 |
+
switch (question.type) {
|
| 541 |
+
case 'part_of_speech':
|
| 542 |
+
const posTerms = ['noun', 'verb', 'adjective', 'adverb', 'pronoun'];
|
| 543 |
+
if (posTerms.some(term => hintLower.includes(term))) score += 40;
|
| 544 |
+
break;
|
| 545 |
+
|
| 546 |
+
case 'sentence_role':
|
| 547 |
+
const roleTerms = ['subject', 'object', 'predicate', 'modifier', 'describes'];
|
| 548 |
+
if (roleTerms.some(term => hintLower.includes(term))) score += 40;
|
| 549 |
+
break;
|
| 550 |
+
|
| 551 |
+
case 'word_category':
|
| 552 |
+
const categoryTerms = ['type', 'kind', 'category', 'group', 'family'];
|
| 553 |
+
if (categoryTerms.some(term => hintLower.includes(term))) score += 40;
|
| 554 |
+
break;
|
| 555 |
+
|
| 556 |
+
case 'synonym':
|
| 557 |
+
const synonymTerms = ['similar', 'means', 'like', 'same as', 'equivalent'];
|
| 558 |
+
if (synonymTerms.some(term => hintLower.includes(term))) score += 40;
|
| 559 |
+
break;
|
| 560 |
+
}
|
| 561 |
+
|
| 562 |
+
// Length appropriateness
|
| 563 |
+
if (hint.length >= 20 && hint.length <= 200) score += 30;
|
| 564 |
+
|
| 565 |
+
// Educational tone
|
| 566 |
+
const educationalTerms = ['this word', 'in this context', 'here', 'sentence'];
|
| 567 |
+
if (educationalTerms.some(term => hintLower.includes(term))) score += 20;
|
| 568 |
+
|
| 569 |
+
return Math.max(0, Math.min(100, score));
|
| 570 |
+
}
|
| 571 |
+
|
| 572 |
+
calculateOverallScore(results) {
|
| 573 |
+
const weights = {
|
| 574 |
+
wordSelection: 0.4,
|
| 575 |
+
contextualization: 0.3,
|
| 576 |
+
chatHints: 0.3
|
| 577 |
+
};
|
| 578 |
+
|
| 579 |
+
let totalScore = 0;
|
| 580 |
+
|
| 581 |
+
if (results.wordSelection.successRate !== undefined) {
|
| 582 |
+
totalScore += results.wordSelection.successRate * 40 * weights.wordSelection;
|
| 583 |
+
}
|
| 584 |
+
|
| 585 |
+
if (results.contextualization.successRate !== undefined) {
|
| 586 |
+
totalScore += results.contextualization.successRate * 50 * weights.contextualization;
|
| 587 |
+
}
|
| 588 |
+
|
| 589 |
+
if (results.chatHints.successRate !== undefined) {
|
| 590 |
+
totalScore += results.chatHints.successRate * 60 * weights.chatHints;
|
| 591 |
+
}
|
| 592 |
+
|
| 593 |
+
// Bonus for consistent performance across all areas
|
| 594 |
+
const allAreas = [results.wordSelection, results.contextualization, results.chatHints];
|
| 595 |
+
const minSuccess = Math.min(...allAreas.map(area => area.successRate || 0));
|
| 596 |
+
if (minSuccess > 0.8) totalScore += 10;
|
| 597 |
+
|
| 598 |
+
return Math.min(100, totalScore);
|
| 599 |
+
}
|
| 600 |
+
|
| 601 |
+
async saveResults() {
|
| 602 |
+
const csvContent = this.generateCSV();
|
| 603 |
+
const timestamp = new Date().toISOString().replace(/[:.]/g, '-');
|
| 604 |
+
const filename = `model_test_results_${timestamp}.csv`;
|
| 605 |
+
|
| 606 |
+
// Browser environment - download file
|
| 607 |
+
this.downloadCSV(csvContent, filename);
|
| 608 |
+
|
| 609 |
+
console.log(`Results saved as ${filename}`);
|
| 610 |
+
return filename;
|
| 611 |
+
}
|
| 612 |
+
|
| 613 |
+
downloadCSV(content, filename) {
|
| 614 |
+
const blob = new Blob([content], { type: 'text/csv' });
|
| 615 |
+
const url = URL.createObjectURL(blob);
|
| 616 |
+
|
| 617 |
+
const a = document.createElement('a');
|
| 618 |
+
a.href = url;
|
| 619 |
+
a.download = filename;
|
| 620 |
+
document.body.appendChild(a);
|
| 621 |
+
a.click();
|
| 622 |
+
document.body.removeChild(a);
|
| 623 |
+
URL.revokeObjectURL(url);
|
| 624 |
+
}
|
| 625 |
+
|
| 626 |
+
generateCSV() {
|
| 627 |
+
const headers = [
|
| 628 |
+
'Model Name',
|
| 629 |
+
'Model ID',
|
| 630 |
+
'Provider',
|
| 631 |
+
'Timestamp',
|
| 632 |
+
'Total Time (ms)',
|
| 633 |
+
'Overall Score',
|
| 634 |
+
'Word Selection Success Rate',
|
| 635 |
+
'Word Selection Avg Time (ms)',
|
| 636 |
+
'Word Selection Quality Score',
|
| 637 |
+
'Word Selection Difficulty Accuracy',
|
| 638 |
+
'Contextualization Success Rate',
|
| 639 |
+
'Contextualization Avg Time (ms)',
|
| 640 |
+
'Contextualization Relevance Score',
|
| 641 |
+
'Chat Hints Success Rate',
|
| 642 |
+
'Chat Hints Avg Time (ms)',
|
| 643 |
+
'Chat Hints Helpfulness Score',
|
| 644 |
+
'Part of Speech Success Rate',
|
| 645 |
+
'Sentence Role Success Rate',
|
| 646 |
+
'Word Category Success Rate',
|
| 647 |
+
'Synonym Success Rate',
|
| 648 |
+
'User Satisfaction Score',
|
| 649 |
+
'Word Selection User Rating',
|
| 650 |
+
'Passage Quality User Rating',
|
| 651 |
+
'Hint Helpfulness User Rating',
|
| 652 |
+
'Overall Experience User Rating',
|
| 653 |
+
'User Comments Count',
|
| 654 |
+
'Error Message'
|
| 655 |
+
];
|
| 656 |
+
|
| 657 |
+
const rows = [headers.join(',')];
|
| 658 |
+
|
| 659 |
+
for (const test of this.testResults.tests) {
|
| 660 |
+
// Get user ranking data if available
|
| 661 |
+
const userRankings = test.userRankings || {};
|
| 662 |
+
const userSatisfaction = userRankings.overallUserSatisfaction || 0;
|
| 663 |
+
const avgRatings = userRankings.averageRatings || {};
|
| 664 |
+
const commentsCount = userRankings.comments?.length || 0;
|
| 665 |
+
|
| 666 |
+
const row = [
|
| 667 |
+
`"${test.modelName}"`,
|
| 668 |
+
`"${test.modelId}"`,
|
| 669 |
+
`"${test.provider}"`,
|
| 670 |
+
`"${test.timestamp}"`,
|
| 671 |
+
test.totalTime || 0,
|
| 672 |
+
test.overallScore || 0,
|
| 673 |
+
test.wordSelection?.successRate || 0,
|
| 674 |
+
test.wordSelection?.averageTime || 0,
|
| 675 |
+
test.wordSelection?.qualityScore || 0,
|
| 676 |
+
test.wordSelection?.difficultyAccuracy || 0,
|
| 677 |
+
test.contextualization?.successRate || 0,
|
| 678 |
+
test.contextualization?.averageTime || 0,
|
| 679 |
+
test.contextualization?.relevanceScore || 0,
|
| 680 |
+
test.chatHints?.successRate || 0,
|
| 681 |
+
test.chatHints?.averageTime || 0,
|
| 682 |
+
test.chatHints?.helpfulnessScore || 0,
|
| 683 |
+
test.chatHints?.questionTypePerformance?.part_of_speech?.successRate || 0,
|
| 684 |
+
test.chatHints?.questionTypePerformance?.sentence_role?.successRate || 0,
|
| 685 |
+
test.chatHints?.questionTypePerformance?.word_category?.successRate || 0,
|
| 686 |
+
test.chatHints?.questionTypePerformance?.synonym?.successRate || 0,
|
| 687 |
+
userSatisfaction.toFixed(2),
|
| 688 |
+
avgRatings.word_selection?.toFixed(2) || 0,
|
| 689 |
+
avgRatings.passage_quality?.toFixed(2) || 0,
|
| 690 |
+
avgRatings.hint_helpfulness?.toFixed(2) || 0,
|
| 691 |
+
avgRatings.overall_experience?.toFixed(2) || 0,
|
| 692 |
+
commentsCount,
|
| 693 |
+
`"${test.error || ''}"`
|
| 694 |
+
];
|
| 695 |
+
|
| 696 |
+
rows.push(row.join(','));
|
| 697 |
+
}
|
| 698 |
+
|
| 699 |
+
return rows.join('\n');
|
| 700 |
+
}
|
| 701 |
+
}
|
| 702 |
+
|
| 703 |
+
export { ModelTestingFramework };
|
src/testAIService.js
ADDED
|
@@ -0,0 +1,154 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
/**
|
| 2 |
+
* Testing-specific AI Service wrapper
|
| 3 |
+
* Extends the main AI service with testing capabilities
|
| 4 |
+
*/
|
| 5 |
+
|
| 6 |
+
class TestAIService {
|
| 7 |
+
constructor(config) {
|
| 8 |
+
this.modelId = config.modelId;
|
| 9 |
+
this.provider = config.provider;
|
| 10 |
+
this.isLocal = config.isLocal || config.provider === 'local';
|
| 11 |
+
this.baseUrl = this.isLocal ? 'http://localhost:1234' : 'https://openrouter.ai/api/v1';
|
| 12 |
+
this.apiKey = this.isLocal ? 'test-key' : this.getApiKey();
|
| 13 |
+
|
| 14 |
+
// Performance tracking
|
| 15 |
+
this.requestCount = 0;
|
| 16 |
+
this.totalResponseTime = 0;
|
| 17 |
+
this.errorCount = 0;
|
| 18 |
+
this.lastError = null;
|
| 19 |
+
}
|
| 20 |
+
|
| 21 |
+
getApiKey() {
|
| 22 |
+
// Try to get API key from meta tag (injected by server)
|
| 23 |
+
const metaTag = document.querySelector('meta[name="openrouter-api-key"]');
|
| 24 |
+
if (metaTag) {
|
| 25 |
+
return metaTag.content;
|
| 26 |
+
}
|
| 27 |
+
|
| 28 |
+
// Fallback to environment variable (for Node.js testing)
|
| 29 |
+
if (typeof process !== 'undefined' && process.env) {
|
| 30 |
+
return process.env.OPENROUTER_API_KEY;
|
| 31 |
+
}
|
| 32 |
+
|
| 33 |
+
return null;
|
| 34 |
+
}
|
| 35 |
+
|
| 36 |
+
async makeAIRequest(prompt, options = {}) {
|
| 37 |
+
const startTime = Date.now();
|
| 38 |
+
this.requestCount++;
|
| 39 |
+
|
| 40 |
+
try {
|
| 41 |
+
const response = await this.performRequest(prompt, options);
|
| 42 |
+
this.totalResponseTime += Date.now() - startTime;
|
| 43 |
+
return response;
|
| 44 |
+
} catch (error) {
|
| 45 |
+
this.errorCount++;
|
| 46 |
+
this.lastError = error;
|
| 47 |
+
this.totalResponseTime += Date.now() - startTime;
|
| 48 |
+
throw error;
|
| 49 |
+
}
|
| 50 |
+
}
|
| 51 |
+
|
| 52 |
+
async performRequest(prompt, options = {}) {
|
| 53 |
+
const requestBody = {
|
| 54 |
+
model: this.modelId,
|
| 55 |
+
messages: [
|
| 56 |
+
{
|
| 57 |
+
role: "user",
|
| 58 |
+
content: prompt
|
| 59 |
+
}
|
| 60 |
+
],
|
| 61 |
+
max_tokens: options.maxTokens || 500,
|
| 62 |
+
temperature: options.temperature || 0.7,
|
| 63 |
+
top_p: options.topP || 0.9
|
| 64 |
+
};
|
| 65 |
+
|
| 66 |
+
const headers = {
|
| 67 |
+
'Content-Type': 'application/json',
|
| 68 |
+
'Authorization': `Bearer ${this.apiKey}`
|
| 69 |
+
};
|
| 70 |
+
|
| 71 |
+
if (!this.isLocal) {
|
| 72 |
+
headers['HTTP-Referer'] = window.location.origin;
|
| 73 |
+
}
|
| 74 |
+
|
| 75 |
+
const controller = new AbortController();
|
| 76 |
+
const timeoutId = setTimeout(() => controller.abort(), 30000); // 30 second timeout
|
| 77 |
+
|
| 78 |
+
try {
|
| 79 |
+
const response = await fetch(`${this.baseUrl}/chat/completions`, {
|
| 80 |
+
method: 'POST',
|
| 81 |
+
headers: headers,
|
| 82 |
+
body: JSON.stringify(requestBody),
|
| 83 |
+
signal: controller.signal
|
| 84 |
+
});
|
| 85 |
+
|
| 86 |
+
clearTimeout(timeoutId);
|
| 87 |
+
|
| 88 |
+
if (!response.ok) {
|
| 89 |
+
throw new Error(`HTTP ${response.status}: ${response.statusText}`);
|
| 90 |
+
}
|
| 91 |
+
|
| 92 |
+
const data = await response.json();
|
| 93 |
+
|
| 94 |
+
if (!data.choices || data.choices.length === 0) {
|
| 95 |
+
throw new Error('No response from AI service');
|
| 96 |
+
}
|
| 97 |
+
|
| 98 |
+
let content = data.choices[0].message.content;
|
| 99 |
+
|
| 100 |
+
// Clean up local LLM response artifacts
|
| 101 |
+
if (this.isLocal) {
|
| 102 |
+
content = this.cleanLocalLLMResponse(content);
|
| 103 |
+
}
|
| 104 |
+
|
| 105 |
+
return content;
|
| 106 |
+
} catch (error) {
|
| 107 |
+
clearTimeout(timeoutId);
|
| 108 |
+
if (error.name === 'AbortError') {
|
| 109 |
+
throw new Error('Request timeout');
|
| 110 |
+
}
|
| 111 |
+
throw error;
|
| 112 |
+
}
|
| 113 |
+
}
|
| 114 |
+
|
| 115 |
+
cleanLocalLLMResponse(content) {
|
| 116 |
+
// Remove common local LLM artifacts
|
| 117 |
+
content = content.replace(/^\[.*?\]\s*/, ''); // Remove leading brackets
|
| 118 |
+
content = content.replace(/\s*\[.*?\]$/, ''); // Remove trailing brackets
|
| 119 |
+
content = content.replace(/^"(.*)"$/, '$1'); // Remove surrounding quotes
|
| 120 |
+
content = content.replace(/\\n/g, '\n'); // Fix escaped newlines
|
| 121 |
+
content = content.replace(/\\"/g, '"'); // Fix escaped quotes
|
| 122 |
+
|
| 123 |
+
return content.trim();
|
| 124 |
+
}
|
| 125 |
+
|
| 126 |
+
// Performance metrics
|
| 127 |
+
getAverageResponseTime() {
|
| 128 |
+
return this.requestCount > 0 ? this.totalResponseTime / this.requestCount : 0;
|
| 129 |
+
}
|
| 130 |
+
|
| 131 |
+
getErrorRate() {
|
| 132 |
+
return this.requestCount > 0 ? this.errorCount / this.requestCount : 0;
|
| 133 |
+
}
|
| 134 |
+
|
| 135 |
+
getPerformanceStats() {
|
| 136 |
+
return {
|
| 137 |
+
requestCount: this.requestCount,
|
| 138 |
+
totalResponseTime: this.totalResponseTime,
|
| 139 |
+
averageResponseTime: this.getAverageResponseTime(),
|
| 140 |
+
errorCount: this.errorCount,
|
| 141 |
+
errorRate: this.getErrorRate(),
|
| 142 |
+
lastError: this.lastError?.message || null
|
| 143 |
+
};
|
| 144 |
+
}
|
| 145 |
+
|
| 146 |
+
reset() {
|
| 147 |
+
this.requestCount = 0;
|
| 148 |
+
this.totalResponseTime = 0;
|
| 149 |
+
this.errorCount = 0;
|
| 150 |
+
this.lastError = null;
|
| 151 |
+
}
|
| 152 |
+
}
|
| 153 |
+
|
| 154 |
+
export { TestAIService };
|
src/testGameRunner.js
ADDED
|
@@ -0,0 +1,473 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
/**
|
| 2 |
+
* Test Game Runner - Monitors and logs performance during game testing
|
| 3 |
+
*/
|
| 4 |
+
|
| 5 |
+
class TestGameRunner {
|
| 6 |
+
constructor(modelConfig) {
|
| 7 |
+
this.modelConfig = modelConfig;
|
| 8 |
+
this.sessionData = {
|
| 9 |
+
modelId: modelConfig.modelId,
|
| 10 |
+
modelName: modelConfig.modelName,
|
| 11 |
+
provider: modelConfig.provider,
|
| 12 |
+
startTime: Date.now(),
|
| 13 |
+
rounds: [],
|
| 14 |
+
interactions: [],
|
| 15 |
+
userRankings: [],
|
| 16 |
+
performance: {
|
| 17 |
+
wordSelectionRequests: 0,
|
| 18 |
+
wordSelectionSuccess: 0,
|
| 19 |
+
wordSelectionTime: 0,
|
| 20 |
+
contextualizationRequests: 0,
|
| 21 |
+
contextualizationSuccess: 0,
|
| 22 |
+
contextualizationTime: 0,
|
| 23 |
+
chatHintRequests: 0,
|
| 24 |
+
chatHintSuccess: 0,
|
| 25 |
+
chatHintTime: 0,
|
| 26 |
+
errors: []
|
| 27 |
+
}
|
| 28 |
+
};
|
| 29 |
+
|
| 30 |
+
this.originalAIService = null;
|
| 31 |
+
this.setupInterception();
|
| 32 |
+
}
|
| 33 |
+
|
| 34 |
+
setupInterception() {
|
| 35 |
+
// Intercept AI service calls to track performance
|
| 36 |
+
if (window.aiService) {
|
| 37 |
+
this.originalAIService = window.aiService;
|
| 38 |
+
this.wrapAIService();
|
| 39 |
+
}
|
| 40 |
+
|
| 41 |
+
// Monitor for game events
|
| 42 |
+
this.setupGameEventListeners();
|
| 43 |
+
}
|
| 44 |
+
|
| 45 |
+
wrapAIService() {
|
| 46 |
+
const testRunner = this;
|
| 47 |
+
|
| 48 |
+
// Wrap the makeAIRequest method
|
| 49 |
+
const originalMakeAIRequest = this.originalAIService.makeAIRequest.bind(this.originalAIService);
|
| 50 |
+
|
| 51 |
+
window.aiService.makeAIRequest = async function(prompt, options = {}) {
|
| 52 |
+
const startTime = Date.now();
|
| 53 |
+
const requestType = testRunner.classifyRequest(prompt);
|
| 54 |
+
|
| 55 |
+
testRunner.logInteraction({
|
| 56 |
+
type: 'ai_request_start',
|
| 57 |
+
requestType: requestType,
|
| 58 |
+
prompt: prompt.substring(0, 200) + '...',
|
| 59 |
+
timestamp: Date.now()
|
| 60 |
+
});
|
| 61 |
+
|
| 62 |
+
try {
|
| 63 |
+
const result = await originalMakeAIRequest(prompt, options);
|
| 64 |
+
const responseTime = Date.now() - startTime;
|
| 65 |
+
|
| 66 |
+
testRunner.updatePerformanceMetrics(requestType, true, responseTime);
|
| 67 |
+
testRunner.logInteraction({
|
| 68 |
+
type: 'ai_request_success',
|
| 69 |
+
requestType: requestType,
|
| 70 |
+
responseTime: responseTime,
|
| 71 |
+
responseLength: result.length,
|
| 72 |
+
timestamp: Date.now()
|
| 73 |
+
});
|
| 74 |
+
|
| 75 |
+
return result;
|
| 76 |
+
} catch (error) {
|
| 77 |
+
const responseTime = Date.now() - startTime;
|
| 78 |
+
|
| 79 |
+
testRunner.updatePerformanceMetrics(requestType, false, responseTime);
|
| 80 |
+
testRunner.logInteraction({
|
| 81 |
+
type: 'ai_request_error',
|
| 82 |
+
requestType: requestType,
|
| 83 |
+
error: error.message,
|
| 84 |
+
responseTime: responseTime,
|
| 85 |
+
timestamp: Date.now()
|
| 86 |
+
});
|
| 87 |
+
|
| 88 |
+
testRunner.sessionData.performance.errors.push({
|
| 89 |
+
type: requestType,
|
| 90 |
+
error: error.message,
|
| 91 |
+
timestamp: Date.now()
|
| 92 |
+
});
|
| 93 |
+
|
| 94 |
+
throw error;
|
| 95 |
+
}
|
| 96 |
+
};
|
| 97 |
+
}
|
| 98 |
+
|
| 99 |
+
classifyRequest(prompt) {
|
| 100 |
+
const promptLower = prompt.toLowerCase();
|
| 101 |
+
|
| 102 |
+
if (promptLower.includes('select') && promptLower.includes('word')) {
|
| 103 |
+
return 'word_selection';
|
| 104 |
+
} else if (promptLower.includes('context') || promptLower.includes('background')) {
|
| 105 |
+
return 'contextualization';
|
| 106 |
+
} else if (promptLower.includes('hint') || promptLower.includes('help') || promptLower.includes('clue')) {
|
| 107 |
+
return 'chat_hint';
|
| 108 |
+
} else {
|
| 109 |
+
return 'other';
|
| 110 |
+
}
|
| 111 |
+
}
|
| 112 |
+
|
| 113 |
+
updatePerformanceMetrics(requestType, success, responseTime) {
|
| 114 |
+
const perf = this.sessionData.performance;
|
| 115 |
+
|
| 116 |
+
switch (requestType) {
|
| 117 |
+
case 'word_selection':
|
| 118 |
+
perf.wordSelectionRequests++;
|
| 119 |
+
if (success) {
|
| 120 |
+
perf.wordSelectionSuccess++;
|
| 121 |
+
perf.wordSelectionTime += responseTime;
|
| 122 |
+
}
|
| 123 |
+
break;
|
| 124 |
+
|
| 125 |
+
case 'contextualization':
|
| 126 |
+
perf.contextualizationRequests++;
|
| 127 |
+
if (success) {
|
| 128 |
+
perf.contextualizationSuccess++;
|
| 129 |
+
perf.contextualizationTime += responseTime;
|
| 130 |
+
}
|
| 131 |
+
break;
|
| 132 |
+
|
| 133 |
+
case 'chat_hint':
|
| 134 |
+
perf.chatHintRequests++;
|
| 135 |
+
if (success) {
|
| 136 |
+
perf.chatHintSuccess++;
|
| 137 |
+
perf.chatHintTime += responseTime;
|
| 138 |
+
}
|
| 139 |
+
break;
|
| 140 |
+
}
|
| 141 |
+
}
|
| 142 |
+
|
| 143 |
+
setupGameEventListeners() {
|
| 144 |
+
// Listen for game-specific events
|
| 145 |
+
document.addEventListener('gameRoundStart', (event) => {
|
| 146 |
+
this.logInteraction({
|
| 147 |
+
type: 'round_start',
|
| 148 |
+
level: event.detail.level,
|
| 149 |
+
round: event.detail.round,
|
| 150 |
+
timestamp: Date.now()
|
| 151 |
+
});
|
| 152 |
+
});
|
| 153 |
+
|
| 154 |
+
document.addEventListener('gameRoundComplete', (event) => {
|
| 155 |
+
const roundData = {
|
| 156 |
+
level: event.detail.level,
|
| 157 |
+
round: event.detail.round,
|
| 158 |
+
score: event.detail.score,
|
| 159 |
+
correctAnswers: event.detail.correctAnswers,
|
| 160 |
+
totalBlanks: event.detail.totalBlanks,
|
| 161 |
+
timeSpent: event.detail.timeSpent,
|
| 162 |
+
timestamp: Date.now()
|
| 163 |
+
};
|
| 164 |
+
|
| 165 |
+
this.sessionData.rounds.push(roundData);
|
| 166 |
+
|
| 167 |
+
// Store the current round index for user ranking association
|
| 168 |
+
this.currentRoundIndex = this.sessionData.rounds.length - 1;
|
| 169 |
+
|
| 170 |
+
this.logInteraction({
|
| 171 |
+
type: 'round_complete',
|
| 172 |
+
level: event.detail.level,
|
| 173 |
+
round: event.detail.round,
|
| 174 |
+
score: event.detail.score,
|
| 175 |
+
timestamp: Date.now()
|
| 176 |
+
});
|
| 177 |
+
});
|
| 178 |
+
|
| 179 |
+
document.addEventListener('userAnswer', (event) => {
|
| 180 |
+
this.logInteraction({
|
| 181 |
+
type: 'user_answer',
|
| 182 |
+
word: event.detail.targetWord,
|
| 183 |
+
userAnswer: event.detail.userAnswer,
|
| 184 |
+
correct: event.detail.correct,
|
| 185 |
+
timestamp: Date.now()
|
| 186 |
+
});
|
| 187 |
+
});
|
| 188 |
+
|
| 189 |
+
document.addEventListener('chatInteraction', (event) => {
|
| 190 |
+
this.logInteraction({
|
| 191 |
+
type: 'chat_interaction',
|
| 192 |
+
questionType: event.detail.questionType,
|
| 193 |
+
word: event.detail.word,
|
| 194 |
+
timestamp: Date.now()
|
| 195 |
+
});
|
| 196 |
+
});
|
| 197 |
+
|
| 198 |
+
// Listen for user ranking events
|
| 199 |
+
document.addEventListener('userRanking', (event) => {
|
| 200 |
+
const rankingData = {
|
| 201 |
+
...event.detail,
|
| 202 |
+
roundIndex: this.currentRoundIndex,
|
| 203 |
+
roundDetails: this.sessionData.rounds[this.currentRoundIndex]
|
| 204 |
+
};
|
| 205 |
+
|
| 206 |
+
this.sessionData.userRankings.push(rankingData);
|
| 207 |
+
|
| 208 |
+
this.logInteraction({
|
| 209 |
+
type: 'user_ranking',
|
| 210 |
+
averageRating: event.detail.averageRating,
|
| 211 |
+
ratings: event.detail.ratings,
|
| 212 |
+
timestamp: Date.now()
|
| 213 |
+
});
|
| 214 |
+
});
|
| 215 |
+
}
|
| 216 |
+
|
| 217 |
+
logInteraction(interaction) {
|
| 218 |
+
this.sessionData.interactions.push(interaction);
|
| 219 |
+
|
| 220 |
+
// Log to console for real-time monitoring
|
| 221 |
+
console.log(`[TestRunner] ${interaction.type}:`, interaction);
|
| 222 |
+
}
|
| 223 |
+
|
| 224 |
+
generateReport() {
|
| 225 |
+
const endTime = Date.now();
|
| 226 |
+
const totalTime = endTime - this.sessionData.startTime;
|
| 227 |
+
const perf = this.sessionData.performance;
|
| 228 |
+
|
| 229 |
+
// Calculate user ranking summary
|
| 230 |
+
const userRankingSummary = this.calculateUserRankingSummary();
|
| 231 |
+
|
| 232 |
+
const report = {
|
| 233 |
+
...this.sessionData,
|
| 234 |
+
endTime: endTime,
|
| 235 |
+
totalSessionTime: totalTime,
|
| 236 |
+
summary: {
|
| 237 |
+
totalRounds: this.sessionData.rounds.length,
|
| 238 |
+
averageScore: this.sessionData.rounds.length > 0
|
| 239 |
+
? this.sessionData.rounds.reduce((sum, round) => sum + round.score, 0) / this.sessionData.rounds.length
|
| 240 |
+
: 0,
|
| 241 |
+
wordSelectionSuccessRate: perf.wordSelectionRequests > 0
|
| 242 |
+
? perf.wordSelectionSuccess / perf.wordSelectionRequests
|
| 243 |
+
: 0,
|
| 244 |
+
wordSelectionAvgTime: perf.wordSelectionSuccess > 0
|
| 245 |
+
? perf.wordSelectionTime / perf.wordSelectionSuccess
|
| 246 |
+
: 0,
|
| 247 |
+
contextualizationSuccessRate: perf.contextualizationRequests > 0
|
| 248 |
+
? perf.contextualizationSuccess / perf.contextualizationRequests
|
| 249 |
+
: 0,
|
| 250 |
+
contextualizationAvgTime: perf.contextualizationSuccess > 0
|
| 251 |
+
? perf.contextualizationTime / perf.contextualizationSuccess
|
| 252 |
+
: 0,
|
| 253 |
+
chatHintSuccessRate: perf.chatHintRequests > 0
|
| 254 |
+
? perf.chatHintSuccess / perf.chatHintRequests
|
| 255 |
+
: 0,
|
| 256 |
+
chatHintAvgTime: perf.chatHintSuccess > 0
|
| 257 |
+
? perf.chatHintTime / perf.chatHintSuccess
|
| 258 |
+
: 0,
|
| 259 |
+
totalErrors: perf.errors.length,
|
| 260 |
+
userRankingSummary: userRankingSummary
|
| 261 |
+
}
|
| 262 |
+
};
|
| 263 |
+
|
| 264 |
+
return report;
|
| 265 |
+
}
|
| 266 |
+
|
| 267 |
+
calculateUserRankingSummary() {
|
| 268 |
+
if (this.sessionData.userRankings.length === 0) {
|
| 269 |
+
return null;
|
| 270 |
+
}
|
| 271 |
+
|
| 272 |
+
const categories = ['word_selection', 'passage_quality', 'hint_helpfulness', 'overall_experience'];
|
| 273 |
+
const summary = {
|
| 274 |
+
totalRankings: this.sessionData.userRankings.length,
|
| 275 |
+
averageRatings: {},
|
| 276 |
+
categoryBreakdown: {},
|
| 277 |
+
comments: [],
|
| 278 |
+
overallUserSatisfaction: 0
|
| 279 |
+
};
|
| 280 |
+
|
| 281 |
+
// Calculate average ratings per category
|
| 282 |
+
categories.forEach(category => {
|
| 283 |
+
const ratings = this.sessionData.userRankings
|
| 284 |
+
.map(r => r.ratings[category])
|
| 285 |
+
.filter(r => r !== undefined);
|
| 286 |
+
|
| 287 |
+
if (ratings.length > 0) {
|
| 288 |
+
summary.averageRatings[category] =
|
| 289 |
+
ratings.reduce((a, b) => a + b, 0) / ratings.length;
|
| 290 |
+
|
| 291 |
+
// Distribution of ratings
|
| 292 |
+
summary.categoryBreakdown[category] = {
|
| 293 |
+
1: ratings.filter(r => r === 1).length,
|
| 294 |
+
2: ratings.filter(r => r === 2).length,
|
| 295 |
+
3: ratings.filter(r => r === 3).length,
|
| 296 |
+
4: ratings.filter(r => r === 4).length,
|
| 297 |
+
5: ratings.filter(r => r === 5).length
|
| 298 |
+
};
|
| 299 |
+
}
|
| 300 |
+
});
|
| 301 |
+
|
| 302 |
+
// Calculate overall satisfaction
|
| 303 |
+
const allRatings = this.sessionData.userRankings
|
| 304 |
+
.map(r => r.averageRating)
|
| 305 |
+
.filter(r => r !== undefined);
|
| 306 |
+
|
| 307 |
+
if (allRatings.length > 0) {
|
| 308 |
+
summary.overallUserSatisfaction =
|
| 309 |
+
allRatings.reduce((a, b) => a + b, 0) / allRatings.length;
|
| 310 |
+
}
|
| 311 |
+
|
| 312 |
+
// Collect comments with context
|
| 313 |
+
summary.comments = this.sessionData.userRankings
|
| 314 |
+
.filter(r => r.comments)
|
| 315 |
+
.map(r => ({
|
| 316 |
+
timestamp: r.timestamp,
|
| 317 |
+
comment: r.comments,
|
| 318 |
+
averageRating: r.averageRating,
|
| 319 |
+
roundLevel: r.roundDetails?.level,
|
| 320 |
+
roundScore: r.roundDetails?.score
|
| 321 |
+
}));
|
| 322 |
+
|
| 323 |
+
return summary;
|
| 324 |
+
}
|
| 325 |
+
|
| 326 |
+
async saveReport() {
|
| 327 |
+
const report = this.generateReport();
|
| 328 |
+
const timestamp = new Date().toISOString().replace(/[:.]/g, '-');
|
| 329 |
+
const filename = `game_test_${this.modelConfig.modelId.replace(/[\/\\:]/g, '_')}_${timestamp}.json`;
|
| 330 |
+
|
| 331 |
+
try {
|
| 332 |
+
// Try to save via browser download
|
| 333 |
+
this.downloadReport(report, filename);
|
| 334 |
+
|
| 335 |
+
// Also try to save to output folder if possible (server-side)
|
| 336 |
+
await this.saveToServer(report, filename);
|
| 337 |
+
|
| 338 |
+
console.log(`Test report saved: ${filename}`);
|
| 339 |
+
return filename;
|
| 340 |
+
} catch (error) {
|
| 341 |
+
console.error('Error saving test report:', error);
|
| 342 |
+
return null;
|
| 343 |
+
}
|
| 344 |
+
}
|
| 345 |
+
|
| 346 |
+
downloadReport(report, filename) {
|
| 347 |
+
const jsonString = JSON.stringify(report, null, 2);
|
| 348 |
+
const blob = new Blob([jsonString], { type: 'application/json' });
|
| 349 |
+
const url = URL.createObjectURL(blob);
|
| 350 |
+
|
| 351 |
+
const a = document.createElement('a');
|
| 352 |
+
a.href = url;
|
| 353 |
+
a.download = filename;
|
| 354 |
+
document.body.appendChild(a);
|
| 355 |
+
a.click();
|
| 356 |
+
document.body.removeChild(a);
|
| 357 |
+
URL.revokeObjectURL(url);
|
| 358 |
+
}
|
| 359 |
+
|
| 360 |
+
async saveToServer(report, filename) {
|
| 361 |
+
try {
|
| 362 |
+
const response = await fetch('/api/save-test-report', {
|
| 363 |
+
method: 'POST',
|
| 364 |
+
headers: {
|
| 365 |
+
'Content-Type': 'application/json'
|
| 366 |
+
},
|
| 367 |
+
body: JSON.stringify({
|
| 368 |
+
filename: filename,
|
| 369 |
+
data: report
|
| 370 |
+
})
|
| 371 |
+
});
|
| 372 |
+
|
| 373 |
+
if (!response.ok) {
|
| 374 |
+
throw new Error(`Server save failed: ${response.status}`);
|
| 375 |
+
}
|
| 376 |
+
} catch (error) {
|
| 377 |
+
console.log('Server save not available, using browser download only');
|
| 378 |
+
}
|
| 379 |
+
}
|
| 380 |
+
|
| 381 |
+
// Utility methods for analysis
|
| 382 |
+
getWordSelectionAnalytics() {
|
| 383 |
+
const wordSelectionInteractions = this.sessionData.interactions.filter(
|
| 384 |
+
i => i.type === 'ai_request_success' && i.requestType === 'word_selection'
|
| 385 |
+
);
|
| 386 |
+
|
| 387 |
+
return {
|
| 388 |
+
count: wordSelectionInteractions.length,
|
| 389 |
+
averageResponseTime: wordSelectionInteractions.length > 0
|
| 390 |
+
? wordSelectionInteractions.reduce((sum, i) => sum + i.responseTime, 0) / wordSelectionInteractions.length
|
| 391 |
+
: 0,
|
| 392 |
+
averageResponseLength: wordSelectionInteractions.length > 0
|
| 393 |
+
? wordSelectionInteractions.reduce((sum, i) => sum + i.responseLength, 0) / wordSelectionInteractions.length
|
| 394 |
+
: 0
|
| 395 |
+
};
|
| 396 |
+
}
|
| 397 |
+
|
| 398 |
+
getChatHintAnalytics() {
|
| 399 |
+
const chatHintInteractions = this.sessionData.interactions.filter(
|
| 400 |
+
i => i.type === 'chat_interaction'
|
| 401 |
+
);
|
| 402 |
+
|
| 403 |
+
const questionTypes = {};
|
| 404 |
+
chatHintInteractions.forEach(interaction => {
|
| 405 |
+
const type = interaction.questionType || 'unknown';
|
| 406 |
+
questionTypes[type] = (questionTypes[type] || 0) + 1;
|
| 407 |
+
});
|
| 408 |
+
|
| 409 |
+
return {
|
| 410 |
+
totalHints: chatHintInteractions.length,
|
| 411 |
+
questionTypeBreakdown: questionTypes
|
| 412 |
+
};
|
| 413 |
+
}
|
| 414 |
+
|
| 415 |
+
getUserPerformanceAnalytics() {
|
| 416 |
+
const answerInteractions = this.sessionData.interactions.filter(
|
| 417 |
+
i => i.type === 'user_answer'
|
| 418 |
+
);
|
| 419 |
+
|
| 420 |
+
const correctAnswers = answerInteractions.filter(i => i.correct).length;
|
| 421 |
+
|
| 422 |
+
return {
|
| 423 |
+
totalAnswers: answerInteractions.length,
|
| 424 |
+
correctAnswers: correctAnswers,
|
| 425 |
+
accuracy: answerInteractions.length > 0 ? correctAnswers / answerInteractions.length : 0
|
| 426 |
+
};
|
| 427 |
+
}
|
| 428 |
+
}
|
| 429 |
+
|
| 430 |
+
// Initialize test runner if in test mode
|
| 431 |
+
window.addEventListener('DOMContentLoaded', () => {
|
| 432 |
+
const urlParams = new URLSearchParams(window.location.search);
|
| 433 |
+
if (urlParams.get('testMode') === 'true') {
|
| 434 |
+
const modelId = urlParams.get('testModel');
|
| 435 |
+
const isLocal = urlParams.get('local') === 'true';
|
| 436 |
+
|
| 437 |
+
if (modelId) {
|
| 438 |
+
window.testGameRunner = new TestGameRunner({
|
| 439 |
+
modelId: modelId,
|
| 440 |
+
modelName: modelId,
|
| 441 |
+
provider: isLocal ? 'local' : 'openrouter'
|
| 442 |
+
});
|
| 443 |
+
|
| 444 |
+
console.log('Test Game Runner initialized for model:', modelId);
|
| 445 |
+
|
| 446 |
+
// Add end session button
|
| 447 |
+
const endButton = document.createElement('button');
|
| 448 |
+
endButton.textContent = 'End Test Session';
|
| 449 |
+
endButton.style.cssText = `
|
| 450 |
+
position: fixed;
|
| 451 |
+
top: 10px;
|
| 452 |
+
right: 10px;
|
| 453 |
+
z-index: 1000;
|
| 454 |
+
padding: 10px 15px;
|
| 455 |
+
background: #dc3545;
|
| 456 |
+
color: white;
|
| 457 |
+
border: none;
|
| 458 |
+
border-radius: 5px;
|
| 459 |
+
cursor: pointer;
|
| 460 |
+
`;
|
| 461 |
+
|
| 462 |
+
endButton.addEventListener('click', async () => {
|
| 463 |
+
const filename = await window.testGameRunner.saveReport();
|
| 464 |
+
alert(`Test session ended. Report saved as: ${filename}`);
|
| 465 |
+
window.close();
|
| 466 |
+
});
|
| 467 |
+
|
| 468 |
+
document.body.appendChild(endButton);
|
| 469 |
+
}
|
| 470 |
+
}
|
| 471 |
+
});
|
| 472 |
+
|
| 473 |
+
export { TestGameRunner };
|
src/testReportGenerator.js
ADDED
|
@@ -0,0 +1,453 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
/**
|
| 2 |
+
* Comprehensive Test Report Generator
|
| 3 |
+
* Analyzes test results and generates detailed reports
|
| 4 |
+
*/
|
| 5 |
+
|
| 6 |
+
class TestReportGenerator {
|
| 7 |
+
constructor() {
|
| 8 |
+
this.reportTemplates = {
|
| 9 |
+
summary: this.generateSummaryReport.bind(this),
|
| 10 |
+
detailed: this.generateDetailedReport.bind(this),
|
| 11 |
+
comparison: this.generateComparisonReport.bind(this),
|
| 12 |
+
performance: this.generatePerformanceReport.bind(this),
|
| 13 |
+
markdown: this.generateMarkdownReport.bind(this)
|
| 14 |
+
};
|
| 15 |
+
}
|
| 16 |
+
|
| 17 |
+
async generateAllReports(testResults, outputFormat = 'all') {
|
| 18 |
+
const reports = {};
|
| 19 |
+
|
| 20 |
+
if (outputFormat === 'all' || outputFormat === 'summary') {
|
| 21 |
+
reports.summary = this.generateSummaryReport(testResults);
|
| 22 |
+
}
|
| 23 |
+
|
| 24 |
+
if (outputFormat === 'all' || outputFormat === 'detailed') {
|
| 25 |
+
reports.detailed = this.generateDetailedReport(testResults);
|
| 26 |
+
}
|
| 27 |
+
|
| 28 |
+
if (outputFormat === 'all' || outputFormat === 'comparison') {
|
| 29 |
+
reports.comparison = this.generateComparisonReport(testResults);
|
| 30 |
+
}
|
| 31 |
+
|
| 32 |
+
if (outputFormat === 'all' || outputFormat === 'performance') {
|
| 33 |
+
reports.performance = this.generatePerformanceReport(testResults);
|
| 34 |
+
}
|
| 35 |
+
|
| 36 |
+
if (outputFormat === 'all' || outputFormat === 'markdown') {
|
| 37 |
+
reports.markdown = this.generateMarkdownReport(testResults);
|
| 38 |
+
}
|
| 39 |
+
|
| 40 |
+
return reports;
|
| 41 |
+
}
|
| 42 |
+
|
| 43 |
+
generateSummaryReport(testResults) {
|
| 44 |
+
const summary = {
|
| 45 |
+
testOverview: {
|
| 46 |
+
timestamp: testResults.timestamp,
|
| 47 |
+
totalModels: testResults.tests.length,
|
| 48 |
+
testDuration: this.calculateTotalTestDuration(testResults.tests),
|
| 49 |
+
successfulTests: testResults.tests.filter(t => !t.error).length
|
| 50 |
+
},
|
| 51 |
+
topPerformers: this.getTopPerformers(testResults.tests),
|
| 52 |
+
categoryAverages: this.calculateCategoryAverages(testResults.tests),
|
| 53 |
+
recommendations: this.generateRecommendations(testResults.tests)
|
| 54 |
+
};
|
| 55 |
+
|
| 56 |
+
return summary;
|
| 57 |
+
}
|
| 58 |
+
|
| 59 |
+
generateDetailedReport(testResults) {
|
| 60 |
+
const detailed = {
|
| 61 |
+
testMetadata: {
|
| 62 |
+
timestamp: testResults.timestamp,
|
| 63 |
+
totalModels: testResults.tests.length,
|
| 64 |
+
testFrameworkVersion: '1.0.0'
|
| 65 |
+
},
|
| 66 |
+
modelResults: testResults.tests.map(test => ({
|
| 67 |
+
modelInfo: {
|
| 68 |
+
id: test.modelId,
|
| 69 |
+
name: test.modelName,
|
| 70 |
+
provider: test.provider
|
| 71 |
+
},
|
| 72 |
+
overallPerformance: {
|
| 73 |
+
score: test.overallScore,
|
| 74 |
+
totalTime: test.totalTime,
|
| 75 |
+
rank: this.calculateRank(test, testResults.tests)
|
| 76 |
+
},
|
| 77 |
+
wordSelection: this.analyzeWordSelection(test.wordSelection),
|
| 78 |
+
contextualization: this.analyzeContextualization(test.contextualization),
|
| 79 |
+
chatHints: this.analyzeChatHints(test.chatHints),
|
| 80 |
+
errorAnalysis: this.analyzeErrors(test)
|
| 81 |
+
}))
|
| 82 |
+
};
|
| 83 |
+
|
| 84 |
+
return detailed;
|
| 85 |
+
}
|
| 86 |
+
|
| 87 |
+
generateComparisonReport(testResults) {
|
| 88 |
+
const validTests = testResults.tests.filter(t => !t.error);
|
| 89 |
+
|
| 90 |
+
const comparison = {
|
| 91 |
+
modelComparison: this.createModelComparisonMatrix(validTests),
|
| 92 |
+
providerAnalysis: this.analyzeByProvider(validTests),
|
| 93 |
+
performanceMetrics: {
|
| 94 |
+
wordSelection: this.compareWordSelectionMetrics(validTests),
|
| 95 |
+
contextualization: this.compareContextualizationMetrics(validTests),
|
| 96 |
+
chatHints: this.compareChatHintMetrics(validTests),
|
| 97 |
+
responseTime: this.compareResponseTimes(validTests)
|
| 98 |
+
},
|
| 99 |
+
recommendations: {
|
| 100 |
+
bestOverall: this.getBestOverallModel(validTests),
|
| 101 |
+
bestForWordSelection: this.getBestForTask(validTests, 'wordSelection'),
|
| 102 |
+
bestForContextualization: this.getBestForTask(validTests, 'contextualization'),
|
| 103 |
+
bestForChatHints: this.getBestForTask(validTests, 'chatHints'),
|
| 104 |
+
fastestResponse: this.getFastestModel(validTests),
|
| 105 |
+
mostReliable: this.getMostReliableModel(validTests)
|
| 106 |
+
}
|
| 107 |
+
};
|
| 108 |
+
|
| 109 |
+
return comparison;
|
| 110 |
+
}
|
| 111 |
+
|
| 112 |
+
generatePerformanceReport(testResults) {
|
| 113 |
+
const performance = {
|
| 114 |
+
responseTimeAnalysis: this.analyzeResponseTimes(testResults.tests),
|
| 115 |
+
successRateAnalysis: this.analyzeSuccessRates(testResults.tests),
|
| 116 |
+
qualityMetrics: this.analyzeQualityMetrics(testResults.tests),
|
| 117 |
+
scalabilityInsights: this.analyzeScalability(testResults.tests),
|
| 118 |
+
reliabilityMetrics: this.analyzeReliability(testResults.tests)
|
| 119 |
+
};
|
| 120 |
+
|
| 121 |
+
return performance;
|
| 122 |
+
}
|
| 123 |
+
|
| 124 |
+
generateMarkdownReport(testResults) {
|
| 125 |
+
const summary = this.generateSummaryReport(testResults);
|
| 126 |
+
const comparison = this.generateComparisonReport(testResults);
|
| 127 |
+
|
| 128 |
+
let markdown = `# Cloze Reader Model Testing Report\n\n`;
|
| 129 |
+
markdown += `**Generated:** ${new Date().toLocaleString()}\n`;
|
| 130 |
+
markdown += `**Test Timestamp:** ${testResults.timestamp}\n`;
|
| 131 |
+
markdown += `**Models Tested:** ${testResults.tests.length}\n\n`;
|
| 132 |
+
|
| 133 |
+
// Executive Summary
|
| 134 |
+
markdown += `## Executive Summary\n\n`;
|
| 135 |
+
markdown += `- **Successful Tests:** ${summary.testOverview.successfulTests}/${summary.testOverview.totalModels}\n`;
|
| 136 |
+
markdown += `- **Best Overall Model:** ${comparison.recommendations.bestOverall.name} (${comparison.recommendations.bestOverall.score.toFixed(1)}/100)\n`;
|
| 137 |
+
markdown += `- **Average Response Time:** ${this.formatTime(this.calculateAverageResponseTime(testResults.tests))}\n\n`;
|
| 138 |
+
|
| 139 |
+
// Top Performers
|
| 140 |
+
markdown += `## Top Performers\n\n`;
|
| 141 |
+
markdown += `| Rank | Model | Score | Provider |\n`;
|
| 142 |
+
markdown += `|------|-------|-------|----------|\n`;
|
| 143 |
+
summary.topPerformers.forEach((model, index) => {
|
| 144 |
+
markdown += `| ${index + 1} | ${model.name} | ${model.score.toFixed(1)} | ${model.provider} |\n`;
|
| 145 |
+
});
|
| 146 |
+
markdown += `\n`;
|
| 147 |
+
|
| 148 |
+
// Performance by Category
|
| 149 |
+
markdown += `## Performance by Category\n\n`;
|
| 150 |
+
markdown += `### Word Selection\n`;
|
| 151 |
+
markdown += `- **Best:** ${comparison.recommendations.bestForWordSelection.name} (${(comparison.recommendations.bestForWordSelection.successRate * 100).toFixed(1)}% success rate)\n`;
|
| 152 |
+
markdown += `- **Average Success Rate:** ${(summary.categoryAverages.wordSelection.successRate * 100).toFixed(1)}%\n`;
|
| 153 |
+
markdown += `- **Average Response Time:** ${this.formatTime(summary.categoryAverages.wordSelection.averageTime)}\n\n`;
|
| 154 |
+
|
| 155 |
+
markdown += `### Contextualization\n`;
|
| 156 |
+
markdown += `- **Best:** ${comparison.recommendations.bestForContextualization.name} (${(comparison.recommendations.bestForContextualization.successRate * 100).toFixed(1)}% success rate)\n`;
|
| 157 |
+
markdown += `- **Average Success Rate:** ${(summary.categoryAverages.contextualization.successRate * 100).toFixed(1)}%\n`;
|
| 158 |
+
markdown += `- **Average Response Time:** ${this.formatTime(summary.categoryAverages.contextualization.averageTime)}\n\n`;
|
| 159 |
+
|
| 160 |
+
markdown += `### Chat Hints\n`;
|
| 161 |
+
markdown += `- **Best:** ${comparison.recommendations.bestForChatHints.name} (${(comparison.recommendations.bestForChatHints.successRate * 100).toFixed(1)}% success rate)\n`;
|
| 162 |
+
markdown += `- **Average Success Rate:** ${(summary.categoryAverages.chatHints.successRate * 100).toFixed(1)}%\n`;
|
| 163 |
+
markdown += `- **Average Response Time:** ${this.formatTime(summary.categoryAverages.chatHints.averageTime)}\n\n`;
|
| 164 |
+
|
| 165 |
+
// Add user rankings section if available
|
| 166 |
+
const hasUserRankings = testResults.tests.some(t => t.userRankings?.totalRankings > 0);
|
| 167 |
+
if (hasUserRankings) {
|
| 168 |
+
markdown += `## User Satisfaction Ratings\n\n`;
|
| 169 |
+
markdown += `| Model | Overall Satisfaction | Word Selection | Passage Quality | Hint Helpfulness | Overall Experience |\n`;
|
| 170 |
+
markdown += `|-------|---------------------|----------------|-----------------|------------------|--------------------|\n`;
|
| 171 |
+
|
| 172 |
+
testResults.tests.forEach(test => {
|
| 173 |
+
if (test.userRankings?.totalRankings > 0) {
|
| 174 |
+
const ur = test.userRankings;
|
| 175 |
+
const avg = ur.averageRatings || {};
|
| 176 |
+
markdown += `| ${test.modelName} | ${ur.overallUserSatisfaction.toFixed(1)}/5 | ${(avg.word_selection || 0).toFixed(1)} | ${(avg.passage_quality || 0).toFixed(1)} | ${(avg.hint_helpfulness || 0).toFixed(1)} | ${(avg.overall_experience || 0).toFixed(1)} |\n`;
|
| 177 |
+
}
|
| 178 |
+
});
|
| 179 |
+
markdown += `\n`;
|
| 180 |
+
|
| 181 |
+
// Add user comments if any
|
| 182 |
+
const allComments = testResults.tests
|
| 183 |
+
.filter(t => t.userRankings?.comments?.length > 0)
|
| 184 |
+
.flatMap(t => t.userRankings.comments.map(c => ({ ...c, model: t.modelName })));
|
| 185 |
+
|
| 186 |
+
if (allComments.length > 0) {
|
| 187 |
+
markdown += `### User Comments\n\n`;
|
| 188 |
+
allComments.forEach(comment => {
|
| 189 |
+
markdown += `- **${comment.model}** (Rating: ${comment.averageRating.toFixed(1)}): "${comment.comment}"\n`;
|
| 190 |
+
});
|
| 191 |
+
markdown += `\n`;
|
| 192 |
+
}
|
| 193 |
+
}
|
| 194 |
+
|
| 195 |
+
// Detailed Results
|
| 196 |
+
markdown += `## Detailed Results\n\n`;
|
| 197 |
+
testResults.tests.forEach(test => {
|
| 198 |
+
if (!test.error) {
|
| 199 |
+
markdown += `### ${test.modelName}\n`;
|
| 200 |
+
markdown += `- **Provider:** ${test.provider}\n`;
|
| 201 |
+
markdown += `- **Overall Score:** ${test.overallScore.toFixed(1)}/100\n`;
|
| 202 |
+
markdown += `- **Total Time:** ${this.formatTime(test.totalTime)}\n`;
|
| 203 |
+
markdown += `- **Word Selection:** ${(test.wordSelection?.successRate * 100 || 0).toFixed(1)}% success\n`;
|
| 204 |
+
markdown += `- **Contextualization:** ${(test.contextualization?.successRate * 100 || 0).toFixed(1)}% success\n`;
|
| 205 |
+
markdown += `- **Chat Hints:** ${(test.chatHints?.successRate * 100 || 0).toFixed(1)}% success\n\n`;
|
| 206 |
+
}
|
| 207 |
+
});
|
| 208 |
+
|
| 209 |
+
// Recommendations
|
| 210 |
+
markdown += `## Recommendations\n\n`;
|
| 211 |
+
summary.recommendations.forEach(rec => {
|
| 212 |
+
markdown += `- ${rec}\n`;
|
| 213 |
+
});
|
| 214 |
+
|
| 215 |
+
return markdown;
|
| 216 |
+
}
|
| 217 |
+
|
| 218 |
+
// Helper methods for analysis
|
| 219 |
+
calculateTotalTestDuration(tests) {
|
| 220 |
+
return tests.reduce((total, test) => total + (test.totalTime || 0), 0);
|
| 221 |
+
}
|
| 222 |
+
|
| 223 |
+
getTopPerformers(tests, limit = 5) {
|
| 224 |
+
return tests
|
| 225 |
+
.filter(t => !t.error && t.overallScore)
|
| 226 |
+
.sort((a, b) => b.overallScore - a.overallScore)
|
| 227 |
+
.slice(0, limit)
|
| 228 |
+
.map(test => ({
|
| 229 |
+
name: test.modelName,
|
| 230 |
+
score: test.overallScore,
|
| 231 |
+
provider: test.provider
|
| 232 |
+
}));
|
| 233 |
+
}
|
| 234 |
+
|
| 235 |
+
calculateCategoryAverages(tests) {
|
| 236 |
+
const validTests = tests.filter(t => !t.error);
|
| 237 |
+
|
| 238 |
+
return {
|
| 239 |
+
wordSelection: this.calculateCategoryAverage(validTests, 'wordSelection'),
|
| 240 |
+
contextualization: this.calculateCategoryAverage(validTests, 'contextualization'),
|
| 241 |
+
chatHints: this.calculateCategoryAverage(validTests, 'chatHints')
|
| 242 |
+
};
|
| 243 |
+
}
|
| 244 |
+
|
| 245 |
+
calculateCategoryAverage(tests, category) {
|
| 246 |
+
const validCategoryTests = tests.filter(t => t[category]);
|
| 247 |
+
|
| 248 |
+
if (validCategoryTests.length === 0) {
|
| 249 |
+
return { successRate: 0, averageTime: 0, qualityScore: 0 };
|
| 250 |
+
}
|
| 251 |
+
|
| 252 |
+
return {
|
| 253 |
+
successRate: validCategoryTests.reduce((sum, t) => sum + (t[category].successRate || 0), 0) / validCategoryTests.length,
|
| 254 |
+
averageTime: validCategoryTests.reduce((sum, t) => sum + (t[category].averageTime || 0), 0) / validCategoryTests.length,
|
| 255 |
+
qualityScore: validCategoryTests.reduce((sum, t) => sum + (t[category].qualityScore || t[category].relevanceScore || t[category].helpfulnessScore || 0), 0) / validCategoryTests.length
|
| 256 |
+
};
|
| 257 |
+
}
|
| 258 |
+
|
| 259 |
+
generateRecommendations(tests) {
|
| 260 |
+
const recommendations = [];
|
| 261 |
+
const validTests = tests.filter(t => !t.error);
|
| 262 |
+
|
| 263 |
+
if (validTests.length === 0) {
|
| 264 |
+
return ['No successful tests to generate recommendations.'];
|
| 265 |
+
}
|
| 266 |
+
|
| 267 |
+
const bestOverall = validTests.reduce((best, test) =>
|
| 268 |
+
test.overallScore > best.overallScore ? test : best
|
| 269 |
+
);
|
| 270 |
+
|
| 271 |
+
recommendations.push(`For overall best performance, use ${bestOverall.modelName} (${bestOverall.provider})`);
|
| 272 |
+
|
| 273 |
+
// Provider-specific recommendations
|
| 274 |
+
const providerPerformance = this.analyzeByProvider(validTests);
|
| 275 |
+
const bestProvider = Object.keys(providerPerformance)
|
| 276 |
+
.reduce((best, provider) =>
|
| 277 |
+
providerPerformance[provider].averageScore > providerPerformance[best]?.averageScore ? provider : best
|
| 278 |
+
);
|
| 279 |
+
|
| 280 |
+
recommendations.push(`${bestProvider} models show the best average performance`);
|
| 281 |
+
|
| 282 |
+
// Speed vs quality trade-offs
|
| 283 |
+
const fastestGoodModel = validTests
|
| 284 |
+
.filter(t => t.overallScore > 70)
|
| 285 |
+
.sort((a, b) => a.totalTime - b.totalTime)[0];
|
| 286 |
+
|
| 287 |
+
if (fastestGoodModel) {
|
| 288 |
+
recommendations.push(`For fastest good performance, consider ${fastestGoodModel.modelName}`);
|
| 289 |
+
}
|
| 290 |
+
|
| 291 |
+
return recommendations;
|
| 292 |
+
}
|
| 293 |
+
|
| 294 |
+
analyzeByProvider(tests) {
|
| 295 |
+
const providerGroups = {};
|
| 296 |
+
|
| 297 |
+
tests.forEach(test => {
|
| 298 |
+
if (!providerGroups[test.provider]) {
|
| 299 |
+
providerGroups[test.provider] = [];
|
| 300 |
+
}
|
| 301 |
+
providerGroups[test.provider].push(test);
|
| 302 |
+
});
|
| 303 |
+
|
| 304 |
+
const analysis = {};
|
| 305 |
+
Object.keys(providerGroups).forEach(provider => {
|
| 306 |
+
const providerTests = providerGroups[provider];
|
| 307 |
+
analysis[provider] = {
|
| 308 |
+
count: providerTests.length,
|
| 309 |
+
averageScore: providerTests.reduce((sum, t) => sum + t.overallScore, 0) / providerTests.length,
|
| 310 |
+
averageTime: providerTests.reduce((sum, t) => sum + t.totalTime, 0) / providerTests.length,
|
| 311 |
+
successRate: providerTests.filter(t => !t.error).length / providerTests.length
|
| 312 |
+
};
|
| 313 |
+
});
|
| 314 |
+
|
| 315 |
+
return analysis;
|
| 316 |
+
}
|
| 317 |
+
|
| 318 |
+
getBestOverallModel(tests) {
|
| 319 |
+
return tests.reduce((best, test) =>
|
| 320 |
+
test.overallScore > best.overallScore ? {
|
| 321 |
+
name: test.modelName,
|
| 322 |
+
score: test.overallScore,
|
| 323 |
+
provider: test.provider
|
| 324 |
+
} : best
|
| 325 |
+
, { name: '', score: 0, provider: '' });
|
| 326 |
+
}
|
| 327 |
+
|
| 328 |
+
getBestForTask(tests, taskName) {
|
| 329 |
+
const validTests = tests.filter(t => t[taskName] && t[taskName].successRate !== undefined);
|
| 330 |
+
|
| 331 |
+
if (validTests.length === 0) {
|
| 332 |
+
return { name: 'N/A', successRate: 0, provider: '' };
|
| 333 |
+
}
|
| 334 |
+
|
| 335 |
+
return validTests.reduce((best, test) =>
|
| 336 |
+
test[taskName].successRate > best.successRate ? {
|
| 337 |
+
name: test.modelName,
|
| 338 |
+
successRate: test[taskName].successRate,
|
| 339 |
+
provider: test.provider
|
| 340 |
+
} : best
|
| 341 |
+
, { name: '', successRate: 0, provider: '' });
|
| 342 |
+
}
|
| 343 |
+
|
| 344 |
+
getFastestModel(tests) {
|
| 345 |
+
return tests.reduce((fastest, test) =>
|
| 346 |
+
test.totalTime < fastest.time ? {
|
| 347 |
+
name: test.modelName,
|
| 348 |
+
time: test.totalTime,
|
| 349 |
+
provider: test.provider
|
| 350 |
+
} : fastest
|
| 351 |
+
, { name: '', time: Infinity, provider: '' });
|
| 352 |
+
}
|
| 353 |
+
|
| 354 |
+
getMostReliableModel(tests) {
|
| 355 |
+
// Model with fewest errors and highest success rates across all tasks
|
| 356 |
+
const reliability = tests.map(test => {
|
| 357 |
+
const wordSelectionReliability = test.wordSelection?.successRate || 0;
|
| 358 |
+
const contextualizationReliability = test.contextualization?.successRate || 0;
|
| 359 |
+
const chatHintReliability = test.chatHints?.successRate || 0;
|
| 360 |
+
|
| 361 |
+
const overallReliability = (wordSelectionReliability + contextualizationReliability + chatHintReliability) / 3;
|
| 362 |
+
|
| 363 |
+
return {
|
| 364 |
+
name: test.modelName,
|
| 365 |
+
reliability: overallReliability,
|
| 366 |
+
provider: test.provider
|
| 367 |
+
};
|
| 368 |
+
});
|
| 369 |
+
|
| 370 |
+
return reliability.reduce((most, test) =>
|
| 371 |
+
test.reliability > most.reliability ? test : most
|
| 372 |
+
, { name: '', reliability: 0, provider: '' });
|
| 373 |
+
}
|
| 374 |
+
|
| 375 |
+
calculateAverageResponseTime(tests) {
|
| 376 |
+
const validTests = tests.filter(t => t.totalTime);
|
| 377 |
+
return validTests.reduce((sum, t) => sum + t.totalTime, 0) / validTests.length;
|
| 378 |
+
}
|
| 379 |
+
|
| 380 |
+
formatTime(milliseconds) {
|
| 381 |
+
if (milliseconds < 1000) {
|
| 382 |
+
return `${milliseconds.toFixed(0)}ms`;
|
| 383 |
+
} else if (milliseconds < 60000) {
|
| 384 |
+
return `${(milliseconds / 1000).toFixed(1)}s`;
|
| 385 |
+
} else {
|
| 386 |
+
return `${(milliseconds / 60000).toFixed(1)}m`;
|
| 387 |
+
}
|
| 388 |
+
}
|
| 389 |
+
|
| 390 |
+
async saveReports(reports, baseFilename) {
|
| 391 |
+
const savedFiles = [];
|
| 392 |
+
|
| 393 |
+
for (const [type, content] of Object.entries(reports)) {
|
| 394 |
+
const filename = `${baseFilename}_${type}`;
|
| 395 |
+
let fileContent, extension;
|
| 396 |
+
|
| 397 |
+
if (type === 'markdown') {
|
| 398 |
+
fileContent = content;
|
| 399 |
+
extension = '.md';
|
| 400 |
+
} else {
|
| 401 |
+
fileContent = JSON.stringify(content, null, 2);
|
| 402 |
+
extension = '.json';
|
| 403 |
+
}
|
| 404 |
+
|
| 405 |
+
try {
|
| 406 |
+
await this.saveFile(`${filename}${extension}`, fileContent);
|
| 407 |
+
savedFiles.push(`${filename}${extension}`);
|
| 408 |
+
} catch (error) {
|
| 409 |
+
console.error(`Error saving ${filename}:`, error);
|
| 410 |
+
}
|
| 411 |
+
}
|
| 412 |
+
|
| 413 |
+
return savedFiles;
|
| 414 |
+
}
|
| 415 |
+
|
| 416 |
+
async saveFile(filename, content) {
|
| 417 |
+
// Try to save via browser download
|
| 418 |
+
const blob = new Blob([content], {
|
| 419 |
+
type: filename.endsWith('.md') ? 'text/markdown' : 'application/json'
|
| 420 |
+
});
|
| 421 |
+
const url = URL.createObjectURL(blob);
|
| 422 |
+
|
| 423 |
+
const a = document.createElement('a');
|
| 424 |
+
a.href = url;
|
| 425 |
+
a.download = filename;
|
| 426 |
+
document.body.appendChild(a);
|
| 427 |
+
a.click();
|
| 428 |
+
document.body.removeChild(a);
|
| 429 |
+
URL.revokeObjectURL(url);
|
| 430 |
+
}
|
| 431 |
+
|
| 432 |
+
// Stub methods for detailed analysis (implement as needed)
|
| 433 |
+
analyzeWordSelection(data) { return data; }
|
| 434 |
+
analyzeContextualization(data) { return data; }
|
| 435 |
+
analyzeChatHints(data) { return data; }
|
| 436 |
+
analyzeErrors(test) { return test.error ? [test.error] : []; }
|
| 437 |
+
calculateRank(test, allTests) {
|
| 438 |
+
const sorted = allTests.filter(t => !t.error).sort((a, b) => b.overallScore - a.overallScore);
|
| 439 |
+
return sorted.findIndex(t => t.modelId === test.modelId) + 1;
|
| 440 |
+
}
|
| 441 |
+
createModelComparisonMatrix(tests) { return {}; }
|
| 442 |
+
compareWordSelectionMetrics(tests) { return {}; }
|
| 443 |
+
compareContextualizationMetrics(tests) { return {}; }
|
| 444 |
+
compareChatHintMetrics(tests) { return {}; }
|
| 445 |
+
compareResponseTimes(tests) { return {}; }
|
| 446 |
+
analyzeResponseTimes(tests) { return {}; }
|
| 447 |
+
analyzeSuccessRates(tests) { return {}; }
|
| 448 |
+
analyzeQualityMetrics(tests) { return {}; }
|
| 449 |
+
analyzeScalability(tests) { return {}; }
|
| 450 |
+
analyzeReliability(tests) { return {}; }
|
| 451 |
+
}
|
| 452 |
+
|
| 453 |
+
export { TestReportGenerator };
|
src/userRankingInterface.js
ADDED
|
@@ -0,0 +1,650 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
/**
|
| 2 |
+
* User Ranking Interface for Model Testing
|
| 3 |
+
* Allows users to rate model performance on each task during gameplay
|
| 4 |
+
*/
|
| 5 |
+
|
| 6 |
+
class UserRankingInterface {
|
| 7 |
+
constructor() {
|
| 8 |
+
this.rankings = {
|
| 9 |
+
rounds: [],
|
| 10 |
+
currentRound: null
|
| 11 |
+
};
|
| 12 |
+
|
| 13 |
+
this.rankingCategories = [
|
| 14 |
+
{
|
| 15 |
+
id: 'word_selection',
|
| 16 |
+
name: 'Word Selection Quality',
|
| 17 |
+
description: 'How appropriate were the selected words for this difficulty level?',
|
| 18 |
+
criteria: [
|
| 19 |
+
'Words match the difficulty level',
|
| 20 |
+
'Vocabulary is challenging but fair',
|
| 21 |
+
'Selected words are meaningful in context'
|
| 22 |
+
]
|
| 23 |
+
},
|
| 24 |
+
{
|
| 25 |
+
id: 'passage_quality',
|
| 26 |
+
name: 'Passage Selection',
|
| 27 |
+
description: 'How suitable was this passage for language learning?',
|
| 28 |
+
criteria: [
|
| 29 |
+
'Text is engaging and appropriate',
|
| 30 |
+
'Content is educational',
|
| 31 |
+
'Difficulty matches the level'
|
| 32 |
+
]
|
| 33 |
+
},
|
| 34 |
+
{
|
| 35 |
+
id: 'hint_helpfulness',
|
| 36 |
+
name: 'Hint Quality',
|
| 37 |
+
description: 'How helpful were the AI-generated hints?',
|
| 38 |
+
criteria: [
|
| 39 |
+
'Hints guide without revealing answers',
|
| 40 |
+
'Explanations are clear and educational',
|
| 41 |
+
'Responses are contextually appropriate'
|
| 42 |
+
]
|
| 43 |
+
},
|
| 44 |
+
{
|
| 45 |
+
id: 'overall_experience',
|
| 46 |
+
name: 'Overall Round Experience',
|
| 47 |
+
description: 'How was the overall quality of this round?',
|
| 48 |
+
criteria: [
|
| 49 |
+
'Smooth gameplay experience',
|
| 50 |
+
'AI responses were timely',
|
| 51 |
+
'Educational value was high'
|
| 52 |
+
]
|
| 53 |
+
}
|
| 54 |
+
];
|
| 55 |
+
|
| 56 |
+
this.createRankingUI();
|
| 57 |
+
this.setupEventListeners();
|
| 58 |
+
}
|
| 59 |
+
|
| 60 |
+
createRankingUI() {
|
| 61 |
+
// Create ranking modal
|
| 62 |
+
const modal = document.createElement('div');
|
| 63 |
+
modal.id = 'ranking-modal';
|
| 64 |
+
modal.className = 'ranking-modal';
|
| 65 |
+
modal.innerHTML = `
|
| 66 |
+
<div class="ranking-modal-content">
|
| 67 |
+
<h2>Rate This Round</h2>
|
| 68 |
+
<p class="ranking-subtitle">Help us improve by rating the AI's performance</p>
|
| 69 |
+
|
| 70 |
+
<div id="ranking-categories" class="ranking-categories">
|
| 71 |
+
<!-- Categories will be populated dynamically -->
|
| 72 |
+
</div>
|
| 73 |
+
|
| 74 |
+
<div class="ranking-comments">
|
| 75 |
+
<label for="ranking-comments-input">Additional Comments (Optional):</label>
|
| 76 |
+
<textarea id="ranking-comments-input" rows="3" placeholder="Any specific feedback about this round..."></textarea>
|
| 77 |
+
</div>
|
| 78 |
+
|
| 79 |
+
<div class="ranking-actions">
|
| 80 |
+
<button id="skip-ranking-btn" class="btn-secondary">Skip</button>
|
| 81 |
+
<button id="submit-ranking-btn" class="btn-primary" disabled>Submit Rating</button>
|
| 82 |
+
</div>
|
| 83 |
+
</div>
|
| 84 |
+
`;
|
| 85 |
+
|
| 86 |
+
// Create ranking trigger button
|
| 87 |
+
const triggerButton = document.createElement('button');
|
| 88 |
+
triggerButton.id = 'ranking-trigger-btn';
|
| 89 |
+
triggerButton.className = 'ranking-trigger-btn';
|
| 90 |
+
triggerButton.innerHTML = '⭐ Rate Round';
|
| 91 |
+
triggerButton.style.cssText = `
|
| 92 |
+
position: fixed;
|
| 93 |
+
bottom: 20px;
|
| 94 |
+
left: 20px;
|
| 95 |
+
z-index: 999;
|
| 96 |
+
padding: 10px 20px;
|
| 97 |
+
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
| 98 |
+
color: white;
|
| 99 |
+
border: none;
|
| 100 |
+
border-radius: 25px;
|
| 101 |
+
cursor: pointer;
|
| 102 |
+
font-size: 14px;
|
| 103 |
+
font-weight: bold;
|
| 104 |
+
box-shadow: 0 4px 15px rgba(102, 126, 234, 0.4);
|
| 105 |
+
transition: all 0.3s ease;
|
| 106 |
+
display: none;
|
| 107 |
+
`;
|
| 108 |
+
|
| 109 |
+
// Add styles
|
| 110 |
+
const styles = document.createElement('style');
|
| 111 |
+
styles.textContent = `
|
| 112 |
+
.ranking-modal {
|
| 113 |
+
display: none;
|
| 114 |
+
position: fixed;
|
| 115 |
+
top: 0;
|
| 116 |
+
left: 0;
|
| 117 |
+
width: 100%;
|
| 118 |
+
height: 100%;
|
| 119 |
+
background: rgba(0, 0, 0, 0.5);
|
| 120 |
+
z-index: 1000;
|
| 121 |
+
backdrop-filter: blur(5px);
|
| 122 |
+
}
|
| 123 |
+
|
| 124 |
+
.ranking-modal.active {
|
| 125 |
+
display: flex;
|
| 126 |
+
align-items: center;
|
| 127 |
+
justify-content: center;
|
| 128 |
+
}
|
| 129 |
+
|
| 130 |
+
.ranking-modal-content {
|
| 131 |
+
background: white;
|
| 132 |
+
border-radius: 15px;
|
| 133 |
+
padding: 30px;
|
| 134 |
+
max-width: 600px;
|
| 135 |
+
width: 90%;
|
| 136 |
+
max-height: 80vh;
|
| 137 |
+
overflow-y: auto;
|
| 138 |
+
box-shadow: 0 10px 40px rgba(0, 0, 0, 0.3);
|
| 139 |
+
}
|
| 140 |
+
|
| 141 |
+
.ranking-modal-content h2 {
|
| 142 |
+
color: #2c3e50;
|
| 143 |
+
margin-bottom: 10px;
|
| 144 |
+
text-align: center;
|
| 145 |
+
}
|
| 146 |
+
|
| 147 |
+
.ranking-subtitle {
|
| 148 |
+
color: #7f8c8d;
|
| 149 |
+
text-align: center;
|
| 150 |
+
margin-bottom: 30px;
|
| 151 |
+
}
|
| 152 |
+
|
| 153 |
+
.ranking-category {
|
| 154 |
+
margin-bottom: 25px;
|
| 155 |
+
padding: 20px;
|
| 156 |
+
background: #f8f9fa;
|
| 157 |
+
border-radius: 10px;
|
| 158 |
+
border: 2px solid #e9ecef;
|
| 159 |
+
}
|
| 160 |
+
|
| 161 |
+
.ranking-category h3 {
|
| 162 |
+
color: #2c3e50;
|
| 163 |
+
margin-bottom: 8px;
|
| 164 |
+
font-size: 1.1rem;
|
| 165 |
+
}
|
| 166 |
+
|
| 167 |
+
.ranking-category-description {
|
| 168 |
+
color: #6c757d;
|
| 169 |
+
font-size: 0.9rem;
|
| 170 |
+
margin-bottom: 15px;
|
| 171 |
+
}
|
| 172 |
+
|
| 173 |
+
.ranking-criteria {
|
| 174 |
+
font-size: 0.85rem;
|
| 175 |
+
color: #6c757d;
|
| 176 |
+
margin-bottom: 15px;
|
| 177 |
+
padding-left: 20px;
|
| 178 |
+
}
|
| 179 |
+
|
| 180 |
+
.ranking-criteria li {
|
| 181 |
+
margin-bottom: 5px;
|
| 182 |
+
}
|
| 183 |
+
|
| 184 |
+
.ranking-stars {
|
| 185 |
+
display: flex;
|
| 186 |
+
gap: 10px;
|
| 187 |
+
justify-content: center;
|
| 188 |
+
margin-top: 10px;
|
| 189 |
+
}
|
| 190 |
+
|
| 191 |
+
.ranking-star {
|
| 192 |
+
font-size: 30px;
|
| 193 |
+
color: #ddd;
|
| 194 |
+
cursor: pointer;
|
| 195 |
+
transition: all 0.2s ease;
|
| 196 |
+
}
|
| 197 |
+
|
| 198 |
+
.ranking-star:hover,
|
| 199 |
+
.ranking-star.hover {
|
| 200 |
+
color: #ffd700;
|
| 201 |
+
transform: scale(1.1);
|
| 202 |
+
}
|
| 203 |
+
|
| 204 |
+
.ranking-star.selected {
|
| 205 |
+
color: #ffd700;
|
| 206 |
+
}
|
| 207 |
+
|
| 208 |
+
.ranking-comments {
|
| 209 |
+
margin: 20px 0;
|
| 210 |
+
}
|
| 211 |
+
|
| 212 |
+
.ranking-comments label {
|
| 213 |
+
display: block;
|
| 214 |
+
color: #2c3e50;
|
| 215 |
+
margin-bottom: 8px;
|
| 216 |
+
font-weight: 500;
|
| 217 |
+
}
|
| 218 |
+
|
| 219 |
+
.ranking-comments textarea {
|
| 220 |
+
width: 100%;
|
| 221 |
+
padding: 10px;
|
| 222 |
+
border: 2px solid #e9ecef;
|
| 223 |
+
border-radius: 8px;
|
| 224 |
+
font-family: inherit;
|
| 225 |
+
resize: vertical;
|
| 226 |
+
}
|
| 227 |
+
|
| 228 |
+
.ranking-actions {
|
| 229 |
+
display: flex;
|
| 230 |
+
gap: 15px;
|
| 231 |
+
justify-content: flex-end;
|
| 232 |
+
margin-top: 20px;
|
| 233 |
+
}
|
| 234 |
+
|
| 235 |
+
.btn-primary, .btn-secondary {
|
| 236 |
+
padding: 10px 24px;
|
| 237 |
+
border: none;
|
| 238 |
+
border-radius: 8px;
|
| 239 |
+
font-size: 1rem;
|
| 240 |
+
cursor: pointer;
|
| 241 |
+
transition: all 0.3s ease;
|
| 242 |
+
font-weight: 500;
|
| 243 |
+
}
|
| 244 |
+
|
| 245 |
+
.btn-primary {
|
| 246 |
+
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
| 247 |
+
color: white;
|
| 248 |
+
}
|
| 249 |
+
|
| 250 |
+
.btn-primary:hover:not(:disabled) {
|
| 251 |
+
transform: translateY(-2px);
|
| 252 |
+
box-shadow: 0 6px 20px rgba(102, 126, 234, 0.4);
|
| 253 |
+
}
|
| 254 |
+
|
| 255 |
+
.btn-primary:disabled {
|
| 256 |
+
background: #6c757d;
|
| 257 |
+
cursor: not-allowed;
|
| 258 |
+
}
|
| 259 |
+
|
| 260 |
+
.btn-secondary {
|
| 261 |
+
background: #e9ecef;
|
| 262 |
+
color: #495057;
|
| 263 |
+
}
|
| 264 |
+
|
| 265 |
+
.btn-secondary:hover {
|
| 266 |
+
background: #dee2e6;
|
| 267 |
+
}
|
| 268 |
+
|
| 269 |
+
.ranking-trigger-btn:hover {
|
| 270 |
+
transform: translateY(-2px) scale(1.05);
|
| 271 |
+
box-shadow: 0 6px 20px rgba(102, 126, 234, 0.6);
|
| 272 |
+
}
|
| 273 |
+
|
| 274 |
+
@media (max-width: 600px) {
|
| 275 |
+
.ranking-modal-content {
|
| 276 |
+
padding: 20px;
|
| 277 |
+
}
|
| 278 |
+
|
| 279 |
+
.ranking-star {
|
| 280 |
+
font-size: 24px;
|
| 281 |
+
}
|
| 282 |
+
|
| 283 |
+
.ranking-trigger-btn {
|
| 284 |
+
bottom: 70px;
|
| 285 |
+
padding: 8px 16px;
|
| 286 |
+
font-size: 12px;
|
| 287 |
+
}
|
| 288 |
+
}
|
| 289 |
+
`;
|
| 290 |
+
|
| 291 |
+
document.head.appendChild(styles);
|
| 292 |
+
document.body.appendChild(modal);
|
| 293 |
+
document.body.appendChild(triggerButton);
|
| 294 |
+
|
| 295 |
+
this.populateCategories();
|
| 296 |
+
}
|
| 297 |
+
|
| 298 |
+
populateCategories() {
|
| 299 |
+
const container = document.getElementById('ranking-categories');
|
| 300 |
+
container.innerHTML = '';
|
| 301 |
+
|
| 302 |
+
this.rankingCategories.forEach(category => {
|
| 303 |
+
const categoryDiv = document.createElement('div');
|
| 304 |
+
categoryDiv.className = 'ranking-category';
|
| 305 |
+
categoryDiv.dataset.categoryId = category.id;
|
| 306 |
+
|
| 307 |
+
const criteriaHtml = category.criteria.map(c => `<li>${c}</li>`).join('');
|
| 308 |
+
|
| 309 |
+
categoryDiv.innerHTML = `
|
| 310 |
+
<h3>${category.name}</h3>
|
| 311 |
+
<p class="ranking-category-description">${category.description}</p>
|
| 312 |
+
<ul class="ranking-criteria">${criteriaHtml}</ul>
|
| 313 |
+
<div class="ranking-stars" data-category="${category.id}">
|
| 314 |
+
${[1, 2, 3, 4, 5].map(i =>
|
| 315 |
+
`<span class="ranking-star" data-rating="${i}">★</span>`
|
| 316 |
+
).join('')}
|
| 317 |
+
</div>
|
| 318 |
+
`;
|
| 319 |
+
|
| 320 |
+
container.appendChild(categoryDiv);
|
| 321 |
+
});
|
| 322 |
+
|
| 323 |
+
// Setup star interactions
|
| 324 |
+
this.setupStarInteractions();
|
| 325 |
+
}
|
| 326 |
+
|
| 327 |
+
setupStarInteractions() {
|
| 328 |
+
const starContainers = document.querySelectorAll('.ranking-stars');
|
| 329 |
+
|
| 330 |
+
starContainers.forEach(container => {
|
| 331 |
+
const stars = container.querySelectorAll('.ranking-star');
|
| 332 |
+
const categoryId = container.dataset.category;
|
| 333 |
+
|
| 334 |
+
stars.forEach((star, index) => {
|
| 335 |
+
star.addEventListener('mouseenter', () => {
|
| 336 |
+
this.highlightStars(stars, index + 1);
|
| 337 |
+
});
|
| 338 |
+
|
| 339 |
+
star.addEventListener('click', () => {
|
| 340 |
+
this.selectRating(categoryId, index + 1);
|
| 341 |
+
this.markStarsAsSelected(stars, index + 1);
|
| 342 |
+
this.updateSubmitButton();
|
| 343 |
+
});
|
| 344 |
+
});
|
| 345 |
+
|
| 346 |
+
container.addEventListener('mouseleave', () => {
|
| 347 |
+
const currentRating = this.getCurrentRating(categoryId);
|
| 348 |
+
if (currentRating > 0) {
|
| 349 |
+
this.markStarsAsSelected(stars, currentRating);
|
| 350 |
+
} else {
|
| 351 |
+
this.highlightStars(stars, 0);
|
| 352 |
+
}
|
| 353 |
+
});
|
| 354 |
+
});
|
| 355 |
+
}
|
| 356 |
+
|
| 357 |
+
highlightStars(stars, count) {
|
| 358 |
+
stars.forEach((star, index) => {
|
| 359 |
+
if (index < count) {
|
| 360 |
+
star.classList.add('hover');
|
| 361 |
+
} else {
|
| 362 |
+
star.classList.remove('hover');
|
| 363 |
+
}
|
| 364 |
+
});
|
| 365 |
+
}
|
| 366 |
+
|
| 367 |
+
markStarsAsSelected(stars, count) {
|
| 368 |
+
stars.forEach((star, index) => {
|
| 369 |
+
if (index < count) {
|
| 370 |
+
star.classList.add('selected');
|
| 371 |
+
star.classList.remove('hover');
|
| 372 |
+
} else {
|
| 373 |
+
star.classList.remove('selected');
|
| 374 |
+
star.classList.remove('hover');
|
| 375 |
+
}
|
| 376 |
+
});
|
| 377 |
+
}
|
| 378 |
+
|
| 379 |
+
selectRating(categoryId, rating) {
|
| 380 |
+
if (!this.currentRound) {
|
| 381 |
+
this.currentRound = {
|
| 382 |
+
timestamp: Date.now(),
|
| 383 |
+
ratings: {},
|
| 384 |
+
comments: ''
|
| 385 |
+
};
|
| 386 |
+
}
|
| 387 |
+
|
| 388 |
+
this.currentRound.ratings[categoryId] = rating;
|
| 389 |
+
}
|
| 390 |
+
|
| 391 |
+
getCurrentRating(categoryId) {
|
| 392 |
+
return this.currentRound?.ratings[categoryId] || 0;
|
| 393 |
+
}
|
| 394 |
+
|
| 395 |
+
setupEventListeners() {
|
| 396 |
+
const modal = document.getElementById('ranking-modal');
|
| 397 |
+
const triggerBtn = document.getElementById('ranking-trigger-btn');
|
| 398 |
+
const skipBtn = document.getElementById('skip-ranking-btn');
|
| 399 |
+
const submitBtn = document.getElementById('submit-ranking-btn');
|
| 400 |
+
const commentsInput = document.getElementById('ranking-comments-input');
|
| 401 |
+
|
| 402 |
+
// Show modal
|
| 403 |
+
triggerBtn.addEventListener('click', () => {
|
| 404 |
+
this.showRankingModal();
|
| 405 |
+
});
|
| 406 |
+
|
| 407 |
+
// Skip ranking
|
| 408 |
+
skipBtn.addEventListener('click', () => {
|
| 409 |
+
this.hideRankingModal();
|
| 410 |
+
this.currentRound = null;
|
| 411 |
+
});
|
| 412 |
+
|
| 413 |
+
// Submit ranking
|
| 414 |
+
submitBtn.addEventListener('click', () => {
|
| 415 |
+
this.submitRanking();
|
| 416 |
+
});
|
| 417 |
+
|
| 418 |
+
// Update comments
|
| 419 |
+
commentsInput.addEventListener('input', (e) => {
|
| 420 |
+
if (this.currentRound) {
|
| 421 |
+
this.currentRound.comments = e.target.value;
|
| 422 |
+
}
|
| 423 |
+
});
|
| 424 |
+
|
| 425 |
+
// Close modal on background click
|
| 426 |
+
modal.addEventListener('click', (e) => {
|
| 427 |
+
if (e.target === modal) {
|
| 428 |
+
this.hideRankingModal();
|
| 429 |
+
}
|
| 430 |
+
});
|
| 431 |
+
|
| 432 |
+
// Listen for round completion events
|
| 433 |
+
document.addEventListener('gameRoundComplete', (event) => {
|
| 434 |
+
this.onRoundComplete(event.detail);
|
| 435 |
+
});
|
| 436 |
+
}
|
| 437 |
+
|
| 438 |
+
updateSubmitButton() {
|
| 439 |
+
const submitBtn = document.getElementById('submit-ranking-btn');
|
| 440 |
+
const allRated = this.rankingCategories.every(category =>
|
| 441 |
+
this.getCurrentRating(category.id) > 0
|
| 442 |
+
);
|
| 443 |
+
|
| 444 |
+
submitBtn.disabled = !allRated;
|
| 445 |
+
}
|
| 446 |
+
|
| 447 |
+
showRankingModal() {
|
| 448 |
+
const modal = document.getElementById('ranking-modal');
|
| 449 |
+
modal.classList.add('active');
|
| 450 |
+
|
| 451 |
+
// Reset current round if needed
|
| 452 |
+
if (!this.currentRound) {
|
| 453 |
+
this.currentRound = {
|
| 454 |
+
timestamp: Date.now(),
|
| 455 |
+
ratings: {},
|
| 456 |
+
comments: ''
|
| 457 |
+
};
|
| 458 |
+
}
|
| 459 |
+
|
| 460 |
+
// Clear previous selections
|
| 461 |
+
this.resetUI();
|
| 462 |
+
}
|
| 463 |
+
|
| 464 |
+
hideRankingModal() {
|
| 465 |
+
const modal = document.getElementById('ranking-modal');
|
| 466 |
+
modal.classList.remove('active');
|
| 467 |
+
}
|
| 468 |
+
|
| 469 |
+
resetUI() {
|
| 470 |
+
// Clear all star selections
|
| 471 |
+
document.querySelectorAll('.ranking-star').forEach(star => {
|
| 472 |
+
star.classList.remove('selected', 'hover');
|
| 473 |
+
});
|
| 474 |
+
|
| 475 |
+
// Clear comments
|
| 476 |
+
document.getElementById('ranking-comments-input').value = '';
|
| 477 |
+
|
| 478 |
+
// Disable submit button
|
| 479 |
+
document.getElementById('submit-ranking-btn').disabled = true;
|
| 480 |
+
}
|
| 481 |
+
|
| 482 |
+
submitRanking() {
|
| 483 |
+
if (!this.currentRound) return;
|
| 484 |
+
|
| 485 |
+
// Add metadata
|
| 486 |
+
this.currentRound.submittedAt = Date.now();
|
| 487 |
+
this.currentRound.modelId = window.testGameRunner?.modelConfig?.modelId || 'unknown';
|
| 488 |
+
|
| 489 |
+
// Calculate average rating
|
| 490 |
+
const ratings = Object.values(this.currentRound.ratings);
|
| 491 |
+
this.currentRound.averageRating = ratings.reduce((a, b) => a + b, 0) / ratings.length;
|
| 492 |
+
|
| 493 |
+
// Save ranking
|
| 494 |
+
this.rankings.rounds.push(this.currentRound);
|
| 495 |
+
|
| 496 |
+
// Dispatch event for test runner
|
| 497 |
+
document.dispatchEvent(new CustomEvent('userRanking', {
|
| 498 |
+
detail: this.currentRound
|
| 499 |
+
}));
|
| 500 |
+
|
| 501 |
+
// Show confirmation
|
| 502 |
+
this.showConfirmation();
|
| 503 |
+
|
| 504 |
+
// Reset
|
| 505 |
+
this.hideRankingModal();
|
| 506 |
+
this.currentRound = null;
|
| 507 |
+
|
| 508 |
+
console.log('Ranking submitted:', this.rankings);
|
| 509 |
+
}
|
| 510 |
+
|
| 511 |
+
showConfirmation() {
|
| 512 |
+
const confirmation = document.createElement('div');
|
| 513 |
+
confirmation.style.cssText = `
|
| 514 |
+
position: fixed;
|
| 515 |
+
bottom: 100px;
|
| 516 |
+
left: 50%;
|
| 517 |
+
transform: translateX(-50%);
|
| 518 |
+
background: #28a745;
|
| 519 |
+
color: white;
|
| 520 |
+
padding: 15px 30px;
|
| 521 |
+
border-radius: 8px;
|
| 522 |
+
box-shadow: 0 4px 15px rgba(40, 167, 69, 0.4);
|
| 523 |
+
z-index: 1001;
|
| 524 |
+
animation: slideInUp 0.3s ease;
|
| 525 |
+
`;
|
| 526 |
+
confirmation.textContent = '✓ Thank you for your feedback!';
|
| 527 |
+
|
| 528 |
+
document.body.appendChild(confirmation);
|
| 529 |
+
|
| 530 |
+
setTimeout(() => {
|
| 531 |
+
confirmation.style.animation = 'slideOutDown 0.3s ease';
|
| 532 |
+
setTimeout(() => confirmation.remove(), 300);
|
| 533 |
+
}, 2000);
|
| 534 |
+
}
|
| 535 |
+
|
| 536 |
+
onRoundComplete(roundDetails) {
|
| 537 |
+
// Store round details for context
|
| 538 |
+
if (!this.currentRound) {
|
| 539 |
+
this.currentRound = {
|
| 540 |
+
timestamp: Date.now(),
|
| 541 |
+
ratings: {},
|
| 542 |
+
comments: '',
|
| 543 |
+
roundDetails: roundDetails
|
| 544 |
+
};
|
| 545 |
+
} else {
|
| 546 |
+
this.currentRound.roundDetails = roundDetails;
|
| 547 |
+
}
|
| 548 |
+
|
| 549 |
+
// Show ranking trigger button
|
| 550 |
+
const triggerBtn = document.getElementById('ranking-trigger-btn');
|
| 551 |
+
triggerBtn.style.display = 'block';
|
| 552 |
+
|
| 553 |
+
// Auto-show modal after a short delay (optional)
|
| 554 |
+
if (window.testGameRunner?.modelConfig?.autoShowRanking) {
|
| 555 |
+
setTimeout(() => this.showRankingModal(), 1500);
|
| 556 |
+
}
|
| 557 |
+
}
|
| 558 |
+
|
| 559 |
+
exportRankings() {
|
| 560 |
+
const exportData = {
|
| 561 |
+
...this.rankings,
|
| 562 |
+
exportedAt: new Date().toISOString(),
|
| 563 |
+
modelId: window.testGameRunner?.modelConfig?.modelId || 'unknown'
|
| 564 |
+
};
|
| 565 |
+
|
| 566 |
+
return exportData;
|
| 567 |
+
}
|
| 568 |
+
|
| 569 |
+
getRankingSummary() {
|
| 570 |
+
if (this.rankings.rounds.length === 0) {
|
| 571 |
+
return null;
|
| 572 |
+
}
|
| 573 |
+
|
| 574 |
+
const summary = {
|
| 575 |
+
totalRounds: this.rankings.rounds.length,
|
| 576 |
+
averageRatings: {},
|
| 577 |
+
categoryBreakdown: {},
|
| 578 |
+
comments: []
|
| 579 |
+
};
|
| 580 |
+
|
| 581 |
+
// Calculate average ratings per category
|
| 582 |
+
this.rankingCategories.forEach(category => {
|
| 583 |
+
const ratings = this.rankings.rounds
|
| 584 |
+
.map(r => r.ratings[category.id])
|
| 585 |
+
.filter(r => r !== undefined);
|
| 586 |
+
|
| 587 |
+
if (ratings.length > 0) {
|
| 588 |
+
summary.averageRatings[category.id] =
|
| 589 |
+
ratings.reduce((a, b) => a + b, 0) / ratings.length;
|
| 590 |
+
|
| 591 |
+
// Distribution of ratings
|
| 592 |
+
summary.categoryBreakdown[category.id] = {
|
| 593 |
+
1: ratings.filter(r => r === 1).length,
|
| 594 |
+
2: ratings.filter(r => r === 2).length,
|
| 595 |
+
3: ratings.filter(r => r === 3).length,
|
| 596 |
+
4: ratings.filter(r => r === 4).length,
|
| 597 |
+
5: ratings.filter(r => r === 5).length
|
| 598 |
+
};
|
| 599 |
+
}
|
| 600 |
+
});
|
| 601 |
+
|
| 602 |
+
// Collect all comments
|
| 603 |
+
summary.comments = this.rankings.rounds
|
| 604 |
+
.filter(r => r.comments)
|
| 605 |
+
.map(r => ({
|
| 606 |
+
timestamp: r.timestamp,
|
| 607 |
+
comment: r.comments,
|
| 608 |
+
averageRating: r.averageRating
|
| 609 |
+
}));
|
| 610 |
+
|
| 611 |
+
return summary;
|
| 612 |
+
}
|
| 613 |
+
}
|
| 614 |
+
|
| 615 |
+
// Initialize when in test mode
|
| 616 |
+
window.addEventListener('DOMContentLoaded', () => {
|
| 617 |
+
const urlParams = new URLSearchParams(window.location.search);
|
| 618 |
+
if (urlParams.get('testMode') === 'true') {
|
| 619 |
+
window.userRankingInterface = new UserRankingInterface();
|
| 620 |
+
|
| 621 |
+
// Add CSS animation keyframes
|
| 622 |
+
const animationStyles = document.createElement('style');
|
| 623 |
+
animationStyles.textContent = `
|
| 624 |
+
@keyframes slideInUp {
|
| 625 |
+
from {
|
| 626 |
+
transform: translate(-50%, 100%);
|
| 627 |
+
opacity: 0;
|
| 628 |
+
}
|
| 629 |
+
to {
|
| 630 |
+
transform: translate(-50%, 0);
|
| 631 |
+
opacity: 1;
|
| 632 |
+
}
|
| 633 |
+
}
|
| 634 |
+
|
| 635 |
+
@keyframes slideOutDown {
|
| 636 |
+
from {
|
| 637 |
+
transform: translate(-50%, 0);
|
| 638 |
+
opacity: 1;
|
| 639 |
+
}
|
| 640 |
+
to {
|
| 641 |
+
transform: translate(-50%, 100%);
|
| 642 |
+
opacity: 0;
|
| 643 |
+
}
|
| 644 |
+
}
|
| 645 |
+
`;
|
| 646 |
+
document.head.appendChild(animationStyles);
|
| 647 |
+
}
|
| 648 |
+
});
|
| 649 |
+
|
| 650 |
+
export { UserRankingInterface };
|
src/welcomeOverlay.js
CHANGED
|
@@ -43,7 +43,7 @@ class WelcomeOverlay {
|
|
| 43 |
|
| 44 |
<div class="welcome-content">
|
| 45 |
<p>
|
| 46 |
-
<strong>How to play:</strong>
|
| 47 |
</p>
|
| 48 |
|
| 49 |
<p>
|
|
@@ -51,7 +51,7 @@ class WelcomeOverlay {
|
|
| 51 |
</p>
|
| 52 |
|
| 53 |
<p style="margin-bottom: 0;">
|
| 54 |
-
<strong>AI assistance:</strong> Powered by Google's Gemma
|
| 55 |
</p>
|
| 56 |
</div>
|
| 57 |
|
|
|
|
| 43 |
|
| 44 |
<div class="welcome-content">
|
| 45 |
<p>
|
| 46 |
+
<strong>How to play:</strong> Fill in the blanks in each passage. Complete 2 passages per round. Pass 2 rounds to advance to the next level.
|
| 47 |
</p>
|
| 48 |
|
| 49 |
<p>
|
|
|
|
| 51 |
</p>
|
| 52 |
|
| 53 |
<p style="margin-bottom: 0;">
|
| 54 |
+
<strong>AI assistance:</strong> Powered by Google's Gemma models via OpenRouter - Gemma-3-27b for contextual hints and Gemma-3-12b for word selection and processing.
|
| 55 |
</p>
|
| 56 |
</div>
|
| 57 |
|
test-prompts-lm-studio.md
DELETED
|
@@ -1,262 +0,0 @@
|
|
| 1 |
-
# Gemma-3-27b Model Integration Guide for Cloze Reader
|
| 2 |
-
|
| 3 |
-
## Part 1: Step-by-Step API Request Processing
|
| 4 |
-
|
| 5 |
-
### 1. Initial Request Reception
|
| 6 |
-
When the Cloze Reader application makes an API request through OpenRouter:
|
| 7 |
-
|
| 8 |
-
1. **Authentication**: Verify Bearer token from `Authorization` header
|
| 9 |
-
2. **Request Type Detection**: Identify the operation type based on prompt content
|
| 10 |
-
3. **Parameter Extraction**: Parse temperature, max_tokens, and message content
|
| 11 |
-
4. **Rate Limiting Check**: Ensure request complies with free tier limits
|
| 12 |
-
|
| 13 |
-
### 2. Word Selection Request Processing
|
| 14 |
-
|
| 15 |
-
**When Temperature = 0.3 and prompt contains "CLOZE DELETION PRINCIPLES":**
|
| 16 |
-
|
| 17 |
-
1. **Parse Passage**: Extract the text passage from the system message
|
| 18 |
-
2. **Identify Difficulty Level**:
|
| 19 |
-
- Level 1-2: Target 4-7 letter words (easy vocabulary)
|
| 20 |
-
- Level 3-4: Target 4-10 letter words (medium difficulty)
|
| 21 |
-
- Level 5+: Target 5-14 letter words (challenging vocabulary)
|
| 22 |
-
3. **Select Words**:
|
| 23 |
-
- Identify significant vocabulary words (nouns, verbs, adjectives, adverbs)
|
| 24 |
-
- Avoid proper nouns, numbers, articles, and function words
|
| 25 |
-
- Ensure words are contextually important for comprehension
|
| 26 |
-
4. **Format Response**: Return JSON array of selected words
|
| 27 |
-
5. **Validate**: Ensure all words exist in the original passage
|
| 28 |
-
|
| 29 |
-
### 3. Batch Processing Request
|
| 30 |
-
|
| 31 |
-
**When Temperature = 0.5 and prompt contains two passages:**
|
| 32 |
-
|
| 33 |
-
1. **Parse Both Passages**: Extract passage1 and passage2 from the prompt
|
| 34 |
-
2. **Process Each Passage**:
|
| 35 |
-
- Apply word selection logic for each based on difficulty level
|
| 36 |
-
- Generate one-sentence contextualization for each book
|
| 37 |
-
3. **Format Response**: Return structured JSON with both passages' data
|
| 38 |
-
4. **Ensure Consistency**: Words must match exactly as they appear in passages
|
| 39 |
-
|
| 40 |
-
### 4. Contextualization Request
|
| 41 |
-
|
| 42 |
-
**When Temperature = 0.2 and prompt asks for book context:**
|
| 43 |
-
|
| 44 |
-
1. **Extract Book Information**: Parse title and author from prompt
|
| 45 |
-
2. **Generate Context**: Create one factual sentence about:
|
| 46 |
-
- Type of work (novel, short story, essay)
|
| 47 |
-
- Historical period when written
|
| 48 |
-
- Literary significance or genre
|
| 49 |
-
3. **Keep Concise**: Limit to 80 tokens maximum
|
| 50 |
-
4. **Avoid Speculation**: Only include verifiable information
|
| 51 |
-
|
| 52 |
-
### 5. Chat Hint Request
|
| 53 |
-
|
| 54 |
-
**When Temperature = 0.6 and prompt includes "word puzzles":**
|
| 55 |
-
|
| 56 |
-
1. **Identify Question Type**:
|
| 57 |
-
- `part_of_speech`: Grammar category identification
|
| 58 |
-
- `sentence_role`: Function in the sentence
|
| 59 |
-
- `word_category`: Abstract/concrete classification
|
| 60 |
-
- `synonym`: Alternative word suggestion
|
| 61 |
-
2. **Parse Target Word**: Extract the hidden word (NEVER reveal it)
|
| 62 |
-
3. **Generate Appropriate Hint**:
|
| 63 |
-
- Follow exact format requested
|
| 64 |
-
- Stay within 50 token limit
|
| 65 |
-
- Use plain text only, no formatting
|
| 66 |
-
4. **Validate**: Ensure hint doesn't contain or spell out the target word
|
| 67 |
-
|
| 68 |
-
### 6. Response Formatting Rules
|
| 69 |
-
|
| 70 |
-
1. **JSON Responses**:
|
| 71 |
-
- Word selection: Clean array format `["word1", "word2"]`
|
| 72 |
-
- Batch processing: Nested object structure
|
| 73 |
-
- No markdown code blocks unless specifically requested
|
| 74 |
-
|
| 75 |
-
2. **Text Responses**:
|
| 76 |
-
- Contextualization: Single sentence, no formatting
|
| 77 |
-
- Chat hints: Plain text, follows exact format requested
|
| 78 |
-
|
| 79 |
-
3. **Error Handling**:
|
| 80 |
-
- Invalid requests: Return graceful error messages
|
| 81 |
-
- Missing parameters: Use sensible defaults
|
| 82 |
-
- Malformed input: Attempt to parse intent
|
| 83 |
-
|
| 84 |
-
## Part 2: LM Studio Testing Configuration
|
| 85 |
-
|
| 86 |
-
### System Prompt
|
| 87 |
-
```
|
| 88 |
-
You are a specialized AI assistant for the Cloze Reader educational application. You help create vocabulary exercises by selecting appropriate words from text passages and providing contextual hints without revealing answers. Always respond in the exact format requested, using plain JSON or text as specified. Never use markdown formatting unless explicitly requested.
|
| 89 |
-
```
|
| 90 |
-
|
| 91 |
-
### Temperature Settings
|
| 92 |
-
- **Word Selection**: 0.3
|
| 93 |
-
- **Batch Processing**: 0.5
|
| 94 |
-
- **Contextualization**: 0.2
|
| 95 |
-
- **Chat Hints**: 0.6
|
| 96 |
-
|
| 97 |
-
### Response Length Limits
|
| 98 |
-
- **Word Selection**: 100 tokens
|
| 99 |
-
- **Batch Processing**: 800 tokens
|
| 100 |
-
- **Contextualization**: 80 tokens
|
| 101 |
-
- **Chat Hints**: 50 tokens
|
| 102 |
-
|
| 103 |
-
### Test Prompts
|
| 104 |
-
|
| 105 |
-
#### 1. Word Selection Test (Level 1-2 Easy)
|
| 106 |
-
```json
|
| 107 |
-
{
|
| 108 |
-
"messages": [
|
| 109 |
-
{
|
| 110 |
-
"role": "system",
|
| 111 |
-
"content": "CLOZE DELETION PRINCIPLES:\n- Select words that require understanding context and vocabulary to identify\n- Choose words essential for comprehension that test language ability\n- Target words where deletion creates meaningful cognitive gaps\n\nFrom the following passage, select exactly 1 word that is important for reading comprehension.\n\nDifficulty level 1-2: Focus on easier vocabulary (4-7 letters) like common nouns, simple verbs, and basic adjectives.\n\nRETURN ONLY A JSON ARRAY OF YOUR SELECTED WORDS. Select words that appear EXACTLY as written in the passage.\n\nPassage:\nThe old woman lived in a small cottage by the forest. Every morning, she would walk to the village market to buy fresh bread."
|
| 112 |
-
}
|
| 113 |
-
],
|
| 114 |
-
"temperature": 0.3,
|
| 115 |
-
"max_tokens": 100
|
| 116 |
-
}
|
| 117 |
-
```
|
| 118 |
-
|
| 119 |
-
**Expected Output Schema:**
|
| 120 |
-
```json
|
| 121 |
-
{
|
| 122 |
-
"type": "array",
|
| 123 |
-
"items": {
|
| 124 |
-
"type": "string",
|
| 125 |
-
"minLength": 4,
|
| 126 |
-
"maxLength": 7
|
| 127 |
-
},
|
| 128 |
-
"minItems": 1,
|
| 129 |
-
"maxItems": 1
|
| 130 |
-
}
|
| 131 |
-
```
|
| 132 |
-
|
| 133 |
-
#### 2. Batch Processing Test (Level 3-4 Medium)
|
| 134 |
-
```json
|
| 135 |
-
{
|
| 136 |
-
"messages": [
|
| 137 |
-
{
|
| 138 |
-
"role": "system",
|
| 139 |
-
"content": "Process these two passages for a cloze reading exercise:\n\nPASSAGE 1 (Pride and Prejudice by Jane Austen):\nIt is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife.\n\nPASSAGE 2 (A Tale of Two Cities by Charles Dickens):\nIt was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness.\n\nFor each passage:\n1. Select 1 word for difficulty level 3-4 (medium vocabulary, 4-10 letters)\n2. Write ONE sentence about the book/author\n\nReturn a JSON object with this structure:\n{\n \"passage1\": {\n \"words\": [selected words],\n \"context\": \"One sentence about the book\"\n },\n \"passage2\": {\n \"words\": [selected words],\n \"context\": \"One sentence about the book\"\n }\n}"
|
| 140 |
-
}
|
| 141 |
-
],
|
| 142 |
-
"temperature": 0.5,
|
| 143 |
-
"max_tokens": 800
|
| 144 |
-
}
|
| 145 |
-
```
|
| 146 |
-
|
| 147 |
-
**Expected Output Schema:**
|
| 148 |
-
```json
|
| 149 |
-
{
|
| 150 |
-
"type": "object",
|
| 151 |
-
"properties": {
|
| 152 |
-
"passage1": {
|
| 153 |
-
"type": "object",
|
| 154 |
-
"properties": {
|
| 155 |
-
"words": {
|
| 156 |
-
"type": "array",
|
| 157 |
-
"items": { "type": "string" },
|
| 158 |
-
"minItems": 1,
|
| 159 |
-
"maxItems": 1
|
| 160 |
-
},
|
| 161 |
-
"context": {
|
| 162 |
-
"type": "string",
|
| 163 |
-
"maxLength": 150
|
| 164 |
-
}
|
| 165 |
-
},
|
| 166 |
-
"required": ["words", "context"]
|
| 167 |
-
},
|
| 168 |
-
"passage2": {
|
| 169 |
-
"type": "object",
|
| 170 |
-
"properties": {
|
| 171 |
-
"words": {
|
| 172 |
-
"type": "array",
|
| 173 |
-
"items": { "type": "string" },
|
| 174 |
-
"minItems": 1,
|
| 175 |
-
"maxItems": 1
|
| 176 |
-
},
|
| 177 |
-
"context": {
|
| 178 |
-
"type": "string",
|
| 179 |
-
"maxLength": 150
|
| 180 |
-
}
|
| 181 |
-
},
|
| 182 |
-
"required": ["words", "context"]
|
| 183 |
-
}
|
| 184 |
-
},
|
| 185 |
-
"required": ["passage1", "passage2"]
|
| 186 |
-
}
|
| 187 |
-
```
|
| 188 |
-
|
| 189 |
-
#### 3. Contextualization Test
|
| 190 |
-
```json
|
| 191 |
-
{
|
| 192 |
-
"messages": [
|
| 193 |
-
{
|
| 194 |
-
"role": "user",
|
| 195 |
-
"content": "Write one factual sentence about 'The Adventures of Sherlock Holmes' by Arthur Conan Doyle. Focus on what type of work it is, when it was written, or its historical significance. Keep it under 20 words and conversational."
|
| 196 |
-
}
|
| 197 |
-
],
|
| 198 |
-
"temperature": 0.2,
|
| 199 |
-
"max_tokens": 80
|
| 200 |
-
}
|
| 201 |
-
```
|
| 202 |
-
|
| 203 |
-
**Expected Output:** Plain text string, no JSON structure required.
|
| 204 |
-
|
| 205 |
-
#### 4. Chat Hint Test (Part of Speech)
|
| 206 |
-
```json
|
| 207 |
-
{
|
| 208 |
-
"messages": [
|
| 209 |
-
{
|
| 210 |
-
"role": "system",
|
| 211 |
-
"content": "You provide clues for word puzzles. You will be told the target word that players need to guess, but you must NEVER mention, spell, or reveal that word in your response. Follow the EXACT format requested. Be concise and direct about the target word without revealing it. Use plain text only - no bold, italics, asterisks, or markdown formatting. Stick to word limits."
|
| 212 |
-
},
|
| 213 |
-
{
|
| 214 |
-
"role": "user",
|
| 215 |
-
"content": "The target word is 'walked'. The sentence is: 'Every morning, she would _____ to the village market to buy fresh bread.'\n\nQuestion type: part_of_speech\n\nIdentify what part of speech fits in this blank. Answer in 2-5 words. Format: 'It's a/an [part of speech]'"
|
| 216 |
-
}
|
| 217 |
-
],
|
| 218 |
-
"temperature": 0.6,
|
| 219 |
-
"max_tokens": 50
|
| 220 |
-
}
|
| 221 |
-
```
|
| 222 |
-
|
| 223 |
-
**Expected Output:** Plain text following format "It's a/an [part of speech]"
|
| 224 |
-
|
| 225 |
-
#### 5. Chat Hint Test (Synonym)
|
| 226 |
-
```json
|
| 227 |
-
{
|
| 228 |
-
"messages": [
|
| 229 |
-
{
|
| 230 |
-
"role": "system",
|
| 231 |
-
"content": "You provide clues for word puzzles. You will be told the target word that players need to guess, but you must NEVER mention, spell, or reveal that word in your response. Follow the EXACT format requested. Be concise and direct about the target word without revealing it. Use plain text only - no bold, italics, asterisks, or markdown formatting. Stick to word limits."
|
| 232 |
-
},
|
| 233 |
-
{
|
| 234 |
-
"role": "user",
|
| 235 |
-
"content": "The target word is 'cottage'. The sentence is: 'The old woman lived in a small _____ by the forest.'\n\nQuestion type: synonym\n\nSuggest a different word that could replace the blank. Answer in 1-3 words only."
|
| 236 |
-
}
|
| 237 |
-
],
|
| 238 |
-
"temperature": 0.6,
|
| 239 |
-
"max_tokens": 50
|
| 240 |
-
}
|
| 241 |
-
```
|
| 242 |
-
|
| 243 |
-
**Expected Output:** Plain text with 1-3 word synonym
|
| 244 |
-
|
| 245 |
-
### LM Studio Configuration
|
| 246 |
-
|
| 247 |
-
1. **Model Selection**: Load gemma-3-27b or equivalent model
|
| 248 |
-
2. **Context Length**: Set to at least 4096 tokens
|
| 249 |
-
3. **GPU Layers**: Maximize based on available VRAM
|
| 250 |
-
4. **Batch Size**: 512 for optimal performance
|
| 251 |
-
5. **Prompt Format**: Use ChatML or model's native format
|
| 252 |
-
|
| 253 |
-
### Testing Checklist
|
| 254 |
-
|
| 255 |
-
- [ ] Verify JSON responses are clean (no markdown blocks)
|
| 256 |
-
- [ ] Check word selections match passage exactly
|
| 257 |
-
- [ ] Ensure hints never reveal target words
|
| 258 |
-
- [ ] Validate response stays within token limits
|
| 259 |
-
- [ ] Test difficulty level word length constraints
|
| 260 |
-
- [ ] Confirm batch processing handles both passages
|
| 261 |
-
- [ ] Verify contextualization produces factual content
|
| 262 |
-
- [ ] Test all four hint question types
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|