milwright commited on
Commit
fed2199
·
1 Parent(s): 56287e6

simplify ui language and integrate dual gemma models

Browse files

- use gemma-3-27b for hints, gemma-3-12b for word selection
- simplify score display and progression messages
- add passage tracking (1/2, 2/2) in header
- clarify "2 passages per round, 2 rounds per level" system

LEADERBOARD_ROADMAP.md DELETED
@@ -1,171 +0,0 @@
1
- # Cloze Reader Leaderboard Implementation Roadmap
2
-
3
- ## Overview
4
- This document outlines the implementation plan for adding a competitive leaderboard system to the Cloze Reader game, where players can submit their scores using 3-letter acronyms.
5
-
6
- ## Phase 1: Core Infrastructure (Week 1-2)
7
-
8
- ### 1.1 Database Schema
9
- - Create leaderboard table structure:
10
- ```sql
11
- leaderboard {
12
- id: UUID
13
- acronym: VARCHAR(3)
14
- score: INTEGER
15
- level_reached: INTEGER
16
- total_time: INTEGER (seconds)
17
- created_at: TIMESTAMP
18
- ip_hash: VARCHAR(64) // For rate limiting
19
- }
20
- ```
21
-
22
- ### 1.2 API Endpoints
23
- - `POST /api/leaderboard/submit` - Submit new score
24
- - `GET /api/leaderboard/top/{period}` - Get top scores (daily/weekly/all-time)
25
- - `GET /api/leaderboard/check-acronym/{acronym}` - Validate acronym availability
26
-
27
- ### 1.3 Score Calculation
28
- - Base score = (correct_answers * 100) * level_multiplier
29
- - Time bonus = max(0, 1000 - seconds_per_round)
30
- - Streak bonus = consecutive_correct * 50
31
-
32
- ## Phase 2: Frontend Integration (Week 2-3)
33
-
34
- ### 2.1 UI Components
35
- - **Leaderboard Modal** (`leaderboardModal.js`)
36
- - Top 10 display with rank, acronym, score, level
37
- - Period toggle (Today/Week/All-Time)
38
- - Personal best highlight
39
-
40
- ### 2.2 Score Submission Flow
41
- - End-of-game prompt for acronym entry
42
- - 3-letter validation (A-Z only)
43
- - Profanity filter implementation
44
- - Success/error feedback
45
-
46
- ### 2.3 Visual Elements
47
- - Trophy icons for top 3 positions
48
- - Animated score counter
49
- - Level badges display
50
-
51
- ## Phase 3: Security & Performance (Week 3-4)
52
-
53
- ### 3.1 Anti-Cheat Measures
54
- - Server-side score validation
55
- - Rate limiting (1 submission per 5 minutes per IP)
56
- - Score feasibility checks (max possible score per level)
57
- - Request signing with session tokens
58
-
59
- ### 3.2 Caching Strategy
60
- - Redis cache for top 100 scores
61
- - 5-minute TTL for leaderboard queries
62
- - Real-time updates for top 10 changes
63
-
64
- ### 3.3 Data Persistence
65
- - PostgreSQL for primary storage
66
- - Daily backups of leaderboard data
67
- - Archived monthly snapshots
68
-
69
- ## Phase 4: Advanced Features (Week 4-5)
70
-
71
- ### 4.1 Achievement System
72
- - "First Timer" - First leaderboard entry
73
- - "Vocabulary Master" - 10+ correct in a row
74
- - "Speed Reader" - Complete round < 30 seconds
75
- - "Persistent Scholar" - Play 7 days straight
76
-
77
- ### 4.2 Social Features
78
- - Share score to social media
79
- - Challenge link generation
80
- - Friend acronym tracking
81
-
82
- ### 4.3 Analytics Dashboard
83
- - Player retention metrics
84
- - Popular acronym analysis
85
- - Score distribution graphs
86
-
87
- ## Technical Implementation Details
88
-
89
- ### Backend Changes Required
90
-
91
- 1. **FastAPI Endpoints** (`app.py`):
92
- ```python
93
- @app.post("/api/leaderboard/submit")
94
- async def submit_score(score_data: ScoreSubmission)
95
-
96
- @app.get("/api/leaderboard/top/{period}")
97
- async def get_leaderboard(period: str, limit: int = 10)
98
- ```
99
-
100
- 2. **Database Models** (`models.py` - new file):
101
- ```python
102
- class LeaderboardEntry(Base):
103
- __tablename__ = "leaderboard"
104
- # Schema implementation
105
- ```
106
-
107
- 3. **Validation Service** (`validation.py` - new file):
108
- - Acronym format validation
109
- - Profanity checking
110
- - Score feasibility verification
111
-
112
- ### Frontend Changes Required
113
-
114
- 1. **Game Engine Integration** (`clozeGameEngine.js`):
115
- - Track game metrics for scoring
116
- - Call submission API on game end
117
- - Store session data for validation
118
-
119
- 2. **UI Updates** (`app.js`):
120
- - Add leaderboard button to main menu
121
- - Integrate submission modal
122
- - Handle API responses
123
-
124
- 3. **New Modules**:
125
- - `leaderboardService.js` - API communication
126
- - `scoreCalculator.js` - Client-side scoring logic
127
- - `leaderboardUI.js` - UI component management
128
-
129
- ## Deployment Considerations
130
-
131
- ### Infrastructure Requirements
132
- - Database: PostgreSQL 14+
133
- - Cache: Redis 6+
134
- - API rate limiting: nginx or API Gateway
135
- - SSL certificate for secure submissions
136
-
137
- ### Environment Variables
138
- ```
139
- DATABASE_URL=postgresql://...
140
- REDIS_URL=redis://...
141
- LEADERBOARD_SECRET=... # For request signing
142
- PROFANITY_API_KEY=... # Optional external service
143
- ```
144
-
145
- ### Migration Strategy
146
- 1. Deploy database schema
147
- 2. Enable API endpoints (feature flagged)
148
- 3. Gradual UI rollout (A/B testing)
149
- 4. Full launch with announcement
150
-
151
- ## Success Metrics
152
-
153
- - **Engagement**: 30% of players submit scores
154
- - **Retention**: 15% return to beat their score
155
- - **Performance**: <100ms leaderboard load time
156
- - **Security**: Zero validated cheating incidents
157
-
158
- ## Timeline Summary
159
-
160
- - **Week 1-2**: Backend infrastructure
161
- - **Week 2-3**: Frontend integration
162
- - **Week 3-4**: Security hardening
163
- - **Week 4-5**: Advanced features
164
- - **Week 6**: Testing & deployment
165
-
166
- ## Open Questions
167
-
168
- 1. Should we allow Unicode characters in acronyms?
169
- 2. Reset frequency for periodic leaderboards?
170
- 3. Maximum entries per player per day?
171
- 4. Prize/reward system for top performers?
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README-testing-framework.md ADDED
@@ -0,0 +1,217 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Cloze Reader Model Testing Framework
2
+
3
+ A comprehensive testing system for evaluating AI models across all tasks in the Cloze Reader application, including both OpenRouter and local LLM (LM Studio) models.
4
+
5
+ ## Features
6
+
7
+ ### 🎯 Comprehensive Testing
8
+ - **Word Selection Testing**: Evaluates vocabulary selection accuracy, difficulty matching, and response quality
9
+ - **Contextualization Testing**: Tests historical and literary context generation for books and authors
10
+ - **Chat Hints Testing**: Assesses all 4 question types (part of speech, sentence role, word category, synonym)
11
+ - **Performance Monitoring**: Tracks response times, success rates, and error patterns
12
+ - **User Satisfaction Ratings**: Collect user feedback on model performance after each round
13
+
14
+ ### 🏠 Local LLM Support
15
+ - **LM Studio Integration**: Auto-detects models running on port 1234
16
+ - **Real-time Status**: Shows connection status and available models
17
+ - **Response Cleaning**: Handles local LLM output artifacts automatically
18
+ - **Fallback Testing**: Graceful handling when local server is unavailable
19
+
20
+ ### 📊 Advanced Analytics
21
+ - **Multi-format Reports**: JSON, CSV, and Markdown outputs
22
+ - **Performance Comparisons**: Side-by-side model analysis
23
+ - **Quality Scoring**: Detailed evaluation metrics for each task
24
+ - **Interactive Game Testing**: Real-time performance monitoring during gameplay
25
+ - **User Ranking Integration**: 5-star ratings for word selection, passage quality, hint helpfulness, and overall experience
26
+
27
+ ## Quick Start
28
+
29
+ ### 1. Start the Testing Interface
30
+ ```bash
31
+ # Start development server
32
+ make dev
33
+ # or
34
+ python local-server.py 8000
35
+
36
+ # Open testing interface
37
+ open http://localhost:8000/model-testing.html
38
+ ```
39
+
40
+ ### 2. Setup Local LLM (Optional)
41
+ ```bash
42
+ # Start LM Studio server on port 1234
43
+ # Load your preferred model (e.g., Gemma-3-12b, Llama-3.1-8b)
44
+ # The framework will auto-detect available models
45
+ ```
46
+
47
+ ### 3. Run Tests
48
+ 1. Select models to test (OpenRouter and/or local models)
49
+ 2. Click "Start Comprehensive Test" for full evaluation
50
+ 3. Or click "Test Selected Model in Game" for interactive testing
51
+ 4. Results are automatically saved to the `/output` folder
52
+
53
+ ## Test Results
54
+
55
+ ### CSV Output Format
56
+ Results are saved as timestamped CSV files with columns for:
57
+ - Model performance metrics (overall score, success rates)
58
+ - Response time analytics (average, min, max)
59
+ - Task-specific scores (word selection, contextualization, chat hints)
60
+ - Error rates and reliability metrics
61
+ - User satisfaction ratings (1-5 stars per category)
62
+ - User comments and feedback count
63
+
64
+ ### Game Testing Output
65
+ Interactive game sessions generate JSON reports with:
66
+ - Real-time AI interaction logs
67
+ - User performance analytics
68
+ - Response time breakdowns
69
+ - Error tracking and categorization
70
+ - User satisfaction ratings per round
71
+ - Qualitative feedback and comments
72
+
73
+ ## Model Categories
74
+
75
+ ### OpenRouter Models
76
+ - GPT-4o, GPT-4o Mini
77
+ - Claude 3.5 Sonnet, Claude 3 Haiku
78
+ - Gemini Pro 1.5
79
+ - Llama 3.1 (8B, 70B)
80
+ - Mistral 7B, Phi-3 Medium, Qwen 2 7B
81
+
82
+ ### Local LLM Models (LM Studio)
83
+ - Auto-detected from running server
84
+ - Supports any OpenAI-compatible model
85
+ - Common options: Gemma-3-12b, Llama-3.1-8b, Mistral-7b
86
+
87
+ ## Testing Methodology
88
+
89
+ ### Word Selection Evaluation
90
+ - **Accuracy**: Words exist in source passage
91
+ - **Difficulty Matching**: Length and complexity appropriate for level
92
+ - **Quality Scoring**: Avoids overly common words at higher difficulties
93
+ - **Performance**: Response time and success rate tracking
94
+ - **User Rating**: 5-star scale for vocabulary appropriateness
95
+
96
+ ### Contextualization Assessment
97
+ - **Relevance**: Mentions book title, author, historical context
98
+ - **Educational Value**: Appropriate for language learners
99
+ - **Completeness**: Balanced length (100-500 characters)
100
+ - **Literary Terms**: Uses appropriate academic vocabulary
101
+ - **User Rating**: Passage quality and educational value scoring
102
+
103
+ ### Chat Hints Analysis
104
+ - **Question Type Coverage**: All 4 hint categories tested
105
+ - **Educational Appropriateness**: Helps without revealing answers
106
+ - **Response Quality**: Clear, concise, and helpful explanations
107
+ - **Consistency**: Performance across different question types
108
+ - **User Rating**: Helpfulness and clarity of AI hints
109
+
110
+ ### User Experience Rating
111
+ After each round, users can rate:
112
+ - **Word Selection Quality** (1-5 stars)
113
+ - **Passage Selection** (1-5 stars)
114
+ - **Hint Helpfulness** (1-5 stars)
115
+ - **Overall Experience** (1-5 stars)
116
+ - **Optional Comments** for detailed feedback
117
+
118
+ ## Architecture
119
+
120
+ ### Core Components
121
+ - **ModelTestingFramework**: Main testing orchestrator
122
+ - **TestAIService**: Performance-tracking AI service wrapper
123
+ - **TestGameRunner**: Real-time game session monitoring
124
+ - **TestReportGenerator**: Multi-format report generation
125
+
126
+ ### File Structure
127
+ ```
128
+ src/
129
+ ├── modelTestingFramework.js # Main testing logic
130
+ ├── testAIService.js # AI service wrapper
131
+ ├── testGameRunner.js # Game monitoring
132
+ └── testReportGenerator.js # Report generation
133
+
134
+ model-testing.html # Testing interface UI
135
+ output/ # Test results folder
136
+ ```
137
+
138
+ ## Usage Examples
139
+
140
+ ### Automated Testing
141
+ ```javascript
142
+ import { ModelTestingFramework } from './src/modelTestingFramework.js';
143
+
144
+ const framework = new ModelTestingFramework();
145
+ const results = await framework.runComprehensiveTest();
146
+ console.log('Results saved to output folder');
147
+ ```
148
+
149
+ ### Custom Model Testing
150
+ ```javascript
151
+ const customModel = {
152
+ id: 'my-local-model',
153
+ name: 'Custom Local Model',
154
+ provider: 'local'
155
+ };
156
+
157
+ const result = await framework.testModel(customModel);
158
+ ```
159
+
160
+ ### Report Generation
161
+ ```javascript
162
+ import { TestReportGenerator } from './src/testReportGenerator.js';
163
+
164
+ const generator = new TestReportGenerator();
165
+ const reports = await generator.generateAllReports(testResults);
166
+ // Generates JSON, CSV, and Markdown reports
167
+ ```
168
+
169
+ ## Integration with Existing Codebase
170
+
171
+ The testing framework integrates seamlessly with the existing Cloze Reader architecture:
172
+
173
+ - **aiService.js**: Framework uses the same AI service patterns
174
+ - **conversationManager.js**: Chat hint testing leverages existing conversation logic
175
+ - **clozeGameEngine.js**: Game testing monitors actual game interactions
176
+ - **bookDataService.js**: Uses same book data and quality filtering
177
+
178
+ ## Troubleshooting
179
+
180
+ ### Local LLM Issues
181
+ - Ensure LM Studio is running on port 1234
182
+ - Check that a model is loaded and ready
183
+ - Verify CORS is enabled in LM Studio settings
184
+
185
+ ### API Key Issues
186
+ - OpenRouter API key must be set via environment variable or meta tag
187
+ - Local models don't require API keys
188
+
189
+ ### Performance Issues
190
+ - Large model testing can take 10-30 minutes
191
+ - Consider testing fewer models or specific categories
192
+ - Monitor network connectivity for OpenRouter models
193
+
194
+ ## Contributing
195
+
196
+ The testing framework is designed to be extensible:
197
+
198
+ 1. Add new model providers in `ModelTestingFramework.constructor()`
199
+ 2. Extend evaluation metrics in the respective `evaluate*` methods
200
+ 3. Add new report formats in `TestReportGenerator`
201
+ 4. Enhance UI components in `model-testing.html`
202
+
203
+ ## Results Interpretation
204
+
205
+ ### Overall Scores
206
+ - **90-100**: Excellent performance across all tasks
207
+ - **80-89**: Very good with minor weaknesses
208
+ - **70-79**: Good performance with some limitations
209
+ - **60-69**: Adequate but needs improvement
210
+ - **Below 60**: Poor performance, not recommended
211
+
212
+ ### Success Rate Thresholds
213
+ - **Word Selection**: >80% for production use
214
+ - **Contextualization**: >90% for educational content
215
+ - **Chat Hints**: >85% for effective tutoring
216
+
217
+ Use these benchmarks to select the best model for your specific needs and performance requirements.
index.html CHANGED
@@ -62,5 +62,13 @@
62
  </div>
63
 
64
  <script src="./src/app.js" type="module"></script>
 
 
 
 
 
 
 
 
65
  </body>
66
  </html>
 
62
  </div>
63
 
64
  <script src="./src/app.js" type="module"></script>
65
+ <script type="module">
66
+ // Load test runner and ranking interface only in test mode
67
+ const urlParams = new URLSearchParams(window.location.search);
68
+ if (urlParams.get('testMode') === 'true') {
69
+ import('./src/testGameRunner.js');
70
+ import('./src/userRankingInterface.js');
71
+ }
72
+ </script>
73
  </body>
74
  </html>
model-testing.html ADDED
@@ -0,0 +1,629 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>Cloze Reader - Model Testing Framework</title>
7
+ <style>
8
+ body {
9
+ font-family: 'Georgia', serif;
10
+ background: linear-gradient(135deg, #f5f3f0 0%, #e8e4df 100%);
11
+ margin: 0;
12
+ padding: 20px;
13
+ min-height: 100vh;
14
+ }
15
+
16
+ .container {
17
+ max-width: 1200px;
18
+ margin: 0 auto;
19
+ background: rgba(255, 255, 255, 0.95);
20
+ border-radius: 15px;
21
+ box-shadow: 0 10px 30px rgba(0, 0, 0, 0.1);
22
+ padding: 40px;
23
+ }
24
+
25
+ h1 {
26
+ text-align: center;
27
+ color: #2c3e50;
28
+ font-size: 2.5rem;
29
+ margin-bottom: 10px;
30
+ text-shadow: 2px 2px 4px rgba(0, 0, 0, 0.1);
31
+ }
32
+
33
+ .subtitle {
34
+ text-align: center;
35
+ color: #7f8c8d;
36
+ font-size: 1.2rem;
37
+ margin-bottom: 40px;
38
+ }
39
+
40
+ .model-selection {
41
+ background: #f8f9fa;
42
+ border-radius: 10px;
43
+ padding: 30px;
44
+ margin-bottom: 30px;
45
+ border: 2px solid #e9ecef;
46
+ }
47
+
48
+ .model-selection h2 {
49
+ color: #2c3e50;
50
+ margin-bottom: 20px;
51
+ font-size: 1.5rem;
52
+ }
53
+
54
+ .model-grid {
55
+ display: grid;
56
+ grid-template-columns: repeat(auto-fit, minmax(300px, 1fr));
57
+ gap: 15px;
58
+ margin-bottom: 20px;
59
+ }
60
+
61
+ .model-option {
62
+ background: white;
63
+ border: 2px solid #dee2e6;
64
+ border-radius: 8px;
65
+ padding: 15px;
66
+ cursor: pointer;
67
+ transition: all 0.3s ease;
68
+ position: relative;
69
+ }
70
+
71
+ .model-option:hover {
72
+ border-color: #007bff;
73
+ box-shadow: 0 4px 8px rgba(0, 123, 255, 0.2);
74
+ }
75
+
76
+ .model-option.selected {
77
+ border-color: #28a745;
78
+ background: #f8fff9;
79
+ }
80
+
81
+ .model-option input[type="checkbox"] {
82
+ position: absolute;
83
+ top: 10px;
84
+ right: 10px;
85
+ transform: scale(1.2);
86
+ }
87
+
88
+ .model-name {
89
+ font-weight: bold;
90
+ color: #2c3e50;
91
+ margin-bottom: 5px;
92
+ }
93
+
94
+ .model-provider {
95
+ color: #6c757d;
96
+ font-size: 0.9rem;
97
+ margin-bottom: 5px;
98
+ }
99
+
100
+ .model-id {
101
+ color: #495057;
102
+ font-size: 0.8rem;
103
+ font-family: monospace;
104
+ background: #f1f3f4;
105
+ padding: 2px 6px;
106
+ border-radius: 4px;
107
+ }
108
+
109
+ .controls {
110
+ display: flex;
111
+ gap: 15px;
112
+ align-items: center;
113
+ flex-wrap: wrap;
114
+ }
115
+
116
+ .btn {
117
+ background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
118
+ color: white;
119
+ border: none;
120
+ padding: 12px 24px;
121
+ border-radius: 8px;
122
+ font-size: 1rem;
123
+ cursor: pointer;
124
+ transition: all 0.3s ease;
125
+ font-weight: 500;
126
+ }
127
+
128
+ .btn:hover {
129
+ transform: translateY(-2px);
130
+ box-shadow: 0 6px 20px rgba(102, 126, 234, 0.4);
131
+ }
132
+
133
+ .btn:disabled {
134
+ background: #6c757d;
135
+ cursor: not-allowed;
136
+ transform: none;
137
+ box-shadow: none;
138
+ }
139
+
140
+ .btn-secondary {
141
+ background: linear-gradient(135deg, #f093fb 0%, #f5576c 100%);
142
+ }
143
+
144
+ .btn-success {
145
+ background: linear-gradient(135deg, #4facfe 0%, #00f2fe 100%);
146
+ }
147
+
148
+ .progress-section {
149
+ margin-top: 30px;
150
+ padding: 20px;
151
+ background: #f8f9fa;
152
+ border-radius: 10px;
153
+ display: none;
154
+ }
155
+
156
+ .progress-section.active {
157
+ display: block;
158
+ }
159
+
160
+ .progress-bar {
161
+ width: 100%;
162
+ height: 8px;
163
+ background: #e9ecef;
164
+ border-radius: 4px;
165
+ overflow: hidden;
166
+ margin-bottom: 10px;
167
+ }
168
+
169
+ .progress-fill {
170
+ height: 100%;
171
+ background: linear-gradient(90deg, #667eea, #764ba2);
172
+ width: 0%;
173
+ transition: width 0.3s ease;
174
+ }
175
+
176
+ .status-message {
177
+ color: #495057;
178
+ font-size: 1rem;
179
+ margin-bottom: 10px;
180
+ }
181
+
182
+ .test-log {
183
+ background: #2d3748;
184
+ color: #e2e8f0;
185
+ padding: 15px;
186
+ border-radius: 8px;
187
+ font-family: 'Courier New', monospace;
188
+ font-size: 0.9rem;
189
+ max-height: 300px;
190
+ overflow-y: auto;
191
+ white-space: pre-wrap;
192
+ }
193
+
194
+ .results-section {
195
+ margin-top: 30px;
196
+ padding: 20px;
197
+ background: #f8f9fa;
198
+ border-radius: 10px;
199
+ display: none;
200
+ }
201
+
202
+ .results-section.active {
203
+ display: block;
204
+ }
205
+
206
+ .results-grid {
207
+ display: grid;
208
+ grid-template-columns: repeat(auto-fit, minmax(300px, 1fr));
209
+ gap: 20px;
210
+ margin-top: 20px;
211
+ }
212
+
213
+ .result-card {
214
+ background: white;
215
+ border-radius: 8px;
216
+ padding: 20px;
217
+ box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
218
+ }
219
+
220
+ .result-card h3 {
221
+ color: #2c3e50;
222
+ margin-bottom: 15px;
223
+ font-size: 1.2rem;
224
+ }
225
+
226
+ .metric {
227
+ display: flex;
228
+ justify-content: space-between;
229
+ margin-bottom: 10px;
230
+ padding-bottom: 8px;
231
+ border-bottom: 1px solid #e9ecef;
232
+ }
233
+
234
+ .metric:last-child {
235
+ border-bottom: none;
236
+ margin-bottom: 0;
237
+ }
238
+
239
+ .metric-label {
240
+ color: #6c757d;
241
+ font-weight: 500;
242
+ }
243
+
244
+ .metric-value {
245
+ color: #2c3e50;
246
+ font-weight: bold;
247
+ }
248
+
249
+ .score-high { color: #28a745; }
250
+ .score-medium { color: #ffc107; }
251
+ .score-low { color: #dc3545; }
252
+
253
+ .game-section {
254
+ margin-top: 30px;
255
+ padding: 20px;
256
+ background: #f8f9fa;
257
+ border-radius: 10px;
258
+ display: none;
259
+ }
260
+
261
+ .game-section.active {
262
+ display: block;
263
+ }
264
+
265
+ .game-frame {
266
+ width: 100%;
267
+ height: 600px;
268
+ border: none;
269
+ border-radius: 8px;
270
+ background: white;
271
+ }
272
+
273
+ @media (max-width: 768px) {
274
+ .container {
275
+ padding: 20px;
276
+ }
277
+
278
+ .model-grid {
279
+ grid-template-columns: 1fr;
280
+ }
281
+
282
+ .controls {
283
+ flex-direction: column;
284
+ align-items: stretch;
285
+ }
286
+ }
287
+ </style>
288
+ </head>
289
+ <body>
290
+ <div class="container">
291
+ <h1>Model Testing Framework</h1>
292
+ <p class="subtitle">Comprehensive evaluation of AI models for the Cloze Reader application</p>
293
+
294
+ <div class="model-selection">
295
+ <h2>Select Models to Test</h2>
296
+ <div id="modelGrid" class="model-grid">
297
+ <!-- Models will be populated by JavaScript -->
298
+ </div>
299
+
300
+ <div class="controls">
301
+ <button id="selectAllBtn" class="btn btn-secondary">Select All</button>
302
+ <button id="clearAllBtn" class="btn btn-secondary">Clear All</button>
303
+ <button id="startTestBtn" class="btn">Start Comprehensive Test</button>
304
+ <button id="testGameBtn" class="btn btn-success">Test Selected Model in Game</button>
305
+ </div>
306
+ </div>
307
+
308
+ <div id="progressSection" class="progress-section">
309
+ <h2>Testing Progress</h2>
310
+ <div class="progress-bar">
311
+ <div id="progressFill" class="progress-fill"></div>
312
+ </div>
313
+ <div id="statusMessage" class="status-message">Initializing tests...</div>
314
+ <div id="testLog" class="test-log"></div>
315
+ </div>
316
+
317
+ <div id="resultsSection" class="results-section">
318
+ <h2>Test Results</h2>
319
+ <p>Results have been saved to the output folder as CSV files.</p>
320
+ <div id="resultsGrid" class="results-grid">
321
+ <!-- Results will be populated by JavaScript -->
322
+ </div>
323
+ </div>
324
+
325
+ <div id="gameSection" class="game-section">
326
+ <h2>Interactive Game Testing</h2>
327
+ <p>Test the selected model by playing the game. Performance will be logged for analysis.</p>
328
+ <iframe id="gameFrame" class="game-frame" src="about:blank"></iframe>
329
+ </div>
330
+ </div>
331
+
332
+ <script type="module">
333
+ import { ModelTestingFramework } from './src/modelTestingFramework.js';
334
+
335
+ class ModelTestingUI {
336
+ constructor() {
337
+ this.framework = new ModelTestingFramework();
338
+ this.selectedModels = new Set();
339
+ this.isTestingInProgress = false;
340
+ this.localServerStatus = null;
341
+
342
+ this.initializeUI();
343
+ this.setupEventListeners();
344
+ }
345
+
346
+ async initializeUI() {
347
+ await this.checkLocalServer();
348
+ await this.populateModelGrid();
349
+ }
350
+
351
+ async checkLocalServer() {
352
+ this.localServerStatus = await this.framework.testLocalServerConnection();
353
+ if (this.localServerStatus.connected) {
354
+ console.log('Local LM Studio server detected:', this.localServerStatus.models.length, 'models available');
355
+ await this.framework.detectLocalModels();
356
+ } else {
357
+ console.log('Local LM Studio server not available:', this.localServerStatus.error);
358
+ }
359
+ }
360
+
361
+ populateModelGrid() {
362
+ const grid = document.getElementById('modelGrid');
363
+ grid.innerHTML = '';
364
+
365
+ // Add local server status indicator
366
+ if (this.localServerStatus) {
367
+ const statusDiv = document.createElement('div');
368
+ statusDiv.className = 'server-status';
369
+ statusDiv.style.cssText = `
370
+ grid-column: 1 / -1;
371
+ padding: 15px;
372
+ margin-bottom: 15px;
373
+ border-radius: 8px;
374
+ font-weight: bold;
375
+ text-align: center;
376
+ ${this.localServerStatus.connected
377
+ ? 'background: #d4edda; color: #155724; border: 1px solid #c3e6cb;'
378
+ : 'background: #f8d7da; color: #721c24; border: 1px solid #f5c6cb;'
379
+ }
380
+ `;
381
+
382
+ if (this.localServerStatus.connected) {
383
+ statusDiv.innerHTML = `
384
+ ✓ Local LM Studio Server Connected (Port 1234)<br>
385
+ <small>${this.localServerStatus.models.length} model(s) available</small>
386
+ `;
387
+ } else {
388
+ statusDiv.innerHTML = `
389
+ ✗ Local LM Studio Server Not Available<br>
390
+ <small>Start LM Studio on port 1234 to test local models</small>
391
+ `;
392
+ }
393
+
394
+ grid.appendChild(statusDiv);
395
+ }
396
+
397
+ this.framework.models.forEach(model => {
398
+ const modelDiv = document.createElement('div');
399
+ modelDiv.className = 'model-option';
400
+ modelDiv.dataset.modelId = model.id;
401
+
402
+ // Disable local models if server is not connected
403
+ const isDisabled = model.provider === 'local' && !this.localServerStatus?.connected;
404
+ if (isDisabled) {
405
+ modelDiv.classList.add('disabled');
406
+ modelDiv.style.opacity = '0.5';
407
+ modelDiv.style.cursor = 'not-allowed';
408
+ }
409
+
410
+ const providerLabel = model.provider === 'local'
411
+ ? `LOCAL ${this.localServerStatus?.connected ? '(✓)' : '(✗)'}`
412
+ : model.provider.toUpperCase();
413
+
414
+ modelDiv.innerHTML = `
415
+ <input type="checkbox" id="model-${model.id}" ${isDisabled ? 'disabled' : ''} />
416
+ <div class="model-name">${model.name}</div>
417
+ <div class="model-provider">${providerLabel}</div>
418
+ <div class="model-id">${model.id}</div>
419
+ `;
420
+
421
+ const checkbox = modelDiv.querySelector('input');
422
+ checkbox.addEventListener('change', (e) => {
423
+ if (e.target.checked) {
424
+ this.selectedModels.add(model);
425
+ modelDiv.classList.add('selected');
426
+ } else {
427
+ this.selectedModels.delete(model);
428
+ modelDiv.classList.remove('selected');
429
+ }
430
+ this.updateControlsState();
431
+ });
432
+
433
+ if (!isDisabled) {
434
+ modelDiv.addEventListener('click', (e) => {
435
+ if (e.target !== checkbox) {
436
+ checkbox.click();
437
+ }
438
+ });
439
+ }
440
+
441
+ grid.appendChild(modelDiv);
442
+ });
443
+ }
444
+
445
+ setupEventListeners() {
446
+ document.getElementById('selectAllBtn').addEventListener('click', () => {
447
+ this.selectAllModels();
448
+ });
449
+
450
+ document.getElementById('clearAllBtn').addEventListener('click', () => {
451
+ this.clearAllModels();
452
+ });
453
+
454
+ document.getElementById('startTestBtn').addEventListener('click', () => {
455
+ this.startComprehensiveTest();
456
+ });
457
+
458
+ document.getElementById('testGameBtn').addEventListener('click', () => {
459
+ this.startGameTest();
460
+ });
461
+ }
462
+
463
+ selectAllModels() {
464
+ this.framework.models.forEach(model => {
465
+ this.selectedModels.add(model);
466
+ const modelDiv = document.querySelector(`[data-model-id="${model.id}"]`);
467
+ const checkbox = modelDiv.querySelector('input');
468
+ checkbox.checked = true;
469
+ modelDiv.classList.add('selected');
470
+ });
471
+ this.updateControlsState();
472
+ }
473
+
474
+ clearAllModels() {
475
+ this.selectedModels.clear();
476
+ document.querySelectorAll('.model-option').forEach(div => {
477
+ div.classList.remove('selected');
478
+ div.querySelector('input').checked = false;
479
+ });
480
+ this.updateControlsState();
481
+ }
482
+
483
+ updateControlsState() {
484
+ const hasSelection = this.selectedModels.size > 0;
485
+ document.getElementById('startTestBtn').disabled = !hasSelection || this.isTestingInProgress;
486
+ document.getElementById('testGameBtn').disabled = this.selectedModels.size !== 1 || this.isTestingInProgress;
487
+ }
488
+
489
+ async startComprehensiveTest() {
490
+ if (this.selectedModels.size === 0) {
491
+ alert('Please select at least one model to test.');
492
+ return;
493
+ }
494
+
495
+ this.isTestingInProgress = true;
496
+ this.updateControlsState();
497
+
498
+ const progressSection = document.getElementById('progressSection');
499
+ const progressFill = document.getElementById('progressFill');
500
+ const statusMessage = document.getElementById('statusMessage');
501
+ const testLog = document.getElementById('testLog');
502
+
503
+ progressSection.classList.add('active');
504
+ testLog.textContent = '';
505
+
506
+ const modelsArray = Array.from(this.selectedModels);
507
+ let completedTests = 0;
508
+
509
+ try {
510
+ for (let i = 0; i < modelsArray.length; i++) {
511
+ const model = modelsArray[i];
512
+ const progress = (i / modelsArray.length) * 100;
513
+
514
+ progressFill.style.width = `${progress}%`;
515
+ statusMessage.textContent = `Testing ${model.name} (${i + 1}/${modelsArray.length})...`;
516
+
517
+ this.log(`Starting test for ${model.name}...`);
518
+
519
+ try {
520
+ const result = await this.framework.testModel(model);
521
+ this.log(`✓ ${model.name} completed - Score: ${result.overallScore.toFixed(1)}`);
522
+ completedTests++;
523
+ } catch (error) {
524
+ this.log(`✗ ${model.name} failed: ${error.message}`);
525
+ }
526
+
527
+ progressFill.style.width = `${((i + 1) / modelsArray.length) * 100}%`;
528
+ }
529
+
530
+ statusMessage.textContent = `Testing completed! ${completedTests}/${modelsArray.length} models tested successfully.`;
531
+ this.log(`\\nTesting completed! Results saved to output folder.`);
532
+
533
+ // Show results
534
+ this.displayResults();
535
+
536
+ } catch (error) {
537
+ this.log(`\\nTesting failed: ${error.message}`);
538
+ statusMessage.textContent = 'Testing failed. Check the log for details.';
539
+ } finally {
540
+ this.isTestingInProgress = false;
541
+ this.updateControlsState();
542
+ }
543
+ }
544
+
545
+ startGameTest() {
546
+ if (this.selectedModels.size !== 1) {
547
+ alert('Please select exactly one model for game testing.');
548
+ return;
549
+ }
550
+
551
+ const selectedModel = Array.from(this.selectedModels)[0];
552
+ const gameSection = document.getElementById('gameSection');
553
+ const gameFrame = document.getElementById('gameFrame');
554
+
555
+ // Construct URL with model parameter
556
+ const gameUrl = `index.html?testModel=${encodeURIComponent(selectedModel.id)}&testMode=true`;
557
+ if (selectedModel.provider === 'local') {
558
+ gameUrl += '&local=true';
559
+ }
560
+
561
+ gameFrame.src = gameUrl;
562
+ gameSection.classList.add('active');
563
+
564
+ this.log(`Starting game test with ${selectedModel.name}...`);
565
+ }
566
+
567
+ displayResults() {
568
+ const resultsSection = document.getElementById('resultsSection');
569
+ const resultsGrid = document.getElementById('resultsGrid');
570
+
571
+ resultsGrid.innerHTML = '';
572
+
573
+ this.framework.testResults.tests.forEach(result => {
574
+ const card = document.createElement('div');
575
+ card.className = 'result-card';
576
+
577
+ const overallScoreClass = this.getScoreClass(result.overallScore);
578
+
579
+ card.innerHTML = `
580
+ <h3>${result.modelName}</h3>
581
+ <div class="metric">
582
+ <span class="metric-label">Overall Score</span>
583
+ <span class="metric-value ${overallScoreClass}">${result.overallScore?.toFixed(1) || 'N/A'}</span>
584
+ </div>
585
+ <div class="metric">
586
+ <span class="metric-label">Word Selection Success</span>
587
+ <span class="metric-value">${(result.wordSelection?.successRate * 100)?.toFixed(1) || 'N/A'}%</span>
588
+ </div>
589
+ <div class="metric">
590
+ <span class="metric-label">Contextualization Success</span>
591
+ <span class="metric-value">${(result.contextualization?.successRate * 100)?.toFixed(1) || 'N/A'}%</span>
592
+ </div>
593
+ <div class="metric">
594
+ <span class="metric-label">Chat Hints Success</span>
595
+ <span class="metric-value">${(result.chatHints?.successRate * 100)?.toFixed(1) || 'N/A'}%</span>
596
+ </div>
597
+ <div class="metric">
598
+ <span class="metric-label">Average Response Time</span>
599
+ <span class="metric-value">${result.wordSelection?.averageTime?.toFixed(0) || 'N/A'}ms</span>
600
+ </div>
601
+ `;
602
+
603
+ resultsGrid.appendChild(card);
604
+ });
605
+
606
+ resultsSection.classList.add('active');
607
+ }
608
+
609
+ getScoreClass(score) {
610
+ if (score >= 80) return 'score-high';
611
+ if (score >= 60) return 'score-medium';
612
+ return 'score-low';
613
+ }
614
+
615
+ log(message) {
616
+ const testLog = document.getElementById('testLog');
617
+ const timestamp = new Date().toLocaleTimeString();
618
+ testLog.textContent += `[${timestamp}] ${message}\\n`;
619
+ testLog.scrollTop = testLog.scrollHeight;
620
+ }
621
+ }
622
+
623
+ // Initialize the testing UI when the page loads
624
+ window.addEventListener('DOMContentLoaded', () => {
625
+ new ModelTestingUI();
626
+ });
627
+ </script>
628
+ </body>
629
+ </html>
src/aiService.js CHANGED
@@ -4,12 +4,17 @@ class OpenRouterService {
4
  this.isLocalMode = this.checkLocalMode();
5
  this.apiUrl = this.isLocalMode ? 'http://localhost:1234/v1/chat/completions' : 'https://openrouter.ai/api/v1/chat/completions';
6
  this.apiKey = this.getApiKey();
7
- this.model = this.isLocalMode ? 'gemma-3-12b' : 'google/gemma-3-27b-it:free';
 
 
 
 
8
 
9
  console.log('AI Service initialized:', {
10
  mode: this.isLocalMode ? 'Local LLM' : 'OpenRouter',
11
  url: this.apiUrl,
12
- model: this.model
 
13
  });
14
  }
15
 
@@ -86,15 +91,18 @@ class OpenRouterService {
86
  method: 'POST',
87
  headers,
88
  body: JSON.stringify({
89
- model: this.model,
90
  messages: [{
 
 
 
91
  role: 'user',
92
- content: `You provide clues for word puzzles. You will be told the target word that players need to guess, but you must NEVER mention, spell, or reveal that word in your response. Follow the EXACT format requested. Be concise and direct about the target word without revealing it. Use plain text only - no bold, italics, asterisks, or markdown formatting. Stick to word limits.
93
-
94
- ${prompt}`
95
  }],
96
- max_tokens: 50,
97
- temperature: 0.6
 
 
98
  })
99
  });
100
 
@@ -104,19 +112,73 @@ ${prompt}`
104
 
105
  const data = await response.json();
106
 
 
 
107
  // Check if data and choices exist before accessing
108
  if (!data || !data.choices || data.choices.length === 0) {
109
  console.error('Invalid API response structure:', data);
110
  return 'Unable to generate hint at this time';
111
  }
112
 
113
- // Check if message content exists
114
- if (!data.choices[0].message || !data.choices[0].message.content) {
115
- console.error('No content in API response');
116
  return 'Unable to generate hint at this time';
117
  }
118
 
119
- let content = data.choices[0].message.content.trim();
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
120
 
121
  // Clean up AI response artifacts
122
  content = content
@@ -176,35 +238,20 @@ ${prompt}`
176
  'X-Title': 'Cloze Reader'
177
  },
178
  body: JSON.stringify({
179
- model: this.model,
180
  messages: [{
 
 
 
181
  role: 'user',
182
- content: `You are a cluemaster vocabulary selector for educational cloze exercises. Select exactly ${count} words from this passage for a cloze exercise.
183
-
184
- DIFFICULTY LEVEL ${level}:
185
- ${difficultyGuidance}
186
-
187
- CLOZE DELETION PRINCIPLES:
188
- - Select words that require understanding context and vocabulary to identify
189
- - Choose words essential for comprehension that test language ability
190
- - Target words where deletion creates meaningful cognitive gaps
191
-
192
- REQUIREMENTS:
193
- - Choose clear, properly-spelled words (no OCR errors like "andsatires")
194
- - Select meaningful nouns, verbs, or adjectives (${wordLengthConstraint})
195
- - Words must appear EXACTLY as written in the passage
196
- - Avoid: capitalized words, ALL-CAPS words, function words, archaic terms, proper nouns, technical jargon
197
- - Skip any words that look malformed or concatenated
198
- - Avoid dated or potentially offensive terms
199
- - PREFER words from the middle portions of the passage when possible
200
- - If struggling to find ${count} perfect words, prioritize returning SOMETHING over returning nothing
201
-
202
- Return ONLY a JSON array of the selected words.
203
 
204
  Passage: "${passage}"`
205
  }],
206
- max_tokens: 100,
207
- temperature: 0.3
 
 
208
  })
209
  });
210
 
@@ -220,13 +267,35 @@ Passage: "${passage}"`
220
  throw new Error(`OpenRouter API error: ${data.error.message || JSON.stringify(data.error)}`);
221
  }
222
 
 
 
 
223
  // Check if response has expected structure
224
- if (!data.choices || !data.choices[0] || !data.choices[0].message || !data.choices[0].message.content) {
225
  console.error('Invalid word selection API response structure:', data);
226
- throw new Error('API response missing expected content');
 
227
  }
228
 
229
- let content = data.choices[0].message.content.trim();
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
230
 
231
  // Clean up local LLM artifacts
232
  if (this.isLocalMode) {
@@ -237,33 +306,55 @@ Passage: "${passage}"`
237
  try {
238
  let words;
239
 
240
- // For local LLM, try different parsing strategies
241
- if (this.isLocalMode) {
242
- // Try JSON parse first
243
- try {
 
 
 
244
  words = JSON.parse(content);
245
- } catch {
246
- // If not JSON, try comma-separated
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
247
  if (content.includes(',')) {
248
  words = content.split(',').map(w => w.trim());
249
  } else {
250
  // Single word
251
  words = [content.trim()];
252
  }
 
 
253
  }
254
- } else {
255
- words = JSON.parse(content);
256
  }
257
 
258
  if (Array.isArray(words)) {
259
- // Filter problematic words and validate word lengths based on level
260
- const problematicWords = ['negro', 'retard', 'retarded', 'nigger', 'chinaman', 'jap', 'gypsy', 'savage', 'primitive', 'heathen'];
261
  const validWords = words.filter(word => {
262
  const cleanWord = word.replace(/[^a-zA-Z]/g, '');
263
- const lowerWord = cleanWord.toLowerCase();
264
-
265
- // Skip problematic words
266
- if (problematicWords.includes(lowerWord)) return false;
267
 
268
  // Check length constraints
269
  if (level <= 2) {
@@ -288,14 +379,9 @@ Passage: "${passage}"`
288
  const matches = content.match(/"([^"]+)"/g);
289
  if (matches) {
290
  const words = matches.map(m => m.replace(/"/g, ''));
291
- // Filter problematic words and validate word lengths
292
- const problematicWords = ['negro', 'retard', 'retarded', 'nigger', 'chinaman', 'jap', 'gypsy', 'savage', 'primitive', 'heathen'];
293
  const validWords = words.filter(word => {
294
  const cleanWord = word.replace(/[^a-zA-Z]/g, '');
295
- const lowerWord = cleanWord.toLowerCase();
296
-
297
- // Skip problematic words
298
- if (problematicWords.includes(lowerWord)) return false;
299
 
300
  // Check length constraints
301
  if (level <= 2) {
@@ -368,45 +454,25 @@ Passage: "${passage}"`
368
  headers,
369
  signal: controller.signal,
370
  body: JSON.stringify({
371
- model: this.model,
372
  messages: [{
 
 
 
373
  role: 'user',
374
- content: `You process passages for cloze reading exercises. For each passage: 1) Select words for blanks, 2) Generate a contextual introduction. Return a JSON object with both passages' data.
375
 
376
- DIFFICULTY LEVEL ${level}:
377
- ${difficultyGuidance}
378
 
379
- Process these two passages for cloze exercises:
 
380
 
381
- PASSAGE 1:
382
- Title: "${book1.title}" by ${book1.author}
383
- Text: "${passage1}"
384
- Select ${blanksPerPassage} words for blanks.
385
-
386
- PASSAGE 2:
387
- Title: "${book2.title}" by ${book2.author}
388
- Text: "${passage2}"
389
- Select ${blanksPerPassage} words for blanks.
390
-
391
- SELECTION RULES:
392
- - Select EXACTLY ${blanksPerPassage} word${blanksPerPassage > 1 ? 's' : ''} per passage, no more, no less
393
- - Choose meaningful nouns, verbs, or adjectives (${wordLengthConstraint})
394
- - Avoid capitalized words, ALL-CAPS words, and table of contents entries
395
- - Avoid dated or potentially offensive terms
396
- - NEVER select words from the first or last sentence/clause of each passage
397
- - Choose words from the middle portions for better context dependency
398
- - Words must appear EXACTLY as written in the passage
399
-
400
- For each passage return:
401
- - "words": array of EXACTLY ${blanksPerPassage} selected word${blanksPerPassage > 1 ? 's' : ''} (exactly as they appear in the text)
402
- - "context": one-sentence intro about the book/author
403
-
404
- CRITICAL: The "words" array must contain exactly ${blanksPerPassage} element${blanksPerPassage > 1 ? 's' : ''} for each passage.
405
-
406
- Return as JSON: {"passage1": {...}, "passage2": {...}}`
407
  }],
408
  max_tokens: 800,
409
- temperature: 0.5
 
410
  })
411
  });
412
 
@@ -425,13 +491,34 @@ Return as JSON: {"passage1": {...}, "passage2": {...}}`
425
  throw new Error(`OpenRouter API error: ${data.error.message || JSON.stringify(data.error)}`);
426
  }
427
 
 
 
428
  // Check if response has expected structure
429
- if (!data.choices || !data.choices[0] || !data.choices[0].message || !data.choices[0].message.content) {
430
  console.error('Invalid batch API response structure:', data);
431
- throw new Error('API response missing expected content');
 
 
 
 
 
 
 
 
 
 
 
 
 
 
432
  }
433
 
434
- const content = data.choices[0].message.content.trim();
 
 
 
 
 
435
 
436
  try {
437
  // Try to extract JSON from the response
@@ -491,15 +578,10 @@ Return as JSON: {"passage1": {...}, "passage2": {...}}`
491
  parsed.passage1.words = parsed.passage1.words.filter(word => word && word.trim() !== '');
492
  parsed.passage2.words = parsed.passage2.words.filter(word => word && word.trim() !== '');
493
 
494
- // Filter problematic words and validate word lengths based on level
495
  const validateWords = (words, passageText) => {
496
- const problematicWords = ['negro', 'retard', 'retarded', 'nigger', 'chinaman', 'jap', 'gypsy', 'savage', 'primitive', 'heathen'];
497
  return words.filter(word => {
498
  const cleanWord = word.replace(/[^a-zA-Z]/g, '');
499
- const lowerWord = cleanWord.toLowerCase();
500
-
501
- // Skip problematic words
502
- if (problematicWords.includes(lowerWord)) return false;
503
 
504
  // Check if word appears in all caps in the passage (like "VOLUME")
505
  if (passageText.includes(word.toUpperCase()) && word === word.toUpperCase()) {
@@ -609,13 +691,17 @@ Return as JSON: {"passage1": {...}, "passage2": {...}}`
609
  'X-Title': 'Cloze Reader'
610
  },
611
  body: JSON.stringify({
612
- model: this.model,
613
  messages: [{
 
 
 
614
  role: 'user',
615
- content: `You are a historical and literary expert of public domain entries in Project Gutenberg. Write one factual sentence about "${title}" by ${author}. Focus on what type of work it is, when it was written, or its historical significance. Be accurate and concise.`
616
  }],
617
- max_tokens: 80,
618
- temperature: 0.2
 
619
  })
620
  });
621
 
@@ -633,13 +719,34 @@ Return as JSON: {"passage1": {...}, "passage2": {...}}`
633
  throw new Error(`OpenRouter API error: ${data.error.message || JSON.stringify(data.error)}`);
634
  }
635
 
 
 
636
  // Check if response has expected structure
637
- if (!data.choices || !data.choices[0] || !data.choices[0].message || !data.choices[0].message.content) {
638
  console.error('Invalid contextualization API response structure:', data);
639
- throw new Error('API response missing expected content');
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
640
  }
641
 
642
- let content = data.choices[0].message.content.trim();
643
 
644
  // Clean up AI response artifacts
645
  content = content
 
4
  this.isLocalMode = this.checkLocalMode();
5
  this.apiUrl = this.isLocalMode ? 'http://localhost:1234/v1/chat/completions' : 'https://openrouter.ai/api/v1/chat/completions';
6
  this.apiKey = this.getApiKey();
7
+
8
+ // Dual model configuration: Gemma-3-27b for hints/query-answering, Gemma-3-12b for everything else
9
+ this.hintModel = this.isLocalMode ? 'gemma-3-12b' : 'google/gemma-3-27b-it';
10
+ this.primaryModel = this.isLocalMode ? 'gemma-3-12b' : 'google/gemma-3-12b-it';
11
+ this.model = this.primaryModel; // Default model for backward compatibility
12
 
13
  console.log('AI Service initialized:', {
14
  mode: this.isLocalMode ? 'Local LLM' : 'OpenRouter',
15
  url: this.apiUrl,
16
+ primaryModel: this.primaryModel,
17
+ hintModel: this.hintModel
18
  });
19
  }
20
 
 
91
  method: 'POST',
92
  headers,
93
  body: JSON.stringify({
94
+ model: this.hintModel, // Use Gemma-3-27b for hints
95
  messages: [{
96
+ role: 'system',
97
+ content: 'You are a helpful assistant that provides hints for word puzzles. Never reveal the answer word directly.'
98
+ }, {
99
  role: 'user',
100
+ content: prompt
 
 
101
  }],
102
+ max_tokens: 150,
103
+ temperature: 0.7,
104
+ // Try to disable reasoning mode for hints
105
+ response_format: { type: "text" }
106
  })
107
  });
108
 
 
112
 
113
  const data = await response.json();
114
 
115
+ console.log('Hint API response:', JSON.stringify(data, null, 2));
116
+
117
  // Check if data and choices exist before accessing
118
  if (!data || !data.choices || data.choices.length === 0) {
119
  console.error('Invalid API response structure:', data);
120
  return 'Unable to generate hint at this time';
121
  }
122
 
123
+ // Check if message exists
124
+ if (!data.choices[0].message) {
125
+ console.error('No message in API response');
126
  return 'Unable to generate hint at this time';
127
  }
128
 
129
+ // OSS-20B model returns content in 'reasoning' field when using reasoning mode
130
+ let content = data.choices[0].message.content || '';
131
+
132
+ // If content is empty, check for reasoning field
133
+ if (!content && data.choices[0].message.reasoning) {
134
+ content = data.choices[0].message.reasoning;
135
+ }
136
+
137
+ // Still no content? Check reasoning_details
138
+ if (!content && data.choices[0].message.reasoning_details?.length > 0) {
139
+ content = data.choices[0].message.reasoning_details[0].text;
140
+ }
141
+
142
+ if (!content) {
143
+ console.error('No content found in hint response');
144
+ // Provide a generic hint based on the prompt type
145
+ if (prompt.toLowerCase().includes('synonym')) {
146
+ return 'Think of a word that means something similar';
147
+ } else if (prompt.toLowerCase().includes('definition')) {
148
+ return 'Consider what this word means in context';
149
+ } else if (prompt.toLowerCase().includes('category')) {
150
+ return 'Think about what type or category this word belongs to';
151
+ } else {
152
+ return 'Consider the context around the blank';
153
+ }
154
+ }
155
+
156
+ content = content.trim();
157
+
158
+ // For OSS-20B, extract hint from reasoning text if needed
159
+ if (content.includes('The user') || content.includes('We need to')) {
160
+ // This looks like reasoning text, try to extract the actual hint
161
+ // Look for text about synonyms, definitions, or clues
162
+ const hintPatterns = [
163
+ /synonym[s]?.*?(?:is|are|include[s]?|would be)\s+([^.]+)/i,
164
+ /means?\s+([^.]+)/i,
165
+ /refers? to\s+([^.]+)/i,
166
+ /describes?\s+([^.]+)/i,
167
+ ];
168
+
169
+ for (const pattern of hintPatterns) {
170
+ const match = content.match(pattern);
171
+ if (match) {
172
+ content = match[1];
173
+ break;
174
+ }
175
+ }
176
+
177
+ // If still has reasoning markers, just return a fallback
178
+ if (content.includes('The user') || content.includes('We need to')) {
179
+ return 'Think about words that mean something similar';
180
+ }
181
+ }
182
 
183
  // Clean up AI response artifacts
184
  content = content
 
238
  'X-Title': 'Cloze Reader'
239
  },
240
  body: JSON.stringify({
241
+ model: this.primaryModel, // Use Gemma-3-12b for word selection
242
  messages: [{
243
+ role: 'system',
244
+ content: 'Select words for a cloze exercise. Return ONLY a JSON array of words, nothing else.'
245
+ }, {
246
  role: 'user',
247
+ content: `Select ${count} ${level <= 2 ? 'easy' : level <= 4 ? 'medium' : 'challenging'} words (${wordLengthConstraint}) from this passage. Choose meaningful nouns, verbs, or adjectives. Avoid capitalized words and proper nouns.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
248
 
249
  Passage: "${passage}"`
250
  }],
251
+ max_tokens: 200,
252
+ temperature: 0.5,
253
+ // Try to disable reasoning mode for word selection
254
+ response_format: { type: "text" }
255
  })
256
  });
257
 
 
267
  throw new Error(`OpenRouter API error: ${data.error.message || JSON.stringify(data.error)}`);
268
  }
269
 
270
+ // Log the full response to debug structure
271
+ console.log('Full API response:', JSON.stringify(data, null, 2));
272
+
273
  // Check if response has expected structure
274
+ if (!data.choices || !data.choices[0] || !data.choices[0].message) {
275
  console.error('Invalid word selection API response structure:', data);
276
+ console.error('Choices[0]:', data.choices?.[0]);
277
+ throw new Error('API response missing expected structure');
278
  }
279
 
280
+ // OSS-20B model returns content in 'reasoning' field when using reasoning mode
281
+ let content = data.choices[0].message.content || '';
282
+
283
+ // If content is empty, check for reasoning field
284
+ if (!content && data.choices[0].message.reasoning) {
285
+ content = data.choices[0].message.reasoning;
286
+ }
287
+
288
+ // Still no content? Check reasoning_details
289
+ if (!content && data.choices[0].message.reasoning_details?.length > 0) {
290
+ content = data.choices[0].message.reasoning_details[0].text;
291
+ }
292
+
293
+ if (!content) {
294
+ console.error('No content found in API response');
295
+ throw new Error('API response missing content');
296
+ }
297
+
298
+ content = content.trim();
299
 
300
  // Clean up local LLM artifacts
301
  if (this.isLocalMode) {
 
306
  try {
307
  let words;
308
 
309
+ // Try to parse JSON first
310
+ try {
311
+ // Check if content contains JSON array anywhere in it
312
+ const jsonMatch = content.match(/\[[\s\S]*?\]/);
313
+ if (jsonMatch) {
314
+ words = JSON.parse(jsonMatch[0]);
315
+ } else {
316
  words = JSON.parse(content);
317
+ }
318
+ } catch {
319
+ // If not JSON, check if this is reasoning text from OSS-20B
320
+ if (content.includes('pick') || content.includes('Let\'s')) {
321
+ // Extract words from reasoning text
322
+ // Look for quoted words or words after "pick"
323
+ const quotedWords = content.match(/"([^"]+)"/g);
324
+ if (quotedWords) {
325
+ words = quotedWords.map(w => w.replace(/"/g, ''));
326
+ } else {
327
+ // Look for pattern like "Let's pick 'word'" or "pick word"
328
+ const pickMatch = content.match(/pick\s+['"]?(\w+)['"]?/i);
329
+ if (pickMatch) {
330
+ words = [pickMatch[1]];
331
+ } else {
332
+ // For local LLM, try comma-separated
333
+ if (this.isLocalMode && content.includes(',')) {
334
+ words = content.split(',').map(w => w.trim());
335
+ } else {
336
+ // Single word
337
+ words = [content.trim()];
338
+ }
339
+ }
340
+ }
341
+ } else if (this.isLocalMode) {
342
+ // For local LLM, try comma-separated
343
  if (content.includes(',')) {
344
  words = content.split(',').map(w => w.trim());
345
  } else {
346
  // Single word
347
  words = [content.trim()];
348
  }
349
+ } else {
350
+ throw new Error('Could not parse words from response');
351
  }
 
 
352
  }
353
 
354
  if (Array.isArray(words)) {
355
+ // Validate word lengths based on level
 
356
  const validWords = words.filter(word => {
357
  const cleanWord = word.replace(/[^a-zA-Z]/g, '');
 
 
 
 
358
 
359
  // Check length constraints
360
  if (level <= 2) {
 
379
  const matches = content.match(/"([^"]+)"/g);
380
  if (matches) {
381
  const words = matches.map(m => m.replace(/"/g, ''));
382
+ // Validate word lengths
 
383
  const validWords = words.filter(word => {
384
  const cleanWord = word.replace(/[^a-zA-Z]/g, '');
 
 
 
 
385
 
386
  // Check length constraints
387
  if (level <= 2) {
 
454
  headers,
455
  signal: controller.signal,
456
  body: JSON.stringify({
457
+ model: this.primaryModel, // Use Gemma-3-12b for batch processing
458
  messages: [{
459
+ role: 'system',
460
+ content: 'Process passages for cloze exercises. Return ONLY a JSON object.'
461
+ }, {
462
  role: 'user',
463
+ content: `Select ${blanksPerPassage} ${level <= 2 ? 'easy' : level <= 4 ? 'medium' : 'challenging'} words (${wordLengthConstraint}) from each passage.
464
 
465
+ Passage 1 ("${book1.title}" by ${book1.author}):
466
+ ${passage1}
467
 
468
+ Passage 2 ("${book2.title}" by ${book2.author}):
469
+ ${passage2}
470
 
471
+ Return JSON: {"passage1": {"words": [${blanksPerPassage} words], "context": "one sentence about book"}, "passage2": {"words": [${blanksPerPassage} words], "context": "one sentence about book"}}`
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
472
  }],
473
  max_tokens: 800,
474
+ temperature: 0.5,
475
+ response_format: { type: "text" }
476
  })
477
  });
478
 
 
491
  throw new Error(`OpenRouter API error: ${data.error.message || JSON.stringify(data.error)}`);
492
  }
493
 
494
+ console.log('Batch API response:', JSON.stringify(data, null, 2));
495
+
496
  // Check if response has expected structure
497
+ if (!data.choices || !data.choices[0] || !data.choices[0].message) {
498
  console.error('Invalid batch API response structure:', data);
499
+ console.error('Choices[0]:', data.choices?.[0]);
500
+ throw new Error('API response missing expected structure');
501
+ }
502
+
503
+ // OSS-20B model returns content in 'reasoning' field when using reasoning mode
504
+ let content = data.choices[0].message.content || '';
505
+
506
+ // If content is empty, check for reasoning field
507
+ if (!content && data.choices[0].message.reasoning) {
508
+ content = data.choices[0].message.reasoning;
509
+ }
510
+
511
+ // Still no content? Check reasoning_details
512
+ if (!content && data.choices[0].message.reasoning_details?.length > 0) {
513
+ content = data.choices[0].message.reasoning_details[0].text;
514
  }
515
 
516
+ if (!content) {
517
+ console.error('No content found in batch API response');
518
+ throw new Error('API response missing content');
519
+ }
520
+
521
+ content = content.trim();
522
 
523
  try {
524
  // Try to extract JSON from the response
 
578
  parsed.passage1.words = parsed.passage1.words.filter(word => word && word.trim() !== '');
579
  parsed.passage2.words = parsed.passage2.words.filter(word => word && word.trim() !== '');
580
 
581
+ // Validate word lengths based on level
582
  const validateWords = (words, passageText) => {
 
583
  return words.filter(word => {
584
  const cleanWord = word.replace(/[^a-zA-Z]/g, '');
 
 
 
 
585
 
586
  // Check if word appears in all caps in the passage (like "VOLUME")
587
  if (passageText.includes(word.toUpperCase()) && word === word.toUpperCase()) {
 
691
  'X-Title': 'Cloze Reader'
692
  },
693
  body: JSON.stringify({
694
+ model: this.primaryModel, // Use Gemma-3-12b for contextualization
695
  messages: [{
696
+ role: 'system',
697
+ content: 'Write one factual sentence about the given literary work.'
698
+ }, {
699
  role: 'user',
700
+ content: `"${title}" by ${author}`
701
  }],
702
+ max_tokens: 150,
703
+ temperature: 0.5,
704
+ response_format: { type: "text" }
705
  })
706
  });
707
 
 
719
  throw new Error(`OpenRouter API error: ${data.error.message || JSON.stringify(data.error)}`);
720
  }
721
 
722
+ console.log('Context API response:', JSON.stringify(data, null, 2));
723
+
724
  // Check if response has expected structure
725
+ if (!data.choices || !data.choices[0] || !data.choices[0].message) {
726
  console.error('Invalid contextualization API response structure:', data);
727
+ console.error('Choices[0]:', data.choices?.[0]);
728
+ throw new Error('API response missing expected structure');
729
+ }
730
+
731
+ // OSS-20B model returns content in 'reasoning' field when using reasoning mode
732
+ let content = data.choices[0].message.content || '';
733
+
734
+ // If content is empty, check for reasoning field
735
+ if (!content && data.choices[0].message.reasoning) {
736
+ content = data.choices[0].message.reasoning;
737
+ }
738
+
739
+ // Still no content? Check reasoning_details
740
+ if (!content && data.choices[0].message.reasoning_details?.length > 0) {
741
+ content = data.choices[0].message.reasoning_details[0].text;
742
+ }
743
+
744
+ if (!content) {
745
+ console.error('No content found in context API response');
746
+ throw new Error('API response missing content');
747
  }
748
 
749
+ content = content.trim();
750
 
751
  // Clean up AI response artifacts
752
  content = content
src/app.js CHANGED
@@ -72,7 +72,8 @@ class App {
72
 
73
  // Show level information
74
  const blanksCount = roundData.blanks.length;
75
- const levelInfo = `Level ${this.game.currentLevel} ${blanksCount} blank${blanksCount > 1 ? 's' : ''}`;
 
76
 
77
  this.elements.roundInfo.innerHTML = levelInfo;
78
 
@@ -155,22 +156,19 @@ class App {
155
  }
156
 
157
  displayResults(results) {
158
- let message = `Score: ${results.correct}/${results.total} (${results.percentage}%)`;
159
-
160
- // Show "Required" information at all levels for consistency
161
- message += ` - Required: ${results.requiredCorrect}/${results.total}`;
162
 
163
  if (results.passed) {
164
  // Check if this completes the requirements for level advancement
165
  const roundsCompleted = this.game.roundsPassedAtCurrentLevel + 1; // +1 for this round
166
  if (roundsCompleted >= 2) {
167
- message += ` - Excellent! Advancing to Level ${this.game.currentLevel + 1}! 🎉`;
168
  } else {
169
- message += ` - Great job! ${roundsCompleted}/2 rounds completed for Level ${this.game.currentLevel}`;
170
  }
171
  this.elements.result.className = 'mt-4 text-center font-semibold text-green-600';
172
  } else {
173
- message += ` - Need ${results.requiredCorrect} correct to advance. Keep practicing! 💪`;
174
  this.elements.result.className = 'mt-4 text-center font-semibold text-red-600';
175
  }
176
 
 
72
 
73
  // Show level information
74
  const blanksCount = roundData.blanks.length;
75
+ const passageNumber = this.game.currentPassageIndex + 1;
76
+ const levelInfo = `Level ${this.game.currentLevel} • Passage ${passageNumber}/2 • ${blanksCount} blank${blanksCount > 1 ? 's' : ''}`;
77
 
78
  this.elements.roundInfo.innerHTML = levelInfo;
79
 
 
156
  }
157
 
158
  displayResults(results) {
159
+ let message = `Score: ${results.correct}/${results.total}`;
 
 
 
160
 
161
  if (results.passed) {
162
  // Check if this completes the requirements for level advancement
163
  const roundsCompleted = this.game.roundsPassedAtCurrentLevel + 1; // +1 for this round
164
  if (roundsCompleted >= 2) {
165
+ message += ` Level ${this.game.currentLevel + 1} unlocked!`;
166
  } else {
167
+ message += ` Passed (1 more round needed for next level)`;
168
  }
169
  this.elements.result.className = 'mt-4 text-center font-semibold text-green-600';
170
  } else {
171
+ message += ` - Try again (need ${results.requiredCorrect}/${results.total})`;
172
  this.elements.result.className = 'mt-4 text-center font-semibold text-red-600';
173
  }
174
 
src/clozeGameEngine.js CHANGED
@@ -885,17 +885,17 @@ class ClozeGame {
885
  // Track successful rounds and advance level after 2 successful rounds
886
  if (roundPassed) {
887
  this.roundsPassedAtCurrentLevel++;
888
- console.log(`Round passed! Total rounds passed at level ${this.currentLevel}: ${this.roundsPassedAtCurrentLevel}`);
889
 
890
  // Advance level after 2 successful rounds
891
  if (this.roundsPassedAtCurrentLevel >= 2) {
892
  this.currentLevel++;
893
  this.roundsPassedAtCurrentLevel = 0; // Reset counter for new level
894
- console.log(`Advancing to level ${this.currentLevel} after 2 successful rounds`);
895
  }
896
  } else {
897
  // Failed round - do not reset the counter, user must accumulate 2 passes
898
- console.log(`Round failed. Still need ${2 - this.roundsPassedAtCurrentLevel} more passed round(s) to advance from level ${this.currentLevel}`);
899
  }
900
 
901
  // Clear chat conversations for new round
 
885
  // Track successful rounds and advance level after 2 successful rounds
886
  if (roundPassed) {
887
  this.roundsPassedAtCurrentLevel++;
888
+ console.log(`Round passed at level ${this.currentLevel}`);
889
 
890
  // Advance level after 2 successful rounds
891
  if (this.roundsPassedAtCurrentLevel >= 2) {
892
  this.currentLevel++;
893
  this.roundsPassedAtCurrentLevel = 0; // Reset counter for new level
894
+ console.log(`Advanced to level ${this.currentLevel}`);
895
  }
896
  } else {
897
  // Failed round - do not reset the counter, user must accumulate 2 passes
898
+ console.log(`Round not passed. Need ${2 - this.roundsPassedAtCurrentLevel} more round(s) to advance`);
899
  }
900
 
901
  // Clear chat conversations for new round
src/modelTestingFramework.js ADDED
@@ -0,0 +1,703 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /**
2
+ * Comprehensive Model Testing Framework for Cloze Reader
3
+ * Tests all AI-powered features across different models
4
+ */
5
+
6
+ class ModelTestingFramework {
7
+ constructor() {
8
+ this.models = [
9
+ // OpenRouter Models
10
+ { id: 'openai/gpt-4o', name: 'GPT-4o', provider: 'openrouter' },
11
+ { id: 'openai/gpt-4o-mini', name: 'GPT-4o Mini', provider: 'openrouter' },
12
+ { id: 'anthropic/claude-3.5-sonnet', name: 'Claude 3.5 Sonnet', provider: 'openrouter' },
13
+ { id: 'anthropic/claude-3-haiku', name: 'Claude 3 Haiku', provider: 'openrouter' },
14
+ { id: 'google/gemini-pro-1.5', name: 'Gemini Pro 1.5', provider: 'openrouter' },
15
+ { id: 'meta-llama/llama-3.1-8b-instruct', name: 'Llama 3.1 8B', provider: 'openrouter' },
16
+ { id: 'meta-llama/llama-3.1-70b-instruct', name: 'Llama 3.1 70B', provider: 'openrouter' },
17
+ { id: 'mistralai/mistral-7b-instruct', name: 'Mistral 7B', provider: 'openrouter' },
18
+ { id: 'microsoft/phi-3-medium-4k-instruct', name: 'Phi-3 Medium', provider: 'openrouter' },
19
+ { id: 'qwen/qwen-2-7b-instruct', name: 'Qwen 2 7B', provider: 'openrouter' },
20
+
21
+ // Local LLM Models (LM Studio compatible)
22
+ { id: 'local-llm', name: 'Local LLM (Auto-detect)', provider: 'local' },
23
+ { id: 'gemma-3-12b', name: 'Gemma 3 12B (Local)', provider: 'local' },
24
+ { id: 'llama-3.1-8b', name: 'Llama 3.1 8B (Local)', provider: 'local' },
25
+ { id: 'mistral-7b', name: 'Mistral 7B (Local)', provider: 'local' },
26
+ { id: 'qwen-2-7b', name: 'Qwen 2 7B (Local)', provider: 'local' },
27
+ { id: 'phi-3-medium', name: 'Phi-3 Medium (Local)', provider: 'local' },
28
+ { id: 'custom-local', name: 'Custom Local Model', provider: 'local' }
29
+ ];
30
+
31
+ this.testResults = {
32
+ timestamp: new Date().toISOString(),
33
+ tests: []
34
+ };
35
+
36
+ this.testPassages = [
37
+ {
38
+ text: "The old man sat by the fireplace, reading his favorite book. The flames danced in the hearth, casting shadows on the walls. He turned each page carefully, savoring every word of the ancient tale.",
39
+ difficulty: 3,
40
+ expectedWords: ['favorite', 'flames', 'shadows', 'carefully', 'ancient']
41
+ },
42
+ {
43
+ text: "In the garden, colorful flowers bloomed under the warm sunshine. Bees buzzed from blossom to blossom, collecting nectar for their hive. The gardener watched with satisfaction as his hard work flourished.",
44
+ difficulty: 2,
45
+ expectedWords: ['colorful', 'warm', 'buzzed', 'collecting', 'satisfaction']
46
+ },
47
+ {
48
+ text: "The protagonist's journey through the labyrinthine corridors revealed the edifice's architectural complexity. Each ornate chamber contained mysterious artifacts that suggested an ancient civilization's sophisticated understanding of mathematics and astronomy.",
49
+ difficulty: 8,
50
+ expectedWords: ['labyrinthine', 'edifice', 'architectural', 'ornate', 'artifacts', 'civilization', 'sophisticated']
51
+ }
52
+ ];
53
+
54
+ this.chatQuestions = [
55
+ { type: 'part_of_speech', prompt: 'What part of speech is this word?' },
56
+ { type: 'sentence_role', prompt: 'What role does this word play in the sentence?' },
57
+ { type: 'word_category', prompt: 'What category or type of word is this?' },
58
+ { type: 'synonym', prompt: 'Can you suggest a synonym for this word?' }
59
+ ];
60
+ }
61
+
62
+ async runComprehensiveTest(selectedModels = null) {
63
+ const modelsToTest = selectedModels || this.models;
64
+ console.log(`Starting comprehensive test of ${modelsToTest.length} models...`);
65
+
66
+ for (const model of modelsToTest) {
67
+ console.log(`\nTesting model: ${model.name}`);
68
+ const modelResults = await this.testModel(model);
69
+ this.testResults.tests.push(modelResults);
70
+
71
+ // Save intermediate results
72
+ await this.saveResults();
73
+ }
74
+
75
+ console.log('\nAll tests completed!');
76
+ return this.testResults;
77
+ }
78
+
79
+ async testModel(model) {
80
+ const startTime = Date.now();
81
+ const results = {
82
+ modelId: model.id,
83
+ modelName: model.name,
84
+ provider: model.provider,
85
+ timestamp: new Date().toISOString(),
86
+ totalTime: 0,
87
+ wordSelection: {},
88
+ contextualization: {},
89
+ chatHints: {},
90
+ errorRates: {},
91
+ overallScore: 0
92
+ };
93
+
94
+ try {
95
+ // Test word selection across different difficulty levels
96
+ results.wordSelection = await this.testWordSelection(model);
97
+
98
+ // Test contextualization
99
+ results.contextualization = await this.testContextualization(model);
100
+
101
+ // Test chat hint generation
102
+ results.chatHints = await this.testChatHints(model);
103
+
104
+ // Calculate overall metrics
105
+ results.totalTime = Date.now() - startTime;
106
+ results.overallScore = this.calculateOverallScore(results);
107
+
108
+ } catch (error) {
109
+ console.error(`Error testing model ${model.name}:`, error);
110
+ results.error = error.message;
111
+ results.overallScore = 0;
112
+ }
113
+
114
+ return results;
115
+ }
116
+
117
+ async testWordSelection(model) {
118
+ const results = {
119
+ tests: [],
120
+ averageTime: 0,
121
+ successRate: 0,
122
+ qualityScore: 0,
123
+ difficultyAccuracy: 0
124
+ };
125
+
126
+ let totalTime = 0;
127
+ let successCount = 0;
128
+ let qualitySum = 0;
129
+ let difficultySum = 0;
130
+
131
+ for (const passage of this.testPassages) {
132
+ const testStart = Date.now();
133
+
134
+ try {
135
+ const words = await this.performWordSelection(model, passage);
136
+ const testTime = Date.now() - testStart;
137
+ totalTime += testTime;
138
+
139
+ const test = {
140
+ passageLength: passage.text.length,
141
+ targetDifficulty: passage.difficulty,
142
+ responseTime: testTime,
143
+ selectedWords: words,
144
+ wordCount: words.length,
145
+ success: words.length > 0,
146
+ qualityScore: this.evaluateWordQuality(words, passage),
147
+ difficultyScore: this.evaluateDifficultyMatch(words, passage.difficulty)
148
+ };
149
+
150
+ results.tests.push(test);
151
+
152
+ if (test.success) {
153
+ successCount++;
154
+ qualitySum += test.qualityScore;
155
+ difficultySum += test.difficultyScore;
156
+ }
157
+
158
+ } catch (error) {
159
+ results.tests.push({
160
+ passageLength: passage.text.length,
161
+ targetDifficulty: passage.difficulty,
162
+ responseTime: Date.now() - testStart,
163
+ error: error.message,
164
+ success: false
165
+ });
166
+ }
167
+
168
+ // Brief pause between tests
169
+ await new Promise(resolve => setTimeout(resolve, 1000));
170
+ }
171
+
172
+ results.averageTime = totalTime / this.testPassages.length;
173
+ results.successRate = successCount / this.testPassages.length;
174
+ results.qualityScore = successCount > 0 ? qualitySum / successCount : 0;
175
+ results.difficultyAccuracy = successCount > 0 ? difficultySum / successCount : 0;
176
+
177
+ return results;
178
+ }
179
+
180
+ async testContextualization(model) {
181
+ const results = {
182
+ tests: [],
183
+ averageTime: 0,
184
+ successRate: 0,
185
+ relevanceScore: 0
186
+ };
187
+
188
+ const testBooks = [
189
+ { title: 'Pride and Prejudice', author: 'Jane Austen' },
190
+ { title: 'The Adventures of Tom Sawyer', author: 'Mark Twain' },
191
+ { title: 'Moby Dick', author: 'Herman Melville' }
192
+ ];
193
+
194
+ let totalTime = 0;
195
+ let successCount = 0;
196
+ let relevanceSum = 0;
197
+
198
+ for (const book of testBooks) {
199
+ const testStart = Date.now();
200
+
201
+ try {
202
+ const context = await this.performContextualization(model, book);
203
+ const testTime = Date.now() - testStart;
204
+ totalTime += testTime;
205
+
206
+ const test = {
207
+ bookTitle: book.title,
208
+ author: book.author,
209
+ responseTime: testTime,
210
+ contextLength: context.length,
211
+ success: context.length > 0,
212
+ relevanceScore: this.evaluateContextRelevance(context, book)
213
+ };
214
+
215
+ results.tests.push(test);
216
+
217
+ if (test.success) {
218
+ successCount++;
219
+ relevanceSum += test.relevanceScore;
220
+ }
221
+
222
+ } catch (error) {
223
+ results.tests.push({
224
+ bookTitle: book.title,
225
+ author: book.author,
226
+ responseTime: Date.now() - testStart,
227
+ error: error.message,
228
+ success: false
229
+ });
230
+ }
231
+
232
+ await new Promise(resolve => setTimeout(resolve, 1000));
233
+ }
234
+
235
+ results.averageTime = totalTime / testBooks.length;
236
+ results.successRate = successCount / testBooks.length;
237
+ results.relevanceScore = successCount > 0 ? relevanceSum / successCount : 0;
238
+
239
+ return results;
240
+ }
241
+
242
+ async testChatHints(model) {
243
+ const results = {
244
+ tests: [],
245
+ averageTime: 0,
246
+ successRate: 0,
247
+ helpfulnessScore: 0,
248
+ questionTypePerformance: {}
249
+ };
250
+
251
+ const testWords = [
252
+ { word: 'magnificent', sentence: 'The cathedral was truly magnificent.', difficulty: 5 },
253
+ { word: 'whispered', sentence: 'She whispered the secret to her friend.', difficulty: 3 },
254
+ { word: 'extraordinary', sentence: 'His performance was extraordinary.', difficulty: 7 }
255
+ ];
256
+
257
+ let totalTime = 0;
258
+ let successCount = 0;
259
+ let helpfulnessSum = 0;
260
+
261
+ // Initialize question type tracking
262
+ this.chatQuestions.forEach(q => {
263
+ results.questionTypePerformance[q.type] = {
264
+ tests: 0,
265
+ successes: 0,
266
+ averageScore: 0
267
+ };
268
+ });
269
+
270
+ for (const testWord of testWords) {
271
+ for (const question of this.chatQuestions) {
272
+ const testStart = Date.now();
273
+
274
+ try {
275
+ const hint = await this.performChatHint(model, testWord, question);
276
+ const testTime = Date.now() - testStart;
277
+ totalTime += testTime;
278
+
279
+ const helpfulnessScore = this.evaluateHintHelpfulness(hint, testWord, question);
280
+
281
+ const test = {
282
+ word: testWord.word,
283
+ questionType: question.type,
284
+ difficulty: testWord.difficulty,
285
+ responseTime: testTime,
286
+ hintLength: hint.length,
287
+ success: hint.length > 10, // Minimum meaningful response
288
+ helpfulnessScore: helpfulnessScore
289
+ };
290
+
291
+ results.tests.push(test);
292
+
293
+ // Update question type performance
294
+ const qtPerf = results.questionTypePerformance[question.type];
295
+ qtPerf.tests++;
296
+
297
+ if (test.success) {
298
+ successCount++;
299
+ helpfulnessSum += helpfulnessScore;
300
+ qtPerf.successes++;
301
+ qtPerf.averageScore += helpfulnessScore;
302
+ }
303
+
304
+ } catch (error) {
305
+ results.tests.push({
306
+ word: testWord.word,
307
+ questionType: question.type,
308
+ difficulty: testWord.difficulty,
309
+ responseTime: Date.now() - testStart,
310
+ error: error.message,
311
+ success: false
312
+ });
313
+
314
+ results.questionTypePerformance[question.type].tests++;
315
+ }
316
+
317
+ await new Promise(resolve => setTimeout(resolve, 500));
318
+ }
319
+ }
320
+
321
+ // Calculate averages for question types
322
+ Object.keys(results.questionTypePerformance).forEach(type => {
323
+ const perf = results.questionTypePerformance[type];
324
+ perf.successRate = perf.tests > 0 ? perf.successes / perf.tests : 0;
325
+ perf.averageScore = perf.successes > 0 ? perf.averageScore / perf.successes : 0;
326
+ });
327
+
328
+ const totalTests = testWords.length * this.chatQuestions.length;
329
+ results.averageTime = totalTime / totalTests;
330
+ results.successRate = successCount / totalTests;
331
+ results.helpfulnessScore = successCount > 0 ? helpfulnessSum / successCount : 0;
332
+
333
+ return results;
334
+ }
335
+
336
+ async performWordSelection(model, passage) {
337
+ // Create a temporary AI service instance for this model
338
+ const aiService = await this.createModelAIService(model);
339
+
340
+ const prompt = `Select ${Math.min(3, Math.floor(passage.difficulty / 2) + 1)} appropriate words to remove from this passage for a cloze exercise at difficulty level ${passage.difficulty}:
341
+
342
+ "${passage.text}"
343
+
344
+ Return only a JSON array of words, like: ["word1", "word2", "word3"]`;
345
+
346
+ const response = await aiService.makeAIRequest(prompt);
347
+
348
+ try {
349
+ return JSON.parse(response);
350
+ } catch {
351
+ // Try to extract words from non-JSON response
352
+ const matches = response.match(/\[.*?\]/);
353
+ if (matches) {
354
+ return JSON.parse(matches[0]);
355
+ }
356
+ return [];
357
+ }
358
+ }
359
+
360
+ async performContextualization(model, book) {
361
+ const aiService = await this.createModelAIService(model);
362
+
363
+ const prompt = `Provide a brief historical and literary context for "${book.title}" by ${book.author}. Keep it concise and educational, suitable for language learners.`;
364
+
365
+ return await aiService.makeAIRequest(prompt);
366
+ }
367
+
368
+ async performChatHint(model, testWord, question) {
369
+ const aiService = await this.createModelAIService(model);
370
+
371
+ const prompt = `You are helping a student understand a word in context. The word is "${testWord.word}" in the sentence: "${testWord.sentence}"
372
+
373
+ ${question.prompt}
374
+
375
+ Provide a helpful hint without revealing the word directly. Keep your response concise and educational.`;
376
+
377
+ return await aiService.makeAIRequest(prompt);
378
+ }
379
+
380
+ async createModelAIService(model) {
381
+ // Use the testing AI service for better performance tracking
382
+ const { TestAIService } = await import('./testAIService.js');
383
+
384
+ const config = {
385
+ modelId: model.id,
386
+ provider: model.provider,
387
+ isLocal: model.provider === 'local'
388
+ };
389
+
390
+ return new TestAIService(config);
391
+ }
392
+
393
+ async detectLocalModels() {
394
+ // Attempt to detect available local models from LM Studio
395
+ try {
396
+ const response = await fetch('http://localhost:1234/v1/models');
397
+ if (response.ok) {
398
+ const data = await response.json();
399
+ const detectedModels = data.data.map(model => ({
400
+ id: model.id,
401
+ name: `${model.id} (Local)`,
402
+ provider: 'local'
403
+ }));
404
+
405
+ // Update the local models list
406
+ this.models = this.models.filter(m => m.provider !== 'local');
407
+ this.models.push(...detectedModels);
408
+
409
+ return detectedModels;
410
+ }
411
+ } catch (error) {
412
+ console.log('No local LM Studio server detected on port 1234');
413
+ }
414
+
415
+ // Return default local models if detection fails
416
+ return this.models.filter(m => m.provider === 'local');
417
+ }
418
+
419
+ async testLocalServerConnection() {
420
+ try {
421
+ const response = await fetch('http://localhost:1234/v1/models', {
422
+ method: 'GET',
423
+ headers: {
424
+ 'Content-Type': 'application/json'
425
+ }
426
+ });
427
+
428
+ if (response.ok) {
429
+ const data = await response.json();
430
+ return {
431
+ connected: true,
432
+ models: data.data || [],
433
+ serverInfo: data
434
+ };
435
+ } else {
436
+ return {
437
+ connected: false,
438
+ error: `HTTP ${response.status}: ${response.statusText}`
439
+ };
440
+ }
441
+ } catch (error) {
442
+ return {
443
+ connected: false,
444
+ error: error.message
445
+ };
446
+ }
447
+ }
448
+
449
+ evaluateWordQuality(words, passage) {
450
+ if (!words || words.length === 0) return 0;
451
+
452
+ let score = 0;
453
+ const text = passage.text.toLowerCase();
454
+
455
+ for (const word of words) {
456
+ const wordLower = word.toLowerCase();
457
+
458
+ // Check if word exists in passage
459
+ if (text.includes(wordLower)) score += 20;
460
+
461
+ // Check word length appropriateness
462
+ const expectedMinLength = Math.max(4, passage.difficulty);
463
+ const expectedMaxLength = Math.min(12, passage.difficulty + 6);
464
+
465
+ if (word.length >= expectedMinLength && word.length <= expectedMaxLength) {
466
+ score += 15;
467
+ }
468
+
469
+ // Avoid overly common words for higher difficulties
470
+ const commonWords = ['the', 'and', 'but', 'for', 'are', 'was', 'his', 'her'];
471
+ if (passage.difficulty > 5 && !commonWords.includes(wordLower)) {
472
+ score += 10;
473
+ }
474
+ }
475
+
476
+ return Math.min(100, score / words.length);
477
+ }
478
+
479
+ evaluateDifficultyMatch(words, targetDifficulty) {
480
+ if (!words || words.length === 0) return 0;
481
+
482
+ let score = 0;
483
+
484
+ for (const word of words) {
485
+ const wordLength = word.length;
486
+ const expectedMin = Math.max(4, targetDifficulty);
487
+ const expectedMax = Math.min(14, targetDifficulty + 6);
488
+
489
+ if (wordLength >= expectedMin && wordLength <= expectedMax) {
490
+ score += 100;
491
+ } else {
492
+ // Partial credit for close matches
493
+ const distance = Math.min(
494
+ Math.abs(wordLength - expectedMin),
495
+ Math.abs(wordLength - expectedMax)
496
+ );
497
+ score += Math.max(0, 100 - (distance * 20));
498
+ }
499
+ }
500
+
501
+ return score / words.length;
502
+ }
503
+
504
+ evaluateContextRelevance(context, book) {
505
+ if (!context || context.length < 20) return 0;
506
+
507
+ let score = 0;
508
+ const contextLower = context.toLowerCase();
509
+
510
+ // Check for book title mention
511
+ if (contextLower.includes(book.title.toLowerCase())) score += 25;
512
+
513
+ // Check for author mention
514
+ if (contextLower.includes(book.author.toLowerCase().split(' ').pop())) score += 25;
515
+
516
+ // Check for literary/historical terms
517
+ const literaryTerms = ['novel', 'literature', 'author', 'published', 'century', 'period', 'style', 'theme'];
518
+ const foundTerms = literaryTerms.filter(term => contextLower.includes(term));
519
+ score += Math.min(30, foundTerms.length * 5);
520
+
521
+ // Length appropriateness (100-500 chars is good)
522
+ if (context.length >= 100 && context.length <= 500) score += 20;
523
+
524
+ return Math.min(100, score);
525
+ }
526
+
527
+ evaluateHintHelpfulness(hint, testWord, question) {
528
+ if (!hint || hint.length < 10) return 0;
529
+
530
+ let score = 0;
531
+ const hintLower = hint.toLowerCase();
532
+ const wordLower = testWord.word.toLowerCase();
533
+
534
+ // Penalize if the word is revealed directly
535
+ if (hintLower.includes(wordLower)) {
536
+ score -= 50;
537
+ }
538
+
539
+ // Check for question-appropriate responses
540
+ switch (question.type) {
541
+ case 'part_of_speech':
542
+ const posTerms = ['noun', 'verb', 'adjective', 'adverb', 'pronoun'];
543
+ if (posTerms.some(term => hintLower.includes(term))) score += 40;
544
+ break;
545
+
546
+ case 'sentence_role':
547
+ const roleTerms = ['subject', 'object', 'predicate', 'modifier', 'describes'];
548
+ if (roleTerms.some(term => hintLower.includes(term))) score += 40;
549
+ break;
550
+
551
+ case 'word_category':
552
+ const categoryTerms = ['type', 'kind', 'category', 'group', 'family'];
553
+ if (categoryTerms.some(term => hintLower.includes(term))) score += 40;
554
+ break;
555
+
556
+ case 'synonym':
557
+ const synonymTerms = ['similar', 'means', 'like', 'same as', 'equivalent'];
558
+ if (synonymTerms.some(term => hintLower.includes(term))) score += 40;
559
+ break;
560
+ }
561
+
562
+ // Length appropriateness
563
+ if (hint.length >= 20 && hint.length <= 200) score += 30;
564
+
565
+ // Educational tone
566
+ const educationalTerms = ['this word', 'in this context', 'here', 'sentence'];
567
+ if (educationalTerms.some(term => hintLower.includes(term))) score += 20;
568
+
569
+ return Math.max(0, Math.min(100, score));
570
+ }
571
+
572
+ calculateOverallScore(results) {
573
+ const weights = {
574
+ wordSelection: 0.4,
575
+ contextualization: 0.3,
576
+ chatHints: 0.3
577
+ };
578
+
579
+ let totalScore = 0;
580
+
581
+ if (results.wordSelection.successRate !== undefined) {
582
+ totalScore += results.wordSelection.successRate * 40 * weights.wordSelection;
583
+ }
584
+
585
+ if (results.contextualization.successRate !== undefined) {
586
+ totalScore += results.contextualization.successRate * 50 * weights.contextualization;
587
+ }
588
+
589
+ if (results.chatHints.successRate !== undefined) {
590
+ totalScore += results.chatHints.successRate * 60 * weights.chatHints;
591
+ }
592
+
593
+ // Bonus for consistent performance across all areas
594
+ const allAreas = [results.wordSelection, results.contextualization, results.chatHints];
595
+ const minSuccess = Math.min(...allAreas.map(area => area.successRate || 0));
596
+ if (minSuccess > 0.8) totalScore += 10;
597
+
598
+ return Math.min(100, totalScore);
599
+ }
600
+
601
+ async saveResults() {
602
+ const csvContent = this.generateCSV();
603
+ const timestamp = new Date().toISOString().replace(/[:.]/g, '-');
604
+ const filename = `model_test_results_${timestamp}.csv`;
605
+
606
+ // Browser environment - download file
607
+ this.downloadCSV(csvContent, filename);
608
+
609
+ console.log(`Results saved as ${filename}`);
610
+ return filename;
611
+ }
612
+
613
+ downloadCSV(content, filename) {
614
+ const blob = new Blob([content], { type: 'text/csv' });
615
+ const url = URL.createObjectURL(blob);
616
+
617
+ const a = document.createElement('a');
618
+ a.href = url;
619
+ a.download = filename;
620
+ document.body.appendChild(a);
621
+ a.click();
622
+ document.body.removeChild(a);
623
+ URL.revokeObjectURL(url);
624
+ }
625
+
626
+ generateCSV() {
627
+ const headers = [
628
+ 'Model Name',
629
+ 'Model ID',
630
+ 'Provider',
631
+ 'Timestamp',
632
+ 'Total Time (ms)',
633
+ 'Overall Score',
634
+ 'Word Selection Success Rate',
635
+ 'Word Selection Avg Time (ms)',
636
+ 'Word Selection Quality Score',
637
+ 'Word Selection Difficulty Accuracy',
638
+ 'Contextualization Success Rate',
639
+ 'Contextualization Avg Time (ms)',
640
+ 'Contextualization Relevance Score',
641
+ 'Chat Hints Success Rate',
642
+ 'Chat Hints Avg Time (ms)',
643
+ 'Chat Hints Helpfulness Score',
644
+ 'Part of Speech Success Rate',
645
+ 'Sentence Role Success Rate',
646
+ 'Word Category Success Rate',
647
+ 'Synonym Success Rate',
648
+ 'User Satisfaction Score',
649
+ 'Word Selection User Rating',
650
+ 'Passage Quality User Rating',
651
+ 'Hint Helpfulness User Rating',
652
+ 'Overall Experience User Rating',
653
+ 'User Comments Count',
654
+ 'Error Message'
655
+ ];
656
+
657
+ const rows = [headers.join(',')];
658
+
659
+ for (const test of this.testResults.tests) {
660
+ // Get user ranking data if available
661
+ const userRankings = test.userRankings || {};
662
+ const userSatisfaction = userRankings.overallUserSatisfaction || 0;
663
+ const avgRatings = userRankings.averageRatings || {};
664
+ const commentsCount = userRankings.comments?.length || 0;
665
+
666
+ const row = [
667
+ `"${test.modelName}"`,
668
+ `"${test.modelId}"`,
669
+ `"${test.provider}"`,
670
+ `"${test.timestamp}"`,
671
+ test.totalTime || 0,
672
+ test.overallScore || 0,
673
+ test.wordSelection?.successRate || 0,
674
+ test.wordSelection?.averageTime || 0,
675
+ test.wordSelection?.qualityScore || 0,
676
+ test.wordSelection?.difficultyAccuracy || 0,
677
+ test.contextualization?.successRate || 0,
678
+ test.contextualization?.averageTime || 0,
679
+ test.contextualization?.relevanceScore || 0,
680
+ test.chatHints?.successRate || 0,
681
+ test.chatHints?.averageTime || 0,
682
+ test.chatHints?.helpfulnessScore || 0,
683
+ test.chatHints?.questionTypePerformance?.part_of_speech?.successRate || 0,
684
+ test.chatHints?.questionTypePerformance?.sentence_role?.successRate || 0,
685
+ test.chatHints?.questionTypePerformance?.word_category?.successRate || 0,
686
+ test.chatHints?.questionTypePerformance?.synonym?.successRate || 0,
687
+ userSatisfaction.toFixed(2),
688
+ avgRatings.word_selection?.toFixed(2) || 0,
689
+ avgRatings.passage_quality?.toFixed(2) || 0,
690
+ avgRatings.hint_helpfulness?.toFixed(2) || 0,
691
+ avgRatings.overall_experience?.toFixed(2) || 0,
692
+ commentsCount,
693
+ `"${test.error || ''}"`
694
+ ];
695
+
696
+ rows.push(row.join(','));
697
+ }
698
+
699
+ return rows.join('\n');
700
+ }
701
+ }
702
+
703
+ export { ModelTestingFramework };
src/testAIService.js ADDED
@@ -0,0 +1,154 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /**
2
+ * Testing-specific AI Service wrapper
3
+ * Extends the main AI service with testing capabilities
4
+ */
5
+
6
+ class TestAIService {
7
+ constructor(config) {
8
+ this.modelId = config.modelId;
9
+ this.provider = config.provider;
10
+ this.isLocal = config.isLocal || config.provider === 'local';
11
+ this.baseUrl = this.isLocal ? 'http://localhost:1234' : 'https://openrouter.ai/api/v1';
12
+ this.apiKey = this.isLocal ? 'test-key' : this.getApiKey();
13
+
14
+ // Performance tracking
15
+ this.requestCount = 0;
16
+ this.totalResponseTime = 0;
17
+ this.errorCount = 0;
18
+ this.lastError = null;
19
+ }
20
+
21
+ getApiKey() {
22
+ // Try to get API key from meta tag (injected by server)
23
+ const metaTag = document.querySelector('meta[name="openrouter-api-key"]');
24
+ if (metaTag) {
25
+ return metaTag.content;
26
+ }
27
+
28
+ // Fallback to environment variable (for Node.js testing)
29
+ if (typeof process !== 'undefined' && process.env) {
30
+ return process.env.OPENROUTER_API_KEY;
31
+ }
32
+
33
+ return null;
34
+ }
35
+
36
+ async makeAIRequest(prompt, options = {}) {
37
+ const startTime = Date.now();
38
+ this.requestCount++;
39
+
40
+ try {
41
+ const response = await this.performRequest(prompt, options);
42
+ this.totalResponseTime += Date.now() - startTime;
43
+ return response;
44
+ } catch (error) {
45
+ this.errorCount++;
46
+ this.lastError = error;
47
+ this.totalResponseTime += Date.now() - startTime;
48
+ throw error;
49
+ }
50
+ }
51
+
52
+ async performRequest(prompt, options = {}) {
53
+ const requestBody = {
54
+ model: this.modelId,
55
+ messages: [
56
+ {
57
+ role: "user",
58
+ content: prompt
59
+ }
60
+ ],
61
+ max_tokens: options.maxTokens || 500,
62
+ temperature: options.temperature || 0.7,
63
+ top_p: options.topP || 0.9
64
+ };
65
+
66
+ const headers = {
67
+ 'Content-Type': 'application/json',
68
+ 'Authorization': `Bearer ${this.apiKey}`
69
+ };
70
+
71
+ if (!this.isLocal) {
72
+ headers['HTTP-Referer'] = window.location.origin;
73
+ }
74
+
75
+ const controller = new AbortController();
76
+ const timeoutId = setTimeout(() => controller.abort(), 30000); // 30 second timeout
77
+
78
+ try {
79
+ const response = await fetch(`${this.baseUrl}/chat/completions`, {
80
+ method: 'POST',
81
+ headers: headers,
82
+ body: JSON.stringify(requestBody),
83
+ signal: controller.signal
84
+ });
85
+
86
+ clearTimeout(timeoutId);
87
+
88
+ if (!response.ok) {
89
+ throw new Error(`HTTP ${response.status}: ${response.statusText}`);
90
+ }
91
+
92
+ const data = await response.json();
93
+
94
+ if (!data.choices || data.choices.length === 0) {
95
+ throw new Error('No response from AI service');
96
+ }
97
+
98
+ let content = data.choices[0].message.content;
99
+
100
+ // Clean up local LLM response artifacts
101
+ if (this.isLocal) {
102
+ content = this.cleanLocalLLMResponse(content);
103
+ }
104
+
105
+ return content;
106
+ } catch (error) {
107
+ clearTimeout(timeoutId);
108
+ if (error.name === 'AbortError') {
109
+ throw new Error('Request timeout');
110
+ }
111
+ throw error;
112
+ }
113
+ }
114
+
115
+ cleanLocalLLMResponse(content) {
116
+ // Remove common local LLM artifacts
117
+ content = content.replace(/^\[.*?\]\s*/, ''); // Remove leading brackets
118
+ content = content.replace(/\s*\[.*?\]$/, ''); // Remove trailing brackets
119
+ content = content.replace(/^"(.*)"$/, '$1'); // Remove surrounding quotes
120
+ content = content.replace(/\\n/g, '\n'); // Fix escaped newlines
121
+ content = content.replace(/\\"/g, '"'); // Fix escaped quotes
122
+
123
+ return content.trim();
124
+ }
125
+
126
+ // Performance metrics
127
+ getAverageResponseTime() {
128
+ return this.requestCount > 0 ? this.totalResponseTime / this.requestCount : 0;
129
+ }
130
+
131
+ getErrorRate() {
132
+ return this.requestCount > 0 ? this.errorCount / this.requestCount : 0;
133
+ }
134
+
135
+ getPerformanceStats() {
136
+ return {
137
+ requestCount: this.requestCount,
138
+ totalResponseTime: this.totalResponseTime,
139
+ averageResponseTime: this.getAverageResponseTime(),
140
+ errorCount: this.errorCount,
141
+ errorRate: this.getErrorRate(),
142
+ lastError: this.lastError?.message || null
143
+ };
144
+ }
145
+
146
+ reset() {
147
+ this.requestCount = 0;
148
+ this.totalResponseTime = 0;
149
+ this.errorCount = 0;
150
+ this.lastError = null;
151
+ }
152
+ }
153
+
154
+ export { TestAIService };
src/testGameRunner.js ADDED
@@ -0,0 +1,473 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /**
2
+ * Test Game Runner - Monitors and logs performance during game testing
3
+ */
4
+
5
+ class TestGameRunner {
6
+ constructor(modelConfig) {
7
+ this.modelConfig = modelConfig;
8
+ this.sessionData = {
9
+ modelId: modelConfig.modelId,
10
+ modelName: modelConfig.modelName,
11
+ provider: modelConfig.provider,
12
+ startTime: Date.now(),
13
+ rounds: [],
14
+ interactions: [],
15
+ userRankings: [],
16
+ performance: {
17
+ wordSelectionRequests: 0,
18
+ wordSelectionSuccess: 0,
19
+ wordSelectionTime: 0,
20
+ contextualizationRequests: 0,
21
+ contextualizationSuccess: 0,
22
+ contextualizationTime: 0,
23
+ chatHintRequests: 0,
24
+ chatHintSuccess: 0,
25
+ chatHintTime: 0,
26
+ errors: []
27
+ }
28
+ };
29
+
30
+ this.originalAIService = null;
31
+ this.setupInterception();
32
+ }
33
+
34
+ setupInterception() {
35
+ // Intercept AI service calls to track performance
36
+ if (window.aiService) {
37
+ this.originalAIService = window.aiService;
38
+ this.wrapAIService();
39
+ }
40
+
41
+ // Monitor for game events
42
+ this.setupGameEventListeners();
43
+ }
44
+
45
+ wrapAIService() {
46
+ const testRunner = this;
47
+
48
+ // Wrap the makeAIRequest method
49
+ const originalMakeAIRequest = this.originalAIService.makeAIRequest.bind(this.originalAIService);
50
+
51
+ window.aiService.makeAIRequest = async function(prompt, options = {}) {
52
+ const startTime = Date.now();
53
+ const requestType = testRunner.classifyRequest(prompt);
54
+
55
+ testRunner.logInteraction({
56
+ type: 'ai_request_start',
57
+ requestType: requestType,
58
+ prompt: prompt.substring(0, 200) + '...',
59
+ timestamp: Date.now()
60
+ });
61
+
62
+ try {
63
+ const result = await originalMakeAIRequest(prompt, options);
64
+ const responseTime = Date.now() - startTime;
65
+
66
+ testRunner.updatePerformanceMetrics(requestType, true, responseTime);
67
+ testRunner.logInteraction({
68
+ type: 'ai_request_success',
69
+ requestType: requestType,
70
+ responseTime: responseTime,
71
+ responseLength: result.length,
72
+ timestamp: Date.now()
73
+ });
74
+
75
+ return result;
76
+ } catch (error) {
77
+ const responseTime = Date.now() - startTime;
78
+
79
+ testRunner.updatePerformanceMetrics(requestType, false, responseTime);
80
+ testRunner.logInteraction({
81
+ type: 'ai_request_error',
82
+ requestType: requestType,
83
+ error: error.message,
84
+ responseTime: responseTime,
85
+ timestamp: Date.now()
86
+ });
87
+
88
+ testRunner.sessionData.performance.errors.push({
89
+ type: requestType,
90
+ error: error.message,
91
+ timestamp: Date.now()
92
+ });
93
+
94
+ throw error;
95
+ }
96
+ };
97
+ }
98
+
99
+ classifyRequest(prompt) {
100
+ const promptLower = prompt.toLowerCase();
101
+
102
+ if (promptLower.includes('select') && promptLower.includes('word')) {
103
+ return 'word_selection';
104
+ } else if (promptLower.includes('context') || promptLower.includes('background')) {
105
+ return 'contextualization';
106
+ } else if (promptLower.includes('hint') || promptLower.includes('help') || promptLower.includes('clue')) {
107
+ return 'chat_hint';
108
+ } else {
109
+ return 'other';
110
+ }
111
+ }
112
+
113
+ updatePerformanceMetrics(requestType, success, responseTime) {
114
+ const perf = this.sessionData.performance;
115
+
116
+ switch (requestType) {
117
+ case 'word_selection':
118
+ perf.wordSelectionRequests++;
119
+ if (success) {
120
+ perf.wordSelectionSuccess++;
121
+ perf.wordSelectionTime += responseTime;
122
+ }
123
+ break;
124
+
125
+ case 'contextualization':
126
+ perf.contextualizationRequests++;
127
+ if (success) {
128
+ perf.contextualizationSuccess++;
129
+ perf.contextualizationTime += responseTime;
130
+ }
131
+ break;
132
+
133
+ case 'chat_hint':
134
+ perf.chatHintRequests++;
135
+ if (success) {
136
+ perf.chatHintSuccess++;
137
+ perf.chatHintTime += responseTime;
138
+ }
139
+ break;
140
+ }
141
+ }
142
+
143
+ setupGameEventListeners() {
144
+ // Listen for game-specific events
145
+ document.addEventListener('gameRoundStart', (event) => {
146
+ this.logInteraction({
147
+ type: 'round_start',
148
+ level: event.detail.level,
149
+ round: event.detail.round,
150
+ timestamp: Date.now()
151
+ });
152
+ });
153
+
154
+ document.addEventListener('gameRoundComplete', (event) => {
155
+ const roundData = {
156
+ level: event.detail.level,
157
+ round: event.detail.round,
158
+ score: event.detail.score,
159
+ correctAnswers: event.detail.correctAnswers,
160
+ totalBlanks: event.detail.totalBlanks,
161
+ timeSpent: event.detail.timeSpent,
162
+ timestamp: Date.now()
163
+ };
164
+
165
+ this.sessionData.rounds.push(roundData);
166
+
167
+ // Store the current round index for user ranking association
168
+ this.currentRoundIndex = this.sessionData.rounds.length - 1;
169
+
170
+ this.logInteraction({
171
+ type: 'round_complete',
172
+ level: event.detail.level,
173
+ round: event.detail.round,
174
+ score: event.detail.score,
175
+ timestamp: Date.now()
176
+ });
177
+ });
178
+
179
+ document.addEventListener('userAnswer', (event) => {
180
+ this.logInteraction({
181
+ type: 'user_answer',
182
+ word: event.detail.targetWord,
183
+ userAnswer: event.detail.userAnswer,
184
+ correct: event.detail.correct,
185
+ timestamp: Date.now()
186
+ });
187
+ });
188
+
189
+ document.addEventListener('chatInteraction', (event) => {
190
+ this.logInteraction({
191
+ type: 'chat_interaction',
192
+ questionType: event.detail.questionType,
193
+ word: event.detail.word,
194
+ timestamp: Date.now()
195
+ });
196
+ });
197
+
198
+ // Listen for user ranking events
199
+ document.addEventListener('userRanking', (event) => {
200
+ const rankingData = {
201
+ ...event.detail,
202
+ roundIndex: this.currentRoundIndex,
203
+ roundDetails: this.sessionData.rounds[this.currentRoundIndex]
204
+ };
205
+
206
+ this.sessionData.userRankings.push(rankingData);
207
+
208
+ this.logInteraction({
209
+ type: 'user_ranking',
210
+ averageRating: event.detail.averageRating,
211
+ ratings: event.detail.ratings,
212
+ timestamp: Date.now()
213
+ });
214
+ });
215
+ }
216
+
217
+ logInteraction(interaction) {
218
+ this.sessionData.interactions.push(interaction);
219
+
220
+ // Log to console for real-time monitoring
221
+ console.log(`[TestRunner] ${interaction.type}:`, interaction);
222
+ }
223
+
224
+ generateReport() {
225
+ const endTime = Date.now();
226
+ const totalTime = endTime - this.sessionData.startTime;
227
+ const perf = this.sessionData.performance;
228
+
229
+ // Calculate user ranking summary
230
+ const userRankingSummary = this.calculateUserRankingSummary();
231
+
232
+ const report = {
233
+ ...this.sessionData,
234
+ endTime: endTime,
235
+ totalSessionTime: totalTime,
236
+ summary: {
237
+ totalRounds: this.sessionData.rounds.length,
238
+ averageScore: this.sessionData.rounds.length > 0
239
+ ? this.sessionData.rounds.reduce((sum, round) => sum + round.score, 0) / this.sessionData.rounds.length
240
+ : 0,
241
+ wordSelectionSuccessRate: perf.wordSelectionRequests > 0
242
+ ? perf.wordSelectionSuccess / perf.wordSelectionRequests
243
+ : 0,
244
+ wordSelectionAvgTime: perf.wordSelectionSuccess > 0
245
+ ? perf.wordSelectionTime / perf.wordSelectionSuccess
246
+ : 0,
247
+ contextualizationSuccessRate: perf.contextualizationRequests > 0
248
+ ? perf.contextualizationSuccess / perf.contextualizationRequests
249
+ : 0,
250
+ contextualizationAvgTime: perf.contextualizationSuccess > 0
251
+ ? perf.contextualizationTime / perf.contextualizationSuccess
252
+ : 0,
253
+ chatHintSuccessRate: perf.chatHintRequests > 0
254
+ ? perf.chatHintSuccess / perf.chatHintRequests
255
+ : 0,
256
+ chatHintAvgTime: perf.chatHintSuccess > 0
257
+ ? perf.chatHintTime / perf.chatHintSuccess
258
+ : 0,
259
+ totalErrors: perf.errors.length,
260
+ userRankingSummary: userRankingSummary
261
+ }
262
+ };
263
+
264
+ return report;
265
+ }
266
+
267
+ calculateUserRankingSummary() {
268
+ if (this.sessionData.userRankings.length === 0) {
269
+ return null;
270
+ }
271
+
272
+ const categories = ['word_selection', 'passage_quality', 'hint_helpfulness', 'overall_experience'];
273
+ const summary = {
274
+ totalRankings: this.sessionData.userRankings.length,
275
+ averageRatings: {},
276
+ categoryBreakdown: {},
277
+ comments: [],
278
+ overallUserSatisfaction: 0
279
+ };
280
+
281
+ // Calculate average ratings per category
282
+ categories.forEach(category => {
283
+ const ratings = this.sessionData.userRankings
284
+ .map(r => r.ratings[category])
285
+ .filter(r => r !== undefined);
286
+
287
+ if (ratings.length > 0) {
288
+ summary.averageRatings[category] =
289
+ ratings.reduce((a, b) => a + b, 0) / ratings.length;
290
+
291
+ // Distribution of ratings
292
+ summary.categoryBreakdown[category] = {
293
+ 1: ratings.filter(r => r === 1).length,
294
+ 2: ratings.filter(r => r === 2).length,
295
+ 3: ratings.filter(r => r === 3).length,
296
+ 4: ratings.filter(r => r === 4).length,
297
+ 5: ratings.filter(r => r === 5).length
298
+ };
299
+ }
300
+ });
301
+
302
+ // Calculate overall satisfaction
303
+ const allRatings = this.sessionData.userRankings
304
+ .map(r => r.averageRating)
305
+ .filter(r => r !== undefined);
306
+
307
+ if (allRatings.length > 0) {
308
+ summary.overallUserSatisfaction =
309
+ allRatings.reduce((a, b) => a + b, 0) / allRatings.length;
310
+ }
311
+
312
+ // Collect comments with context
313
+ summary.comments = this.sessionData.userRankings
314
+ .filter(r => r.comments)
315
+ .map(r => ({
316
+ timestamp: r.timestamp,
317
+ comment: r.comments,
318
+ averageRating: r.averageRating,
319
+ roundLevel: r.roundDetails?.level,
320
+ roundScore: r.roundDetails?.score
321
+ }));
322
+
323
+ return summary;
324
+ }
325
+
326
+ async saveReport() {
327
+ const report = this.generateReport();
328
+ const timestamp = new Date().toISOString().replace(/[:.]/g, '-');
329
+ const filename = `game_test_${this.modelConfig.modelId.replace(/[\/\\:]/g, '_')}_${timestamp}.json`;
330
+
331
+ try {
332
+ // Try to save via browser download
333
+ this.downloadReport(report, filename);
334
+
335
+ // Also try to save to output folder if possible (server-side)
336
+ await this.saveToServer(report, filename);
337
+
338
+ console.log(`Test report saved: ${filename}`);
339
+ return filename;
340
+ } catch (error) {
341
+ console.error('Error saving test report:', error);
342
+ return null;
343
+ }
344
+ }
345
+
346
+ downloadReport(report, filename) {
347
+ const jsonString = JSON.stringify(report, null, 2);
348
+ const blob = new Blob([jsonString], { type: 'application/json' });
349
+ const url = URL.createObjectURL(blob);
350
+
351
+ const a = document.createElement('a');
352
+ a.href = url;
353
+ a.download = filename;
354
+ document.body.appendChild(a);
355
+ a.click();
356
+ document.body.removeChild(a);
357
+ URL.revokeObjectURL(url);
358
+ }
359
+
360
+ async saveToServer(report, filename) {
361
+ try {
362
+ const response = await fetch('/api/save-test-report', {
363
+ method: 'POST',
364
+ headers: {
365
+ 'Content-Type': 'application/json'
366
+ },
367
+ body: JSON.stringify({
368
+ filename: filename,
369
+ data: report
370
+ })
371
+ });
372
+
373
+ if (!response.ok) {
374
+ throw new Error(`Server save failed: ${response.status}`);
375
+ }
376
+ } catch (error) {
377
+ console.log('Server save not available, using browser download only');
378
+ }
379
+ }
380
+
381
+ // Utility methods for analysis
382
+ getWordSelectionAnalytics() {
383
+ const wordSelectionInteractions = this.sessionData.interactions.filter(
384
+ i => i.type === 'ai_request_success' && i.requestType === 'word_selection'
385
+ );
386
+
387
+ return {
388
+ count: wordSelectionInteractions.length,
389
+ averageResponseTime: wordSelectionInteractions.length > 0
390
+ ? wordSelectionInteractions.reduce((sum, i) => sum + i.responseTime, 0) / wordSelectionInteractions.length
391
+ : 0,
392
+ averageResponseLength: wordSelectionInteractions.length > 0
393
+ ? wordSelectionInteractions.reduce((sum, i) => sum + i.responseLength, 0) / wordSelectionInteractions.length
394
+ : 0
395
+ };
396
+ }
397
+
398
+ getChatHintAnalytics() {
399
+ const chatHintInteractions = this.sessionData.interactions.filter(
400
+ i => i.type === 'chat_interaction'
401
+ );
402
+
403
+ const questionTypes = {};
404
+ chatHintInteractions.forEach(interaction => {
405
+ const type = interaction.questionType || 'unknown';
406
+ questionTypes[type] = (questionTypes[type] || 0) + 1;
407
+ });
408
+
409
+ return {
410
+ totalHints: chatHintInteractions.length,
411
+ questionTypeBreakdown: questionTypes
412
+ };
413
+ }
414
+
415
+ getUserPerformanceAnalytics() {
416
+ const answerInteractions = this.sessionData.interactions.filter(
417
+ i => i.type === 'user_answer'
418
+ );
419
+
420
+ const correctAnswers = answerInteractions.filter(i => i.correct).length;
421
+
422
+ return {
423
+ totalAnswers: answerInteractions.length,
424
+ correctAnswers: correctAnswers,
425
+ accuracy: answerInteractions.length > 0 ? correctAnswers / answerInteractions.length : 0
426
+ };
427
+ }
428
+ }
429
+
430
+ // Initialize test runner if in test mode
431
+ window.addEventListener('DOMContentLoaded', () => {
432
+ const urlParams = new URLSearchParams(window.location.search);
433
+ if (urlParams.get('testMode') === 'true') {
434
+ const modelId = urlParams.get('testModel');
435
+ const isLocal = urlParams.get('local') === 'true';
436
+
437
+ if (modelId) {
438
+ window.testGameRunner = new TestGameRunner({
439
+ modelId: modelId,
440
+ modelName: modelId,
441
+ provider: isLocal ? 'local' : 'openrouter'
442
+ });
443
+
444
+ console.log('Test Game Runner initialized for model:', modelId);
445
+
446
+ // Add end session button
447
+ const endButton = document.createElement('button');
448
+ endButton.textContent = 'End Test Session';
449
+ endButton.style.cssText = `
450
+ position: fixed;
451
+ top: 10px;
452
+ right: 10px;
453
+ z-index: 1000;
454
+ padding: 10px 15px;
455
+ background: #dc3545;
456
+ color: white;
457
+ border: none;
458
+ border-radius: 5px;
459
+ cursor: pointer;
460
+ `;
461
+
462
+ endButton.addEventListener('click', async () => {
463
+ const filename = await window.testGameRunner.saveReport();
464
+ alert(`Test session ended. Report saved as: ${filename}`);
465
+ window.close();
466
+ });
467
+
468
+ document.body.appendChild(endButton);
469
+ }
470
+ }
471
+ });
472
+
473
+ export { TestGameRunner };
src/testReportGenerator.js ADDED
@@ -0,0 +1,453 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /**
2
+ * Comprehensive Test Report Generator
3
+ * Analyzes test results and generates detailed reports
4
+ */
5
+
6
+ class TestReportGenerator {
7
+ constructor() {
8
+ this.reportTemplates = {
9
+ summary: this.generateSummaryReport.bind(this),
10
+ detailed: this.generateDetailedReport.bind(this),
11
+ comparison: this.generateComparisonReport.bind(this),
12
+ performance: this.generatePerformanceReport.bind(this),
13
+ markdown: this.generateMarkdownReport.bind(this)
14
+ };
15
+ }
16
+
17
+ async generateAllReports(testResults, outputFormat = 'all') {
18
+ const reports = {};
19
+
20
+ if (outputFormat === 'all' || outputFormat === 'summary') {
21
+ reports.summary = this.generateSummaryReport(testResults);
22
+ }
23
+
24
+ if (outputFormat === 'all' || outputFormat === 'detailed') {
25
+ reports.detailed = this.generateDetailedReport(testResults);
26
+ }
27
+
28
+ if (outputFormat === 'all' || outputFormat === 'comparison') {
29
+ reports.comparison = this.generateComparisonReport(testResults);
30
+ }
31
+
32
+ if (outputFormat === 'all' || outputFormat === 'performance') {
33
+ reports.performance = this.generatePerformanceReport(testResults);
34
+ }
35
+
36
+ if (outputFormat === 'all' || outputFormat === 'markdown') {
37
+ reports.markdown = this.generateMarkdownReport(testResults);
38
+ }
39
+
40
+ return reports;
41
+ }
42
+
43
+ generateSummaryReport(testResults) {
44
+ const summary = {
45
+ testOverview: {
46
+ timestamp: testResults.timestamp,
47
+ totalModels: testResults.tests.length,
48
+ testDuration: this.calculateTotalTestDuration(testResults.tests),
49
+ successfulTests: testResults.tests.filter(t => !t.error).length
50
+ },
51
+ topPerformers: this.getTopPerformers(testResults.tests),
52
+ categoryAverages: this.calculateCategoryAverages(testResults.tests),
53
+ recommendations: this.generateRecommendations(testResults.tests)
54
+ };
55
+
56
+ return summary;
57
+ }
58
+
59
+ generateDetailedReport(testResults) {
60
+ const detailed = {
61
+ testMetadata: {
62
+ timestamp: testResults.timestamp,
63
+ totalModels: testResults.tests.length,
64
+ testFrameworkVersion: '1.0.0'
65
+ },
66
+ modelResults: testResults.tests.map(test => ({
67
+ modelInfo: {
68
+ id: test.modelId,
69
+ name: test.modelName,
70
+ provider: test.provider
71
+ },
72
+ overallPerformance: {
73
+ score: test.overallScore,
74
+ totalTime: test.totalTime,
75
+ rank: this.calculateRank(test, testResults.tests)
76
+ },
77
+ wordSelection: this.analyzeWordSelection(test.wordSelection),
78
+ contextualization: this.analyzeContextualization(test.contextualization),
79
+ chatHints: this.analyzeChatHints(test.chatHints),
80
+ errorAnalysis: this.analyzeErrors(test)
81
+ }))
82
+ };
83
+
84
+ return detailed;
85
+ }
86
+
87
+ generateComparisonReport(testResults) {
88
+ const validTests = testResults.tests.filter(t => !t.error);
89
+
90
+ const comparison = {
91
+ modelComparison: this.createModelComparisonMatrix(validTests),
92
+ providerAnalysis: this.analyzeByProvider(validTests),
93
+ performanceMetrics: {
94
+ wordSelection: this.compareWordSelectionMetrics(validTests),
95
+ contextualization: this.compareContextualizationMetrics(validTests),
96
+ chatHints: this.compareChatHintMetrics(validTests),
97
+ responseTime: this.compareResponseTimes(validTests)
98
+ },
99
+ recommendations: {
100
+ bestOverall: this.getBestOverallModel(validTests),
101
+ bestForWordSelection: this.getBestForTask(validTests, 'wordSelection'),
102
+ bestForContextualization: this.getBestForTask(validTests, 'contextualization'),
103
+ bestForChatHints: this.getBestForTask(validTests, 'chatHints'),
104
+ fastestResponse: this.getFastestModel(validTests),
105
+ mostReliable: this.getMostReliableModel(validTests)
106
+ }
107
+ };
108
+
109
+ return comparison;
110
+ }
111
+
112
+ generatePerformanceReport(testResults) {
113
+ const performance = {
114
+ responseTimeAnalysis: this.analyzeResponseTimes(testResults.tests),
115
+ successRateAnalysis: this.analyzeSuccessRates(testResults.tests),
116
+ qualityMetrics: this.analyzeQualityMetrics(testResults.tests),
117
+ scalabilityInsights: this.analyzeScalability(testResults.tests),
118
+ reliabilityMetrics: this.analyzeReliability(testResults.tests)
119
+ };
120
+
121
+ return performance;
122
+ }
123
+
124
+ generateMarkdownReport(testResults) {
125
+ const summary = this.generateSummaryReport(testResults);
126
+ const comparison = this.generateComparisonReport(testResults);
127
+
128
+ let markdown = `# Cloze Reader Model Testing Report\n\n`;
129
+ markdown += `**Generated:** ${new Date().toLocaleString()}\n`;
130
+ markdown += `**Test Timestamp:** ${testResults.timestamp}\n`;
131
+ markdown += `**Models Tested:** ${testResults.tests.length}\n\n`;
132
+
133
+ // Executive Summary
134
+ markdown += `## Executive Summary\n\n`;
135
+ markdown += `- **Successful Tests:** ${summary.testOverview.successfulTests}/${summary.testOverview.totalModels}\n`;
136
+ markdown += `- **Best Overall Model:** ${comparison.recommendations.bestOverall.name} (${comparison.recommendations.bestOverall.score.toFixed(1)}/100)\n`;
137
+ markdown += `- **Average Response Time:** ${this.formatTime(this.calculateAverageResponseTime(testResults.tests))}\n\n`;
138
+
139
+ // Top Performers
140
+ markdown += `## Top Performers\n\n`;
141
+ markdown += `| Rank | Model | Score | Provider |\n`;
142
+ markdown += `|------|-------|-------|----------|\n`;
143
+ summary.topPerformers.forEach((model, index) => {
144
+ markdown += `| ${index + 1} | ${model.name} | ${model.score.toFixed(1)} | ${model.provider} |\n`;
145
+ });
146
+ markdown += `\n`;
147
+
148
+ // Performance by Category
149
+ markdown += `## Performance by Category\n\n`;
150
+ markdown += `### Word Selection\n`;
151
+ markdown += `- **Best:** ${comparison.recommendations.bestForWordSelection.name} (${(comparison.recommendations.bestForWordSelection.successRate * 100).toFixed(1)}% success rate)\n`;
152
+ markdown += `- **Average Success Rate:** ${(summary.categoryAverages.wordSelection.successRate * 100).toFixed(1)}%\n`;
153
+ markdown += `- **Average Response Time:** ${this.formatTime(summary.categoryAverages.wordSelection.averageTime)}\n\n`;
154
+
155
+ markdown += `### Contextualization\n`;
156
+ markdown += `- **Best:** ${comparison.recommendations.bestForContextualization.name} (${(comparison.recommendations.bestForContextualization.successRate * 100).toFixed(1)}% success rate)\n`;
157
+ markdown += `- **Average Success Rate:** ${(summary.categoryAverages.contextualization.successRate * 100).toFixed(1)}%\n`;
158
+ markdown += `- **Average Response Time:** ${this.formatTime(summary.categoryAverages.contextualization.averageTime)}\n\n`;
159
+
160
+ markdown += `### Chat Hints\n`;
161
+ markdown += `- **Best:** ${comparison.recommendations.bestForChatHints.name} (${(comparison.recommendations.bestForChatHints.successRate * 100).toFixed(1)}% success rate)\n`;
162
+ markdown += `- **Average Success Rate:** ${(summary.categoryAverages.chatHints.successRate * 100).toFixed(1)}%\n`;
163
+ markdown += `- **Average Response Time:** ${this.formatTime(summary.categoryAverages.chatHints.averageTime)}\n\n`;
164
+
165
+ // Add user rankings section if available
166
+ const hasUserRankings = testResults.tests.some(t => t.userRankings?.totalRankings > 0);
167
+ if (hasUserRankings) {
168
+ markdown += `## User Satisfaction Ratings\n\n`;
169
+ markdown += `| Model | Overall Satisfaction | Word Selection | Passage Quality | Hint Helpfulness | Overall Experience |\n`;
170
+ markdown += `|-------|---------------------|----------------|-----------------|------------------|--------------------|\n`;
171
+
172
+ testResults.tests.forEach(test => {
173
+ if (test.userRankings?.totalRankings > 0) {
174
+ const ur = test.userRankings;
175
+ const avg = ur.averageRatings || {};
176
+ markdown += `| ${test.modelName} | ${ur.overallUserSatisfaction.toFixed(1)}/5 | ${(avg.word_selection || 0).toFixed(1)} | ${(avg.passage_quality || 0).toFixed(1)} | ${(avg.hint_helpfulness || 0).toFixed(1)} | ${(avg.overall_experience || 0).toFixed(1)} |\n`;
177
+ }
178
+ });
179
+ markdown += `\n`;
180
+
181
+ // Add user comments if any
182
+ const allComments = testResults.tests
183
+ .filter(t => t.userRankings?.comments?.length > 0)
184
+ .flatMap(t => t.userRankings.comments.map(c => ({ ...c, model: t.modelName })));
185
+
186
+ if (allComments.length > 0) {
187
+ markdown += `### User Comments\n\n`;
188
+ allComments.forEach(comment => {
189
+ markdown += `- **${comment.model}** (Rating: ${comment.averageRating.toFixed(1)}): "${comment.comment}"\n`;
190
+ });
191
+ markdown += `\n`;
192
+ }
193
+ }
194
+
195
+ // Detailed Results
196
+ markdown += `## Detailed Results\n\n`;
197
+ testResults.tests.forEach(test => {
198
+ if (!test.error) {
199
+ markdown += `### ${test.modelName}\n`;
200
+ markdown += `- **Provider:** ${test.provider}\n`;
201
+ markdown += `- **Overall Score:** ${test.overallScore.toFixed(1)}/100\n`;
202
+ markdown += `- **Total Time:** ${this.formatTime(test.totalTime)}\n`;
203
+ markdown += `- **Word Selection:** ${(test.wordSelection?.successRate * 100 || 0).toFixed(1)}% success\n`;
204
+ markdown += `- **Contextualization:** ${(test.contextualization?.successRate * 100 || 0).toFixed(1)}% success\n`;
205
+ markdown += `- **Chat Hints:** ${(test.chatHints?.successRate * 100 || 0).toFixed(1)}% success\n\n`;
206
+ }
207
+ });
208
+
209
+ // Recommendations
210
+ markdown += `## Recommendations\n\n`;
211
+ summary.recommendations.forEach(rec => {
212
+ markdown += `- ${rec}\n`;
213
+ });
214
+
215
+ return markdown;
216
+ }
217
+
218
+ // Helper methods for analysis
219
+ calculateTotalTestDuration(tests) {
220
+ return tests.reduce((total, test) => total + (test.totalTime || 0), 0);
221
+ }
222
+
223
+ getTopPerformers(tests, limit = 5) {
224
+ return tests
225
+ .filter(t => !t.error && t.overallScore)
226
+ .sort((a, b) => b.overallScore - a.overallScore)
227
+ .slice(0, limit)
228
+ .map(test => ({
229
+ name: test.modelName,
230
+ score: test.overallScore,
231
+ provider: test.provider
232
+ }));
233
+ }
234
+
235
+ calculateCategoryAverages(tests) {
236
+ const validTests = tests.filter(t => !t.error);
237
+
238
+ return {
239
+ wordSelection: this.calculateCategoryAverage(validTests, 'wordSelection'),
240
+ contextualization: this.calculateCategoryAverage(validTests, 'contextualization'),
241
+ chatHints: this.calculateCategoryAverage(validTests, 'chatHints')
242
+ };
243
+ }
244
+
245
+ calculateCategoryAverage(tests, category) {
246
+ const validCategoryTests = tests.filter(t => t[category]);
247
+
248
+ if (validCategoryTests.length === 0) {
249
+ return { successRate: 0, averageTime: 0, qualityScore: 0 };
250
+ }
251
+
252
+ return {
253
+ successRate: validCategoryTests.reduce((sum, t) => sum + (t[category].successRate || 0), 0) / validCategoryTests.length,
254
+ averageTime: validCategoryTests.reduce((sum, t) => sum + (t[category].averageTime || 0), 0) / validCategoryTests.length,
255
+ qualityScore: validCategoryTests.reduce((sum, t) => sum + (t[category].qualityScore || t[category].relevanceScore || t[category].helpfulnessScore || 0), 0) / validCategoryTests.length
256
+ };
257
+ }
258
+
259
+ generateRecommendations(tests) {
260
+ const recommendations = [];
261
+ const validTests = tests.filter(t => !t.error);
262
+
263
+ if (validTests.length === 0) {
264
+ return ['No successful tests to generate recommendations.'];
265
+ }
266
+
267
+ const bestOverall = validTests.reduce((best, test) =>
268
+ test.overallScore > best.overallScore ? test : best
269
+ );
270
+
271
+ recommendations.push(`For overall best performance, use ${bestOverall.modelName} (${bestOverall.provider})`);
272
+
273
+ // Provider-specific recommendations
274
+ const providerPerformance = this.analyzeByProvider(validTests);
275
+ const bestProvider = Object.keys(providerPerformance)
276
+ .reduce((best, provider) =>
277
+ providerPerformance[provider].averageScore > providerPerformance[best]?.averageScore ? provider : best
278
+ );
279
+
280
+ recommendations.push(`${bestProvider} models show the best average performance`);
281
+
282
+ // Speed vs quality trade-offs
283
+ const fastestGoodModel = validTests
284
+ .filter(t => t.overallScore > 70)
285
+ .sort((a, b) => a.totalTime - b.totalTime)[0];
286
+
287
+ if (fastestGoodModel) {
288
+ recommendations.push(`For fastest good performance, consider ${fastestGoodModel.modelName}`);
289
+ }
290
+
291
+ return recommendations;
292
+ }
293
+
294
+ analyzeByProvider(tests) {
295
+ const providerGroups = {};
296
+
297
+ tests.forEach(test => {
298
+ if (!providerGroups[test.provider]) {
299
+ providerGroups[test.provider] = [];
300
+ }
301
+ providerGroups[test.provider].push(test);
302
+ });
303
+
304
+ const analysis = {};
305
+ Object.keys(providerGroups).forEach(provider => {
306
+ const providerTests = providerGroups[provider];
307
+ analysis[provider] = {
308
+ count: providerTests.length,
309
+ averageScore: providerTests.reduce((sum, t) => sum + t.overallScore, 0) / providerTests.length,
310
+ averageTime: providerTests.reduce((sum, t) => sum + t.totalTime, 0) / providerTests.length,
311
+ successRate: providerTests.filter(t => !t.error).length / providerTests.length
312
+ };
313
+ });
314
+
315
+ return analysis;
316
+ }
317
+
318
+ getBestOverallModel(tests) {
319
+ return tests.reduce((best, test) =>
320
+ test.overallScore > best.overallScore ? {
321
+ name: test.modelName,
322
+ score: test.overallScore,
323
+ provider: test.provider
324
+ } : best
325
+ , { name: '', score: 0, provider: '' });
326
+ }
327
+
328
+ getBestForTask(tests, taskName) {
329
+ const validTests = tests.filter(t => t[taskName] && t[taskName].successRate !== undefined);
330
+
331
+ if (validTests.length === 0) {
332
+ return { name: 'N/A', successRate: 0, provider: '' };
333
+ }
334
+
335
+ return validTests.reduce((best, test) =>
336
+ test[taskName].successRate > best.successRate ? {
337
+ name: test.modelName,
338
+ successRate: test[taskName].successRate,
339
+ provider: test.provider
340
+ } : best
341
+ , { name: '', successRate: 0, provider: '' });
342
+ }
343
+
344
+ getFastestModel(tests) {
345
+ return tests.reduce((fastest, test) =>
346
+ test.totalTime < fastest.time ? {
347
+ name: test.modelName,
348
+ time: test.totalTime,
349
+ provider: test.provider
350
+ } : fastest
351
+ , { name: '', time: Infinity, provider: '' });
352
+ }
353
+
354
+ getMostReliableModel(tests) {
355
+ // Model with fewest errors and highest success rates across all tasks
356
+ const reliability = tests.map(test => {
357
+ const wordSelectionReliability = test.wordSelection?.successRate || 0;
358
+ const contextualizationReliability = test.contextualization?.successRate || 0;
359
+ const chatHintReliability = test.chatHints?.successRate || 0;
360
+
361
+ const overallReliability = (wordSelectionReliability + contextualizationReliability + chatHintReliability) / 3;
362
+
363
+ return {
364
+ name: test.modelName,
365
+ reliability: overallReliability,
366
+ provider: test.provider
367
+ };
368
+ });
369
+
370
+ return reliability.reduce((most, test) =>
371
+ test.reliability > most.reliability ? test : most
372
+ , { name: '', reliability: 0, provider: '' });
373
+ }
374
+
375
+ calculateAverageResponseTime(tests) {
376
+ const validTests = tests.filter(t => t.totalTime);
377
+ return validTests.reduce((sum, t) => sum + t.totalTime, 0) / validTests.length;
378
+ }
379
+
380
+ formatTime(milliseconds) {
381
+ if (milliseconds < 1000) {
382
+ return `${milliseconds.toFixed(0)}ms`;
383
+ } else if (milliseconds < 60000) {
384
+ return `${(milliseconds / 1000).toFixed(1)}s`;
385
+ } else {
386
+ return `${(milliseconds / 60000).toFixed(1)}m`;
387
+ }
388
+ }
389
+
390
+ async saveReports(reports, baseFilename) {
391
+ const savedFiles = [];
392
+
393
+ for (const [type, content] of Object.entries(reports)) {
394
+ const filename = `${baseFilename}_${type}`;
395
+ let fileContent, extension;
396
+
397
+ if (type === 'markdown') {
398
+ fileContent = content;
399
+ extension = '.md';
400
+ } else {
401
+ fileContent = JSON.stringify(content, null, 2);
402
+ extension = '.json';
403
+ }
404
+
405
+ try {
406
+ await this.saveFile(`${filename}${extension}`, fileContent);
407
+ savedFiles.push(`${filename}${extension}`);
408
+ } catch (error) {
409
+ console.error(`Error saving ${filename}:`, error);
410
+ }
411
+ }
412
+
413
+ return savedFiles;
414
+ }
415
+
416
+ async saveFile(filename, content) {
417
+ // Try to save via browser download
418
+ const blob = new Blob([content], {
419
+ type: filename.endsWith('.md') ? 'text/markdown' : 'application/json'
420
+ });
421
+ const url = URL.createObjectURL(blob);
422
+
423
+ const a = document.createElement('a');
424
+ a.href = url;
425
+ a.download = filename;
426
+ document.body.appendChild(a);
427
+ a.click();
428
+ document.body.removeChild(a);
429
+ URL.revokeObjectURL(url);
430
+ }
431
+
432
+ // Stub methods for detailed analysis (implement as needed)
433
+ analyzeWordSelection(data) { return data; }
434
+ analyzeContextualization(data) { return data; }
435
+ analyzeChatHints(data) { return data; }
436
+ analyzeErrors(test) { return test.error ? [test.error] : []; }
437
+ calculateRank(test, allTests) {
438
+ const sorted = allTests.filter(t => !t.error).sort((a, b) => b.overallScore - a.overallScore);
439
+ return sorted.findIndex(t => t.modelId === test.modelId) + 1;
440
+ }
441
+ createModelComparisonMatrix(tests) { return {}; }
442
+ compareWordSelectionMetrics(tests) { return {}; }
443
+ compareContextualizationMetrics(tests) { return {}; }
444
+ compareChatHintMetrics(tests) { return {}; }
445
+ compareResponseTimes(tests) { return {}; }
446
+ analyzeResponseTimes(tests) { return {}; }
447
+ analyzeSuccessRates(tests) { return {}; }
448
+ analyzeQualityMetrics(tests) { return {}; }
449
+ analyzeScalability(tests) { return {}; }
450
+ analyzeReliability(tests) { return {}; }
451
+ }
452
+
453
+ export { TestReportGenerator };
src/userRankingInterface.js ADDED
@@ -0,0 +1,650 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /**
2
+ * User Ranking Interface for Model Testing
3
+ * Allows users to rate model performance on each task during gameplay
4
+ */
5
+
6
+ class UserRankingInterface {
7
+ constructor() {
8
+ this.rankings = {
9
+ rounds: [],
10
+ currentRound: null
11
+ };
12
+
13
+ this.rankingCategories = [
14
+ {
15
+ id: 'word_selection',
16
+ name: 'Word Selection Quality',
17
+ description: 'How appropriate were the selected words for this difficulty level?',
18
+ criteria: [
19
+ 'Words match the difficulty level',
20
+ 'Vocabulary is challenging but fair',
21
+ 'Selected words are meaningful in context'
22
+ ]
23
+ },
24
+ {
25
+ id: 'passage_quality',
26
+ name: 'Passage Selection',
27
+ description: 'How suitable was this passage for language learning?',
28
+ criteria: [
29
+ 'Text is engaging and appropriate',
30
+ 'Content is educational',
31
+ 'Difficulty matches the level'
32
+ ]
33
+ },
34
+ {
35
+ id: 'hint_helpfulness',
36
+ name: 'Hint Quality',
37
+ description: 'How helpful were the AI-generated hints?',
38
+ criteria: [
39
+ 'Hints guide without revealing answers',
40
+ 'Explanations are clear and educational',
41
+ 'Responses are contextually appropriate'
42
+ ]
43
+ },
44
+ {
45
+ id: 'overall_experience',
46
+ name: 'Overall Round Experience',
47
+ description: 'How was the overall quality of this round?',
48
+ criteria: [
49
+ 'Smooth gameplay experience',
50
+ 'AI responses were timely',
51
+ 'Educational value was high'
52
+ ]
53
+ }
54
+ ];
55
+
56
+ this.createRankingUI();
57
+ this.setupEventListeners();
58
+ }
59
+
60
+ createRankingUI() {
61
+ // Create ranking modal
62
+ const modal = document.createElement('div');
63
+ modal.id = 'ranking-modal';
64
+ modal.className = 'ranking-modal';
65
+ modal.innerHTML = `
66
+ <div class="ranking-modal-content">
67
+ <h2>Rate This Round</h2>
68
+ <p class="ranking-subtitle">Help us improve by rating the AI's performance</p>
69
+
70
+ <div id="ranking-categories" class="ranking-categories">
71
+ <!-- Categories will be populated dynamically -->
72
+ </div>
73
+
74
+ <div class="ranking-comments">
75
+ <label for="ranking-comments-input">Additional Comments (Optional):</label>
76
+ <textarea id="ranking-comments-input" rows="3" placeholder="Any specific feedback about this round..."></textarea>
77
+ </div>
78
+
79
+ <div class="ranking-actions">
80
+ <button id="skip-ranking-btn" class="btn-secondary">Skip</button>
81
+ <button id="submit-ranking-btn" class="btn-primary" disabled>Submit Rating</button>
82
+ </div>
83
+ </div>
84
+ `;
85
+
86
+ // Create ranking trigger button
87
+ const triggerButton = document.createElement('button');
88
+ triggerButton.id = 'ranking-trigger-btn';
89
+ triggerButton.className = 'ranking-trigger-btn';
90
+ triggerButton.innerHTML = '⭐ Rate Round';
91
+ triggerButton.style.cssText = `
92
+ position: fixed;
93
+ bottom: 20px;
94
+ left: 20px;
95
+ z-index: 999;
96
+ padding: 10px 20px;
97
+ background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
98
+ color: white;
99
+ border: none;
100
+ border-radius: 25px;
101
+ cursor: pointer;
102
+ font-size: 14px;
103
+ font-weight: bold;
104
+ box-shadow: 0 4px 15px rgba(102, 126, 234, 0.4);
105
+ transition: all 0.3s ease;
106
+ display: none;
107
+ `;
108
+
109
+ // Add styles
110
+ const styles = document.createElement('style');
111
+ styles.textContent = `
112
+ .ranking-modal {
113
+ display: none;
114
+ position: fixed;
115
+ top: 0;
116
+ left: 0;
117
+ width: 100%;
118
+ height: 100%;
119
+ background: rgba(0, 0, 0, 0.5);
120
+ z-index: 1000;
121
+ backdrop-filter: blur(5px);
122
+ }
123
+
124
+ .ranking-modal.active {
125
+ display: flex;
126
+ align-items: center;
127
+ justify-content: center;
128
+ }
129
+
130
+ .ranking-modal-content {
131
+ background: white;
132
+ border-radius: 15px;
133
+ padding: 30px;
134
+ max-width: 600px;
135
+ width: 90%;
136
+ max-height: 80vh;
137
+ overflow-y: auto;
138
+ box-shadow: 0 10px 40px rgba(0, 0, 0, 0.3);
139
+ }
140
+
141
+ .ranking-modal-content h2 {
142
+ color: #2c3e50;
143
+ margin-bottom: 10px;
144
+ text-align: center;
145
+ }
146
+
147
+ .ranking-subtitle {
148
+ color: #7f8c8d;
149
+ text-align: center;
150
+ margin-bottom: 30px;
151
+ }
152
+
153
+ .ranking-category {
154
+ margin-bottom: 25px;
155
+ padding: 20px;
156
+ background: #f8f9fa;
157
+ border-radius: 10px;
158
+ border: 2px solid #e9ecef;
159
+ }
160
+
161
+ .ranking-category h3 {
162
+ color: #2c3e50;
163
+ margin-bottom: 8px;
164
+ font-size: 1.1rem;
165
+ }
166
+
167
+ .ranking-category-description {
168
+ color: #6c757d;
169
+ font-size: 0.9rem;
170
+ margin-bottom: 15px;
171
+ }
172
+
173
+ .ranking-criteria {
174
+ font-size: 0.85rem;
175
+ color: #6c757d;
176
+ margin-bottom: 15px;
177
+ padding-left: 20px;
178
+ }
179
+
180
+ .ranking-criteria li {
181
+ margin-bottom: 5px;
182
+ }
183
+
184
+ .ranking-stars {
185
+ display: flex;
186
+ gap: 10px;
187
+ justify-content: center;
188
+ margin-top: 10px;
189
+ }
190
+
191
+ .ranking-star {
192
+ font-size: 30px;
193
+ color: #ddd;
194
+ cursor: pointer;
195
+ transition: all 0.2s ease;
196
+ }
197
+
198
+ .ranking-star:hover,
199
+ .ranking-star.hover {
200
+ color: #ffd700;
201
+ transform: scale(1.1);
202
+ }
203
+
204
+ .ranking-star.selected {
205
+ color: #ffd700;
206
+ }
207
+
208
+ .ranking-comments {
209
+ margin: 20px 0;
210
+ }
211
+
212
+ .ranking-comments label {
213
+ display: block;
214
+ color: #2c3e50;
215
+ margin-bottom: 8px;
216
+ font-weight: 500;
217
+ }
218
+
219
+ .ranking-comments textarea {
220
+ width: 100%;
221
+ padding: 10px;
222
+ border: 2px solid #e9ecef;
223
+ border-radius: 8px;
224
+ font-family: inherit;
225
+ resize: vertical;
226
+ }
227
+
228
+ .ranking-actions {
229
+ display: flex;
230
+ gap: 15px;
231
+ justify-content: flex-end;
232
+ margin-top: 20px;
233
+ }
234
+
235
+ .btn-primary, .btn-secondary {
236
+ padding: 10px 24px;
237
+ border: none;
238
+ border-radius: 8px;
239
+ font-size: 1rem;
240
+ cursor: pointer;
241
+ transition: all 0.3s ease;
242
+ font-weight: 500;
243
+ }
244
+
245
+ .btn-primary {
246
+ background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
247
+ color: white;
248
+ }
249
+
250
+ .btn-primary:hover:not(:disabled) {
251
+ transform: translateY(-2px);
252
+ box-shadow: 0 6px 20px rgba(102, 126, 234, 0.4);
253
+ }
254
+
255
+ .btn-primary:disabled {
256
+ background: #6c757d;
257
+ cursor: not-allowed;
258
+ }
259
+
260
+ .btn-secondary {
261
+ background: #e9ecef;
262
+ color: #495057;
263
+ }
264
+
265
+ .btn-secondary:hover {
266
+ background: #dee2e6;
267
+ }
268
+
269
+ .ranking-trigger-btn:hover {
270
+ transform: translateY(-2px) scale(1.05);
271
+ box-shadow: 0 6px 20px rgba(102, 126, 234, 0.6);
272
+ }
273
+
274
+ @media (max-width: 600px) {
275
+ .ranking-modal-content {
276
+ padding: 20px;
277
+ }
278
+
279
+ .ranking-star {
280
+ font-size: 24px;
281
+ }
282
+
283
+ .ranking-trigger-btn {
284
+ bottom: 70px;
285
+ padding: 8px 16px;
286
+ font-size: 12px;
287
+ }
288
+ }
289
+ `;
290
+
291
+ document.head.appendChild(styles);
292
+ document.body.appendChild(modal);
293
+ document.body.appendChild(triggerButton);
294
+
295
+ this.populateCategories();
296
+ }
297
+
298
+ populateCategories() {
299
+ const container = document.getElementById('ranking-categories');
300
+ container.innerHTML = '';
301
+
302
+ this.rankingCategories.forEach(category => {
303
+ const categoryDiv = document.createElement('div');
304
+ categoryDiv.className = 'ranking-category';
305
+ categoryDiv.dataset.categoryId = category.id;
306
+
307
+ const criteriaHtml = category.criteria.map(c => `<li>${c}</li>`).join('');
308
+
309
+ categoryDiv.innerHTML = `
310
+ <h3>${category.name}</h3>
311
+ <p class="ranking-category-description">${category.description}</p>
312
+ <ul class="ranking-criteria">${criteriaHtml}</ul>
313
+ <div class="ranking-stars" data-category="${category.id}">
314
+ ${[1, 2, 3, 4, 5].map(i =>
315
+ `<span class="ranking-star" data-rating="${i}">★</span>`
316
+ ).join('')}
317
+ </div>
318
+ `;
319
+
320
+ container.appendChild(categoryDiv);
321
+ });
322
+
323
+ // Setup star interactions
324
+ this.setupStarInteractions();
325
+ }
326
+
327
+ setupStarInteractions() {
328
+ const starContainers = document.querySelectorAll('.ranking-stars');
329
+
330
+ starContainers.forEach(container => {
331
+ const stars = container.querySelectorAll('.ranking-star');
332
+ const categoryId = container.dataset.category;
333
+
334
+ stars.forEach((star, index) => {
335
+ star.addEventListener('mouseenter', () => {
336
+ this.highlightStars(stars, index + 1);
337
+ });
338
+
339
+ star.addEventListener('click', () => {
340
+ this.selectRating(categoryId, index + 1);
341
+ this.markStarsAsSelected(stars, index + 1);
342
+ this.updateSubmitButton();
343
+ });
344
+ });
345
+
346
+ container.addEventListener('mouseleave', () => {
347
+ const currentRating = this.getCurrentRating(categoryId);
348
+ if (currentRating > 0) {
349
+ this.markStarsAsSelected(stars, currentRating);
350
+ } else {
351
+ this.highlightStars(stars, 0);
352
+ }
353
+ });
354
+ });
355
+ }
356
+
357
+ highlightStars(stars, count) {
358
+ stars.forEach((star, index) => {
359
+ if (index < count) {
360
+ star.classList.add('hover');
361
+ } else {
362
+ star.classList.remove('hover');
363
+ }
364
+ });
365
+ }
366
+
367
+ markStarsAsSelected(stars, count) {
368
+ stars.forEach((star, index) => {
369
+ if (index < count) {
370
+ star.classList.add('selected');
371
+ star.classList.remove('hover');
372
+ } else {
373
+ star.classList.remove('selected');
374
+ star.classList.remove('hover');
375
+ }
376
+ });
377
+ }
378
+
379
+ selectRating(categoryId, rating) {
380
+ if (!this.currentRound) {
381
+ this.currentRound = {
382
+ timestamp: Date.now(),
383
+ ratings: {},
384
+ comments: ''
385
+ };
386
+ }
387
+
388
+ this.currentRound.ratings[categoryId] = rating;
389
+ }
390
+
391
+ getCurrentRating(categoryId) {
392
+ return this.currentRound?.ratings[categoryId] || 0;
393
+ }
394
+
395
+ setupEventListeners() {
396
+ const modal = document.getElementById('ranking-modal');
397
+ const triggerBtn = document.getElementById('ranking-trigger-btn');
398
+ const skipBtn = document.getElementById('skip-ranking-btn');
399
+ const submitBtn = document.getElementById('submit-ranking-btn');
400
+ const commentsInput = document.getElementById('ranking-comments-input');
401
+
402
+ // Show modal
403
+ triggerBtn.addEventListener('click', () => {
404
+ this.showRankingModal();
405
+ });
406
+
407
+ // Skip ranking
408
+ skipBtn.addEventListener('click', () => {
409
+ this.hideRankingModal();
410
+ this.currentRound = null;
411
+ });
412
+
413
+ // Submit ranking
414
+ submitBtn.addEventListener('click', () => {
415
+ this.submitRanking();
416
+ });
417
+
418
+ // Update comments
419
+ commentsInput.addEventListener('input', (e) => {
420
+ if (this.currentRound) {
421
+ this.currentRound.comments = e.target.value;
422
+ }
423
+ });
424
+
425
+ // Close modal on background click
426
+ modal.addEventListener('click', (e) => {
427
+ if (e.target === modal) {
428
+ this.hideRankingModal();
429
+ }
430
+ });
431
+
432
+ // Listen for round completion events
433
+ document.addEventListener('gameRoundComplete', (event) => {
434
+ this.onRoundComplete(event.detail);
435
+ });
436
+ }
437
+
438
+ updateSubmitButton() {
439
+ const submitBtn = document.getElementById('submit-ranking-btn');
440
+ const allRated = this.rankingCategories.every(category =>
441
+ this.getCurrentRating(category.id) > 0
442
+ );
443
+
444
+ submitBtn.disabled = !allRated;
445
+ }
446
+
447
+ showRankingModal() {
448
+ const modal = document.getElementById('ranking-modal');
449
+ modal.classList.add('active');
450
+
451
+ // Reset current round if needed
452
+ if (!this.currentRound) {
453
+ this.currentRound = {
454
+ timestamp: Date.now(),
455
+ ratings: {},
456
+ comments: ''
457
+ };
458
+ }
459
+
460
+ // Clear previous selections
461
+ this.resetUI();
462
+ }
463
+
464
+ hideRankingModal() {
465
+ const modal = document.getElementById('ranking-modal');
466
+ modal.classList.remove('active');
467
+ }
468
+
469
+ resetUI() {
470
+ // Clear all star selections
471
+ document.querySelectorAll('.ranking-star').forEach(star => {
472
+ star.classList.remove('selected', 'hover');
473
+ });
474
+
475
+ // Clear comments
476
+ document.getElementById('ranking-comments-input').value = '';
477
+
478
+ // Disable submit button
479
+ document.getElementById('submit-ranking-btn').disabled = true;
480
+ }
481
+
482
+ submitRanking() {
483
+ if (!this.currentRound) return;
484
+
485
+ // Add metadata
486
+ this.currentRound.submittedAt = Date.now();
487
+ this.currentRound.modelId = window.testGameRunner?.modelConfig?.modelId || 'unknown';
488
+
489
+ // Calculate average rating
490
+ const ratings = Object.values(this.currentRound.ratings);
491
+ this.currentRound.averageRating = ratings.reduce((a, b) => a + b, 0) / ratings.length;
492
+
493
+ // Save ranking
494
+ this.rankings.rounds.push(this.currentRound);
495
+
496
+ // Dispatch event for test runner
497
+ document.dispatchEvent(new CustomEvent('userRanking', {
498
+ detail: this.currentRound
499
+ }));
500
+
501
+ // Show confirmation
502
+ this.showConfirmation();
503
+
504
+ // Reset
505
+ this.hideRankingModal();
506
+ this.currentRound = null;
507
+
508
+ console.log('Ranking submitted:', this.rankings);
509
+ }
510
+
511
+ showConfirmation() {
512
+ const confirmation = document.createElement('div');
513
+ confirmation.style.cssText = `
514
+ position: fixed;
515
+ bottom: 100px;
516
+ left: 50%;
517
+ transform: translateX(-50%);
518
+ background: #28a745;
519
+ color: white;
520
+ padding: 15px 30px;
521
+ border-radius: 8px;
522
+ box-shadow: 0 4px 15px rgba(40, 167, 69, 0.4);
523
+ z-index: 1001;
524
+ animation: slideInUp 0.3s ease;
525
+ `;
526
+ confirmation.textContent = '✓ Thank you for your feedback!';
527
+
528
+ document.body.appendChild(confirmation);
529
+
530
+ setTimeout(() => {
531
+ confirmation.style.animation = 'slideOutDown 0.3s ease';
532
+ setTimeout(() => confirmation.remove(), 300);
533
+ }, 2000);
534
+ }
535
+
536
+ onRoundComplete(roundDetails) {
537
+ // Store round details for context
538
+ if (!this.currentRound) {
539
+ this.currentRound = {
540
+ timestamp: Date.now(),
541
+ ratings: {},
542
+ comments: '',
543
+ roundDetails: roundDetails
544
+ };
545
+ } else {
546
+ this.currentRound.roundDetails = roundDetails;
547
+ }
548
+
549
+ // Show ranking trigger button
550
+ const triggerBtn = document.getElementById('ranking-trigger-btn');
551
+ triggerBtn.style.display = 'block';
552
+
553
+ // Auto-show modal after a short delay (optional)
554
+ if (window.testGameRunner?.modelConfig?.autoShowRanking) {
555
+ setTimeout(() => this.showRankingModal(), 1500);
556
+ }
557
+ }
558
+
559
+ exportRankings() {
560
+ const exportData = {
561
+ ...this.rankings,
562
+ exportedAt: new Date().toISOString(),
563
+ modelId: window.testGameRunner?.modelConfig?.modelId || 'unknown'
564
+ };
565
+
566
+ return exportData;
567
+ }
568
+
569
+ getRankingSummary() {
570
+ if (this.rankings.rounds.length === 0) {
571
+ return null;
572
+ }
573
+
574
+ const summary = {
575
+ totalRounds: this.rankings.rounds.length,
576
+ averageRatings: {},
577
+ categoryBreakdown: {},
578
+ comments: []
579
+ };
580
+
581
+ // Calculate average ratings per category
582
+ this.rankingCategories.forEach(category => {
583
+ const ratings = this.rankings.rounds
584
+ .map(r => r.ratings[category.id])
585
+ .filter(r => r !== undefined);
586
+
587
+ if (ratings.length > 0) {
588
+ summary.averageRatings[category.id] =
589
+ ratings.reduce((a, b) => a + b, 0) / ratings.length;
590
+
591
+ // Distribution of ratings
592
+ summary.categoryBreakdown[category.id] = {
593
+ 1: ratings.filter(r => r === 1).length,
594
+ 2: ratings.filter(r => r === 2).length,
595
+ 3: ratings.filter(r => r === 3).length,
596
+ 4: ratings.filter(r => r === 4).length,
597
+ 5: ratings.filter(r => r === 5).length
598
+ };
599
+ }
600
+ });
601
+
602
+ // Collect all comments
603
+ summary.comments = this.rankings.rounds
604
+ .filter(r => r.comments)
605
+ .map(r => ({
606
+ timestamp: r.timestamp,
607
+ comment: r.comments,
608
+ averageRating: r.averageRating
609
+ }));
610
+
611
+ return summary;
612
+ }
613
+ }
614
+
615
+ // Initialize when in test mode
616
+ window.addEventListener('DOMContentLoaded', () => {
617
+ const urlParams = new URLSearchParams(window.location.search);
618
+ if (urlParams.get('testMode') === 'true') {
619
+ window.userRankingInterface = new UserRankingInterface();
620
+
621
+ // Add CSS animation keyframes
622
+ const animationStyles = document.createElement('style');
623
+ animationStyles.textContent = `
624
+ @keyframes slideInUp {
625
+ from {
626
+ transform: translate(-50%, 100%);
627
+ opacity: 0;
628
+ }
629
+ to {
630
+ transform: translate(-50%, 0);
631
+ opacity: 1;
632
+ }
633
+ }
634
+
635
+ @keyframes slideOutDown {
636
+ from {
637
+ transform: translate(-50%, 0);
638
+ opacity: 1;
639
+ }
640
+ to {
641
+ transform: translate(-50%, 100%);
642
+ opacity: 0;
643
+ }
644
+ }
645
+ `;
646
+ document.head.appendChild(animationStyles);
647
+ }
648
+ });
649
+
650
+ export { UserRankingInterface };
src/welcomeOverlay.js CHANGED
@@ -43,7 +43,7 @@ class WelcomeOverlay {
43
 
44
  <div class="welcome-content">
45
  <p>
46
- <strong>How to play:</strong> Read the passage, fill in the blanks, and use hints or chat help (💬) if needed. Progress through levels as you improve.
47
  </p>
48
 
49
  <p>
@@ -51,7 +51,7 @@ class WelcomeOverlay {
51
  </p>
52
 
53
  <p style="margin-bottom: 0;">
54
- <strong>AI assistance:</strong> Powered by Google's Gemma 3 model via OpenRouter for intelligent word selection and contextual hints.
55
  </p>
56
  </div>
57
 
 
43
 
44
  <div class="welcome-content">
45
  <p>
46
+ <strong>How to play:</strong> Fill in the blanks in each passage. Complete 2 passages per round. Pass 2 rounds to advance to the next level.
47
  </p>
48
 
49
  <p>
 
51
  </p>
52
 
53
  <p style="margin-bottom: 0;">
54
+ <strong>AI assistance:</strong> Powered by Google's Gemma models via OpenRouter - Gemma-3-27b for contextual hints and Gemma-3-12b for word selection and processing.
55
  </p>
56
  </div>
57
 
test-prompts-lm-studio.md DELETED
@@ -1,262 +0,0 @@
1
- # Gemma-3-27b Model Integration Guide for Cloze Reader
2
-
3
- ## Part 1: Step-by-Step API Request Processing
4
-
5
- ### 1. Initial Request Reception
6
- When the Cloze Reader application makes an API request through OpenRouter:
7
-
8
- 1. **Authentication**: Verify Bearer token from `Authorization` header
9
- 2. **Request Type Detection**: Identify the operation type based on prompt content
10
- 3. **Parameter Extraction**: Parse temperature, max_tokens, and message content
11
- 4. **Rate Limiting Check**: Ensure request complies with free tier limits
12
-
13
- ### 2. Word Selection Request Processing
14
-
15
- **When Temperature = 0.3 and prompt contains "CLOZE DELETION PRINCIPLES":**
16
-
17
- 1. **Parse Passage**: Extract the text passage from the system message
18
- 2. **Identify Difficulty Level**:
19
- - Level 1-2: Target 4-7 letter words (easy vocabulary)
20
- - Level 3-4: Target 4-10 letter words (medium difficulty)
21
- - Level 5+: Target 5-14 letter words (challenging vocabulary)
22
- 3. **Select Words**:
23
- - Identify significant vocabulary words (nouns, verbs, adjectives, adverbs)
24
- - Avoid proper nouns, numbers, articles, and function words
25
- - Ensure words are contextually important for comprehension
26
- 4. **Format Response**: Return JSON array of selected words
27
- 5. **Validate**: Ensure all words exist in the original passage
28
-
29
- ### 3. Batch Processing Request
30
-
31
- **When Temperature = 0.5 and prompt contains two passages:**
32
-
33
- 1. **Parse Both Passages**: Extract passage1 and passage2 from the prompt
34
- 2. **Process Each Passage**:
35
- - Apply word selection logic for each based on difficulty level
36
- - Generate one-sentence contextualization for each book
37
- 3. **Format Response**: Return structured JSON with both passages' data
38
- 4. **Ensure Consistency**: Words must match exactly as they appear in passages
39
-
40
- ### 4. Contextualization Request
41
-
42
- **When Temperature = 0.2 and prompt asks for book context:**
43
-
44
- 1. **Extract Book Information**: Parse title and author from prompt
45
- 2. **Generate Context**: Create one factual sentence about:
46
- - Type of work (novel, short story, essay)
47
- - Historical period when written
48
- - Literary significance or genre
49
- 3. **Keep Concise**: Limit to 80 tokens maximum
50
- 4. **Avoid Speculation**: Only include verifiable information
51
-
52
- ### 5. Chat Hint Request
53
-
54
- **When Temperature = 0.6 and prompt includes "word puzzles":**
55
-
56
- 1. **Identify Question Type**:
57
- - `part_of_speech`: Grammar category identification
58
- - `sentence_role`: Function in the sentence
59
- - `word_category`: Abstract/concrete classification
60
- - `synonym`: Alternative word suggestion
61
- 2. **Parse Target Word**: Extract the hidden word (NEVER reveal it)
62
- 3. **Generate Appropriate Hint**:
63
- - Follow exact format requested
64
- - Stay within 50 token limit
65
- - Use plain text only, no formatting
66
- 4. **Validate**: Ensure hint doesn't contain or spell out the target word
67
-
68
- ### 6. Response Formatting Rules
69
-
70
- 1. **JSON Responses**:
71
- - Word selection: Clean array format `["word1", "word2"]`
72
- - Batch processing: Nested object structure
73
- - No markdown code blocks unless specifically requested
74
-
75
- 2. **Text Responses**:
76
- - Contextualization: Single sentence, no formatting
77
- - Chat hints: Plain text, follows exact format requested
78
-
79
- 3. **Error Handling**:
80
- - Invalid requests: Return graceful error messages
81
- - Missing parameters: Use sensible defaults
82
- - Malformed input: Attempt to parse intent
83
-
84
- ## Part 2: LM Studio Testing Configuration
85
-
86
- ### System Prompt
87
- ```
88
- You are a specialized AI assistant for the Cloze Reader educational application. You help create vocabulary exercises by selecting appropriate words from text passages and providing contextual hints without revealing answers. Always respond in the exact format requested, using plain JSON or text as specified. Never use markdown formatting unless explicitly requested.
89
- ```
90
-
91
- ### Temperature Settings
92
- - **Word Selection**: 0.3
93
- - **Batch Processing**: 0.5
94
- - **Contextualization**: 0.2
95
- - **Chat Hints**: 0.6
96
-
97
- ### Response Length Limits
98
- - **Word Selection**: 100 tokens
99
- - **Batch Processing**: 800 tokens
100
- - **Contextualization**: 80 tokens
101
- - **Chat Hints**: 50 tokens
102
-
103
- ### Test Prompts
104
-
105
- #### 1. Word Selection Test (Level 1-2 Easy)
106
- ```json
107
- {
108
- "messages": [
109
- {
110
- "role": "system",
111
- "content": "CLOZE DELETION PRINCIPLES:\n- Select words that require understanding context and vocabulary to identify\n- Choose words essential for comprehension that test language ability\n- Target words where deletion creates meaningful cognitive gaps\n\nFrom the following passage, select exactly 1 word that is important for reading comprehension.\n\nDifficulty level 1-2: Focus on easier vocabulary (4-7 letters) like common nouns, simple verbs, and basic adjectives.\n\nRETURN ONLY A JSON ARRAY OF YOUR SELECTED WORDS. Select words that appear EXACTLY as written in the passage.\n\nPassage:\nThe old woman lived in a small cottage by the forest. Every morning, she would walk to the village market to buy fresh bread."
112
- }
113
- ],
114
- "temperature": 0.3,
115
- "max_tokens": 100
116
- }
117
- ```
118
-
119
- **Expected Output Schema:**
120
- ```json
121
- {
122
- "type": "array",
123
- "items": {
124
- "type": "string",
125
- "minLength": 4,
126
- "maxLength": 7
127
- },
128
- "minItems": 1,
129
- "maxItems": 1
130
- }
131
- ```
132
-
133
- #### 2. Batch Processing Test (Level 3-4 Medium)
134
- ```json
135
- {
136
- "messages": [
137
- {
138
- "role": "system",
139
- "content": "Process these two passages for a cloze reading exercise:\n\nPASSAGE 1 (Pride and Prejudice by Jane Austen):\nIt is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife.\n\nPASSAGE 2 (A Tale of Two Cities by Charles Dickens):\nIt was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness.\n\nFor each passage:\n1. Select 1 word for difficulty level 3-4 (medium vocabulary, 4-10 letters)\n2. Write ONE sentence about the book/author\n\nReturn a JSON object with this structure:\n{\n \"passage1\": {\n \"words\": [selected words],\n \"context\": \"One sentence about the book\"\n },\n \"passage2\": {\n \"words\": [selected words],\n \"context\": \"One sentence about the book\"\n }\n}"
140
- }
141
- ],
142
- "temperature": 0.5,
143
- "max_tokens": 800
144
- }
145
- ```
146
-
147
- **Expected Output Schema:**
148
- ```json
149
- {
150
- "type": "object",
151
- "properties": {
152
- "passage1": {
153
- "type": "object",
154
- "properties": {
155
- "words": {
156
- "type": "array",
157
- "items": { "type": "string" },
158
- "minItems": 1,
159
- "maxItems": 1
160
- },
161
- "context": {
162
- "type": "string",
163
- "maxLength": 150
164
- }
165
- },
166
- "required": ["words", "context"]
167
- },
168
- "passage2": {
169
- "type": "object",
170
- "properties": {
171
- "words": {
172
- "type": "array",
173
- "items": { "type": "string" },
174
- "minItems": 1,
175
- "maxItems": 1
176
- },
177
- "context": {
178
- "type": "string",
179
- "maxLength": 150
180
- }
181
- },
182
- "required": ["words", "context"]
183
- }
184
- },
185
- "required": ["passage1", "passage2"]
186
- }
187
- ```
188
-
189
- #### 3. Contextualization Test
190
- ```json
191
- {
192
- "messages": [
193
- {
194
- "role": "user",
195
- "content": "Write one factual sentence about 'The Adventures of Sherlock Holmes' by Arthur Conan Doyle. Focus on what type of work it is, when it was written, or its historical significance. Keep it under 20 words and conversational."
196
- }
197
- ],
198
- "temperature": 0.2,
199
- "max_tokens": 80
200
- }
201
- ```
202
-
203
- **Expected Output:** Plain text string, no JSON structure required.
204
-
205
- #### 4. Chat Hint Test (Part of Speech)
206
- ```json
207
- {
208
- "messages": [
209
- {
210
- "role": "system",
211
- "content": "You provide clues for word puzzles. You will be told the target word that players need to guess, but you must NEVER mention, spell, or reveal that word in your response. Follow the EXACT format requested. Be concise and direct about the target word without revealing it. Use plain text only - no bold, italics, asterisks, or markdown formatting. Stick to word limits."
212
- },
213
- {
214
- "role": "user",
215
- "content": "The target word is 'walked'. The sentence is: 'Every morning, she would _____ to the village market to buy fresh bread.'\n\nQuestion type: part_of_speech\n\nIdentify what part of speech fits in this blank. Answer in 2-5 words. Format: 'It's a/an [part of speech]'"
216
- }
217
- ],
218
- "temperature": 0.6,
219
- "max_tokens": 50
220
- }
221
- ```
222
-
223
- **Expected Output:** Plain text following format "It's a/an [part of speech]"
224
-
225
- #### 5. Chat Hint Test (Synonym)
226
- ```json
227
- {
228
- "messages": [
229
- {
230
- "role": "system",
231
- "content": "You provide clues for word puzzles. You will be told the target word that players need to guess, but you must NEVER mention, spell, or reveal that word in your response. Follow the EXACT format requested. Be concise and direct about the target word without revealing it. Use plain text only - no bold, italics, asterisks, or markdown formatting. Stick to word limits."
232
- },
233
- {
234
- "role": "user",
235
- "content": "The target word is 'cottage'. The sentence is: 'The old woman lived in a small _____ by the forest.'\n\nQuestion type: synonym\n\nSuggest a different word that could replace the blank. Answer in 1-3 words only."
236
- }
237
- ],
238
- "temperature": 0.6,
239
- "max_tokens": 50
240
- }
241
- ```
242
-
243
- **Expected Output:** Plain text with 1-3 word synonym
244
-
245
- ### LM Studio Configuration
246
-
247
- 1. **Model Selection**: Load gemma-3-27b or equivalent model
248
- 2. **Context Length**: Set to at least 4096 tokens
249
- 3. **GPU Layers**: Maximize based on available VRAM
250
- 4. **Batch Size**: 512 for optimal performance
251
- 5. **Prompt Format**: Use ChatML or model's native format
252
-
253
- ### Testing Checklist
254
-
255
- - [ ] Verify JSON responses are clean (no markdown blocks)
256
- - [ ] Check word selections match passage exactly
257
- - [ ] Ensure hints never reveal target words
258
- - [ ] Validate response stays within token limits
259
- - [ ] Test difficulty level word length constraints
260
- - [ ] Confirm batch processing handles both passages
261
- - [ ] Verify contextualization produces factual content
262
- - [ ] Test all four hint question types