daniel_whisper_finetune_tiny_v2

This model is a fine-tuned version of openai/whisper-tiny on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.3584

Model description

This is a personal fine-tune of the Whisper tiny model, trained on approximately 1 hour of audio featuring Daniel Rosehill's voice. The training data includes domain-specific vocabulary focused on:

  • Technology and software development terminology
  • A few Hebrew words and phrases

This model was created as a proof of concept for fine-tuning Whisper models for personal use and improved transcription accuracy on domain-specific content.

Training Infrastructure

Fine-tuning was performed using Modal GPU inference infrastructure.

Converted Formats

In addition to the standard SafeTensors format, this repository includes converted model formats in the converted/ directory:

  • GGML format (converted/ggml/): For use with whisper.cpp

    • Cross-platform inference (desktop, mobile, edge devices)
    • Optimized for CPU and CUDA (NVIDIA GPU) acceleration
    • Compatible with iOS, Android, Raspberry Pi, and other platforms
  • CTranslate2 format (converted/ctranslate2/): For use with faster-whisper

    • Highly optimized inference engine (4x faster than OpenAI Whisper)
    • Excellent CPU and GPU (CUDA) support
    • Lower memory usage with 8-bit and 16-bit quantization

Intended uses & limitations

This model is optimized for:

  • Transcribing Daniel Rosehill's voice
  • Technical and software development content
  • Mixed English with occasional Hebrew terms

Limitations:

  • Performance may degrade on voices significantly different from the training data
  • Limited to the vocabulary and accent patterns in the training set
  • Best suited for personal use rather than general-purpose transcription

Training and evaluation data

Training dataset consisted of approximately 1 hour of recorded audio featuring:

  • Technical discussions and software development content
  • Mixed English with occasional Hebrew vocabulary
  • Single speaker (Daniel Rosehill)

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 50
  • training_steps: 250
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
1.1754 1.3158 50 0.7287
0.3072 2.6316 100 0.4515
0.1772 3.9474 150 0.3848
0.1181 5.2632 200 0.3649
0.1002 6.5789 250 0.3584

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.9.1+cu128
  • Datasets 4.4.1
  • Tokenizers 0.22.1
Downloads last month
6
Safetensors
Model size
37.8M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for danielrosehill/daniel_whisper_finetune_tiny_v2

Finetuned
(1671)
this model

Collection including danielrosehill/daniel_whisper_finetune_tiny_v2

Evaluation results