VANPY: Voice Analysis Framework
Paper
•
2502.17579
•
Published
•
1
This model combines the SpeechBrain ECAPA-TDNN speaker embedding model with an SVM classifier to predict speaker gender from audio input. The model was trained and evaluated on the VoxCeleb2, Mozilla Common Voice v10.0, and TIMIT datasets
The model was trained on VoxCeleb2 dataset:
You can install the package directly from GitHub:
pip install git+https://github.com/griko/voice-gender-classification.git
from voice_gender_classification import GenderClassificationPipeline
# Load the pipeline
classifier = GenderClassificationPipeline.from_pretrained(
"griko/gender_cls_svm_ecapa_voxceleb"
)
# Single file prediction
result = classifier("path/to/audio.wav")
print(result) # ["female"] or ["male"]
# Batch prediction
results = classifier(["audio1.wav", "audio2.wav"])
print(results) # ["female", "male", "female"]
If you use this model in your research, please cite:
@misc{koushnir2025vanpyvoiceanalysisframework,
title={VANPY: Voice Analysis Framework},
author={Gregory Koushnir and Michael Fire and Galit Fuhrmann Alpert and Dima Kagan},
year={2025},
eprint={2502.17579},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2502.17579},
}