Spaces:

natasa365
/

whisper.cpp

Sleeping

App Files Files Community

whisper.cpp / examples /cli /README.md

arpitjain96

docs : Update cli documentation (#3102)

8566207 unverified 8 months ago

preview code

raw

history blame contribute delete

4.53 kB

	# whisper.cpp/examples/cli

	This is the main example demonstrating most of the functionality of the Whisper model.
	It can be used as a reference for using the `whisper.cpp` library in other projects.

	```
	./build/bin/whisper-cli -h

	usage: ./build/bin/whisper-cli [options] file0 file1 ...
	supported audio formats: flac, mp3, ogg, wav

	options:
	-h, --help [default] show this help message and exit
	-t N, --threads N [4 ] number of threads to use during computation
	-p N, --processors N [1 ] number of processors to use during computation
	-ot N, --offset-t N [0 ] time offset in milliseconds
	-on N, --offset-n N [0 ] segment index offset
	-d N, --duration N [0 ] duration of audio to process in milliseconds
	-mc N, --max-context N [-1 ] maximum number of text context tokens to store
	-ml N, --max-len N [0 ] maximum segment length in characters
	-sow, --split-on-word [false ] split on word rather than on token
	-bo N, --best-of N [5 ] number of best candidates to keep
	-bs N, --beam-size N [5 ] beam size for beam search
	-ac N, --audio-ctx N [0 ] audio context size (0 - all)
	-wt N, --word-thold N [0.01 ] word timestamp probability threshold
	-et N, --entropy-thold N [2.40 ] entropy threshold for decoder fail
	-lpt N, --logprob-thold N [-1.00 ] log probability threshold for decoder fail
	-nth N, --no-speech-thold N [0.60 ] no speech threshold
	-tp, --temperature N [0.00 ] The sampling temperature, between 0 and 1
	-tpi, --temperature-inc N [0.20 ] The increment of temperature, between 0 and 1
	-debug, --debug-mode [false ] enable debug mode (eg. dump log_mel)
	-tr, --translate [false ] translate from source language to english
	-di, --diarize [false ] stereo audio diarization
	-tdrz, --tinydiarize [false ] enable tinydiarize (requires a tdrz model)
	-nf, --no-fallback [false ] do not use temperature fallback while decoding
	-otxt, --output-txt [false ] output result in a text file
	-ovtt, --output-vtt [false ] output result in a vtt file
	-osrt, --output-srt [false ] output result in a srt file
	-olrc, --output-lrc [false ] output result in a lrc file
	-owts, --output-words [false ] output script for generating karaoke video
	-fp, --font-path [/System/Library/Fonts/Supplemental/Courier New Bold.ttf] path to a monospace font for karaoke video
	-ocsv, --output-csv [false ] output result in a CSV file
	-oj, --output-json [false ] output result in a JSON file
	-ojf, --output-json-full [false ] include more information in the JSON file
	-of FNAME, --output-file FNAME [ ] output file path (without file extension)
	-np, --no-prints [false ] do not print anything other than the results
	-ps, --print-special [false ] print special tokens
	-pc, --print-colors [false ] print colors
	-pp, --print-progress [false ] print progress
	-nt, --no-timestamps [false ] do not print timestamps
	-l LANG, --language LANG [en ] spoken language ('auto' for auto-detect)
	-dl, --detect-language [false ] exit after automatically detecting language
	--prompt PROMPT [ ] initial prompt (max n_text_ctx/2 tokens)
	-m FNAME, --model FNAME [models/ggml-base.en.bin] model path
	-f FNAME, --file FNAME [ ] input audio file path
	-oved D, --ov-e-device DNAME [CPU ] the OpenVINO device used for encode inference
	-dtw MODEL --dtw MODEL [ ] compute token-level timestamps
	-ls, --log-score [false ] log best decoder scores of tokens
	-ng, --no-gpu [false ] disable GPU
	-fa, --flash-attn [false ] flash attention
	-sns, --suppress-nst [false ] suppress non-speech tokens
	--suppress-regex REGEX [ ] regular expression matching tokens to suppress
	--grammar GRAMMAR [ ] GBNF grammar to guide decoding
	--grammar-rule RULE [ ] top-level GBNF grammar rule name
	--grammar-penalty N [100.0 ] scales down logits of nongrammar tokens
	```