31 points | by noahkay13 3 hours ago
4 comments
Hoe does this compare?
What it does: - Runs 7 model families: offline transcription (CTC, RNNT, TDT, TDT-CTC), streaming (EOU, Nemotron), and speaker diarization (Sortformer) - Word-level timestamps - Streaming transcription from microphone input - Speaker diarization detecting up to 4 speakers
With models like these often you want to glue things together and manage multithreaded queues. And gluing in cpp is no fun.
This assumes that your offering will perform better than onnxruntime on at least some metric such as memory.
https://github.com/rishikanthc/Scriberr
Hoe does this compare?
What it does: - Runs 7 model families: offline transcription (CTC, RNNT, TDT, TDT-CTC), streaming (EOU, Nemotron), and speaker diarization (Sortformer) - Word-level timestamps - Streaming transcription from microphone input - Speaker diarization detecting up to 4 speakers
With models like these often you want to glue things together and manage multithreaded queues. And gluing in cpp is no fun.
This assumes that your offering will perform better than onnxruntime on at least some metric such as memory.
https://github.com/rishikanthc/Scriberr