Browser-based Whisper transcription using WebGPU and adaptive model selection

(cowslator.space)

2 points | by brunochavesj 10 hours ago ago

1 comments

brunochavesj 10 hours ago ago

Hey HN folks,
I built a browser-based audio/video transcription tool that runs Whisper locally using WebGPU (with a CPU fallback via whisper.cpp).
There’s no backend processing — everything runs on the user’s machine. Files are never uploaded to a server.
Some implementation details:
Uses ONNX + WebGPU when available
Falls back to whisper.cpp (WASM) on CPU
Adaptive model selection engine chooses tiny/base/small/medium/etc based on:
CPU cores/threads
RAM (user input)
WebGPU availability
Models are downloaded once and stored in IndexedDB
Batch folder support (you can select a folder with hundreds or thousands of files)
The idea is that scaling happens on the client. A better GPU allows larger models. On weaker machines it automatically falls back.
I’m currently stress-testing it by transcribing 1000 audio files in a single batch to see how stable the memory behavior is over time.
Outputs:
.txt
.srt
.vtt
.lrc
Considering adding a final ZIP export so users don’t have to download files individually.
Would appreciate feedback, especially on:
WebGPU reliability across browsers
Memory management strategies for long batch runs
UX for hardware-based model selection
From what I tested, it seems to work best in chrome, didn't test it with chromium yet.
Curious what you folks think.