1 comments

  • brunochavesj 10 hours ago ago

    Hey HN folks,

    I built a browser-based audio/video transcription tool that runs Whisper locally using WebGPU (with a CPU fallback via whisper.cpp).

    There’s no backend processing — everything runs on the user’s machine. Files are never uploaded to a server.

    Some implementation details:

    Uses ONNX + WebGPU when available

    Falls back to whisper.cpp (WASM) on CPU

    Adaptive model selection engine chooses tiny/base/small/medium/etc based on:

    CPU cores/threads

    RAM (user input)

    WebGPU availability

    Models are downloaded once and stored in IndexedDB

    Batch folder support (you can select a folder with hundreds or thousands of files)

    The idea is that scaling happens on the client. A better GPU allows larger models. On weaker machines it automatically falls back.

    I’m currently stress-testing it by transcribing 1000 audio files in a single batch to see how stable the memory behavior is over time.

    Outputs:

    .txt

    .srt

    .vtt

    .lrc

    Considering adding a final ZIP export so users don’t have to download files individually.

    Would appreciate feedback, especially on:

    WebGPU reliability across browsers

    Memory management strategies for long batch runs

    UX for hardware-based model selection

    From what I tested, it seems to work best in chrome, didn't test it with chromium yet.

    Curious what you folks think.