11 comments

  • undefined 8 hours ago ago
    [deleted]
  • scott_s 5 days ago ago

    For disclosure, I've worked on TorchCodec. I'm happy to answer any questions!

    • weitendorf 8 hours ago ago

      > TorchCodec now has a dedicated WavDecoder for decoding WAV files. It bypasses FFmpeg entirely and reads WAV data directly, resulting in significantly faster decoding.

      I'm working in this area recently and very keen to use this given the claimed performance benefits, but I tried all your links and didn't see any actual performance numbers. Do you have any to share?

      IMO a fair performance benchmark for those not tied to the full pytorch stack would have ffmpeg and the wav already loaded into memory before execution. Given that torchcodec relies on the user-supplied ffmpeg installation I suspect that may not be the case for ffmpeg already, at least not by default.

      I understand why meta wouldn't want to do this (then you are inevitably distributing exploitable security vulnerabilities in pytorch, because ffmpeg will probably always have them) but I've been statically linking fmpeg and keeping the binary in-memory while still using separate processes for different batches of audio, with I/O through UDS between the parent and ffmpeg; then the parent does VAD on the pcm on CPU before any further inference. My implementation for static linking is similar to the pattern in https://github.com/amenzhinsky/go-memexec#static-binary - would be interesting to see if this is possible in the pytorch/python ecosystem, or maybe it's already been done.

      • NicolasHug 4 hours ago ago

        We tend to be conservative with the benchmarks results that we make public, because all benchmarks are wrong and unfair - they depend too much on the machine capabilities, on software versions, and on the actual decoding patterns that are relevant for the user - none of which can be controlled or fairly captured in a benchmark. That being said, we've got some benchmarks here, with a script that users can run on their own: https://github.com/meta-pytorch/torchcodec/pull/1474.

        Note that TorchCodec relies on FFmpeg libraries, not the FFmpeg binary itself. The new WavDecoder is faster because it bypasses the FFmpeg libraries code, not because it bypasses loading the FFmpeg binary in memory.

        Regarding static linking: we stick to dynamic linking to honor the L-GPL license of the FFmpeg libraries. TorchCodec is BSD-licensed, and statically linking against the L-GPL FFmpeg libs would not be compliant. Some libraries dynamically link against FFmpeg while still bundling the FFmpeg libraries as .so files in the Python wheel - whether that's still compliant is honestly unclear to me, so we prefer leaving it up for the user to supply their own FFmpeg via pure dynamic linking.

    • antixk 12 hours ago ago

      Hi, In the past I have used NVVideoCodec and VPI for gpu accelerated decoding and processing. What would be torchcodec's appeal here? VPI already provides zero-copy interface with pytorch.

      Thanks!

      • scott_s 4 hours ago ago

        1. A higher-level API that better integrates into the PyTorch ecosystem.

        2. Ease of going back-and-forth between CPU and GPU; in our experience, there's still a lot of scenarios where CPU decoding makes sense.

        3. Audio decoding support.

        Please take a look at our tutorials to get a feel for what TorchCodec can do: https://meta-pytorch.org/torchcodec/stable/generated_example...

  • hmaarrfk 18 hours ago ago

    What version of ffmpeg does this use? Last I tried torch tools used really outdated version of ffmpeg at the time of their release.

    • scott_s 5 hours ago ago

      The one you have installed. :) We don't distribute FFmpeg and instead find your installed version at runtime. We support versions 4 through 8.

  • alphatozeta 14 hours ago ago

    its really fast and the performance is great, but its really unfortunate it requires torch>=2.11 Too many NVIDIA libraries are still using 2.10 or an alpha version of 2.11 that doesn't have c++ methods used by torchcodec's underlying C++ code like use_blob and a few others. I had to fall back to ffmpeg-python unfortunately

  • Reubend 19 hours ago ago

    the WAV file decoding perf improvement is also very welcome!