Show HN: Vidformer – Drop-In Acceleration for Cv2 Video Annotation Scripts

14 comments

·March 4, 2025

Hi HN, this is a project I've been working on as part of my PhD.

Vidformer is a system that makes video annotation or transformation scripts practically instant. Traditional scripts that render full videos can take minutes—Vidformer speeds this up by optimizing execution and using on-demand rendering, so results appear immediately instead of waiting for entire videos to render. It works as a drop-in replacement for OpenCV's cv2, meaning most scripts can adopt it by simply changing "import cv2" to "import vidformer.cv2 as cv2"—no need to rewrite code or sacrifice flexibility. Vidformer is written in Rust and uses FFmpeg libraries for low-level video access.

Under the hood, Vidformer runs code with symbolic references to frames and tracks frame modifications to build a declarative representation of the task. Then, when rendering, it can transparently distribute the workload across many cores and efficiently use additional memory for caching frames. Further, it can expose a Video on Demand endpoint and only render segments once requested; this lets playback begin instantly.

Repo: https://github.com/ixlab/vidformer

The "Open in Colab" notebook is a great place to start. Would love to hear feedback!

Visit

vulpescana

Ooh, this seems interesting ! I'm working on tracklab (https://github.com/TrackingLaboratory/tracklab), I might try to implement this in the visualization pipeline, which needs some work right now.

What can I potentially get out of VidFormer when there are a lot of annotations to show ?

dominikwin

There are a few benefits, depending on your use case; most of them are reducing video length as a barrier to visualization.

1) If you are running models you can use vidformer to see the results as they come in, essentially streaming annotated videos to your web browser as your model runs. 2) If you have existing inference results you can practically instantly render those on videos, then you can iterate or remix in seconds. 3) If you're hosting any infrastructure you can expose VOD streams publicly to show annotated videos to web clients. For example, it's trivial to build a video search engine which returns compilations.

tzm

Thanks for sharing / working on tracklab. Would be cool to see VidFormer implemented.

simlevesque

I'm working on a pull request to add it to Yolo right now.

I got this error: `AttributeError: module 'vidformer.cv2' has no attribute 'setNumThreads`

Maybe you could add some noops to make it easier to swap.

Edit: I saw the other functions that aren't implemented yet (https://ixlab.github.io/vidformer/opencv-filters.html)

I had to comment out: fillPoly, polylines and drawContours.

dominikwin

Happy to help, what do you mean add it to Yolo?

simlevesque

Use it in this project: https://github.com/ultralytics/ultralytics

They do some cv2 monkey patching so it won't be simple.

dominikwin

Okay, do you mean use it for speeding up media I/O there? I don't know if that would work. Using Yolo and drawing bounding boxes should already work fine with the supervision integration. That's how the Colab notebook does it.

null

[deleted]

xmichael909

import vidformer.cv2 as cv2

Do you have a list of what is supported? I've played around with cv2 quite a bit in python for everything from yolo, to loss of signal, to corrupt frame detection and simple things like 'snow blocking camera'

Ultimately, what is supported in your library? is it *.cv2?

dominikwin

Yes! There's a list here: https://ixlab.github.io/vidformer/opencv-filters.html

A good chunk of OpenCV imgproc is implemented, but it can go beyond that. Vidformer can be applied to any function which returns or manipulates frames (transformation), but not functions which return data. So not yolo, but things like canny edge detection would work just fine. It's impossible to accelerate all processing tasks, so we focus on the "video transformation" subset used to create videos.

smashah

[flagged]

null

[deleted]