Detecting AV1-encoded videos with Python

breve

> I’ve saved some AV1-encoded videos that I can’t play on my iPhone.

Sure you can. Install VLC on your phone and you'll be able to play the AV1 videos. Even the iPhone 7 released in 2016 can play AV1 video.

Don't agonise over battery life. The dav1d decoder for AV1 is great:

https://www.reddit.com/r/AV1/comments/1cf7eti/av1_dav1d_play...

https://www.reddit.com/r/AV1/comments/1cg2wv4/dav1d_battery_...

https://www.reddit.com/r/AV1/comments/1cgyace/dav1d_battery_...

https://www.reddit.com/r/AV1/comments/1chpz2r/dav1d_battery_...

monster_truck

It's not just great. It's so good that even on much older android phones than the ones tested in those links the brightness of the screen has a larger impact.

This is by design, so that even extremely dated smart tvs and etc can also benefit from the bandwidth savings.

Fun fact: I can't say which, but some of the oldest devices (smart tvs, home security products, etc) work around their dated hardware decoders by buzzsawing 4k video in half, running each piece through the decoder at a resolution it supports, then stitching them back together.

zahlman

    av1_videos = {
        p
        for p in glob.glob("**/*.mp4", recursive=True)
        if is_av1_video(p)
    }

    assert av1_videos == set()

Building a set just to check if it's empty is a bit more complexity than necessary. A more direct way that also bails out early:

    assert not any(is_av1_video(p) for p in glob.glob("**/*.mp4", recursive=True))

Equivalently (de Morgan's law):

    assert all(not is_av1_video(p) for p in glob.glob("**/*.mp4", recursive=True))

KwanEsq

> A more direct way that also bails out early

If it bails out early it is of no use to them.

> This means that if the test fails, I can see all the affected videos at once. If the test failed on the first AV1 video, I’d only know about one video at a time, which would slow me down.

Scaevolus

Note that ffprobe can output JSON which is much easier to handle than CSV. I have this snippet in my bashrc:

ffpj() { for f in "$@"; do ffprobe -v quiet -print_format json -show_format -show_streams "$f"; done }

avidiax

My first question is, where is this guy getting AV1 videos? Never seen these on the high seas.

Also, given that these videos are going to be reencoded, which is tremendously expensive, I feel that any optimization in this step is basically premature. Naively launching ffprobe 10,000 times is probably still less heavyweight than 1 reencode.

KwanEsq

Sounds like you're just sailing the wrong seas. Some have plenty of AV1. Though those tend to be more obviously advertised as such, I believe, so perhaps this is about downloads from YouTube.

breve

YouTube encodes video to AV1.

Right click on a YouTube video and select "Stats for Nerds" to see which format it's using in your browser. AV1 will be something like "av01.0.09M.08".

You've probably watched a lot of AV1 video without realising it.

senand

Off-topic, but it’s actually a she

01HNNWZ0MV43FF

Maybe he transcoded them. I know some archivers who download in H.264 but then transcode to H.265 to save on disk. (I guess they don't seed?)

wolttam

Somehow I thought this was going to be about detecting AV1 based on the decoded video frames, which would have been interesting!

avidiax

Yeah, I would think that the simulated grain of AV1 might be characterizable, even though, IIRC, it is pretty sophisticated.

nick238

Is launching an ffmpeg process so heavyweight that there's a reason to avoid it? If anything, it feels like it would trivialize parallelism, which is probably a feature, not a bug, if you have a bunch of videos to go through.

zahlman

TFA claims:

> This is shorter than the ffprobe code, and faster too – testing locally, this is about 3.5× faster than spawning an ffprobe process per file.

And the calls to the MediaInfo wrapper are not really harder to parallelize. `subprocess.check_output` is synchronous, so that code would have to be adapted to spawn in a loop and then collect the results in a queue or something. With the wrapper you basically end up doing the same thing, but with `multiprocessing` instead. And you can then just reuse a few worker processes for the entire job.

01HNNWZ0MV43FF

Python must have libav bindings somewhere, you could certainly run that check in-process.

Off the top of my head, it's probably in the container metadata, so you'd just need libavformat and not even libavcodec. Pass it a path, open it, scan the list of streams and check the codec magic number?