Skip to content(if available)orjump to list(if available)

Self-Supervised Learning from Images with JEPA (2023)

byyoung3

It’s not new and only superior in a very narrow set of categories.

heyitsguay

As a computer vision guy I'm sad JEPA didn't end up more effective. Makes perfect sense conceptually, would have easily transferred to video, but other self-supervised methods just seem to beat it!

turnersr

Yeah! JEPA seems awesome. Do you mind sharing what other self-supervised methods work better than JEPA?

blixt

Needs a (2023) tag. But definitely the release of ARC2 and image outputs from 4o got me thinking about the JEPA family too.

I don't know if it's right (and I'm sure JEPA has lots of performance issues) but seems good to have a fully latent space representation, ideally across all modalities, so that the concept "an apple a day keeps the doctor away" becoming image/audio/text is a choice of decoder rather than dedicated token ranges being chosen even before the actual creation process in the model begins.

niemandhier

GPTs are in the “exploit” phase of the “explore-exploit” trade-off.

JEPA is still in the explore phase, it’s good to read the paper and have an understanding of the architecture to gain an alternative perspective.

laughingcurve

Not new, not notable right now, not sure why it's getting upvoted (just kidding, it's because people see YLC and upvote based on names)

MoonGhost

Even average papers can have nice overview of the problem and references.

Grimblewald

I don't care for names, i just thought it was an interesting read.

justanotheratom

JEPA is presumably superior to Transformers. Can any expert enlighten us on the implications of this paper?

spmurrayzzz

Transformers are usually part of JEPA architectures. In I-JEPA's case, there is a ViT that is used in the context encoding phase.

null

[deleted]