SHARP, an approach to photorealistic view synthesis from a single image

62 comments

·December 16, 2025

superfish

"Unsplash > Gen3C > The fly video" is nightmare fuel. View at your own risk: https://apple.github.io/ml-sharp/video_selections/Unsplash/g...

Traubenfuchs

Early AI „everything turns into dog heads“ vibes. Beautiful.

drcongo

I miss those. Anyone know if it's still possible to get the models etc. needed to generate them?

Traubenfuchs

I wish there was an archive of all those melty dreamscapes.

https://m.youtube.com/watch?v=DgPaCWJL7XI&t=1s&pp=2AEBkAIB0g...

https://www.youtube.com/watch?v=X0oSKFUnEXc

schneehertz

san check, 1d10

ghurtado

Seth Brundle has entered the chat.

Leptonmaniac

Can someone ELI5 what this does? I read the abstract and tried to find differences in the provided examples, but I don't understand (and don't see) what the "photorealistic" part is.

emsign

Imagine history documentaries where they take an old photo and free objects from the background and move them round giving the illusion of parallax movement. This software does that in less than a second, creating a 3D model that can be accurately moved (or the camera for that matter) in your video editor. It's not new, but this one is fast and "sharp".

Gaussian splashing is pretty awesome.

kurtis_reed

What are free objects?

ferriswil

The "free" in this case is a verb. The objects are freed from the background.

ares623

Takes a 2D image and allows you to simulate moving the angle of the camera with correct-ish parallax effect and proper subject isolation (seems to be able to handle multiple subjects in the same scene as well)

I guess this is what they use for the portrait mode effects.

derleyici

It turns a single photo into a rough 3D scene so you can slightly move the camera and see new, realistic views. "Photorealistic" means it preserves real textures and lighting instead of a flat depth effect. Similar behavior can be seen with Apple's Spatial Scene feature in the Photos app: https://files.catbox.moe/93w7rw.mov

eloisius

From a single picture it infers a hidden 3D representation, from which you can produce photorealistic images from slightly different vantage points (novel views).

avaer

There's nothing "hidden" about the 3d represenation. It's a point cloud (in meters) with colors, and a guess at the the "camera" that produced it.

(I am oversimplifying).

uh_uh

"Hidden" or "latent" in a context like this just means variables that the algo is trying to infer because it doesn't have direct access to them.

eloisius

Hidden in the sense of neural net layers. I mean intermediary representation.

avaer

It makes your picture 3D. The "photorealistic" part is "it's better than these other ways".

carabiner

Black Mirror episode portraying what this could do: https://youtu.be/XJIq_Dy--VA?t=14. If Apple ran SHARP on this photo and compared it to the show, that would be incredible.

Or if you prefer Blade Runner: https://youtu.be/qHepKd38pr0?t=107

p-e-w

Agreed, this is a terrible presentation. The paper abstract is bordering on word salad, the demo images are meaningless and don’t show any clear difference to the previous SotA, the introduction talks about “nearby” views while the images appear to show zooming in, etc.

Dumbledumb

In Chapter D.7 they describe: "The complex reflection in water is interpreted by the network as a distant mountain, therefore the water surface is broken."

This is really interesting to me because the model would have to encode the reflection as both the depth of the reflecting surface (for texture, scattering etc) as well as the "real depth" of the reflected object. The examples in Figure 11 and 12 already look amazing.

Long tail problems indeed.

derleyici

Apple's Spatial Scene in the Photos app shows similar behavior, turning a single photo into a small 3D scene that you can view by tilting the phone. Demo here: https://files.catbox.moe/93w7rw.mov

Traubenfuchs

It‘s awful and often creates a blurry mess in the imaginated space behind the object.

Photoshop content aware fill could do equally or better many years ago.

moondev

cuda gpu only

https://github.com/apple/ml-sharp#rendering-trajectories-cud...

delis-thumbs-7e

Interestingly Apple’s own models don’t work on MPS. Well, I guess you just have to wait for few years..

diimdeep

No, model works without CUDA then you have .ply that you can drop into gaussian splatter viewer like https://sparkjs.dev/examples/#editor

CUDA is needed to render side scrolling video, but there is many ways to do other things with result.

matthewmacleod

This is specifically only for video rendering. The model itself works across GPU, CPU, and MPS.

yodon

> photorealistic 3D representation from a single photograph in less than a second

tartoran

Impressive but something doesn't feel right to me.. Possibly too much sharpness, possibly a mix of cliches, all amplified at once.

arjie

This is incredibly cool. It's interesting how it fails in the section where you need to in-paint. SVC seems to do that better than all the rest, though not anywhere close to the photorealism of this model.

Is there a similar flow but to transform either a video/photo/NeRF of a scene into a tighter, minimal polygon approximation of it. The reason I ask is that it would make some things really cool. To make my baby monitor mount I had to knock out the calipers and measure the pins and this and that, but if I could take a couple of photos and iterate in software that would be sick.

avaer

Is there a link with some sample gaussian splat files coming from this model? I couldn't find it.

Without that that it's hard to tell how cherry-picked the NVS video samples are.

EDIT: I did it myself, if anyone wants to check out the result (caveat, n=1): https://github.com/avaer/ml-sharp-example

brcmthrowaway

So this is the secret sauce behind Cinematic mode. The fake bokeh insanity has reached its climax!

duskwuff

As well as their "Spatial Scene" mode for lock screen images, which synthesizes a mild parallax effect as you move the phone.

Terretta

It's available for everyday photos, portraits, everything, not just lock screens.

spike021

you can also press the button while viewing a photo in the Photos app to see this.

Geee

This is great for turning a photo into a dynamic-IPD stereo pair + allows some head movement in VR.

SequoiaHope

Ah and the dynamic IPD component preserves scale?

remh

Enhance! https://www.youtube.com/watch?v=LhF_56SxrGk

mvandermeulen

I thought this was going to be the Super Troopers version

HN

SHARP, an approach to photorealistic view synthesis from a single image

SHARP, an approach to photorealistic view synthesis from a single image