Skip to content(if available)orjump to list(if available)

GPT Image 1.5

vunderba

Okay results are in for GenAI Showdown with the new gpt-image 1.5 model for the editing portions of the site!

https://genai-showdown.specr.net/image-editing

Conclusions

- OpenAI has always had some of the strongest prompt understanding alongside the weakest image fidelity. This update goes some way towards addressing this weakness.

- It's leagues better at making localized edits without altering the entire image's aesthetic than gpt-image-1, doubling the previous score from 4/12 to 8/12 and the only model that legitimately passed the Giraffe prompt.

- It's one of the most steerable models with a 90% compliance rate

Updates to GenAI Showdown

- Added outtakes sections to each model's detailed report in the Text-to-Image category, showcasing notable failures and unexpected behaviors.

- New models have been added including REVE and Flux.2 Dev (a new locally hostable model).

- Finally got around to implementing a weighted scoring mechanism which considers pass/fail, quality, and compliance for a more holistic model evaluation (click pass/fail icon to toggle between scoring methods).

If you just want to compare gpt-image-1, gpt-image-1.5, and NB Pro at the same time:

https://genai-showdown.specr.net/image-editing?models=o4,nbp...

echelon

I really love everything you're doing!

Personal request: could you also advocate for "image previz rendering", which I feel is an extremely compelling use case for these companies to develop. Basically any 2d/3d compositor that allows you to visually block out a scene, then rely on the model to precisely position the set, set pieces, and character poses.

If we got this task onto benchmarks, the companies would absolutely start training their models to perform well at it.

Here are some examples:

gpt-image-1 absolutely excels at this, though you don't have much control over the style and aesthetic:

https://imgur.com/a/previz-to-image-gpt-image-1-Jq5M2Mh

Nano Banana (Pro) fails at this task:

https://imgur.com/a/previz-to-image-nano-banana-pro-Q2B8psd

Flux Kontext, Qwen, etc. have mixed results.

I'm going to re-run these under gpt-image-1.5 and report back.

minimaxir

I have a Nano Banana Pro blog post in the works expanding on my experiments with Nano Banana (https://news.ycombinator.com/item?id=45917875). Running a few of my test cases from that post and the upcoming blog post through this new ChatGPT Image model, this new model is better than Nano Banana but MUCH worse than Nano Banana Pro which now nails the test cases that previously showed issues. The pricing is unclear but gpt-image-1.5 appears to be 20% cheaper than the current gpt-image-1 model, which would put a `high`-quality generation in the same price range as Nano Banana Pro.

One curious case demoed here in the docs is the grid use case. Nano Banana Pro can also generate grids, but for NBP grid adherence to the prompt collapses after going higher than 4x4 (there's only a finite amount of output tokens to correspond to each subimage), so I'm curious that OpenAI started with a 6x6 case albeit the test prompt is not that nuanced.

vunderba

I'll be running gpt-image-1.5 through my GenAI Showdown later today, but in the meantime if you want to see some legitimately impressive NB Pro outputs, check out:

https://mordenstar.com/blog/edits-with-nanobanana

In particular, NB Pro successfully assembled a jigsaw puzzle it had never seen before, generated semi-accurate 3D topographical extrapolations, and even swapped a window out for a mirror.

jngiam1

The mirror test is cool!

IgorPartola

Subtle detail but the little table casts a shadow because of the light in the window and the shadow remains unchanged after the mirror replaces the window.

niklassheth

Nice! Your comparison site is probably the best one out there for image models

qingcharles

I just tested GPT1.5. I would say the image quality is on par with NBP in my tests (which is surprising as the images in their trailer video are bad), but the prompt adherence is way worse, and its "world model" if you want to call it that is worse. For instance, I asked it for two people in a row boat and it had two people, but the boat was more like a coracle and they would barely fit inside it.

Also: SUPER ANNOYING. It seems every time you give it a modification prompt it erases the whole conversation leading up to the new pic? Like.. all the old edits vanish??

I added "shaky amateur badly composed crappy smartphone photo of ____" to the start of my prompts to make them look more natural.

Counterpoint from someone on the Musk site: https://x.com/flowersslop/status/2001007971292332520

abadar

I really enjoyed your experiments. Thank you for sharing your experiences. They've improved my prompting and have tempered my expectations.

echelon

I've been a filmmaker for 10+ years. I really want more visual tools that let you precisely lay out consistent scenes without prompting. This is important for crafting the keyframes in an image-to-video style workflow, and is especially important for long form narrative content.

One thing that gpt-image-1 does exceptionally well that Nano Banana (Pro) can't is previz-to-render. This is actually an incredibly useful capability.

The Nano Banana models take the low-fidelity previz elements/stand-ins and unfortunately keep the elements in place without attempting to "upscale" them. The model tries to preserve every mistake and detail verbatim.

Gpt-image-1, on the other hand, understands the layout and blocking of the scene, the pose of human characters, and will literally repair and upscale everything.

Here's a few examples:

- 3D + Posing + Blocking: https://youtu.be/QYVgNNJP6Vc

- Again, but with more set re-use: https://youtu.be/QMyueowqfhg

- Gaussian splats: https://youtu.be/iD999naQq9A

- Gaussians again: https://youtu.be/IxmjzRm1xHI

We need models that can do what gpt-image-1 does above, but that have higher quality, better stylistic control, faster speed, and that can take style references (eg. glossy Midjourney images).

Nano Banana team: please grow these capabilities.

Adobe is testing and building some really cool capabilities:

- Relighting scenes: https://youtu.be/YqAAFX1XXY8?si=DG6ODYZXInb0Ckvc&t=211

- Image -> 3D editing: https://youtu.be/BLxFn_BFB5c?si=GJg12gU5gFU9ZpVc&t=185 (payoff is at 3:54)

- Image -> Gaussian -> Gaussian editing: https://youtu.be/z3lHAahgpRk?si=XwSouqEJUFhC44TP&t=285

- 3D -> image with semantic tags: https://youtu.be/z275i_6jDPc?si=2HaatjXOEk3lHeW-&t=443

I'm trying to build the exact same things that they are, except as open source / source available local desktop tools that we can own. Gives me an outlet to write Rust, too.

pablonaj

Love the samples of the app you are making, will be testing it!

echelon

Images make this even easier to see (though predictable and precise video is what drives the demand) :

gpt-image-1: https://imgur.com/a/previz-to-image-gpt-image-1-Jq5M2Mh

nano banana / pro: https://imgur.com/a/previz-to-image-nano-banana-pro-Q2B8psd

gpt-image-1 excels in these cases, despite being stylistically monotone.

I hope that Google, OpenAI, and the various Chinese teams lean in on this visual editing and blocking use case. It's much better than text prompting for a lot of workflows, especially if you need to move the camera and maintain a consistent scene.

While some image editing will be in the form of "remove the object"-style prompts, a lot will be molding images like clay. Grabbing arms and legs and moving them into new poses. Picking up objects and replacing them. Rotating scenes around.

When this gets fast, it's going to be magical. We're already getting close.

oxag3n

If this was a farm of sweatshop Photoshopers in 2010, who download all images from the internet and provide a service of combining them on your request, this would escalate pretty quickly.

Question: with copyright and authorship dead wrt AI, how do I make (at least) new content protected?

Anecdotal: I had a hobby of doing photos in quite rare style and lived in a place where you'd get quite a few pictures of. When I asked gpt to generate a picture of that are in that style, it returned highly modified, but recognizable copy of a photo I've published years ago.

mortenjorck

> how do I make (at least) new content protected?

Air gap. If you don’t want content to be used without your permission, it never leaves your computer. This is the only protection that works.

If you want others to see your content, however, you have to accept some degree of trade off with it being misappropriated. Blatant cases can be addressed the same as they always were, but a model overfitting to your original work poses an interesting question for which I’m not aware of any legal precedents having been set yet.

LudwigNagasena

Using references is a standard industry practice for digital art and VFX. The main difference is that you are unable to accidentally copy a reference too close, while with AI it’s possible.

margorczynski

We are probably entering the post-copyright era. The law will follow sooner or later.

rafram

That seems unlikely to me. One side is made up of lots and lots of entrenched interests with sympathetic figures like authors and artists on their side, and the other is “big tech,” dominated by the rather unsympathetic OpenAI and Google.

ur-whale

> Question: with copyright and authorship dead wrt AI, how do I make (at least) new content protected?

Question: Now that the steamboats have been invented, how do I keep my clipper business afloat ?

Answer: Good riddance to the broken idea of IP, Schumpeter's Gale is around the corner, time for a new business model.

nobody_r_knows

my question to your anecdotal: who cares? not being fecicious, but who cares if someone reproduced your stuff and millions of people see your stuff? is the money that you want? is it the fame? because fame you will get, maybe not money... but couldn't there be another way?

swatcoder

People have values that go beyond wealth and fame. Some people care about things like personal agency, respect and deference, etc.

If someone were on vacation and came home to learn that their neighbor had allowed some friends stay in the empty house, we would often expect some kind of outrage regardless of whether there had been specific damage or wear to the home.

Culturally, people have deeply set ideas about what's theirs, and feel like they deserve some say over how their things are used and by whom. Even those that are very generous and want their things be widely shared usually want to have have some voice in making that come to be.

oxag3n

To clarify my question - I do not want anything I create to be fed into their training data. That photo is just an example that I caught and it became personal. But in general I don't want anymore to open source my code, write articles and put any effort into improving training data set.

netule

Suddenly, copyright doesn't matter anymore when it's no longer useful to the narrative.

ragequittah

Copyright has overstepped its initial purpose by leaps and bounds because corporations make the law. If you're not cynical about how Copyright currently works you probably haven't been paying attention. And it doesn't take much to go from cynical to nihilist in this case.

BoorishBears

OpenAI does care about copyright, thankfully China does not: https://imgur.com/a/RKxYIyi

(to clarify, OpenAI stops refining the image if a classifier detects your image as potentially violating certain copyrights. Although the gulf in resolution is not caused by that.)

CamperBob2

(Shrug) This is more important. Sorry.

jibal

facetious

[I won't bother responding to the rest of your appalling comment]

illwrks

The issue is ownership, not promotion or visibility.

Forgeties79

As a professional cinematographer/photographer I am incredibly uncomfortable with people using my art without my permission for unknown ends. Doubly so when it’s venture backed private companies stealing from millions of people like me as they make vague promises about the capabilities of their software trained on my work. It doesn’t take much to understand why that makes me uncomfortable and why I feel I am entitled to saying “no.” Legally I am entitled to that in so many cases, yet for some reason Altman et al get to skip that hurdle. Why?

How do you feel about entities taking your face off of your personal website and plastering it on billboards smiling happily next to their product? What if it’s for a gun? Or condoms? Or a candidate for a party you don’t support? Pick your own example if none of those bother you. I’m sure there are things you do not want to be associated with/don’t want to contribute to.

At the end of the day it’s very gross when we are exploited without our knowledge or permission so rich groups can get richer. I don’t care if my visual work is only partially contributing to some mashed up final image. I don’t want to be a part of it.

CamperBob2

The day after I first heard about the Internet, back in 1990-whatever, it occurred to me that I probably shouldn't upload anything to the Internet that I didn't want to see on the front page of tomorrow's newspaper.

Apart from the 'newspaper' anachronism, that's pretty much still my take.

Sorry, but you'll just have to deal with it and get over it.

agentifysh

I am very impressed a benchmark I like to run is have it create sprite maps, uv texture maps for an imagined 3d model

Noticed it captured a megaman legends vibe ....

https://x.com/AgentifySH/status/2001037332770615302

and here it generated a texture map from a 3d character

https://x.com/AgentifySH/status/2001038516067672390/photo/1

however im not sure if these are true uv maps that is accurate as i dont have the 3d models itself

but ive tried this in nano banana when it first came out and it couldn't do it

101008

> however im not sure if these are true uv maps that is accurate as i dont have the 3d models itself

also in the tweet

> GPT Image 1.5 is **ing crazy

and

> holy shit lol

what's impressive if you don't know if it's right or not (as the other comment pointed out, it is not right)

gs17

> however im not sure if these are true uv maps

I can tell you with 100% certainty they are not. For example, Crash doesn't have a backside for his torso. You could definitely make a model that uses these as textures, but you'd really have to force it and a lot of it would be stretched or look weird. If you want to go this approach, it would make a lot more sense to make a model, unwrap it, and use the wireframe UV map as input.

Here's the original Crash model: https://models.spriters-resource.com/pc_computer/crashbandic... , its actual texture is nothing like the generated one, because the real one was designed for efficiency.

sharkjacobs

Was it ever explained or understood why ChatGPT Images always has (had?) that yellow cast?

minimaxir

My pet theory is that OpenAI screwed up the image normalization calculation and was stuck with the mistake since that's something that can't be worked around.

At the least, it's not present in these new images.

BoorishBears

There's still something off in the grading, and I suspect they worked around it

(although I get what you mean, not easily since you already trained)

I'm guessing when they get a clean slate we'll have Image 2 instead of 1.5. In LMArena it was immediately apparent it was an OpenAI model based on visuals.

KaiserPro

Meta's codec avatars all have a green cast because they spent millions on the rig to capture whole bodies and even more on rolling it out to get loads of real data.

They forgot to calibrate the cameras, so everything had a green tint.

Meanwhile all the other teams had a billion macbeth charts lying around just in case.

jiggawatts

Also, you'd be shocked at how few developers know anything at all about sRGB (or any other gamut/encoding), other than perhaps the name. Even people working in graphics, writing 3D game engines, working on colorist or graphics artist tools and libraries.

viraptor

My pet theory is that this is the "Mexico filter" from movies leaking through the training data.

varjag

Not always, it started at a very specific point. Studio Ghibli craze + reinforcement learning on the likes.

ACCount37

Not really, but there's a number of theories. The simplest one is that they "style tuned" the AI on human preference data, and this introduced a subtle bias for yellow.

And I say "subtle" - but because that model would always "regenerate" an image when editing, it would introduce more and more of this yellow tint with each tweak or edit. Which has a way of making a "subtle" bias anything but.

amoursy

There was also the theory that is was because they scanned a bunch of actual real books and book paper has a slight yellow hue.

danielbln

That seems unlikely, as we didn't see anything like that with Dall-E, unless the auto regressive nature of gpt-image somehow was more influenced by it.

dvngnt_

maybe their version of synth-id? it at least helps me spot gpt images vs gemini's

vunderba

I never heard anything concrete offered. At least it's relatively easy to work around with a tone mapping / LUTs.

wahnfrieden

Soaked in piss.

blurbleblurble

It's really weird to see "make images from memories that aren't real" as a product pitch

kingstnap

It's strange to me too, but they must have done the market research for what people do with image gen.

My own main use cases are entirely textual: Programming, Wiki, and Mathematics.

I almost never use image generation for anything. However its objectively extremely popular.

This has strong parallels for me to when snapchat filters became super popular. I know lots of people loved editing and filtering pictures but I always left everything as auto mode, in fact I'd turn off a lot of the default beauty filters. It just never appealed to me.

nurettin

It would creep me out if the model produced origami animals for that prompt.

999900000999

I can actually imagine actors selling the rights to make fake images with them.

In late stage capitalism you pay for fake photos with someone. You have chat gpt write about how you dated for a summer, and have it end with them leaving for grad school to explain why you aren't together.

Eventually we'll all just pay to live in the matrix. When your credit card is declined you'll be logged out, to awaken in a shared studio apartment. To eat your rations.

ares623

I can see them getting paid like residuals from TV re-runs.

But after a point it'll hit saturation point. The novelty will wear off since everyone has access to it. Who cares if you have a fake photo with a celebrity if everyone knows it's fake.

mingabunga

Did an experiment to give a software product a dark theme. Gave Both (GPT and Gemini/Nano) a screenshot of the product and an example theme I found on Dribbble.

- Gemini/Nano did a pretty average job, only applying some grey to some of the panels. I tried a few different examples and got similar output.

- GPT did a great job and themed the whole app and made it look great. I think I'd still need a designer to finesse some things though.

anonfunction

So the announcement said the API works with the new model, so I updated my Golang SDK grail (https://github.com/montanaflynn/grail) to use but it returns a 500 server error when you try to use it, and if you change to a completely unknown model it's not listed in the available models:

  POST "https://api.openai.com/v1/responses": 500 Internal Server Error {
    "message": "An error occurred while processing your request. You can retry your request, or contact us through our help center at help.openai.com if the error persists. Please include the request ID req_******************* in your message.",
    "type": "server_error",
    "param": null,
    "code": "server_error"
  }

  POST "https://api.openai.com/v1/responses": 400 Bad Request {
    "message": "Invalid value: 'blah'. Supported values are: 'gpt-image-1' and 'gpt-image-1-mini'.",
    "type": "invalid_request_error",
    "param": "tools[0].model",
    "code": "invalid_value"
  }

alasano

It's still not available in the API despite them announcing the availability.

They even linked to their Image Playground where it's also not available..

I updated my local playground to support it and I'm just handling the 404 on the model gracefully

https://github.com/alasano/gpt-image-1-playground

anonfunction

Yeah I just tried it and got a 500 server error with no details as to why:

  POST "https://api.openai.com/v1/responses": 500 Internal Server Error {
    "message": "An error occurred while processing your request. You can retry your request, or contact us through our help center at help.openai.com if the error persists. Please include the request ID req_******************* in your message.",
    "type": "server_error",
    "param": null,
    "code": "server_error"
  }
Interestingly if you change to request the model foobar you get an error showing this:

  POST "https://api.openai.com/v1/responses": 400 Bad Request {
    "message": "Invalid value: 'blah'. Supported values are: 'gpt-image-1' and 'gpt-image-1-mini'.",
    "type": "invalid_request_error",
    "param": "tools[0].model",
    "code": "invalid_value"
  }

minimaxir

It's a staggered rollout but I am not seeing it on the backend either.

joshstrange

> staggered rollout

It's too bad no OpenAI Engineers (or Marketers?) know that term exists. /s

I do not understand why it's so hard for them to just tell the truth. So many announcements "Available today for Plus/Pro/etc" really means "Sometime this week at best, maybe multiple weeks". I'm not asking for them to roll out faster, just communicate better.

abbycurtis33

I still use Midjourney, because all of these major players are so bad at stylistic and creative work. They're singularly focused on photorealism.

ianbicking

I haven't really kept up with what Midjourney has been doing the past year or two. While I liked the stylistic aspects of Midjourney, being able to use image examples to maintain stylistic consistency and character consistency is SO useful for creating any meaningful output. Have they done anything in that respect?

That is, it's nice to make a pretty stand-alone image, but without tools to maintain consistency and place them in context you can't make a project that is more than just one image, or one video, or a scattered and disconnected sequence of pieces.

xnx

This is surprising. Is there a gallery of images that illustrates this?

takoid

Midjourney has a gallery on their website: https://www.midjourney.com/explore

throwthrowuknow

their explore page is a firehose of examples created by users and you can see the prompt used so you can compare the results in other services https://www.midjourney.com/explore?tab=video_top

FergusArgyll

That's the opinionated vs user choice dynamic. When the opinions are good, they have a leg up

kingkawn

This is a cultural flaw that predates image generation. Even PG has made statements on HN in the past equating “rendering skill” with the quality of art works. It’s a stand-in for the much more difficult task of understanding the work and value of culture making within the context of the society producing it.

doctorpangloss

Suppose the deck for Midjourney hit Paul Graham's desk, and the CEO was just an average Y Combinator CEO - so no previous success story. He would have never invested in Midjourney at seed stage (meaning before launch / before there were users) even if he were given the opportunity.

Better to read that particular story in the context of, "It would be very difficult to make a seed fund that is an index of all avant garde culture making because [whatever]."

aziis98

I know this is a bit out of scope for these image editing models but I always try this experiment [1] of drawing a "random" triangle and then doing some geometric construction and they mess up in very funny ways. These models can't "see" very well. I think [2] is still very relevant.

[1]: https://chatgpt.com/share/6941c96c-c160-8005-bea6-c809e58591...

[2]: https://vlmsareblind.github.io/

KaiserPro

Is there a watermarking, or some other way for normal people to tell if its fake?

mmh0000

I know OpenAI watermarks their stuff. But I wish they wouldn't. It's a "false" trust.

Now it means whoever has access to uncensored/non-watermarking models can pass off their faked images as real and claim, "Look! There's no watermark, of course, it's not fake!"

Whereas, if none of the image models did watermarking, then people (should) inherently know nothing can be trusted by default.

PhilippGille

https://help.openai.com/en/articles/8912793-c2pa-in-chatgpt-...

It doesn't mention the new model, but it's likely the same or similar.

adrian17

I just checked several of the files uploaded to the news post, the "previous" and "new", both the png and webp (&fm=webp in url) versions - none had the content metadata. So either the internal version they used to generate them skipped them, or they just stripped the metadata when uploading.

mnorris

I ran exiftool on an image I just generated:

$ exiftool chatgpt_image.png

...

Actions Software Agent Name : GPT-4o

Actions Digital Source Type : http://cv.iptc.org/newscodes/digitalsourcetype/trainedAlgori...

Name : jumbf manifest

Alg : sha256

Hash : (Binary data 32 bytes, use -b option to extract)

Pad : (Binary data 8 bytes, use -b option to extract)

Claim Generator Info Name : ChatGPT

...

KaiserPro

Exif isn't all that robust though.

I suppose I'm going to have to bite the bullet and actually train an AI detector that works roughly in real time.