Pico-Banana-400k

vunderba

From the paper

> The pipeline (bottom) shows how diverse OpenImages inputs are edited using Nano-Banana and quality-filtered by Gemini-2.5-Pro, with failed attempts automatically retried.

Pretty interesting. I run a fairly comprehensive image-comparison site for SOTA generative AI in text-to-image and editing. Managing it manually got pretty tiring, so a while back I put together a small program that takes a given starting prompt, a list of GenAI models, and a max number of retries which does something similar.

It generates and evaluates images using a separate multimodal AI, and then rewrites failed prompts automatically repeating up to a set limit.

It's not perfect (nine pointed star example in particular) - but often times the "recognition aspect of a multimodal model" is superior to its generative capabilities so you can run it in a sort of REPL until you get the desired outcome.

https://genai-showdown.specr.net/image-editing

lukasb

What do you use for evaluation? gemini-2.5-pro is at the top of MMLU and has been best for me but always looking for better.

typpilol

I love your site I stumble across it once a month it seems.

Or there's another very similar site. But I'm pretty sure it's yours

daemonologist

I confess that I don't quite get the point here - is it just that they've paid the inference costs for a dataset than can be used for distillation/other research?

djtriptych

Really cool - looking to Apple to lead the on-device AI space in short order...

vednig

Other Post: https://news.ycombinator.com/item?id=45708493

cjrd

HN

Pico-Banana-400k

Pico-Banana-400k