Skip to content(if available)orjump to list(if available)

Unusual Capabilities of Nano Banana (Examples)

mitthrowaway2

I've come to realize that I liked believing that there was something special about the human mental ability to use our mind's eye and visual imagination to picture something, such as how we would look with a different hairstyle. It's uncomfortable seeing that skill reproduced by machinery at the same level as my own imagination, or even better. It makes me feel like my ability to use my imagination is no more remarkable than my ability to hold a coat off the ground like a coat hook would.

FuckButtons

I have aphantasia, I’m glad we’re all on a level playing field now.

yoz-y

I always thought I had a vivid imagination. But then the aphantasia was mentioned in Hello Internet once, I looked it up, see comments like these and honestly…

I’ve no idea how to even check. According to various tests I believe I have aphantasia. But mostly I’ve got not even a slightest idea on how not having it is supposed to work. I guess this is one of those mysteries when a missing sense cannot be described in any manner.

jmcphers

A simple test for aphantasia that I gave my kids when they asked about it is to picture an apple with three blue dots on it. Once you have it, describe where the dots are on the apple.

Without aphantasia, it should be easy to "see" where the dots are since your mind has placed them on the apple somewhere already. Maybe they're in a line, or arranged in a triangle, across the middle or at the top.

Revisional_Sin

Aphantasia gang!

m3kw9

To be fair, the model's ability came from us generating the training data.

quantummagic

To be fair, we're the beneficiaries of nature generating the data we trained on ourselves. Our ability came from being exposed to training in school, and in the world, and from examples from all of human history. Ie. if you locked a child in a dark room for their entire lives, and gave them no education or social interaction, they wouldn't have a very impressive imagination or artistic ability either.

We're reliant on training data too.

micromacrofoot

it can only do this because it's been trained on millions of human works

xnx

Amazing model. The only limit is your imagination, and it's only $0.04/image.

Since the page doesn't mention it, this is the Google Gemini Image Generation model: https://ai.google.dev/gemini-api/docs/image-generation

Good collection of examples. Really weird to choose an inappropriate for work one as the second example.

smrtinsert

Is it a single model or is it a pipeline of models?

SweetSoftPillow

Single model, Gemini 2.5 Flash with native image output capability.

warkdarrior

More specifically, Nano Banana is tuned for image editing: https://gemini.google/overview/image-generation

minimaxir

[misread]

vunderba

They're referring to Case 1 Illustration to Figure, the anime figurine dressed in a maid outfit in the HN post.

pdpi

I assume OP means the actual post.

The second example under "Case 1: Illustration to Figure" is a panty shot.

vunderba

Nano-Banana can produce some astonishing results. I maintain a comparison website for state-of-the-art image models with a very high focus on adherence across a wide variety of text-to-image prompts.

I recently finished putting together an Editing Comparison Showdown counterpart where the focus is still adherence but testing the ability to make localized edits of existing images using pure text prompts. It's currently comparing 6 multimodal models including Nano-Banana, Kontext Max, Qwen 20b, etc.

https://genai-showdown.specr.net/image-editing

Gemini Flash 2.5 leads with a score of 7 out of 12, but Kontext comes in at 5 out of 12 which is especially surprising considering you can run the Dev model of it locally.

darkamaul

This is amazing. Not that long ago, even getting a model to reliably output the same character multiple times was a real challenge. Now we’re seeing this level of composition and consistency. The pace of progress in generative models is wild.

Huge thanks to the author (and the many contributors) as well for gathering so many examples; it’s incredibly useful to see them to better understand the possibilities of the tool.

minimaxir

I recently released a Python package for easily generating images with Nano Banana: https://github.com/minimaxir/gemimg

Through that testing, there is one prompt engineering trend that was consistent but controversial: both a) LLM-style prompt engineering with with Markdown-formated lists and b) old-school AI image style quality syntatic sugar such as award-winning and DSLR camera are both extremely effective with Gemini 2.5 Flash Image, due to its text encoder and larger training dataset which can now more accurately discriminate which specific image traits are present in an award-winning image and what traits aren't. I've tried generations both with and without those tricks and the tricks definitely have an impact. Google's developer documentation encourages the latter.

However, taking advantage of the 32k context window (compared to 512 for most other models) can make things interesting. It’s possible to render HTML as an image (https://github.com/minimaxir/gemimg/blob/main/docs/notebooks...) and providing highly nuanced JSON can allow for consistent generations. (https://github.com/minimaxir/gemimg/blob/main/docs/notebooks...)

plomme

This is the first time I really don't understand how people are getting good results. On https://aistudio.google.com with Nano Banana selected (gemini-2.5-flash-image-preview) I get - garbage - results. I'll upload a character reference photo and a scene and ask Gemini to place the character in the scene. What it then does is to simply cut and paste the character into the scene, even if they are completely different in style, colours, etc.

I get far better results using ChatGPT for example. Of course, the character seldom looks anything like the reference, but it looks better than what I could do in paint in two minutes.

Am I using the wrong model, somehow??

SweetSoftPillow

Play around with your prompt, try ask Gemini 2.5 pro to improve your prompt before sending it to Gemini 2.5 Flash, retry and learn what works and what doesn't.

epolanski

+1

I understand the results are non deterministic but I get absolute garbage too.

Uploaded pics of my (32 years old) wife and we wanted to ask it to give her a fringe/bangs to see how would she look like it either refused "because of safety" and when it complied results were horrible, it was a different person.

After many days and tries we got it to make one but there was no way to tweak the fringe, the model kept returning the same pic every time (with plenty of "content blocked" in between).

SweetSoftPillow

Are you in gemini.google.com interface? If so, try Google AI Studio instead, there you can disable safety filters.

istjohn

Personally, I'm underwhelmed by this model. I feel like these examples are cherry-picked. Here are some fails I've had:

- Given a face shot in direct sunlight with severe shadows, it would not remove the shadows

- Given an old black and white photo, it would not render the image in vibrant color as if taken with a modern DSLR camera. It will colorize the photo, but only with washed out, tinted colors

- When trying to reproduce the 3 x 3 grid of hair styles, it repeatedly created a 2x3 grid. Finally, it made a 3x3 grid, but one of the nine models was black instead of caucasian.

- It is unable to integrate real images into fabricated imagery. For example, when given an image of a tutu and asked to create an image of a dolphin flying over clouds wearing the tutu, the result looks like a crude photoshop snip and copy/paste job.

stoobs

I'm pretty sure these are cherry-picked out of many generation attempts, I tried a few basic things and it flat out refused to do many of them like turning a cartoon illustration into a real-world photographic portrait, it kept wanting to create a pixar style image, then when I used an ai generated portrait as an example, it refused with an error saying it wouldn't modify real world people...

I then tried to generate some multi-angle product shots from a single photo of an object, and it just refused to do the whole left, right, front, back thing, and kept doing things like a left, a front, another left, and weird half back/half side view combination.

Very frustrating.

SweetSoftPillow

Are you in gemini.google.com interface? If so, try Google AI Studio instead, there you can disable safety filters.

Animats

I have two friends who are excellent professional graphic artists and I hesitate to send them this.

SweetSoftPillow

They better learn it today than tomorrow. Even though it's might be painful for some who does not like to learn new tools and explore new horizons.

mitthrowaway2

Maybe they're better off switching careers? At some point, your customers aren't going to pay you very much to do something that they've become able to do themselves.

There used to be a job people would do, where they'd go around in the morning and wake people up so they could get to work on time. They were called a "knocker-up". When the alarm clock was invented, these people lose their jobs to other knockers-up with alarm clocks, they lost their jobs to alarm clocks.

non_aligned

A lot of technological progress is about moving in the other direction: taking things you can do yourself and having others do it instead.

You can paint your own walls or fix your own plumbing, but people pay others instead. You can cook your food, but you order take-out. It's not hard to sew your own clothes, but...

So no, I don't think it's as simple as that. A lot of people will not want the mental burden of learning a new tool and will have no problem paying someone else to do it. The main thing is that the price structure will change. You won't be able to charge $1,000 for a project that takes you a couple of days. Instead, you will need to charge $20 for stuff you can crank out in 20 minutes with gen AI.

AstroBen

I don't know if "learning this tool" is gunna help..

flysonic10

I added some of these examples into my Nanna Banana image generator: https://nannabanana.ai

frfl

While these are incredibly good, it's sad to think about the unfathomable amount of abuse, spam, disinformation, manipulation and who know what other negatives these advancement are gonna cause. It was one thing when you could spot an AI image, but now and moving forward it's be basically increasingly futile to even try.

Almost all "human" interaction online will be subject to doubt soon enough.

Hard to be cheerful when technology will be a net negative overall even if it benefits some.

signatoremo

By your logic email is clearly a net negative, given how much junk it generates - spam, phishing, hate mails, etc. Most of my emails at this point are spams.

frfl

If we're talking objectively, yeah by definition if it's a net negative, it's a net negative. But we can both agree in absolute terms the negatives of email are manageable.

Hopefully you understand the sentiment of my original message, without getting into the semantics. AI advancement, like email when it arrived, are gonna turbocharge the negatives. Difference is in the magnitude of the problem. We're dealing with whole different scale we have never seen before.

Re: Most of my emails at this point are spams. - 99% of my emails are not spam. Yet AI spam is everywhere else I look online.

null

[deleted]