No elephants: Breakthroughs in image generation
372 comments
·April 5, 2025x187463
Gasp0de
I love using LLMs to generate pictures. I'd call myself rather creative, but absolutely useless in any artistic craft. Now, I can just describe any image I can imagine and get 90% accurate results, which is good enough for the presentations I hold, online pet projects (created a squirrel-themed online math-learning game for which I previously would have needed a designer to create squirrel highschool themed imagery) and memes. For many, many websites this is going to be good enough.
nitwit005
> For many, many websites this is going to be good enough.
It was largely a solved problem though. Companies did not seem to have an issue with using stock photos. My current company's website is full of them.
For business use cases, those galleries were already so extensive before AI image generation, that what you wanted was almost always there. They seemingly looked at people's search queries, and added images to match previously failed queries. Even things you wouldn't think would have a photo like "man in business suit jump kicking a guy while screaming", have plenty of results.
dylan604
Really? What stock service would have a selection of squirrels in a high school setting doing various math or other subject related things?
To think any/all combined stock services would be the end all is just unrealistic. Sure, someone one might have settled on something just because they got tired of scrolling (much like streaming video services), that does not mean they are happy with their selection. Just happy to be done.
Now, with generativeAI, they can have squirrels doing anything in any setting they can describe. If they don't like it, they can just tweak the description until they are happy. It's an obvious plus for them.
I never drank the kool-aid to be all gung-ho on this boom/fad, but I'm not going to be so obstinate that I refuse to accept some people find it quite useful. May someone make all the squirrel attending highschool generative art they want, but you can't tell me some stock place is good 'nuff for everything.
schwartzworld
AI is mediocre at a lot of things, but it makes a damn fine upgrade from stock photos. This is the art that’s going to get replaced by this tech, shitty low effort stuff. Images where you just need a picture of X because people are expecting a picture.
It’s the same with code. I don’t think software engineers will really be replaced, but small web dev agencies have a good reason to be nervous. Why would you pay someone to make a website for your restaurant when 3-5 prompts will get you there?
globnomulous
> Companies did not seem to have an issue with using stock photos.
And now these image-generating models are giving us the equivalent of stock photos without the pesky issue of attribution or royalties. What a wonderful time to be alive.
candiddevmike
My problem with finding enjoyment in this is the same problem I have when using cheat codes in games: the doing part is the fun part, getting to the end or just permutations of the end gets really boring.
williamcotton
Trying to draw a squirrel when you have no artistic talents or experience is not the fun part.
I've produced my own music recordings in the past and I've hired musicians to play the instruments that I cannot. Having exasperated recording engineers watch my 5,000th take on a drum fill that I absolutely cannot play is not the fun part. Sitting behind the glass and watching my vision come to life from a really good drummer is absolutely the fun part.
munksbeer
>I love using LLMs to generate pictures. I'd call myself rather creative, but absolutely useless in any artistic craft. Now, I can just describe any image I can imagine and get 90% accurate results
May I ask what you use? I'm not yet even a paid subscriber to any of the models, because my company offer a corporate internal subscription chatbot and code integration that works well enough for what I've been doing so far but has no image generation.
I have tried image generation on the free tier but run out of free use before I get anyway pleasing.
What do you pay for?
wincy
I was generating pictures to use for a little game I made with my six and ten year old kids. They were so excited to see us go from idea to execution so quickly, they were laughing and we had a ton of fun. The only thing that disappointed me was I got throttled. We’d need to pay for API image gen to get it even faster.
I made a logo for an internal product that wouldn’t have had a logo otherwise at our company. I also make a lot of shitpost memes to my friends to trash talk in the long running turn based war game we’ve all been playing, like “make a cartoony image of a dog man and a Greek giant beating up a devil” and the picture it gave was just hilarious and perfect, like an old timey Popeye cartoon.
Two years ago I was spending three hours using local models like Stable Diffusion to get exactly what I wanted. I had to inpaint and generate 100 variations which would have been insanely expensive if I wasn’t powering it with my own hardware.
Now I get something good in minutes, it’s crazy really.
Gasp0de
The new ChatGPT image generation is insane. It's available on the free tier, just strongly rate limited.
__loam
If you use this technology, you're actively harming creative labor.
genewitch
Whatever. I wrote and co-wrote ten albums and my total take was $3.
The market is saturated and the way it works means ten get rich for every million artists. I feel as though this has been pretty constant throughout history.
Of course there's a lot of talent out there, "wasted", but I think that's always been the case. How many William Shakesmans did we lose with all the war, famine, disease?
I actually decided I'd probably never write music again after 1-shot making a song about the south Korea coup attempt several months ago. I had the song done before the news really even hit the US. Why would I destroy my own hearing writing music anymore when I can prompt an AI to do it for me, with the same net result - no one cares.
here's the 3-shot remix, the triangle cracks me up so much that i had to upload it https://soundcloud.com/djoutcold/coup-detat-symphony-remix
the "original" "1-shot" is on my soundcloud page as well. https://soundcloud.com/djoutcold/i-aint-even-writing-music-a...
it's in lojban. That's why you can't understand it. Yes. Lojban. Brings a tear to my eye every time i hear it. fkin AI
[0] more my style - hold music for our PBX https://soundcloud.com/djoutcold/bew-hold-music also all my stuff is CC licensed, mostly CC0 at this point.
ldoughty
Can you elaborate how there's no possible way to use this technology without actively harming artists?
If a classroom of 14 year olds are making a game in their computer science class, and they use AI to make placeholder images... Was a real artist harmed?
The teacher certainly cant afford to pay artists to provide content for all the students games, and most students can't afford to hire an artist either.. they perhaps can't even legally do it, if the artist requires a contract... they are underage in most countries to sign a contract.
This technology gives the kids a lot more freedom than a pre-packaged asset library, and can encourage more engagement with the course content, leading to more people interested in creative-employing pursuits.
So, I think this technology can create a new generation of creative individuals, and statements about the blanket harm need to be qualified.
Empact
Creative labor is not entitled to the work parent comment is describing. We employ labor because it is beneficial to us, not merely because it exists as an option. Creative labor’s responsibility is to adapt to a changing world and find roles where their labor is not simply produced / exceeded by a computer system.
Practically speaking, the work described would most likely never have been done, rather than been done by an artist if that were the only option - it’s uncommon to employ artists to help with incidental tasks relative to side projects, etc.
Animats
Creative labor is going the way of manual labor.
immibis
Only if there was ever any chance you would have hired some for that task.
lupusreal
I was never going to hire a professional artist to sketch shit up for me. I have replaced MS Paint, not harmed "creative labor".
becquerel
All labor is bad.
Retr0id
I've never used a stock photo site before, so I suppose it's no surprise I have no real use for "generate any image on demand".
esperent
I've used stock photo sites occasionally but I use vector art and icon sites multiple times a week. Even today, I used an few different sites while designing some stuff on Canva.
The reason I don't use AI is because it gives me far less reliable and impossible to specify results than just searching through the limited lists of human made art.
Today, for undisclosed reasons, I needed vector art of peanuts. I found imperfect but usable human made art within seconds from a search engine. I then spent around 15 - 25 minutes trying to get something closer to my vision using ChatGPT, and using the imperfect art I'd found as a style guide. I got lots of "huh that's cool what AI can do" but nothing useful. Nothing closer to my vision than what I started with.
By coincidence it's the first time I'vr tried making art with AI in about a year, but back then I bought a Midjourney account and spent a month making loads of art, then installed SD on my laptop and spent another couple of weeks playing around with that. So it's not like I'm lacking experience. What I've found so far is that AI art generators are great for generating articles like this one. And they do make some genuinely cool pictures, it blows my mind that computers can do this now.
It's just when I sit down with a real world task that has specific, concrete requirements... I find them useless.
YurgenJurgensen
Their main application appears to be taking blog posts and internal memos and making them three times longer and use ten times the bandwidth to convey no more information. So exactly the application AI is ‘good’ at.
wongarsu
If anything stock image websites are even worse at this then AI. With AI you come up with an image idea, then try to make the AI produce something close to it. With stock images you come up with an image idea, then hope some photographer had a similar idea and uploaded it to a stock website.
Suppafly
>so I suppose it's no surprise I have no real use for "generate any image on demand".
Other than stock photos, porn is the killer app for that, but most of the AI companies don't want to allow that.
avereveard
How about remove blur from your photo, remove blocking items, denoise darks and fixing whiteouts. Granted it's not quite there yet for everything but it's pretty close.
genewitch
I have the Gemini app on my phone and you can interact with it with voice only and I was like oh this is really cool I can use it while I'm driving instead of listening to music.
I can never think of anything to talk to an AI about. I run LM local, as well
JFingleton
Have it interview you (as-in a job interview) on your specialisation. Works your interviewer skills.
Ask it to teach you a language.
DnD works really well (the LLM being the game-master).
voidUpdate
DnD does not work really well, I've tried that with LLMs before
loudmax
That is a very interesting point about how little use of AI most of us making day to day, despite the potential utility that seems to be lurking. I think it just takes time for people and economies to adapt to new technology.
Even if technological progress on AI were to stop today, and the best models that exist in 2030 are the same models we have now, there would still be years of social and economic change as people and companies figure out how to make use of novel technology.
milanove
Unless I'm doing something simple like writing out some basic shell script or python program, it's often easier to just do something myself than take the time to explain what I want to an LLM. There's something to be said about taking the time to formulate your plan in clear steps ahead of time, but for many problems it just doesn't feel like it's worth the time to write it all out.
danielbln
I find that if a problem doesn't require planning it's probably simple enough that the LLM can handle it with little input. if it does require planning, I might as well dump it into an LLM as another evaluator and then to drive the implementation.
skybrian
Image generation is still very slow. If it generated many images instantly like Google’s image search, it would be a lot more fun to use, and we would learn to use it more effectively with practice.
neuroelectron
Some of the image generation systems are very fast.
Suppafly
>Image generation is still very slow.
Only because the free ones slow things down.
nyarlathotep_
> They almost all use AI generated images that are full of nonsense the closer you look. Jazz channels will feature coffee shops with garbled text on the menu and furniture blending together.
Noticed tht.
Maybe it's my algorithm but YouTube is seemingly filled with these videos now.
UncleEntity
They insist on feeding me AI generated videos about "HOA Karens" for some odd reason.
True, I do enjoy watching the LawTubers and sometimes they talk about HOAs but that is a far stretch from someone taking a reddit post and laundering it through the robots.
rasz
Youtube Studio is has build in AI thumbnail functionality. Google actively encourages use of AI to clickbait and to generate automatic AI replies to comments ala onlyfaps giving your viewers that feeling of interaction without reading their comments.
_DeadFred_
All my music cover images are AI generated. At the same time I refuse to listen to AI music. We're all going to sink alone on this one.
What's frustrating me is if I tell the Youtube algo 'don't recommend' to AI music video channels it stops giving me any music video channels. That's not what I want, I just don't want the AI. They need to seperate the two. But of course they need to not do that with AI cover images because otherwise it would harm me. :)
satvikpendem
Probably is your algorithm as mine is pretty good in not showing me those low effort channels. Check out extensions like PocketTube, SponsorBlock, and DeArrow to manage your YouTube feeds better.
card_zero
I was wondering yesterday how AI is coming along for tweening animation frames. I just did a quick search and apparently last year the state of the art was garbage:
https://yosefk.com/blog/the-state-of-ai-for-hand-drawn-anima...
Maybe this multimodal thing can fix that?
GaggiX
That blog post is a year old.
There has been a lot of progress since then: https://doubiiu.github.io/projects/ToonCrafter/
kridsdale3
Very impressive. This is going to result in an explosion of content creation by pro studios, just as CG with cel-shading renderers did. I greatly prefer the hand-drawn + AI tweened look to the current low-budget CG 3D models look.
shostack
A restaurant near me has a framed monitor that displays some animated art with a scene of a cafe on a street corner. I looked closely and realized it was AI. Chairs were melted together, text was gibberish, trees were not branching properly etc.
If a local restaurant is using this stuff we're near an inflection point of adoption.
card_zero
Looking at the example where the coffee table is swapped, I notice every time the image is reprocessed it mutates, based on the previous iteration, and objects become more bizarre each time, like chinese whispers.
* The weird-ass basket decoration on the table originally has some big chain links (maybe anchor chain, to keep the theme with the beach painting). By the third version, they're leathery and are merging with the basket.
* The candelabra light on the wall, with branch decorations, turns into a sort of skinny minimalist gold stag head, and then just a branch.
* The small table in the background gradually loses one of its three legs, and ends up defying gravity.
* The freaky green lamps in the window become at first more regular, then turn into topiary.
* Making the carpet less faded turns up the saturation on everything else, too, including the wood the table is made from.
og_kalu
It's kind of clear that for every request, it generates a new image entirely. Some people are speculating a diffusion decoder but i think it's more likely an implementation of VAR - https://arxiv.org/abs/2404.02905.
So rather than predicting each patch at the target resolution right away, it starts with the image (as patches) at a very small resolution and increasingly scales up. I guess that could make it hard for the model to learn to just copy and paste image tokens for editing like it might for text.
flkiwi
BUT it's doing a stunningly better job replicating previous scenes than it did before. I asked it just now for a selfie of two biker buddies on a Nevada highway, but one is a quokka and one is a hyrax. It did it. Then I asked for the same photo with late afternoon lighting, and it did a pretty amazing job of preserving the context where just a few months ago it would have had no idea what it had done before.
Also, sweet jesus, after more than a year of hilarious frustration, it now knows that a flying squirrel is a real animal and not just a tree squirrel with butterfly wings.
og_kalu
I agree. I'm not saying it's a different model generating the images. 4o is clearly generating the images itself rather than sending a prompt to some other model. I'm speculating about the mechanism for generation in the model itself.
M4v3R
Yeah, this is in my opinion the biggest limitation of the current gen GPT 4o image generation: it is incapable of editing only parts of an image. I assume what it does every time is tokenizing the source image, then transforming it according to the prompt and then giving you the final result. For some use cases that’s fine but if you really just want a small edit while keeping the rest of the image intact you’re out of luck.
atommclain
I thought the selection tool allows you to limit the area of the image that a revision will make changes to, but I tested it and I still see changes outside of the selected area which is good to know.
As an example the tape spindles, among other changes, are different: https://chatgpt.com/share/67f53965-9480-800a-a166-a6c1faa87c...
https://help.openai.com/en/articles/9055440-editing-your-ima...
qingcharles
Yeah, I'm not sure what the selection brush actually does. Is it just a hint to the LLM?
null
danielbln
It just means that you comp it together manually. That's still much better than having to set up some inpainting pipeline or whatever.
wavemode
Is manually comping actually going to be easier (let alone, give better results) than inpainting? I can imagine it working in simple cases, but for anything involving 3D geometry you'll likely run into issues of things not quite lining up between the first and second image.
echelon
100%. Multimodal images surpass ComfyUI and inpainting (for now). It's a step function improvement in image generation.
I'm hoping we see an open weights or open source model with these capabilities soon, because good tools need open models.
As has happened in the past, once an open implementation of DallE or whatever comes out, the open source community pushes the capabilities much further by writing lots of training, extensions, and pipelines. The results look significantly better than closed SaaS models.
iandanforth
Fwiw pixlr is a good pairing with GPT 4o for just this. Generate with 4o then use pixlr AI tools to edit bits. Especially for removals pixlr (and I'm sure others) are much much faster and quite reliable.
bla3
The pictures on the wall change too.
rob74
Actually, almost everything changes slightly - the number, shape and pattern of the chairs, the number and pattern of the pillows, the pattern of the curtains, the scene outside the window, the wooden part of the table, the pattern of the carpet... The blue couch stays largely the same, it just loses some detail...
card_zero
Yes, first a still life and something impressionist, then a blob and a blob, then a smear and a smear. And what about the reflections and transparency of the glass table top? It gets very indistinct. Keep working at the same image and it looks like you'll end up with some Deep Dream weirdness.
I think the fireplace might be turning into some tiny stairs leading down. :)
YurgenJurgensen
Only sailors know how to leave.
empath75
The vast majority of people wouldn't notice any of that in most contexts in which such an image would be used.
nowittyusername
There is circumstantial evidence out there that 4o image manipulation isn't done within the 4o image generator in one shot but is a workflow done by an agentic system. Meaning this, user inputs prompt "create an image with no elephants in the room" > prompt goes to an llm which preprocesses the human prompt > outputs a a prompt that it knows works withing this image generator well > create an image of a room > and that llm processed prompt is sent to the image generator. Same happens with edits but a lot more complicated, meaning function calling tools are involved with many layers of edits being done behind the scenes. Try it yourself, take an image, send it it, and have the 4o edit it for you in some way, then ask it to edit again, and again, and so on. you will notice noticeable sepia filter being applied every edit, and the image ends up more and more sepia toned with more edits. This is because in the workflow that is one of the steps that is naively applied without consideration of multi edit possibility. If this was a one shot solution where editing is done within 4o image model by itself, the sepia problem wouldn't be there.
vunderba
As somebody who actually tried to build a multimodal stable diffusion chat agent about a year back using YOLO to build partial masks for adjustments via inpainting, dynamic controlnets, and a whole host of other things, I highly doubt that it's as simple as an agentic process.
Using the prompt to detect and choose the most appropriate model checkpoint and LoRa(s) along with rewriting a prompt to most appropriately suit the chosen model has been pretty bog standard for a long time now.
echelon
> Using the prompt to detect and choose the most appropriate model checkpoint and LoRa(s) along with rewriting a prompt to most appropriately suit the chosen model has been pretty bog standard for a long time now.
Which players are doing this? I haven't heard of this approach at all.
Most artistic interfaces want you to visually select a style (LoRA, Midjourney sref, etc.) and will load these under the hood. But it's explicit behavior controlled by the user.
nialv7
None of your observations say anything about how these images are generated one way or another.
The only thing we currently have to go off of is OpenAI's own words, which claims the images are generated by a single multimodal model autoregressively, and I don't think they are lying.
pclmulqdq
Generated autoregressively and generated in one shot are not the same. There is a possibility that there is a feedback loop here. Personally, I wouldn't be surprised if there was a small one, but not nearly the complex agentic workflow that OP may be thinking of.
Suppafly
>This is because in the workflow that is one of the steps that is naively applied without consideration of multi edit possibility. If this was a one shot solution where editing is done within 4o image model by itself, the sepia problem wouldn't be there.
I don't really see that with chatgpt, what I do see is that it's presumably running the same basic query with just whatever you said different each time instead of modifying the existing image. Like if you say "generate a photo of a woman", and get a pic and then say "make her hair blonde", the new image is likely to also have different facial features.
renewiltord
The prompt enrichment thing is pretty standard. Everyone does that bit, though some make it user-visible. On Grok it used to populate to the frontend via the download name on the image. The image editing is interesting.
genewitch
All the stable diffusion software I've used names the files after some form of the prompt, and probably because SD weights the first tokens higher than the last tokens, probably as a side effect of the way the CLIP/BLIP works.
I doubt any of these companies have rolled their own interface to stable diffusion / transformers. It's copy and paste from huggingface all the way down.
I'm still waiting for a confirmed Diffusion Language Model to be released as gguf that works with llama.cpp
danielbln
Auto1111 and co are using the prompt in the filename because it's convenient, not due to some inherent CLIP mechanism.
If you think that companies like OpenAI (for all the criticisms they deserve) don't use their own inference harness and image models I have a bridge to sell to you.
diggan
> There is circumstantial evidence out there that 4o image manipulation isn't done within the 4o image generator in one shot
I thought this was obvious? At least from the first time (and only time) I used it, you can clearly see that it's not just creating one image based on the prompt, but instead it first creates a canvas for everything to fit into, then it generates piece by piece, with some coordinator deciding the workflow.
Don't think we need evidence either way when it's so obvious from using it and what you can see while it generates the "collage" of images.
andy12_
I mean, it could very well be that it generates image patches autoregressively, but in a pyramidal way (first a very low resolution version, the "canvas", and then each individual patch). This is very similar to VAR [1]
We can't really be sure until OpenAI tells us.
Voloskaya
> This is because in the workflow that is one of the steps that is naively applied without consideration of multi edit possibility.
Unconvinced by that tbh. This could simply be a bias with the encoder/decoder or the model itself, many image generation models showed behaviour like this. Also unsure why a sepia filter would always be applied if it was a workflow, what's the point of this?
Personally, I don't believe this is just an agentic workflow. Agentic workflows can't really do anything a human couln't do manually, they just make the process much faster. I spent 2 years working with image models, specifically around controllability of the output, and there is just no way of getting this kind of edits with a regular diffusion model just through smarter prompting or other tricks. So I don't see how an agentic workflow would help.
I think you can only get there via a true multimodal model.
lawlessone
huh, i wa thinking myself based on how it looked that it was doing layers too. The blurred backgrounds with sharp cartoon characters in front are what made me think this is how they do it.
probably_wrong
> Is it okay to reproduce the hard-won style of other artists using AI? Who owns the resulting art? Who profits from it? Which artists are in the training data for AI, and what is the legal and ethical status of using copyrighted work for training? These were important questions before multimodal AI, but now developing answers to them is increasingly urgent.
I have to disagree with the conclusion. This was an important discussion to have two to three years ago, then we had it online, and then we more or less agreed that it's unfair for artists to have their works sucked up with no recourse.
What the post should say is "we know that this is unfair to artists, but the tech companies are making too much money from them and we have no way to force them to change".
Taek
I don't think there's consensus around that idea. Lots of people (myself included) feel that copyright is already vastly overreaching, and that AI represents forward progress for the proliferation of art in society (its crap today, but digital cameras were crap in 2007 and look where they are now).
Its also not clear for example that Studio Ghibli lost by having their art style plastered all over the internet. I went home and watched a Ghibli film that week, as I'm sure many others did as well. Their revenue is probably up quite a bit right now?
"How can we monetize art" remains an open question for society, but I certainly don't think that AI without restrictions is going to lead to fewer people with art jobs.
kelseyfrog
I'd take it farther to say that copyright and intellectual property is a legal fiction that ultimately benefits the wealthy[those who can pay to legally enforce it] over small artists.
Small artists get paid to create the art; corporations benefit from exclusivity.
jayd16
The alternative being small artists don't get paid and corporations benefit from non-exclusivity.
thwarted
> Its also not clear for example that Studio Ghibli lost by having their art style plastered all over the internet. I went home and watched a Ghibli film that week, as I'm sure many others did as well. Their revenue is probably up quite a bit right now?
This sounds like a rewording of "You won't get paid, but this is a great opportunity for you because you'll get exposure".
Taek
Exposure has value! The meme around trying to pay artists with exposure is because some people think their "exposure" has meaningful value when they are offering to expose the artist to 100 people, 99 of whom aren't likely even target customers.
Studio Ghibli on the other hand had exposure to millions of people (maybe hundreds of millions), and probably >5% of those were potential customers.
So yes, being paid in exposure makes sense, if the exposure is actually worth what the art is worth. But most people offering to pay in exposure are overvaluing their exposure by 100x or more.
adamredwoods
When I did freelance graphic design, this was said to me. I didn't eat much that week.
DeathArrow
>Its also not clear for example that Studio Ghibli lost by having their art style plastered all over the internet.
Maybe Studio Ghibli is much more than merely a style. Maybe people aren't looking at their production just for the style.
Most people dislike wearing fake clothes and the dislike wearing fake watches or fake jewelry. Because it isn't just about the style.
pixl97
>Most people dislike wearing fake clothes and the dislike wearing fake watches or fake jewelry
I'd disagree. Most people don't like buying something 'real' then finding out it's fake. Far more people don't mind an actual fake if it's either high quality or is very low priced.
__loam
Nearly every artist I've spoken to or have seen talk about this technology says it's evil, so at least among the victims of this corporate abuse of the creative community, there's wide consensus that it's bad.
> but I certainly don't think that AI without restrictions is going to lead to fewer people with art jobs.
It's great that you think that but in reality a lot of artists are saying they're getting less work these days. Maybe that's the result of a shitty economy but I find it very difficult to believe this technology isn't actively stealing work from people.
Ray20
>in reality a lot of artists are saying they're getting less work these days
Good. That means we as a society get more art cheaper. I've long since grown tired of sponsoring greed of artists.
adamredwoods
The Ghibli style took humans decades to refine and create. All that respect and adoration for the craft and artists and the time it took, is now gone in an instant, making it a shallow trivial thing. Worse is to have another company exploit it with no regard for the ones who helped make it a reality.
The threat of AI produced art will forever trivialise human artistic capabilities. The reality is: why bother when it can be done faster and cheaper? The next generation will leverage it, and those skills will be very rare. It is the nature of technology to do this.
mrdependable
Studio Ghibli might not have been affected yet, but only because the technology is not there yet. What's going to happen when someone can make a competing movie in their style with just a prompt? Should we all just be okay with it because it's been decided that Studio Ghibli has made enough money?
If the effort required to create that can just be ingested by a machine and replicated without consequence, how would it be viable for someone to justify that kind of investment? Where would the next evolution of the art form come from? Even if some company put in the time to create something amazing using AI that does require an investment, the precedent is that it can just be ingested and copied without consequence.
I think aside from what is legal, we need to think about what kind of world we want to live in. We can already plainly see what social media has done to the world. What do you honestly think the world will look like once this plays out?
drdaeman
> What's going to happen when someone can make a competing movie in their style with just a prompt?
Nothing? Just like how if some studio today invests millions of man-hours and does a competing movie in Studio Ghibli's aesthetic (but not including any Studio Ghibli's characters, branding, etc. - basically, not the copyrightable or trademarkable stuff) nothing out of ordinary is going to happen.
I mean, artistic style is not copyrightable, right?
wavemode
Companies like Studio Ghibli are not being harmed by AI, small freelance artists are.
masswerk
I think, Studio Ghibli will be affected, as well, since their "trademark style" (as we used to say), formerly a welcome sight and indicative for a certain type of story telling, will be devaluated as an indicator for slop. (Much like there are certain traits of an image, which we associate with soap operas and assume to be indicative of a low-value production.)
mycall
> "How can we monetize art" remains an open question for society
Yet much of the best art imho is in the wild to the element while being at home at some random place. Or perhaps in someone's collection forgot and displaced. Art's worth will always be an open question.
eadmund
> it's unfair for artists to have their works sucked up
I never thought it was unfair to artists for others to look at their work and imitate it. That seems to me to be what artists have been doing since the second caveman looked at a hand painting on a cave wall and thought, ‘huh, that’s pretty neat! I’d like to try my hand at that!’
SirMaster
You don't see a massive difference in the shear number of images that the AI can look at and the speed at which it can imitate it as a fundamental difference between AI and a human copying works or styles?
For a human it took a lot of practice and a lot of time and effort. But now it takes practically no time or effort at all.
Workaccount2
Well yeah, but copyright infringement isn't a function of how quickly you can view and create works.
Copyright is meant to secure distribution of works you create. It's not a tool to stop people from creating art because it looks like your art. That has been a thing for centuries, we even categorize art by it's style. Imagine anime was had to adhere to a copyright interpretation of "it's my style!".
eadmund
It took a truly colossal amount of human time and effort to build AI systems. It takes significant amount of energy to run those AI systems.
I don’t see any meaningful difference at all between the system of a human, a computer and a corpus of images producing new images, and the system of a human, a paintbrush, an easel, a canvas and a corpus of images producing new images. Emphasis on the new — copying is still copying, and still controlled by copyrights.
Suppafly
>You don't see a massive difference in the shear number of images that the AI can look at and the speed at which it can imitate it as a fundamental difference between AI and a human copying works or styles?
I don't.
>For a human it took a lot of practice and a lot of time and effort. But now it takes practically no time or effort at all.
So effort is what makes it ok?
Ray20
>For a human it took a lot of practice and a lot of time and effort. But now it takes practically no time or effort at all.
And why is this not a good thing?
BriggyDwiggs42
Right the difference is that it’s a large company looking at it then copying it and reselling it without credit, which basically everyone would understand as bad without the indirection of a model.
Edit: the key words here are “company” and “reselling”
eadmund
But it’s not copying and reselling — it’s imitation.
Copying is controlled by copyrights. And imitation isn’t controlled by anything.
As for a company: a company is just a group of people acting together.
ZoomZoomZoom
False, copyism as a career has been always looked down at in the arts community. Learning and reinterpreting is a qualitatively different process.
shkkmo
> This was an important discussion to have two to three years ago, then we had it online, and then we more or less agreed that it's unfair for artists to have their works sucked up with no recourse.
Speak for yourself, there was no consensus online. There are plenty of us that think that dramatically expanding the power of copyright would be a huge mistake that would primarily benefit larger companies and do little to protect or fund small artists.
OtherShrezzing
>There are plenty of us that think that dramatically expanding the power of copyright would be a huge mistake that would primarily benefit larger companies and do little to protect or fund small artists.
The status quo also primarily benefits larger companies, and does little (exactly nothing, if we're being earnest) to protect or fund small artists.
It's reasonable to hold both opinions that: 1) artists aren't being compensated, even though their work is being used by these tools, and 2) massive expansion of copyright isn't the appropriate response to 1).
null
Suppafly
> and then we more or less agreed that it's unfair for artists to have their works sucked up with no recourse.
No we didn't agree with that.
wat10000
“Fair” doesn’t matter. The only consensus that matters is what is legal and profitable. The former seems to be pretty much decided in favor of AI, with some open question about whether large media companies enjoy protections that smaller artists don’t. (The legal battle when some AI company finally decides to let their model imitate Disney stuff is going to be epic.) Profitable remains to be seen, but doesn’t matter much while investors’ money is so plentiful.
__loam
> The former seems to be pretty much decided in favor of AI
None of the cases against AI companies have been decided afaik. There's a ton of ongoing litigation.
> but doesn’t matter much while investors’ money is so plentiful.
More and more people are realizing how wasteful this stuff is every day.
hnbad
> What the post should say is "we know that this is unfair to artists, but the tech companies are making too much money from them and we have no way to force them to change".
It seemed a fact of life that companies will just abuse your personal data to their liking and can do what they want with information they collect about you because "if it's free, you're the product" (and even if you paid for it, "you should know better" etc). Then GDPR and its international derivatives came along and changed that.
It seemd a fact of life that companies that technically don't have an actual market monopoly can do whatever they want within their vertically integrated walled gardens because competitors can just create their own vertically integrated walled gardens to compete with them and the rules for markets don't apply to walled gardens. Then the DSA and DMA came along and changed that.
I don't see why legislation can't change this, too. Of course just with the GDPR, DSA and DMA we'll hear from libertarians, megacorps and astroturf movements how unfair it all is to mom & pop startups and how it's going to ruin the economy but I think given the angle grider the US is currently taking to its own economy (and by extension the global economy because we're all connected), I think that's no longer a valid argument in politics.
DeathArrow
>> it's unfair for artists to have their works sucked up
What framework can we use to decide if something is fair or not?
Style is not something that should be copyrighted. I can pain in the style of X painter, I can write in the style of Y writer, I can compose music in the style of Z composer.
Everything has a style. Dressing yourself has a style. Speaking has a style. Even writing mathematical proofs can have a style.
Copying another person's style might reflect poor judgement, bad taste and lack of originality but it shouldn't be illegal.
And anyone in the business of art should have much more than a style. He should have original ideas, a vision a way to tell stories, a way to make people ask themselves questions.
A style is merely a tool. If all someone has is a style, then good luck!
yencabulator
It's already gone quite a bit further than "style". https://www.404media.co/listen-to-the-ai-generated-ripoff-so...
gosub100
In music, someone can sing the same style as another, but if they imitate it to the point that there is brand confusion, where the consumer believes the product came from X when it actually came from Y, that's clearly crossing the line.
ChadNauseam
Is that actually crossing a line? I'm sure some consumers have thought that rocket league was associated with FIFA, or that studio Ghibli movies were made by Disney. But these aren't widespread issues because we have a robust system of trademarks that draws a clear line: you can't use trademarked names or iconography in a way that causes confusion. But if some people heading Olivia Rodrigo's "good 4 you" think they're listening to Paramore because they have a similar style, that has never been illegal
shubhamjain
The Ghibli trend completely missed the real breakthrough — and it’s this. The ability to closely follow text, understand the input image, and maintain context of what’s already there is a massive leap in image generation. While Midjourney delivered visually stunning results, I constantly struggled to get anything specific out of it, making it pretty much useless for actual workflows.
4o is the first image generation model that feels genuinely useful not just for pretty things. It can produce comics, app designs, UI mockups, storyboards, marketing assets, and so on. I saw someone make a multi-panel comic with it with consistent characters. Obviously, it's not perfect. But just getting there 90% is a game changer.
empath75
I had chatgpt generate a flow chart with mermaid js for something at work and then write a scott mccloud style comic book explaining it in detail and it looked so convincing, even though it got some of the details a bit wrong. It's _so close_ to making completely usable graphics out of the box.
gcanyon
It's interesting to hear people side with the artists when in previous discussions on this forum I've gotten significant approval/agreement arguing that copyright is far too long.
As I've argued in the past, I think copyright should last maybe five years: in this modern era, monetizing your work doesn't (usually) have to take more than a short time. I'd happily concede to some sort of renewal process to extend that period, especially if some monetization method is in process. Or some sort of mechanical rights process to replace the "public domain" phase early on. Or something -- I haven't thought about it that deeply.
So thinking about that in this process: everyone is "ghiblifying" things. Studio Ghibli has been around for very nearly 40 years, and their "style" was well established over 35 years ago. To me, that (should) make(s) it fair game.
The underlying assumption, I think, is that all the "starving" artists are being ripped off, but are they? Let's consider the numbers -- there are a handful of large-scale artists whose work is obviously replicable: Ghibli, the Simpsons, Pixar, etc. None of them is going hungry because a machine model can render a prom pic in their style. Then you get the other 99.999% of artists, all of whose work went into the model. They will be hurt, but not specifically because their style has been ingested and people want to replicate their style.
Rather, they will be hurt because no one knows their style, nor cares about it; people just want to be able to say e.g. "Make a charcoal illustration of me in this photo, but make me sitting on a horse in the mountains."
It's very much like the arguments about piracy in the past: 99.99% of people were never going to pay an artist to create that charcoal sketch. The 0.01% who might are arguably causing harm to the artist(s) by not using them to create that thing, but the rest were never going to pay for it in the first place.
All to say it's complicated, and obviously things are changing dramatically, but it's difficult to make the argument that "artists need to be compensated for their work being used to train the model" without both a reasonable plan for how that might be done, and a better-supported argument for why.
ben_w
Mm.
The arguments about wanting copyright to be life+70 have always felt entitled, to me. Making claims about things for their kids to inherit, when the median person doesn't have the option to build up much of an inheritance anyway, and 70 years isn't just the next generation but the next 2.5 generations.
I don't know the exact duration of copyright that makes sense, the world changes too much and different media behave differently. Feels like nobody should have the right to block remakes of C64 games on copyright grounds, but I wouldn't necessarily say that about books published in the same year.
From what I've seen about the distribution of specifically book sales, where even the top-100 best sellers often don't make enough to justify the time involved, I think that one of the biggest problems with the economics of the arts is a mixture of (1) the low cost of reproduction, and (2) all the other artists.
For the former: There were political campaigns a century ago, warning about the loss of culture when cinemas shifted from live bands to recorded music[0]; Today, if I were so inclined, I can for a pittance listen to any of (I'm told) 100 million musical performances, watch any of 1.24 million movies or TV shows. Even before GenAI, there was a seemingly endless quantity of graphical art.
For the latter: For every new book by a current living author such as Charlie Stross (who is on here sometimes), my limited time is also spread between that and the huge back-catalogue of old classics like the complete works of Conan Doyle, Larry Niven, or Terry Pratchett.
[0] https://www.smithsonianmag.com/history/musicians-wage-war-ag...
cannonpr
Being someone who has paid a lot of attention to Ghibli, I wouldn’t say their style was well established 35-40 years ago… There is considerable evolution and refinement to their style from Naushika, to later works, both in the artistic style and the philosophical content it presents.
I think allowing it to be fair game would have destroyed something quite beautiful that I’ve watched evolve across 40 years and which I was hoping to see the natural conclusion of without him being bothered by the AI-fication of his work.
gcanyon
Yeah, of course their style isn't static, but I was taking Kiki's Delivery Service (1989) as a point where much of their visual style was pretty well-established.
cannonpr
I agree that some of his main elements were relatively set by then but others continue to evolve. Personally I feel that his work is entirely fair game after his death, by which I mean to say after the end of his journey that he has poured his entirety into, in terms of value to humanity and respect to him for that value I think it’s reasonable to respect his wishes during his lifespan.
another-dave
I'd agree with limiting copyrights but would do it based on money earned rather than time, so something like when you make $X million, the work becomes public domain.
As a specific example — _A Game of Thrones_ was released in 1996. It picked up awards early on but only became a NYT best seller in 2011, just before the TV show aired.
It would feel harsh for an author to loose all their copyright because their work was a "slow burn" and 5 years have elapsed but they've made little to no money on it.
pixl97
>o something like when you make $X million, the work becomes public domain.
https://en.wikipedia.org/wiki/Hollywood_accounting
No, no metrics that can be gamed.
gcanyon
It’s a super-interesting idea, but GoT seems highly cherry picked: the vast majority of all works would never leave copyright if the requirement was that they clear even $1000.
Avshalom
>It's interesting to hear people side with the artists when in previous discussions on this forum I've gotten significant approval arguing that copyright is far too long.
Well broadly that's because most arguments about copyright(length/scope) are made against corporations attacking individual artists and arguments about copyright(AI/scope) are made against corporations attacking individual artists.
Taek
I find it unlikely that someone who was willing to pay an artist for a charcoal sketch would be satisfied with an AI alternative.
You don't just buy art for the aethstetic, you buy it for a lot of reasons and AI doesn't give any of the same satisfaction.
zwnow
I'm all for paying artists for their work. Unfortunately, same as tattoo artists, some just heavily overcharge for mediocre results (been tattooing myself AND I know a few things about art). Like, sorry, but if you want to earn money doing art, please be good at it...
amazingamazing
> some just heavily overcharge for mediocre results
if people are paying, then they aren't "overcharging"
RhysU
> Like, sorry, but if you want to earn money doing art, please be good at it...
By definition, almost half of all $ARTISTS are worse than the median. Should that half not get paid for their time?
AlecSchueler
It's one thing to argue that copyright terms should be shortened, and another to accept that a handful of corporations should be able to forcefully shorten it for certain actors entirely on their own terms.
amazingamazing
> I think copyright should last maybe five years: in this modern era, monetizing your work doesn't (usually) have to take more than a short time. I
funny how people who say this kind of stuff are never content creators (in the monetization sense).
bko
There are a lot of programmers on this platform (myself included), and I love that my work has an impact on others.
I have a number of public repos and I have benefitted greatly from other public repos. I hope LLMs made some use of my code.
I wrote blogs for years without any monetization. I hope my ideas influenced someone and would be happy if they made some impact on the reasoning of LLMs.
I'm aware of patent trolls and know people with personal experience with them.
So I generate a lot more content that the typical person and I am still in favor of much looser IP rights as I think they have gone overboard and the net benefit for me, a content creator, is much greater having access to the work of others and being able to use tools like LLMs trained on their work.
mitthrowaway2
Programming, at least, is much easier to shield in a loose IP regime than art. You can ship only binaries, or even keep your code running on a server and disclose only the API. And likely, the company that pays your salary would opt to do just that.
I can't imagine a similar way for an artist to distribute their work while protecting their interests.
amazingamazing
posting stuff for free is different than selling stuff.
rikroots
My personal preference is for (say) 15-20 years.
And, as a content creator, I practice what I preach - at least when it comes to my poetry: https://rikverse2020.rikweb.org.uk/blog/copyrights
gcanyon
Not that it should impact the validity of my argument, but I have sold commercial software in the past, and it is absurd that that software will be copyrighted through most of the 21st-century.
6510
If you make a blog with nice original long form articles it may take much longer to gain traction. Reproducing the content in "your own" wording quickly gets fuzzy.
I like the practical angle. Any formula that requires monitoring what everyone is doing is unworthy of consideration. Appeal to tradition should not apply.
bongodongobob
The entitlement of the modern artist/musician is unprecedented. Never have I seen so many people expect to be handed a living because they've posted some "content". Musicians and artists now have global distribution with a plethora of platforms. You have to harness that and then work and grind it out. You have to travel and play gigs and set up a booth at art shows.
There's this new expectation that you should just be able to post some music on Spotify or set up an Etsy shop and get significant passive income. It has never ever worked that way and I feel this new expectation comes from the hustle/influencer types selling it.
Most art is crap and most music isn't worth listening to. In the modern age, it's easy for anyone to be a band or artist and the ability to do this has led to a ton of choice, the market is absolutely flooded. If anyone can do a thing (for very loose values of "do") it's inherently worth less. Only the very best make it and it will always be that way.
Source: made a living as a musician for 20 years. The ones who make it are relentlessly marketing themselves in person. You have to leave the house, be a part of a scene, and be constantly looking for opportunities. No one comes knocking on your door, you must drive your product and make yourself stand out in some way. You make money on merch and gigs, and it's always been that way.
This is all to say that copyright law only affects the top 0.1%. The avg struggling artist will never have to worry about any of this. It's like Bob the mechanic worrying about inheritance taxes. Pipe dream at best.
pixl97
I mean, this is about as useful as saying anti-slavery people should become slave owners so they understand the hardships of making money.
My example is extreme to the absurd, so how about we go with
>It's difficult to get a man to understand something when his salary depends on not understanding it.
Workaccount2
As you grow older and run through more cycles of general opinions, you realize that pretty much everyone is in it for themselves, what serves them best, and support what narrative aligns with that.
2007: Copyright is garbage and must be abolished (so I can get music/movies free)
2025: Copyright needs to be strengthened (so my artistic abilities retain value)
Der_Einzige
Correct. This is why Stirner is the best Philosopher. https://en.wikipedia.org/wiki/The_Ego_and_Its_Own
There is nothing other than Egoism.
otabdeveloper4
Either abolish it, or make it stronger so it applies to everyone. (OpenAI included.)
"Copyright for thee but not for me" is the worst of all worlds.
UncleEntity
You forgot:
2024: What do you mean I can't copyright AI generated artwork?
haswell
> The question isn't whether these tools will change visual media, but whether we'll be thoughtful enough to shape that change intentionally.
Unfortunately I think the answer to this question is a resounding “no”.
The time for thoughtful shaping was a few years ago. It feels like we’re hurtling toward a future where instead we’ll be left picking up the pieces and assessing the damage.
These tools are impressive and will undoubtedly unlock new possibilities for existing artists and for people who are otherwise unable to create art.
But I think it’s going to be a rough ride, and whatever new equilibrium we reach will be the result of much turmoil.
Employment for artists won’t disappear, but certain segments of the market will just use AI because it’s faster, cheaper, and doesn’t require time consuming iterations and communication of vision. The results will be “good enough” for many.
I say this as someone who has found these tools incredibly helpful for thinking. I have aphantasia, and my ability to visualize via AI is pretty remarkable. But I can’t bring myself to actually publish these visualizations. A growing number of blogs and YouTube channels don’t share these qualms and every time I encounter them in the wild I feel an “ick”. It’ll be interesting to see if more people develop this feeling.
pixl97
>But I think it’s going to be a rough ride, and whatever new equilibrium we reach will be the result of much turmoil.
Honestly visual media just seems to be the start. In the past two years we've seen about as much robotics progress as the last 20. If this momentum keeps up then we're not just talking about artists that are going to have issues.
TheGrognardling
Honestly, I'm pretty encouraged by all of the projects and efforts within legislation and organizations regarding clear lines being drawn - i.e., through watermarking to clearly label whether something is AI-generated or not - as well as the efforts by industries for livelihoods to be protected, specifically in the creative space, where human intentionality and feeling are still of the utmost essentiality. We've seen, are seeing, and will see cultural and societal acceptance and backlash against one thing or another, but I'm confident that we will adapt. Ultimately, pushback, thanks to the Web itself, is already pretty monumental among artists and even other AI researchers in many respects - regulations for the internet, largely due to lack of the Web, were far slower to materialize, on an exponential scale. I remain optimistic that we will find the niches where AI is needed, where it isn't, and where it is detrimental.
haswell
While I know there have been plenty of scathing essays, backlash among various communities, etc. do you have some concrete examples of the clear lines being drawn and legislation that gives you this optimism?
Maybe the progress you’re describing has escaped me because of the sheer speed this is all unfolding, but it feels like all I’ve heard is lots of noise, while AI companies continue to hammer hosted resources across the Internet to build their next model, the US government continues to claim they’ll use AI to solve problems of waste and fraud, companies like Shopify claim they won’t hire anyone unless it can be proven that AI cannot do the job, and an increasing % of the content I encounter is AI slop.
Maybe this is all necessary for a proper backlash to form, and I definitely want to become more aware of the positives anywhere I can find them. I’m not an AI doomer, but haven’t yet found the optimism you describe.
justinator
But the annotations are still wrong,
https://substackcdn.com/image/fetch/w_1456,c_limit,f_webp,q_...
(nice URL btw)
The room, the door, the ceiling are all of a scale to fit many sizes of elephants.
sfn42
The lines don't really make sense either, like the one above the sofa should probably go along the corner between the floor and wall.
justinator
Just like text results from AI, when images like this get better, the more and more subtle yet absolute wrongness is going to be a nightmare to deal with.
Imagine a ask AI to show me a sewer cap that's less than a foot wide (or whatever, I dunno, watch TMNT right now). And it does, just by showing a sewer cap that looks photorealistic and a ruler, that has markings from one end to the other that only go up to 8 inches. That doesn't mean sewer caps come in that size, it just means you can produce a rendered image to fit what you asked for.
m4thfr34k
I am very impressed with the current image generators out there, 4o / Leonardo / etc., but I cannot wait until they include some step to actually "check their work". Ask it to produce a watch with the time of 6:37. It fails every time, because almost all watch photos out there are set to a specific time, and seems like something an initial "did I do this right" check could confirm. The time example is trivial but a general "does this output actually make sense considering what the user asked" checked would be tremendously valuable.
Retr0id
I had a reasonable intuition for how the "old" method works, but I still don't grok this new approach.
"in multimodal image generation, images are created in the same way that LLMs create text, a token at a time"
Is there some way to visualise these "image tokens", in the same way I can view tokenized text?
fxtentacle
Imagine you cut the image into 32x32 pixel blocks. And then for each block, you can chose 1 out of 128,000 variations. And then a post-processing step smoothes out the borders between blocks and adjusts small details. That's basically how a transformer image generation model works.
As such, the process is remarkably similar to old fixed-font ASCII art. It's just that modern AIs have a larger alphabet and, thus, more character shapes to choose from.
rwmj
I don't get how this would produce consistent images. In the article, the text could be on a grid, but the window and doorway and sofa don't seem to be grid-aligned. (Or maybe the text is overlaid?)
danielbln
The model looks ahead, just like LLMs look ahead. An LLM outputs token by token but can still output a fully coherent and consistent story for example. This new crop of auto-regressive image models does the same.
WhiteNoiz3
I haven't see any details on how OpenAI's model works, but the tokens it generates aren't directly translated into pixels - those tokens are probably fed into a diffusion process which generates the actual image.. The tokens are the latent space or conditioning for the actual image generation process.
bonoboTP
> I haven't see any details on how OpenAI's model works
Exactly. People just confidently make things up. There are many possible ways, and without details, "native generation" is just a marketing buzzword without clear definition. It's a proprietary system, there is no code release, there is no publication. We simply don't know how exactly it's done.
og_kalu
Open AI have both said it's native image generation and autoregressive. It has the signs of it too.
It's probably an implementation of VAR (https://arxiv.org/abs/2404.02905) - autoregressive image generation with a small twist. Rather than predict every token at the target resolution directly, start with predicting it at a small resolution, cranking it higher and higher until the desired resolution.
NitpickLawyer
> The results are not as good as a professional designer could create but are an impressive first prototype.
I like to look at how far we've come since the early days of Stable Diffusion. It was fascinating to play with it back then, but it quickly became apparent that it was "generic" and not suited for "real work" because it lacked consistency, text capabilities, fingers! and so on... Looking at these results now, I'm amazed at the quality, consistency and ease of use. Gone are the days of doing alchemy on words and adding a bunch of "in the style of Rutkovsky, golden hour, hd, 4k, pretty please ..." at the end of prompts.
smusamashah
I am waiting for when I could provide these a scene snippet from "Hitchhiker's Guide To Galaxy" (or any book) and it could draw that for me. The gold planets, the waking up on the beach, total perspective vortex etc.
I like the book, but there are quite a few scenes which are quite hard to visualize and make sense. An image generator that can follow that language and detail will be amazing. Even more awesome will be if it remains consistent in follow ups.
iandanforth
Have you tried? It works well for other books, e.g. here's a scene from "A Connecticut Yankee in King Arthur's Court" which is conveniently in the public domain.
https://chatgpt.com/share/67f5d652-f7f4-8013-b2f2-3c997ea513...
fivestones
While it’s not 100% perfect (no horn from the forehead of his helmet) I’d say this is far beyond 90%. I can imagine reading books from project Gutenberg on a future reader app that automatically makes pictures of each scene which are consistent with each other and faithful to the text, on the fly as you read.
ARandumGuy
I've seen stuff that echoes this sentiment before, and I have to say I don't understand this desire at all. Why would I need a computer to show me what something in a book looks like? I already have an imagination for that!
Books are fundamentally a collaborative artform between the author and the reader. The author provides the blueprint, but it's up to the reader to construct the scene in their own head. And every reader is going to have slightly different interpretations based on how they imagine the events of a book. This act of imagination and re-interpertation is one of the things I love about reading books.
Having a computer do the visualization for you completely destroys what makes books engaging and interesting. If you don't want to visualize the book yourself, I have to wonder why the hell you're reading a book in the first place.
If you need that visual component, just watch a movie or read a comic book or something. This isn't a slight against movies or comics! They're fantastic mediums and are able to utilize their visual element to communicate ideas in ways that books can struggle with. And these visuals will form a much more cohesive artistic vision then whatever an AI outputs, since they're an integrated and intentional part of the work.
smusamashah
You make it sound like an unfair wish, which I would say is unfair itself. I like the book, I like visualising it in my head, and I fantasise what a scene would look like. AI won't generate a true visual. It's all fantasy anyway and AI can actually do it the way I have it in my head. It will solidify that thought.
For this book in particular, I read the comic version and I didn't like the visuals very much. I have a different idea of babel fish. Vogons look different. I would love to see the visual that's in my head on paper.
wrboyce
I love the idea, but I feel like I have to say that I’ve got a pretty solid idea of what the total perspective vortex would look like for someone being subjected to it. When I first read the books I immediately had a visual and that has never changed when I’ve read them again (and again…).
I’m not sure what that says about either of us, but I would say that your definitive “quite hard to visualise” statement is very much subjective.
smusamashah
Vortex may be not so much but there are other hard to visualize things. I am on the third book, and I have no idea what Beeblebrox's two heads look like. Second head is often mentioned in passing. Sometimes its mentioned as its always there, other times it feels like its just pops out of somewhere, otherwise, it's like it doesn't exist.
There is the scene when they see themselves on the beach on first rescue by the ship. That was hard to grasp. Or the insides of the ship itself, the bridge, the panels etc. Also that black ship they stole.
But may be its just me having a hard time with these concepts.
It's not just about scene being difficult to visualize, even if I can see them in my head, I want to see them on paper too because those thing excite me.
thwarted
We have alt text for images, you want alt images for text.
You can see other people's interpretation of Zaphod's two heads by watching the BBC HHGTTG show (Mark Wing-Davey) or the movie (Sam Rockwell), among other renditions, which offer completely different interpretations, none of them canonical (not the least of which is because there was no canonical version of HHGTTG according to DA). I'm sure there are multitudes of fan art for HHGTTG on deviantart. Having AI generate an image doesn't offer any more "official" visualization.
Zaphod's second head is mentioned just as much is warranted. If a character has a limp or a crazy haircut it is not mentioned every time, because it has nothing to do with what is going on. And the book mentions that one head is often distracted/asleep, so it sounds like you do have a good visual of what his two heads are like.
While I understand that people think differently and some people are more visual thinkers, a good portion of the concepts expressed through writing are meant to be mindfucks that are difficult to express visually. A picture may be worth a thousand words, but the meat of writing is usually not the visual representation of its concepts. That's a great thing about writing: you can fill in the visuals yourself and it's fodder for fans to discuss.
(BTW, Hotblack Desiato's ship would just be black. Your eyes couldn't focus on it. Even the controls were black labels on a black background. There is nothing here to visualize other than, well, blackness).
wrboyce
Again, I have to disagree - which I suppose reinforces the whole subjectivity angle. I was positive that Zaphod’s two heads were side-by-side to the extent that it pissed me off a fair bit in the most recent movie adaption (among, let’s face it, plenty of other candidates).
I don’t know if the “layout” of the heads is mentioned or not in the books - I’d have to go back and check - but it’s often quite jarring when a book becomes a movie and doesn’t match my inner vision (and how incredibly unthoughtful of them, to boot).
This is a before/after moment for image generation. A simple example is the background images on a ton of (mediocre) music youtube channels. They almost all use AI generated images that are full of nonsense the closer you look. Jazz channels will feature coffee shops with garbled text on the menu and furniture blending together. I bet all of that disappears over the next few months.
On another note, and perhaps others are feeling similarly, but I am finding myself surprised at how little use I have for this stuff, LLMs included. If, ten years ago, you told me I would have access to tools like this, I'm sure I would have responded with a never ending stream of ideas and excitement. But now that they're here, I just sort of poke at it for a minute and carry on with my day.
Maybe it's the unreliability on all fronts, I don't know. I ask a lot of programming questions and appreciate some of the autocomplete in vscode, but I know I'm not anywhere close to taking full advantage of what these systems can do.