AI and pictures

jkop October 06, 2024 at 10:25 2125 views 15 comments

I've been using OpenAIs image generator DALL-E a few times, and I'm not impressed. Anyone else here with some experience of AI and pictures?

For example, I asked DALL-E 3 to generate a picture of a three storey building situated in a park. The picture that it generated shows indeed a building situated in a park, but the building has more than 10 storeys. It's an absurd looking high-rise.

So, I ask it to reduce the number of storeys, and specify that 'three storeys' might look like three horisontal rows of windows stacked on top of each other. It generated a new picture, but it shows yet another high rise building, now with 15 or more storeys. The software claims that it has now reduced the number of stories according to my description. It's a lie. Obviously, the software does not know what it's doing.

Text-generating AI assistants seem to be better at acting as if they know what they're doing, and for coding and text analysis they might be useful even. But there are some fundamental differences between texts and pictures.

For example, pictures such as paintings or photographs are syntactically or semantically dense, i.e. between two identifiable meanings there is possibly a third meaning. AI-powered picture generators produce pictures according to verbal descriptions, but verbal descriptions are syntactically disjoint, not dense.

Does this difference explain my lack of success when I ask DALL-E 3 to produce a picture of a three story building? It talks (ChatGPT4o) as if it knows the meaning of 'three storeys', but it shows that it has no clue.

User image

Comments (15)

Baden October 06, 2024 at 13:59 #937110 0 likes

Reply to jkop

I one asked it (innocently) for a picture of a candle "dripping wax". I won't repeat what it produced for me. But that was no candle. :shade:

praxis October 06, 2024 at 17:15 #937166 0 likes

Using the same prompt, the AI that I use also produced too many floors to the building. I then revised the prompt to use the number 3 instead of the word three and it worked.

Nils Loc October 06, 2024 at 18:40 #937186 0 likes

Looks like AI generators have the same skill issue that Adolf Hitler had. Either the perspective is wrong or its just an aberration of architectural features.

Now that the story level problem is solved, how do you solve the windows and doors problem?

jkop October 06, 2024 at 22:13 #937239 0 likes

Reply to Baden The results I got are mostly absurd a la Monty Python.

Quoting praxis

I then revised the prompt to use the number 3 instead of the word three and it worked.

That's interesting. When I typed '3' the number of storeys increased to 8 :lol: Perhaps I should ask it to erase its memory of my previous attempts? I'll try again tomorrow.

Quoting Nils Loc

Either the perspective is wrong or its just an aberration of architectural features.

I suppose many errors arise because the image sampling technology is blind. The AI never sees the pictures that it samples, nor the result that it generates. Instead it reads our verbal commands, and matches them to the tags or content lists that describe millions of ready-made pictures.

punos October 07, 2024 at 03:22 #937309 0 likes

Reply to jkop
You can try asking a text-based AI to optimize your image prompt. Explain the problem you're experiencing with the image results and request that it optimize your prompt to mitigate the issue.

I copied and pasted your original prompt into Google Gemini and i got this:
https://g.co/gemini/share/dedbccddd2a3

javi2541997 October 07, 2024 at 05:52 #937331 0 likes

Reply to jkop Reply to Nils Loc Reply to praxis Reply to punos

Folks, I would not care to live in those buildings generated by artificial intelligence. They look weird and out of perception, like Hitler's paintings but at least they have a ceiling to cover myself in case. I try to use prompts too, and the result is, let's say, unique. I ask for ten stories, but if my maths are not wrong, I only count six:

[/img]

jkop October 07, 2024 at 07:42 #937345 0 likes

Quoting punos

request that it optimize your prompt to mitigate the issue

Ok! Let's see:

User image

So it did change the bottom floor, but also the rest of the building. It doesn't modify the picture according to my request but picks a different picture from its database. One step forward in one respect, two steps back in other respects. :cool:

Quoting javi2541997

I ask for ten stories, but if my maths are not wrong, I only count six

It seems to me that AI could be useful for intentional work with pictures if it had optical object or pattern recognition abilities. In some special areas it is evidently useful. But this blind image sampling that OpenAI and others offer online seems to be as useful as scrolling through a database of generic pictures.

Furthermore, we tend to react negatively because the assumptions under which we use their tools are false. AI is not intelligent, and it doesn't generate and modify pictures in the sense that one generates and modifies what there is to see.

punos October 07, 2024 at 08:50 #937363 0 likes

Reply to jkop
Yeah, these things are not "perfect" yet; remember, they're still babies. But they grow up so quickly! You should say, "This is amazing! I can see you have a great imagination. I love how you used those colors! They really stand out!" Then, promptly hang it on your refrigerator. :joke:

But really, i've heard that even professionals who use AI image generators have to go through many iterations until the AI gets it just right, or right enough. Most of these models have a parameter or method of introducing randomness into the process to enhance creativity, but at the cost of accuracy. LLMs have a "temperature" parameter that serves this purpose.

Also, companies that develop these models tend to lobotomize them in the name of content moderation and safety, which some might characterize as censorship. This incurs knock-on effects on unrelated material; in other words, it makes them dumber than they would be otherwise.

praxis October 07, 2024 at 17:07 #937470 0 likes

Reply to javi2541997

I added "cozy modern" to the prompt.

I wouldn't mind working there. With any luck the interior isn't decorated with Hitler paintings.

javi2541997 October 07, 2024 at 17:51 #937481 0 likes

Reply to praxis Cool! I wouldn't mind to work or cohabit there either. Is it me or is it similar to Murakami's 'Killing Commendatore' house? I can see myself drinking tea and writing a haiku in your AI-generated building or duplex.

No Hitler's paintings but Hokusai's!

frank October 07, 2024 at 18:33 #937495 0 likes

Reply to jkop
The coolest results I get from using AI (I use Wonder) come from giving it an image to start with. If I wanted a three story building, I'd give it an image of a three story building and then see what it does with it. I go through lot of iterations and sometimes feed its own images back into it.

javi2541997 October 07, 2024 at 18:51 #937505 0 likes

@praxis

I typed the following prompts: "cozy," "autumn," "rainy," and "ideal for writing poems."

The AI generated houses with candles and lights inside, which I didn't like. I asked to remove them and generate a darker/cloudy ambient. It was impossible for the AI. This machine kept generating houses with lights on inside them. What a waste of money and energy!

By the way, this is the generated house. Looks good, but it is not what I had in mind...

praxis October 07, 2024 at 19:14 #937524 0 likes

Quoting javi2541997

No Hitler's paintings but Hokusai's!

I don't know, kinda weird and dark, lol.

Loved that book, btw.

jkop October 07, 2024 at 21:49 #937606 0 likes

Quoting punos

..many iterations until the AI gets it just right, or right enough.

When each iteration presents a new picture, and parts or features in the previous picture that one would like to keep are lost, no amount of iterations could make it right. That's very different from modifying a picture by changing or adding parts while keeping other parts.

Quoting frank

The coolest results I get from using AI (I use Wonder) come giving it an image to start with.

Sounds cool, I'll check it out. :up:

punos October 07, 2024 at 22:26 #937623 0 likes

Reply to jkop
Sometimes when i encounter issues like this, i "reboot" the session. I start a new thread in order to clear any data it has in its context window (chat session history). Every prompt you give it skews the token probabilities for all subsequent consecutive prompts. Sometimes a piece of data in the context window can persistently muck up your results.

Usually, when i notice this happening early on, i just delete the last prompt/response up to where it started having the issue, just to clear those pieces from the context window. Then i continue prompting from there and repeat the process if it happens again.

Full disclosure: I don't usually use AI image generators much, except in rare and specific cases. I rarely get the results i was hoping for.