Can AI Decode This Simple Image? Exploring the Limits of Artificial Intelligence

jack action · Dec 21, 2023

I found this image on social media today:

This image is so simple, yet out of the ordinary. You really need to stop and think about what you are looking at. Some might not even ever get it.

The question that quickly popped into my mind was: Can AI ever be able to explain what this image represents, without being fed anything else? What amazes me is how I can even do it! I failed to see how a bunch of statistical analysis can do the trick. And if it is possible, the database has to be really huge and diversified.

Have you seen other examples of seemingly simple tasks for humans that seem impossible for AI? Something that doesn't obey any known rules per se but humans can figure it out rather easily nonetheless.

DaveE · Dec 21, 2023

That doesn't seem that hard to me for an advance AI engine. It just has to recognize names of things in photos, that word order matters (even vertically), and that you can substitute symbols for words (like when people use emojis now). It then has to sort through the choices to find something makes sense to humans; like 'you sun of a wave' isn't as meaningful as other choices. There are simple examples already of each of the pieces. This all seems very trainable if people cared to do it. Maybe they're not there yet, but they will be. Maybe the hardest part is learning that people might want it done at all (i.e. self-learning), or maybe the creativity to make the first example of this sort of thing.

Filip Larsen · Dec 21, 2023

Andrej Karpathy gave an example of that in 2012 and earlier this year he reported that ChatGPT-4 was able to explain why that picture is funny (I believe I read that in Arstechnica). However, I now fail to find any posting regarding this claim and are only able to find this discussion on reddit. But I guess anyone with ChatGPT-4 access could give it a try.

jack action · Dec 21, 2023

First, this image has two layers: one image of a sunset and another of text;
Then you have to understand the text is incomplete and it must be a joke;
Then you have to understand that the text location matters;
Then you must understand that the background image will complete the text;
Then you must understand that the part of the image that can replace a word sounds like the word it replaces (not even a true homophone in one case);
You most likely had to have heard the sentence before;

The last word (beach/b-i-tch) is really hard to get. I got it because I knew the sentence and I was looking for the word, and I found it by looking at the left of the image where the sandy beach is more prominent.

I'm not talking about asking AI "What is the joke?" or "Find the hidden text in this image"; Just asking "What does that image represent?" All of that without answering "A sunset on the beach with the words 'YOU OF A'".

jack action · Dec 21, 2023

Filip Larsen said:

However, I now fail to find any posting regarding this claim and are only able to find this discussion on reddit.

There is one obvious explanation in the comments of that discussion:

Karpathy said there is a risk that the image (or a derivative thereof) was part of the training data, which would to some extent invalidate the test.

Filip Larsen · Dec 21, 2023

Yes, but it seems strange (to me, at least) that one training sample can be retrieved "verbatim" when given enough context. But the point is valid in general that you can't verify a network by using training data.

FactChecker · Dec 21, 2023

jack action said:

There is one obvious explanation in the comments of that discussion:

That beings a question to my mind. If a neural network is asked to evaluate an example that was used as a training input, is it guaranteed to remember it? Could it get watered down by the other training inputs and maybe even get treated like an outlier?

Borg · Dec 22, 2023

FactChecker said:

Could it get watered down by the other training inputs and maybe even get treated like an outlier?

Yes. In its simplest form, an AI model looks like this:

The decision is not strictly A, B, or C with 100% certainty. The choices are always statistical. When training is first started, all of the weights in the hidden layer have randomly assigned values and the output would be statistical nonsense. If an 'outlier' is the first and only one that it is trained on, the backpropagation algorithm that is used will generate weight values in the hidden layer such that it would generate an output for that input with near 100% certainty.

As training progresses with additional inputs, the hidden layer's weights are continuously adjusted to create a best fit for everything that it's been trained on in order to attempt to get the correct output for every input that it's been trained on. This will naturally cause the first training items to shift away from 100%. If it's a big enough outlier from other items of its output type, the model could eventually classify it as something else.

Note however that with good test data, you can eliminate most of these types of error misclassifications. For example, in the standard MNIST number dataset, it's pretty easy to get a model 99.5% accuracy on identifying hand-written digits. And, if you look at the ones that it gets wrong, you would often have a hard time telling what the number was.

Can AI Decode This Simple Image? Exploring the Limits of Artificial Intelligence

Similar threads

Hot Threads

Recent Insights