Can AI Decode This Simple Image? Exploring the Limits of Artificial Intelligence

  • #1
jack action
Science Advisor
Insights Author
Gold Member
2023 Award
3,128
7,780
I found this image on social media today:

sun-on-the-beach.jpg

This image is so simple, yet out of the ordinary. You really need to stop and think about what you are looking at. Some might not even ever get it.

The question that quickly popped into my mind was: Can AI ever be able to explain what this image represents, without being fed anything else? What amazes me is how I can even do it! I failed to see how a bunch of statistical analysis can do the trick. And if it is possible, the database has to be really huge and diversified.

Have you seen other examples of seemingly simple tasks for humans that seem impossible for AI? Something that doesn't obey any known rules per se but humans can figure it out rather easily nonetheless.
 
  • Like
Likes FactChecker
Computer science news on Phys.org
  • #2
That doesn't seem that hard to me for an advance AI engine. It just has to recognize names of things in photos, that word order matters (even vertically), and that you can substitute symbols for words (like when people use emojis now). It then has to sort through the choices to find something makes sense to humans; like 'you sun of a wave' isn't as meaningful as other choices. There are simple examples already of each of the pieces. This all seems very trainable if people cared to do it. Maybe they're not there yet, but they will be. Maybe the hardest part is learning that people might want it done at all (i.e. self-learning), or maybe the creativity to make the first example of this sort of thing.
 
  • Skeptical
Likes jack action
  • #3
Andrej Karpathy gave an example of that in 2012 and earlier this year he reported that ChatGPT-4 was able to explain why that picture is funny (I believe I read that in Arstechnica). However, I now fail to find any posting regarding this claim and are only able to find this discussion on reddit. But I guess anyone with ChatGPT-4 access could give it a try.
 
  • Like
Likes jack action
  • #4
  • First, this image has two layers: one image of a sunset and another of text;
  • Then you have to understand the text is incomplete and it must be a joke;
  • Then you have to understand that the text location matters;
  • Then you must understand that the background image will complete the text;
  • Then you must understand that the part of the image that can replace a word sounds like the word it replaces (not even a true homophone in one case);
  • You most likely had to have heard the sentence before;
The last word (beach/b-i-tch) is really hard to get. I got it because I knew the sentence and I was looking for the word, and I found it by looking at the left of the image where the sandy beach is more prominent.

I'm not talking about asking AI "What is the joke?" or "Find the hidden text in this image"; Just asking "What does that image represent?" All of that without answering "A sunset on the beach with the words 'YOU OF A'".
 
  • Like
Likes DaveE and FactChecker
  • #5
Filip Larsen said:
However, I now fail to find any posting regarding this claim and are only able to find this discussion on reddit.
There is one obvious explanation in the comments of that discussion:
Karpathy said there is a risk that the image (or a derivative thereof) was part of the training data, which would to some extent invalidate the test.
 
  • #6
Yes, but it seems strange (to me, at least) that one training sample can be retrieved "verbatim" when given enough context. But the point is valid in general that you can't verify a network by using training data.
 
  • #7
jack action said:
There is one obvious explanation in the comments of that discussion:
That beings a question to my mind. If a neural network is asked to evaluate an example that was used as a training input, is it guaranteed to remember it? Could it get watered down by the other training inputs and maybe even get treated like an outlier?
 
  • #8
FactChecker said:
Could it get watered down by the other training inputs and maybe even get treated like an outlier?
Yes. In its simplest form, an AI model looks like this:
neural-net-classifier.png

The decision is not strictly A, B, or C with 100% certainty. The choices are always statistical. When training is first started, all of the weights in the hidden layer have randomly assigned values and the output would be statistical nonsense. If an 'outlier' is the first and only one that it is trained on, the backpropagation algorithm that is used will generate weight values in the hidden layer such that it would generate an output for that input with near 100% certainty.

As training progresses with additional inputs, the hidden layer's weights are continuously adjusted to create a best fit for everything that it's been trained on in order to attempt to get the correct output for every input that it's been trained on. This will naturally cause the first training items to shift away from 100%. If it's a big enough outlier from other items of its output type, the model could eventually classify it as something else.

Note however that with good test data, you can eliminate most of these types of error misclassifications. For example, in the standard MNIST number dataset, it's pretty easy to get a model 99.5% accuracy on identifying hand-written digits. And, if you look at the ones that it gets wrong, you would often have a hard time telling what the number was.
 
Last edited:

Similar threads

Replies
10
Views
2K
  • General Discussion
Replies
9
Views
2K
Replies
179
Views
23K
Replies
4
Views
1K
Replies
80
Views
16K
Writing: Input Wanted Captain's choices on colony ships
  • Sci-Fi Writing and World Building
Replies
4
Views
2K
  • Classical Physics
Replies
21
Views
1K
  • STEM Academic Advising
Replies
7
Views
911
Replies
8
Views
4K
  • Science Fiction and Fantasy Media
Replies
2
Views
3K
Back
Top