The most recent development in AI makes me feel like I’m living in a condensed timeline where progress flies at a 10x speed. It wasn’t that long ago that people were shocked by the image generation capabilities of DALL-E 2, a GPT-based multi-modal model. Suddenly, we have tools that are able to make functional games, websites, and demonstrate complex reasoning skills with a surprising range of flexibility over images, text, and even videos. Really, once you grasp the power of the building blocks of GPT models, there’s virtually nothing you can’t do by training and combining them in ever more creative ways. Here are some examples that blew my mind:
- Creating 3D games
- Turn a picture of an open fridge into recipes
- Come up with a business plan with $100
- Be the eyes for people with blindness
These use cases technically are achievable with purpose-built tools (e.g., “an app that gives you recipes based on pictures of foods.) The remarkable thing here is that GPT-4 models can do these highly complex tasks out of the box without any specialized training.
It’s as if…a human were behind it.
Well, maybe small, tiny parts of humans.
A system of simpler parts
Our brain links together concepts with unfathomably complex neural networks to form a “complete world model.” Any human behavior, no matter how complex, can be broken down into smaller and smaller chunks of independent patterns. For example, if we were told to “draw a baby daikon in a tutu walking a dog,” we know how to do this properly because we know:
- What a daikon and dog look like
- That a “baby” qualifier means an anthropomorphically infant object with what we consider “infantile” features such as smallness, roundedness, etc.
- What a tutu looks like and that is usually worn around the waist area of a being
- “Walking an animal” implies some sort of leash object between the two.
This is how I understand DALL-E 2 to be able to draw entirely new and imaginary scenarios. It consumes enormous amounts of images and text description and learns what shapes, curves, and colors are associated with which words. The same principle also applies to GPT text models. When you ask ChatGPT to explain something, it follows the same patterns of how humans explain a topic and construct an answer.
But the moment you start using a GPT tool, you’d realize that no matter how technically perfect GPT-produced results are (which they aren’t yet), the models behind them don’t actually understand what they are doing. Like a parrot scrambling words into a faux-sentence, it doesn’t actually know what those words mean beyond sound patterns.
The fundamental concept underlying any kind of AI today is pattern-recognition and reconstruction. They don’t know the meaning behind topics and objects, they only recognize patterns, store them as mathematical constructs, and then regurgitate answers that conform to the patterns. Like a math student who doesn’t understand how a formula is derived, it’s pretty easy to tell that AI’s don’t know how the world actually works.
I think of AI’s today as if they were small, standalone slices of our brain. It can draw things that resemble the curves and colors of hands, but unaware that hands can only have five fingers and that there’s a skeletal structure underneath governing its range of motion. It can explain a complex topic like string theory to a 5-year-old, but it doesn’t understand why we care about it or that advancing the theory will transform our understanding of the universe.
In other words, AI’s don’t know what’s true, what’s real, and what’s human. It just knows what “sounds true,” “looks real,” and “feels human.” For now.
I don’t think it’s always going to stay that way.
Consciousness
If AI’s are able to accomplish what parts of our brains can do, aren’t we just a combination of those independent parts? Is that what consciousness is?
I think of consciousness as a spectrum. On the left, you have mechanical turks that perform precise tasks but are clearly not reasoning or learning anything new. Various intelligent animals like dogs and dolphins are perhaps in the middle. Somewhere on the right is us Homo sapiens, with our survival drives, complex motivations, self-awareness, and the ability to construct a world model and understand how to affect it. Everyone starts with a primitive set of genetic programming and spends decades programming new behaviors and constructing our worldview. Learning makes us human.
The best and most complex AI’s we have today seem to sit somewhere left of the middle on that spectrum. But I believe it’s only a matter of time before they start moving closer and closer toward the right. With the right motivations and learning mechanisms, can we really say that an AI cannot achieve the level of world awareness as we do? Will it, then, achieve consciousness and self-awareness?
Eventually, I do believe that as we climb the ladder of complexity, we’d inch closer to some version of consciousness. The AI’s should soon be able to derive facts such as that dogs have 4 legs and apply them to drawings without explicit human instruction, as a child would naturally do. The AI’s, if we choose to program them so, should also be able to seek an existential meaning to their life, as we’re wired to do.
The interesting question to me is: when would we come to understand our own consciousness? Will we reach an alternative model of intelligence through AI development, or stumble upon a mirror that reveals the inner workings of our own mind?
What a time to be alive.