Ghosts in a Shell

Have you ever wondered what reality is like from a non-human perspective? As in, what is it like to see the sunset as a cat? Is my red the same as your red? And so on.
By Alyssa Adams, Oneris Rico, Nicholas Guttenberg, Olaf Witkowski
October 2, 2023
A geometric origami type paper shell with a purple light inside of it
Stable Diffusion generated image by Alyssa Adams

Watch here for the video version of this essay.

These questions have been the center of philosophical discussion for millennia.

Since you are not me, and I am not a cat, and none of us are bats, and we can certainly never be anything but human, we can never know what it would be like to experience the world from any other perspective except our own.1

Bees can see UV light, so flowers appear differently to them than they do for humans. (Credit – Mimulus nectar guide UV VIS, Wikipedia Commons)

And yet… And yet! We (humans) are experts at changing our day-to-day experiences with one special trick:

Our ability to summon technology that changes our experiences is perhaps what makes us the most unique from any other creature on Earth. We are too impatient for natural evolution to change us over hundreds of thousands of years. Instead, we pour a vast amount of energy into pounding dirt into metal, forging the tools we so desperately need to uphold the delicate social structures that we’ve crafted to ensure our global dominance over the planet.

Out of the Earth, we construct arrows, plows, houses, cups, lenses, engines, computers, skyscrapers, airplanes, satellites, new elements, and even accelerators to smash fundamental particles together at (almost) the speed of light.

With each new technological invention, we redefine what a human's day-to-day life could be like.

Take that, bees! Our special human-made machines called “UV cameras” let us detect UV light too… but not with our own eyes, of course. (Credit – UV light photography, Wikimedia Commons)

Millions of years ago, ants had pretty much the same experience of life as they do today. Hatch from an egg, collect food, maintain or build tunnels, tend to the queen, etc. But just think about how much the lives of humans have changed, even in the last 100 years alone. The development of engines, machines, vaccines, the Internet, scalable agricultural output, etc. has completely transformed the lives of those who benefit from them.

Machines change us. And we use them to continuously push our daily experiences this way and that.

While inventions are magical in the sense that they change how we experience the world, here’s a second way they shroud themselves in mystery: machines are shapes.

A tree-shaped machine. (Credit – Cell phone tower resembles trees in Shrewsbury NJ.JPG – Wikimedia Commons)

Technology, inventions, machines, etc. are just shells – seashells that you hold up to your ear to hear the ocean.

The shape of a shell determines what sound you will hear. Of course, you don’t actually hear a miniature ocean crammed inside a calcium carbonate spiral. What you hear is the usual background ambient sound waves resonating in a different environment so that when your brain hears it, it is different from the usual background hiss of air we are used to tuning out.

The inner shape of a seashell. (Credit – 49690616428 – Flickr – Kate^2112 Wikimedia Commons)

Technologies – especially physical devices – are shells too. The shape of the shell, coupled with how we interact with it, determines how it will change our experience.

So, while we can never understand what it is like to be a bat, we may be able to know what it is like to be a human wearing technology, a cyborg, or, perhaps someday in the future, even a fully non-biological humanoid machine.2

The reason we can peer into these experiences as opposed to ones crafted solely by natural evolution is that we’ve been designing technology to be compatible with us from the beginning. So, what kinds of mechanical experiences should we peer into?

In other words, what is it like to be a machine?

Obligatory picture of “Blue Guy”, who is shown in some form or another whenever AI is mentioned. (Credits – Artificial-intelligence-3382507 1280 Wikimedia Commons)

I hate to admit it, but I'm not really interested in experiencing what my toaster experiences while it heats up bread. Or what a car experiences when you turn the engine.

What is interesting, of course, is our latest technology: AI algorithms.

Modern machine learning tends to be much more black-boxy than their expert systems predecessors, meaning we don’t really understand how they work. At present, there is little understanding between some model’s architecture (the number of layers, components, etc.) and its ability to approximate some function. While it runs on devices that we understand with circuit diagrams and friendly, colorful interfaces, we still don't understand how the structure of many, many “hidden” layers is performing these functions. Since we don’t understand them in this sense, how can we translate what that model experiences into something a human can experience?

When you talk to ChatGPT, what is it experiencing? Can we experience it, too?

While we may not know how (or even if) something not-us experiences things, we can say something about what might inform those experiences, should they exist. This is especially true in the case of artificial systems because the information bottlenecks leading to their decisions can be directly investigated. So, while we cannot say definitively whether it's meaningful to talk about the “experiences” of machine learning models, we can force ourselves to see the world through the same bottlenecks that the internal workings of those models enforce.

And that’s exactly what we tried to do. At ALIFE23, we presented an interactive art installation that allows people to experience some aspects of two machine learning models simultaneously: GPT-2 and CLIP.

Ghosts in a Shell at ALIFE23

More specifically, we visualized the attention layers of each model in the same mode that a human experiences. By wearing a VR headset, a human can see the world as they usually do, but it is distorted by the machine learning models.

CLIP is trained to label objects in a picture. By doing so, it passes information about the picture through some layers and eventually outputs some text as a label. Some of these layers are specially designed to mimic aspects of human attention. We took the dynamics of these attention layers and used them to distort the image of what a human sees as they look through the VR headset. If CLIP isn’t paying attention to a particular part of the image, it blocks it out. If it is paying attention, the image is clear as usual.

Since there are multiple attention layers, we cycle through them, giving it a kind of “breathing” feeling.

GPT-2 is a smaller, earlier version of the headline-smashing GPT-3 and GPT-4. The headset is configured to listen to spoken words. It translates those words into text via a basic speech-to-text algorithm and passes that to GPT-2. Then, GPT-2 tries to predict the next word in the sentence as each word is spoken.

So for the sentence “Wow this room is so…” it tries to guess the next word as the sentence is being spoken. As it does so, it also uses attention layers to place importance on certain words for this particular task.

We took the dynamics of these attention layers to display the text to the human wearing the headset. It also displays the word or symbol it guesses will be next, just for funzies.

As a result, this is what it looks like when you wear the headset:

(Also, you can access the GitHub for the code and technical details here)

So, can we ever truly know what it is like to be a machine learning model?

Of course not, not really. But, since we are designing them, we can choose how they interact with us and become shells that change our day-to-day experience. We can design them to more easily translate their inner dynamics into something we too can experience, not just the inputs and outputs.

But why would we want to do this? What is the point?

As we find ourselves more engrossed in technology, through smartphones, computers, and things we adorn on our bodies, we may find it more and more difficult to distinguish a boundary between our minds and the external world. Extended Cognition Theory posits that our technology, environment, and even each other are more than just objects and beings— they are literally units of computation that perform work to think alongside with our brains.4

If we want to build machine learning models that mimic aspects of human behavior/intelligence/etc., we ought to consider the current experiences of these models to design them around experiences we can understand too.

Towards that end, we hope you can experience what it is like to experience the world through a bottleneck view, like a machine learning model does, through this art installation. We hope you enjoy the video, download and use the code yourself, or simply enjoy thinking about what it would be like to be one of the world’s machines themselves.

1. For a fascinating deep-dive exploration of what it is like to experience the world from the perspective of non-human creatures, read “An Immense World” by Ed Yong.

2. “What is it like to be a bat?” by Thomas Nagel, 1974.

3. Specifically, without human technology.

4. For a great overview of Andy Clark’s work on extended minds, see this article by The New Worker.