As we make advances in machine learning and robotics, the day of non-human companions comes closer. Whether they be cybernetic buddies, android companions, positronic au pairs, robotic service agents, or automatic nursemaids it seems inevitable we will, one day in the relatively near future, be interacting with software agents created specifically for autonomous bodies, or robots. Not simple AIBOs upgraded at the Robot App Store, but full fledged MIT Atlas frames with a Softbank’s Pepper face and Apple’s Siri connectivity. Something made to replace a human being.
While the technical challenges to building a human replacement are many, great strides are being made. Robotics in one form or another has been steadily replacing human workforce in factories since the industrial revolution. It’s the move into the white collar job world and the home that we’re witnessing today. Office automation has been ongoing for the last fifty years, but only recently are we beginning to see the merger of hardware and software into the familiar robotic form from science fiction.
Currently we have an army of Siris, Alexas and Google Assistants in our hands and on our countertops. Soon, social droids like Pepper and Jibo are going to start popping up, though exactly how ready for market they are is questionable. Whether it’s two years away or ten years away is the only pertinent part of that question. Like the rest of the future, for better or for worse, it’s coming. Ready or not.
“Creating interactions with the robot and using those interactions to build a connection is the interface through which humans and robots will not just get along, but learn to thrive together.”
So, while everyone else is doing the difficult work of coding the neural networks, let’s sketch up some human interface guidelines. For instance, what makes Baxter different than most industrial automatons is that it’s specifically built to be interacted with by humans and trained by humans. You can program Baxter by taking ahold of its arms and moving them to direct it to perform a function. That’s a human/robot interface.
The future can go one of two ways. Either we create intelligences that we can relate to or ones we can’t. The question isn’t whether artificial intelligences will grow smarter and more powerful than humans, that’s an inevitability. What is in our control, however, is how these intelligences will be designed to interact with humans. Like all new parents we are faced with a choice of letting the tablet babysit our progeny or will we put in the difficult work of raising our children to be good citizens of the universe.
The Pie Hole Rule: Another name for this would be the Uncanny Valley rule. Maybe some day we’ll make a robot that looks perfectly like a human being, a la Ex Machina. But the uncanny valley is deep and wide and until the day we cross it, we’re going to need to adjust for its effects. Computer generated voices still seem quite crude considering how long they’ve been a part of our lives. There’s an uncanny valley for voice as well, can you hear it?
That’s okay, though. We can adjust for it in the same way that we adjust for visual uncanny valley, by “idealizing” the sounds and images. Pixar has practically pioneered this by taking digital visualization and cartoon-ifying it.
The robots in the movie Interstellar have ultra-realistic human voices (and, indeed, are voiced by humans) but didn’t look anything at all like humans, other than being bipedal. Because the human voice isn’t coming from something recognizably human, this leads to confusion. Humans learn to look for sounds coming from flapping lips and then we identify the sound with a face. So, even if I can’t see Matthew McConaughey’s lips moving, I have a connection with his face when I hear his voice. When the robots in the movie speak, there doesn’t appear to be any attempt to have them mimic human speech movements, this led to me wondering where the voices were coming from.
“Even if robots can’t currently feel, they can take part in a transaction that includes emotions on the human side of that transaction.”
Human actor’s provide the voices for computer generated cartoons, but those characters are highly emotive. It’s this emoting that projects a voice into our imaginations, connecting the moving image with the accompanying sounds. Our mind completes the scene by melding the two into a single idea, the same way we do when we see and hear another human talk.
If we focus less on making artificially generated voices sound like perfect human voices and instead focus on the expression of emotion, they’ll be better accepted. WALL-E expressed more emotion with fewer words than Siri ever will.
The “OK Google, That’s Creepy” Rule: Your phone already knows more about you than your best friend does. Ubiquitous internet tracking by multiple actors and the digital marketplace for this information assure that there is a database of facts about each of us that will be readily available to an intelligent software agent. This information must never be used when interacting with a human being. Just don’t. It’s incredibly creepy.
While I’m sure it’s tempting to have your robot announce, “Hi, <your name here>.” upon completion of power on self test, it’s better to have the robot start with, “Hello, what’s your name?” Yes, you can do a quick reverse image search on Facebook and come up with a name with 99.95% certainty, but that’s not the point. The point is creating a sense of interactivity and — through that interaction — trust, between the robot and the user.
And that’s the key here. Creating interactions with the robot and using those interactions to build a connection is the interface through which humans and robots will not just get along, but learn to thrive together.
The Multiple Personalities Exception: When creating a personality on the fly for a robot that doesn’t have a fixed personality out of the box, it’s okay (and possibly even preferable) to use information from the online dossier of the subject to create a compatible personality shell. If you have a database that says I have clinical depression, it’s totally cool to create a digital companion who comes loaded with tools to deal with that, but let the transaction occur between human and robot that surfaces that need as naturally as possible. Doing otherwise just leads to mistrust and a sense that the connection isn’t real. Then proceed according to the “OK Google, That’s Creepy” Rule.
Can companionship between robot and human ever be real? Of course it can. There isn’t a ghost in the machine that creates emotion. Anyone who has ever had an unrequited love knows that one person can have an emotion that isn’t shared by another and that emotion is as real as any other. Even if robots can’t currently feel, they can take part in a transaction that includes emotions on the human side of that transaction. When they can feel, and there’s no reason not to believe that some day they will, they can begin to join in on the transactions already taking place, if they so choose.
Let’s go back to Jibo for a second, because in doing research for this post I came across the linked article describing why it’s three years late: because of user interface problems. We just haven’t built a usable interface for digital assistants yet. While it seems simple to suggest that we have, it’s called “voice”, that’s like saying that the keyboard is the user interface for Microsoft Word: It isn’t, it’s the input device.
A product like Jibo is absolutely going in the right direction. The article talks about how users are “reporting a ‘strong emotional connection'” to the device, and that’s a triumph, for sure. It’s steps toward a goal. But “users had trouble discovering what Jibo could do,” and that’s because designers didn’t ask themselves how they would go about finding the same information from another person, and what assumptions they would be making about the person they were asking.
Let’s say I’ve just unpack my brand new Jibo and plug it in. Assume I’m not going to read the manual, in fact, don’t even include one. If Jibo can’t get me interested in an interaction from the moment it’s plugged it, it’s lost me forever. It will never be useful to me.
Once I am interacting, however, you’ve created a user interface. Once I’m interacting, I can be engaged. Once I’m engaged, I can be instructed. And that’s the key to a good user interface: it uses icons (idioms, ideas) already present in my mind as signposts on paths leading to new places to explore. Once these idioms are identified and connected to features of the product, a software entity can use these same idioms to create a connection with me.
Once upon a time, Apple was the leader in human interface design and while Apple spent decades being mocked for every reason under the sun, they were always respected for their design insights. There are many examples of good ideas in AI user experience design, such as Baxter and Jibo, but the executions are incomplete. We’re living in the days of Xerox PARC, watching the first mice control the first graphical user interfaces. Flawed, imperfect, but showing us there is a better way.
Waiting for a visionary to bring it to market in a way the inspires a revolution.