
In
Part 1 of this article I explained how magic can be used as a metaphor for designing NUI applications. In this article I'll hang a little meat on those bones by providing examples of how in-the-air gesture systems and speech can be combined to create even richer interactions that seem nothing short of spell casting as described in modern fantasy fiction.
Spells in popular fantasy fiction involve the use of magic words and gestures. A mage, wizard or witch waves her hands around and mutters words in Latin and things happen. How different is this from the use of speech or gesture recognition to control appliances, computer programs, mobile phones?
The use of gestural interfaces are now coming into vogue and the extension of these interfaces into our cars, work, and home will eventually become common place. As an example just look at this G-Speak
video by the same person who designed the gestural interface for Minority Report. The video is not fiction its real. Another example is the recently famous 6th-Sense
demonstration at TED 2009 - don't the gestures used by postdoctoral student Pranav Mistry look like incantations from some fantasy fiction story?
These types of in-the-air gesture interfaces are likely to be in millions of homes within the next 5 years as video game makers, impressed with the success of the Nintendo Wii, adopt gesture recognition technology. Just this week Microsoft announced a new research project for the XBox 360 called Natal (see this
video). And in-the-air gestures will not be limited to a controlled environments such as your home or office. As a matter of fact we've been using gestures to dispense paper towels and turn on the water in public restrooms for over a decade. Some gestures will be specific to personal space and other will be generalized and available to anyone in public settings.
Now lets take a look at Speech Recognition, something that's been under research for decades and is just now starting to see some practical applications. A mobile phone I had a few years back could dial any number in my address book without having to train it to my voice. All I had to do is say "Dial, Mom" and it would call my Mother. It was extremely accurate. Today voice activated dialing is in many mobile phones. Speech recognition will also be applied to home automation. Check out this
video of the One Voice home entertainment system. You can even purchase voice control software for your Windows or uses the built in voice control in your Mac to launch programs and navigate file folders.
Admittedly voice control is pretty crude as are in-the-air gestures. With voice you usually give a verb followed by a noun (e.g. "Start, Microsoft Word"). Gestures are also crude with people having to extend their arms out in front of them - the problem of "Gorilla Arms" is often mentioned as a draw back to in-the-air gesture systems. In both cases you also have to be careful of what I call "accidental activation", where the gesture or speech recognition systems pick up commands when you don't want them too.
If, however, we can combine in-the-air gesture systems with voice we bring a much richer experience to the user. For example, imagine being able to point at a lamp or your TV and simply say "On". In this case the gesture replaced the noun by identifying the object to which the command (verb) applies. That's a simple example, but the possible combinations of speech with gestures are pretty impressive.
You can also avoid "accidental activation" by using made up words or words in a foreign language (i.e. Russian), something that dog trainers have been doing for years. Now imagine using the Latin word
Incedo ("to awake") in combination with pointing at your lamp or TV. Now its really starting to look like magic, isn't it? Want all the lights on, spread your arms out and say
Lumen. It's like you've been educated at Hogwarts.
What's interesting is when you start to assign speech and gesture commands to small individual programs and then link those programs together in macros. Have you ever seen a really good Unix guy hook together three our four Unix utilities using a
pipeline? Pipes, in Unix, is where the output of one utility becomes the input to another. It's pretty amazing the things that can be accomplished using the Unix pipleline. Unix shell commands and utilities are even more powerful with a scripting language of some sort which adds branch logic and looping.
Now imagine having a speech and gesture scripting language and thousands of small commands to choose from which you can stitch together into macros or programs. You can think of these macros or programs as
spells. For example, lets say you want to turn on the TV, watch LOST while recording Amazing Race. You do this every Thursday night. You might point at the TV while rotating your finger in a circle (a gesture for record) and speak command words in Latin.
Incedo fenestra;
utor LOST; {Rotate finger} Amazing Race.
{Turn on the TV, find LOST, record Amazing Race}
Looks like a spell to me. Even if you don't use Latin, which I use just to make it a bit for fun, its like spell isn't it?
To take this a step further imagine if you could record long spells and assign them power words. For example, lets say you want to check your stove, lock your windows and doors, turn down the heat, and switch on the bedroom TV every night at bedtime. This could be accomplished with a long spell
cuspis occludo; domus
obfirmo;
cubitum ire fenestra Incedo{turn off the stove, lock up the house, and turn on the bedroom TV}
Or a single power word:
recedo {retire}
In this case
recedo becomes a power word that activates a chain of commands. You could make up spells and power words on the fly or, even better, get them off the Internet or from a book. Imagine a book titled "Spells and Power Words for the Home" or something like that. Now you not only have spells, you also have books of Magic; instruction manuals that contains lots of NUI spells that you can learn to recite or record with associated power words.
As I said you don't have to use Latin any more than a dog trainer has to use Russian, but it does avoid "accidental activation" and it makes interacting with systems a lot more fun. That said, the language used is immaterial; its the combination of speech and in-the-air gestures along with the possibility of creating more complex macros (i.e. spells) that can make NUI so much like Magic is really quite exciting.
So do you have to do it this way? Of course not, but one of the fundamental tenets of NUI is that the interaction should be enjoyable. Designing NUI to closely resemble Magic is one way to accomplish that. Magic as a design metaphor also provides us with a conceptual framework around which we can develop a full ecosystem of NUI controlled devices.
In the next installment of this series I'll talk about how display technology and multitouch can be modeled around magical artifacts.
Series Links
- Magic as a Metaphor for NUI Design: Part 1
- Magic as a Metaphor for NUI Design: Part 2, Casting Spells
- Magic as a Metaphor for NUI Design: Part 3, Enchanted Artifacts
Disclaimer1. This article is subject to editing and will probably change over time.
2. My education in Latin is based on Googling up words while writing this story.