Philip Haine's articles on Product Vision, Innovation and Design

Using gestures and voice for access to key tasks on a mobile device

How might the iPhone afford direct access to key apps and tasks without defiling its exterior with another dastardly button?

My rant from a few months ago about the inefficiency of the iPhone calendar application continues to strike a chord.

One of the things I criticized was how many steps it takes just to navigate to the calendar in order to check, tweak, or add an appointment.  On the iPhone it ranges from 3 to 7 steps, with some of those being heavyweight steps that pull eyes and your attention of other things.  On the ancient PalmPilot and its newer descendants, it is one button press.  Extremely frequent task was rightfully given top-tier treatment, with a physical button on the device.

But Apple isn’t really into buttons. (Nor are they into acknowledging that the iPhone is really more of a PDA than it is a phone.)

Can we have our cake and eat it too?  Can we have direct access to key tasks while also accommodating Apple’s pathological aversion to real buttons?  Buttons that you can actually find without looking at the device, which are always available, regardless of the mode you are in, and which have the gratifying haptic feedback of… clicking?

Here’s one way: from the iPhone’s “slide to unlock” screen (or even from standby mode) let the user jump directly to an app by drawing a gesture.  C for calendar, M for mail, F for facebook.  It would be configurable.

Gestures could go deeper than just launching apps and get you to most used tasks.  Draw an A to create a new appointment.  Draw a T to go to today.  Each apps could publish its candidates for direct-access tasks, and the user could assign them to gestures.

Here’s an even better way to give immediate access to key tasks without buttons: make voice recognition the main way to get to most frequent tasks.  Press a physical “listen to me” button and say, “Go to today” or “new appointment for next Thursday at 5:30 pm” or “Call Leslie” or “new contact” or “Address book find Edwin” or “Facebook” or “Yelp nearby sushi” or “Montreal weather” or “Apple stock price”.  These were scenarios I painted several years ago.  Now they are starting to take shape at Google and with iPhone add-ons like Say Who (which actually works well) and Say Where (which doesn’t work as well yet).

A good implementation of voice command would suddenly make all that iPhone goodness a heck of a lot more efficient.  It could be a key part of an iPhone-neutralizing device.

Posted by Philip Haine on Friday, November 14th, 2008 at 1:50 pm.
See similar articles in: Commentary, Designs to Steal.

4 Responses to “Using gestures and voice for access to key tasks on a mobile device”

  1. Anecdotes of a Dog Part 4 | Anecdotes of a Dog wrote on November 15th, 2008 at 1:37 pm :

    [...] Steal This Idea » Using gestures and voice for access to key tasks … [...]

  2. Dave Cortright wrote on September 29th, 2009 at 9:20 pm :

    the iPhone has Voice Control now in version 3. It’s not 100% accurate, but it does OK.

  3. Ryan wrote on March 26th, 2010 at 12:04 pm :

    iPhones are for consumers, Blackberries are for content creators…
    Well, at least as far as scheduling and email go, in the real-world, this tends to be one of the defining factors that determine what smart phone one gets to improve their mobile productivity.

    So far voice control for things pretty much stinks. Try the new voice-operated car controls, or speak to type software… the same problem always occurs: the computer got it wrong and now it takes 5 steps to retroactively fix the problem that would’ve been easier to type myself anyway.

    And what about privacy?
    I ride the bus to work, not only do I not want to announce my latest app launch, but I definitely don’t want to hear everyone else speaking their commands. Plus, in crowded situations like this, there’s all sorts of input problems as far as who the device listens to.

    For private scenarios (during a phone call for exmaple) voice operations like transcribing to text is fairly easy and reliable, and actually useful, yet even those a laden with errors.

    Lastly, using voice tends to require a similar response from the device (talking spawns conversation, so this is natural) but take any of our lovable characters from TV and the movies, I don’t think anyone really wants to have a true conversation with their car. Plus we’re all going to look silly doing it.

    Voice just isn’t there yet, and may never be. – my two cents.

  4. Philip Haine wrote on March 26th, 2010 at 12:32 pm :

    Ryan, thank you for comments and I appreciate your perspective.

    I think you are projecting the limitations of current incarnations of speech into the indefinite future and missing the potential.

    Does speech work?
    - I use MacSpeech Dictate for all my serious writing, and it works wonderfully. It’s really come of age in the last couple of years.
    - I use the Google app on my iPhone to do search. It surprises me at how well it works, given that I didn’t have to train it. Sure beats typing, and even when it’s wrong, I have less work to do to correct it than to type it in.
    - I also use the Reqall app to email myself ideas all the time when the inspiration strikes. Without it I would simply not capture the idea and would forget many of them.
    - Finally, messages left by people to my Google Voice phone number are transcribed pretty well, considering the low quality of the mic and that callers aren’t even attempting to enunciate for a machine.

    So voice works quite well today.

    And pulling off what I describe is an easier job than what Dictate/Dragon and Google do. The job of recognizing speech is even easier the smaller the vocabulary it has to recognize. With what I am talking about there are probably fewer than 100 patterns that the device would have to recognize.

    As for privacy, and annoying others and looking foolish, that’s up to the user. I personally loathe hearing the half conversation of people speaking to thin air into their Bluetooth earbuds like crazy people, but they do it. Using the device discreetly is always an option.

    Talking to a device does not necessitate it talking back. As you point out, that has obnoxious downsides. There is nothing wrong with talking to a device and having it “talk back” visually.

    Errors will always occur; the question is what is the overall effectiveness of using the device? I can tell you that an iPhone it’s much more difficult to create a “new appointment with Carol next Wednesday at 2:30 for 90 minutes” than it would be if those words were spoken.

    Personally I think this future is an inevitability and more of a revolution than multitouch and gestures that are getting the hoopla today. And I think my claim, that may sounds contentious now, will seem obvious in retrospect. It just takes one really good incarnation to prove the idea.

    When you can do your most common tasks faster, and without having to make sure you are in the right mode, and without having to devote your gaze to the machine, I think you will agree.

    Philip

Leave a Reply