Wither the pen, I think the keyboard is far mightier


A friend (John Welch) and I were talking about pen centric computing, and we were in fierce agreement that it is not the revolution that some people pretend. Coincidentally, or because of the renewed interest, I saw an excellent speech on the state of the art in pen computing; what Microsoft and other researchers around the country are doing with the technology and where they expect it to go.

What amazed me most was not the progress we’ve made, but the lack thereof. It was good stuff, showing how we break down pen computing into areas like:

  • 1) digital ink (pure unrecognized blobs)

 

  • 2) drawing / illustration

 

  • 3) recognized text (cursive and printing, trained and untrained)

 

  • 4) gestures (macros)

 

  • 5) contextual recognition

But I saw a similar demo by ATG (Apple’s Advanced Technology Group) in the early 90’s. I’ve seen elements of this done with mice or other input devices for another decade or more, some demos go back to the 60’s.

1) Digital ink has worked forever. You just record mouse movements and can redraw them. If you want to get fancy, you remember pressure to change darkness or width as well. Basically this is data-entry at its simplest. So simple that it has little value. What do you do with it? You captured information, but in a form that has little value. You can print it. But if you want to do anything you need to recognize it; that means either having the computer translate it, or giving it to a human to do the data entry for you.

This isn’t completely useless. Sometimes it is easier to use a tablet (pen) for entry. Basically, if you are standing/walking, and need to jot something down, it is very fast/easy to write. Blat. Done. Information captured in the least useful of ways. However, if you have a modicum of training, and the ability to sit, and some physical space on the device, then a keyboard is about 10 times more efficient at entering that data (speed, accuracy, etc.). So if you’re the UPS man, or a stockboy, standing around in a keyboard hostile environment, then pen computing is great. If you’re running in the hall, and happen to be lugging your computer with you, and someone gives you a nugget of info, and you can jot something down, then heck, that’s great too. If you have a small device with you where the keyboard would increase the size or be annoyingly hard to use in microscopic scale (say an iPod, Palm Pilot, phone, etc.), and you’re entering very small amounts of data (a name and phone number), then pen computing is kinda cool. Other than that, I’m looking for the big gain – and not finding it.

So what’s the big payoff? You make it a lot more useful if you give me the function on a device like an iPod where the alternative interface (due to size constraints) is far worse – like I want to click-wheel through the alphabet to enter something. For quick browsing or pointing at objects while standing around, it isn’t bad. For data entry, digital ink, isn’t really very useful for the computer. Digital ink isn’t going to replace a keyboard for email, writing, or most of the input of real data, day to day. So you can make a computer 3% more useful to me for those times when I want to quickly scribble a mini-note or sketch and don’t have space to do it right. Other than that, I’m using more traditional means to enter my data.

2) Drawing is fun. There are certainly cases where drawing on a color tablet/display is great, and seeing the actual effects in real time. It feels like painting, sketching or other mediums. The metaphor is easy. But is it really better? Not as much as you might think.

Use a paintbrush or pencil in real life, and the brush often obstructs your detail. You can learn to get around this by experience (or by lifting), but it isn’t a great concept to have the object doing the marking obstructing what you’re marking. (Annoying physics problems). On the other hand, with a modicum of training, you can use a mouse/tablet on the desk, while the effect is being displayed on the screen, and there is no brush or arm obstructing what you are doing. Computer artists can have more control (and undo) than other mediums for reasons. The limits of physical objects may not apply.

In the west, we write right to left and top to bottom just so that our arm/hand doesn’t obstruct what we just wrote, assuming you’re right handed. Most lefties often do this weird curl thing when they write for the same reason. But when you allow some space between the drawing surface/object, and the display (as in mouse-screen, keyboard-screen, tablet-screen, etc.) it helps defeat the laws of physics and the obstruction, and allow your eyes and hands to work on the same problem at the same time, without getting in each others way. Yes, it takes a little more training because we’re used to the other, but once over that curve your efficiency goes up.

Also, ergonomics this disassociation is better as well. If you’ve ever tried to operate a touch-screen, or designed interfaces for them, you know their limitations. Besides getting the screen dirty while using, if the screen is the right position for viewing, then it sucks for usage. Your arm gets fatigued holding it up to press menus or write on your display. Or, you can put a display+touch screen integrated and use it like a clipboard. Have you ever used a clipboard a lot? It is great for writing and arm fatigue, but sucks in that you’re always looking down at it, and strains your neck. Whether you work with an easel layout and let your arm get tired, or lay it on a desk/drawing surface and let your neck get tired, both are less efficient (for fatigue) than having a low surface for your arms like keyboard, mouse, drawing tablet – and a high surface for your eyes like a display.

Once you start thinking of good interface and pen computing (a bit of an oxymoron), you have to start making modifications to the interface. Like we westerners think top-left, bottom right. With a mouse, this works fine; menus at the top, and they drop down with nothing but a small cursor to obstruct us. If you’re designing a touch interface you want to work the other way. Menu’s on the bottom (easier to reach), and they pop up, so that your finger/arm that activated them isn’t obstructing the choices. By changing the input, the current WIMP interface/metaphor breaks. Or for once, Microsoft putting the start menu at the bottom-left actually makes some sense. (Actually, bottom right would be better, but I wouldn’t want to confuse Microsofties by blowing their minds with well-reasoned HCI / Human-Computer Interface discussions).

So pen style computing has a certain reward for some types of work. If you’re lazy (not in a bad way, we’re all lazy), or not used to computers for input, or only working for very short periods of time (small sections of entry), then pen/touch computing has some value. If you’re doing the opposite, well then it just isn’t as efficient as the metaphors and solutions we’ve already created, which is one of the reasons it hasn’t had much adoption.

3) Recognizing text and shapes. There are multiple forms of writing, handwriting, printing, multiple systems, shorthand, abbreviations, and so on. Human brains are amazing in the way they can recognize and adapt context and partial information. Computers not so much. If I write (714) 853-1212 you think “the phone number for getting time in the O.C.”. Pen recognition has to figure was that 714 with part of a circle around it, and should I complete the circle to help him? That 1 looks like an L, and the 8 looks like a B, and are those 2’s or Z’s? Then it has to try to put it altogether and recognize the pattern as a phone number sequence, and so on. It is ugly, and requires a lot of computing power, and doesn’t work well. Throw in difference between my writing and my Doctors, and it is even a harder problem. Computers need a context, because there’s just too much to known (too many variables, and they aren’t good at making the leaps).

This has resulted in the dilemmas we have with pen computing. Draw something; is that the number 0, the letter O, or just a circle? If you tell the computer first, by entering in a number area, or giving it a hint, the computer’s like “Ohhhh, I know what he wants”. If you don’t, it scratches it’s anthropomorphic head, and grinds it’s microscopic gears to figure it out. And guesses wrong quite often.

We’ve come a long way in pen computing. If we tell a computer what we’re doing, they are getting mediocre at recognizing printing or cursive. If we use a special shorthand, or spend a while training the computer on how we write (as an individual), it leaps from mediocre to acceptable or even good. And if we throw in context that we’ll write certain things in certain places, and give the computer tips on what we’re asking, the accuracy leaps from good to very good. (It can do work analysis, and so on). That applies whether using pens or not.

The other side is the pen hasn’t done anything but give me more states (modes) and make it more confusing for the computer. If I type with the keyboard and draw with the mouse, the computer knows when it sees the mouse that it is a drawing. (Discrete devices to do different things, is a free hint as to context). The more I overload a single device (the pen) and use it for entry of numbers, letters, and drawing and gestures, etc., the harder the problem becomes, and the more errors the computer makes.

So for pen computing the idea of having a stylus to draw can be nice; easier to manipulate than a mouse. If you’re abstracted (the mouse/stylus and display are disassociated), it makes it easier for you do to things like scale – I draw at twice the size it shows up on the screen, or half, and so on. There’s power in that disassociation, once trained. The idea of drawing on the display means a 1:1 association that is easier in some ways, and far more limiting in others. In the end, there are a lot of tradeoffs with overloading the stylus to do too many things. Eventually computing will be able to handle all this at once, but we aren’t there yet.

4) Gestures and Macros are cool. We use them a lot already. Double-clicking versus single clicking has a different effect, that’s a gesture. Click and drag to pull down a menu is a gesture. Pen computing has adopted many more gestures. Part of this is the bias of their interface; you’re probably on a smaller device so menu’s to pull down a “delete” function isn’t as easy as just scribbling over the object to make it go away, and you don’t have a keyboard to give a command like cut/copy or paste, so you make gestures that mean the same thing. But is this an improvement? It requires more training, and that usually increases mistakes.

Apple did some research at gestures in the 80’s and early 90’s, just using a mouse, and some things popped out at them. You want easy gestures to do frequent things. So selecting an object and throwing it (flicking your wrist and letting go) would launch the object towards the target. So you could move a file to a drive from across the desktop with a flick of the wrist. But then again, if your aim was off, you could hit the other drive and issue a copy, or worse, hit the trashcan instead. Hmmm… destructive behaviors shouldn’t be too easy. Mouse movements are sometimes shaky, and gestures can be easy to misinterpret, or mean many things. And the training is very contextual to the culture and individual. (A circle made with your thumb and index finger may mean OK in America, but in some countries it is the symbol for asshole).

Beyond a few dozen gestures and they start becoming hard to remember and too similar to differentiate. So they are good for frequently used shortcuts. But they need to vary by context/application, and are quickly overwhelmed. Apple gave up (mostly). Pen computing needs to rely on that for too much. I see the solution as more a hybrid that can use a pen for some things, but that it is unlikely to be the only interface that most people use. (An augmentation to computer interfaces, not a replacement).

5) Contextual recognition. Well, I already went over most of this. Context is hard. An ‘e’ in handwriting is almost the same loop you make to say delete in document editing. A circle can be a circle, a number, a letter, or a part of something else (like an 8, b, etc.).

The idea of the Mac was making a computer that was less mode-centric (modal), and making it more modeless. You could copy things between modes (applications) by drag and drop, the same menu’s were common across applications (at least the first few were). A window in one app looked a lot like all the others, and so on. Breaking down the barriers from each application being it’s own universe (mode).

Pen computing is almost the opposite. They are so stressed with overloading the same gesture to mean many things depending on context, that they have to force us back into modes to help the computer decide what we’re trying to do. Pull up an equation editor and all the gestures are used for that. Pull up a 3D drawing too, and the same gestures become different behaviors for that, and so on. Normal interfaces do this too, but if the pen is the only for of input it increases this and/or increases the error rates. In a few more generations (or more) of computers, we might have the computing power to throw at the problem. But so far, this is a very complex problem to crack. We’re getting better, but the error rates are still incredibly high, as is the training costs.

Conclusion

Pen computing types often talk about things like speech helping to augment pen interfaces. I agree. But then again, they augment other computer interfaces like mouse as well. Pen interfaces NEED to augment themselves more than other interfaces because of their limitations. So sure, I’d love to have continuous speech recognition and a pen interface… almost as much as I’d like continuous speech recognition and a traditional WIMP interface. The speech helps the pen, but it helps other things too. And speech is too limited to work without something to augment it (like mouse or stylus), so isn’t a panacea on it’s own.

So I do hope pen-computing keeps progressing. I’d love to be able to take notes on my iPod, and do speech recognition with it to dictate notes. I would very infrequently use those behaviors on my laptop or desktop, but it would add some value. So pen computing is what it is, a nice enhancement or improvement for man-machine interface, especially with discrete devices like an iPod, Palm, or UPS / Inventory tracking tablet – but for now (and quite a while), I don’t see it as a huge revolution that will change the way we interact with computers. Pen computing is more a driver of combining multiple interface elements to make up for its limitations. That’s good, not just for pen computing, but all man-machine interfaces.

http://graphics.cs.brown.edu/research/pcc/research.html

 

Leave a Reply