Skip to content

Sign – Gesture Recognition

July 23, 2010

One of the difficulties in developing a speed dial application using gestures is that you introduce a huge amount of user input into the equation. That is always difficult because of the extremely wide variety of individuals who will be using the phone, the differences in how they interact with the phone, and also the method by which they input gesture-based information into the phone.

Google included its Gesture API into Android starting with Version 1.6. Based on that Gesture API, Google introduced Gesture Search to the Android Market earlier this year. Gesture Search allows the user to search their phone using Gestures. Simply draw a letter or number and the phone begins to search for that letter or number in the phones contact lists, applications, and data. It’s an extremely intuitive search methodology, and it works well for finding information on your phone.

To create this app, Google likely had to create an entire library of letters and numbers which people would potentially input. This would include 52 (a-z, A-Z) letters and 10 (0-9) numbers. This is only 36 potential inputs; however, it is likely that Google included a large variety of inputs for each number and letter in its library to account for the variety of ways in which people would potentially input the information. Although Google has done a good job of this, there are still times when it recognizes the wrong letter or number. In a search feature, this is not a big deal. You simply swipe your finger backwards across the screen (another pre-defined gesture) to erase the last input and try again. All recognizable gestures are pre-defined.

Although Sign incorporates elements of Google’s Gesture API, the functionality is somewhat different. First, in Sign, the user is responsible for creating the gestures that will later be replicated in order to make a call or send a text. Allowing the user to define the gestures to be identified has benefits and problems. First, it allows a level of customization to Sign which makes the entire process more interactive for the user. Instead of your sweetheart being a “1” or “A” Sign allows the user to make a custom “Sign,” for instance a heart, as shown in our YouTube video.

While this is cool and allows the user to customize the experience, it adds some additional elements of uncertainty to the process. Unlike Google, we can’t use a library of pre-defined gestures and put tons of variations in the library to make sure it accurately assesses what gesture the user was trying to input because the Sign doesn’t exist until the user creates it. So, Sign is left to compare any future user input against that single point of reference. Beyond that, we found that Google’s Gesture Search, while mostly accurate, still pulled up incorrect letters, which is something we can’t afford to happen on a regular, or even irregular, basis because Sign is a communication tool and any mis-identification can lead to inadvertent calls. Avoiding mis-dials is one of our primary concerns and is what actually led to Sign being developed in the first place.

Direct dial icons, while extremely fast, are very prone to mis-dials because of inadvertently hitting the icon when the phone isn’t locked. To avoid that, Simply Applied created Sign, which requires a quick tap to activate the recognition engine so that the user can input their custom Sign for whichever contact they wish to call or text. However, with gestures, you add in the element that the gestures themselves are not going to be repeated exactly as they were originally input by the user. It’s just a fact that the user is going to make the Sign a little bit differently each time. This creates a challenge because again, we don’t have a library of Signs that we can compare the Sign to. We only have the one, previously created gesture input by the user. So, we developed our own recognition system designed to reduce the potential of the gesture that is recognized being incorrect.

Sign’s Improved Gesture Recognition Engine
As discussed above, anytime you allow significant user input, you introduce a large amount of uncertainty into the process. To combat this, we developed a system that uses a variety of factors to ensure that the gesture that is input by the user to initiate a call or text is correctly identified against the library of Signs the user has created as speed dial contacts.

The first element that is considered is the number of strokes. Sign has been developed to allow multi-stroke gestures with up to 8 strokes. Theoretically this could be unlimited, but as a speed dial option, it is prohibitive to have more than a few strokes. However, many people want to be able to input people’s initials as that contact’s “Sign.” So we decided that based on the variety of ways that people could potentially draw letters, with 4 pretty much being the maximum for an E, W or M. With two initials per person, we felt that 8 strokes was sufficient for almost every user. So one of the important factors in the recognition is the number of strokes the user inputs. The following illustration shows how this can can be used to help improve the recognition.

Number of Strokes

Secondly, Sign evaluates the order in which multi-stroke gestures are made. For instance, if you draw a T, using 2 strokes, Sign would evaluate the order in which you make the strokes, e.g., whether you make the vertical line first or the horizontal line first would be evaluated and considered in the recognition. The following illustrates how this can potentially impact the recognition and make it more precise.

Order of Strokes

Third, Sign evaluates the direction the strokes that are made. For instance, a right to left swipe is different that a left to write swipe. Top to bottom is different the bottom to top. This is illustrated before.

Direction of Strokes

Each of these elements is evaluated within each stroke and within each entire Sign. By evaluating each of these elements independently, while also evaluating the gesture as a whole, Sign is a much more accurate and significantly reduces the potential that a Sign is mis-recognized. By evaluating this many factors, Sign is increasingly more flexible and can allow the user to make relatively simple gestures in large variety of ways which will be evaluated as completely separate gestures. This is indicated below.

V - 10 Ways

However, evaluating this many factors also introduces increased chances that if the user input is entered in a different manner than when it was assigned, or if the stroke is outside of the threshold for recognition, then Sign will return a “No Match.” We have attempted to optimize this as much as possible so that the majority of users are successful a very high percentage of times they use Sign. Obviously, with any gesture-based system, there is a significant margin for error. However, we have attempted to find a good balance with the recognition so that if there is a difference in the assigned Sign for a specific contact, and the user input Sign is still able to recognize the proper contact, and if it does not correctly find a match, then it is more likely to say “No Matches” than it is to dial the wrong contact. It is much better for Sign to not dial the wrong person than it is to require the user to re-input the Sign.

As an added safeguard against potential mis-dials, we implemented two additional features. First, the user can select a short delay prior to Sign opening the phone’s dialer. This provides the user to view the person being called and the option to either continue with the call or to cancel. The user is not required to make any selection as Sign will proceed with initiating the call immediately upon the end of the user-defined delay.

The second safeguard against potential mis-dials is that if Sign identifies more than one potential match to the user-inputted sign, it will ask the user to clarify which contact they meant to contact. This does require specific user input to continue with the call; however, it is preferable to Sign making a incorrect selection and dialing the wrong individual.

Simply Applied’s entire gesture recognition system uses a variety of factors to evaluate each stroke and each Sign. The purpose is to create a consistent, reliable system for gesture recognition so that the user can use Sign with confidence that it will properly identify the correct contact and take the appropriate action. We are interested in hearing from the users whether they are having success with the system correctly identifying the contacts they intend to call or text.


From → Uncategorized

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: