next up previous contents
Next: Dorner and Hagen's Up: 2.8.1 Image-based approaches to Previous: Davis and Shah's

Starner's work with American Sign Language and Hidden Markov Models

A very new (February 1995) approach was suggested by Starner in his Master's Thesis [Sta95], and together with Pentland in another Tech Report [SP95].

First, they extended the HMM to be able to handle not just simple output symbols, but distributions of multiple variables, thus allowing them to use features extracted directly from the data, instead of having to do pre-process the data into a sequence of output symbols.

In terms of hardware, a colour camera was used and users wore a yellow glove on their right hand and an orange glove on their left. Five images are taken per second and fed into an SGI Indigo 2. A small selection of features are extracted from these images, such as the bounding ellipse and its eccentricity, the x and y positions for each hand and the axis of the bounding ellipse.

In this case, a raw correct rate was achieved of 91.3 per cent. By imposing a strict grammar on this, it was shown that accuracy rates in excess of 99 per cent were possible, with real-time performance. A selection of 40 signs were used and the simplifying assumption was made that signs have one grammatical classgif. The signs were selected to allow a large set of coherent sentences to be constructed. Furthermore, the grammar was strictly ``pronoun, verb, noun, adjective, pronoun'', with pronouns and adjectives possibly empty. It is suggested that the bigram and trigram techniques could be used (a well-known method), whereby the preceding sign(s) are used to consider what the probability of the current sign is.

The system makes no attempt to consider the movements of the fingers, however. This is a limitation in a number of ways, since for large-lexicon system, finger position becomes increasingly important. Furthermore, there is no way that the system can handle finger-spelling.

Still, the application of the use of Hidden Markov Models was shown to be very promising and it may be effective to try to use Hidden Markov Models, even on glove-based sign recognition systems, since the HMM's are not directly related to the use of video at all; they are used on attributes extracted from the motion. This means that the HMM technique would be particularly well-suited to migrating to large-lexicon Auslan recognitiongif. In fact, with very little modification, the features explored in this thesis can be used as the basis for the features used in the HMM.



next up previous contents
Next: Dorner and Hagen's Up: 2.8.1 Image-based approaches to Previous: Davis and Shah's



waleed@cse.unsw.edu.au