1 Introduction
Recognition of signs has been the focus of several research areas such as human-computer interaction, virtual reality, tele-manipulation and images processing. Another area of application is sign language interpretation [1]. Among the types of gestures, sign language is one of the most structured; usually each gesture is associated to a predefined meaning. However, the application of strong context rules and grammar makes sign language more difficult to recognize [22]. According to sensing technology used to capture gestures, there are two main approaches for sign recognition. One based on vision techniques [16], in which hand movement is followed and the corresponding sign is interpreted [23, 18] and another based on gloves [15], with sensors that capture the movement and rotation of hands and fingers [9]. Other methods include Leap Motion [11], or Kinect sensors [17].
Corresponding to the approach based on vision; in [20] a method to convert the Indian Sign Language (ISL), hand gestures into appropriate text message is presented. The hand gestures are captured through a webcam and the corresponding frames are segmented considering features such as number of fingers and the angle between them. Trigueiros et al. [25], used vision based technique for recognition of Portuguese language. For their implementation, hand gesture was captured in real time.
SVM algorithm is used for classification purpose. In this system vowels are recognized with accuracy of 99.4% and consonants are recognized with 99.6% accuracy. In [3], a real-time method for hand gesture recognition is presented. The hand region is extracted from the background, then the palm and fingers are segmented to detect and recognize the fingers. A rule classifier is applied to predict the labels of hand gestures. Computer vision based techniques have the potential to provide more natural and non-contact solutions, and are based on the way human beings perceive information about their surroundings [21].
The main drawback is in the acquisition process due to many environmental apprehensions such as the place of the camera, background condition and lightning sensitivity [14], in addition accuracy and processing speed are challenging.
Leap Motion controller is a small USB device that using monochromatic IR cameras and infrared LEDs, observes a roughly hemispherical area, to a distance of about 1 meter. The LEDs generate pattern-less IR light and the cameras generate almost 200 frames per second [26]. P. Karthick et al. [10] used a model that transform Indian sign language into text using a leap controller. The Leap device detects the data like point, wave, reach, grab which is generated by a leap motion controller. Combination of DTW and IS algorithm are used for conversion of hand gesture into text. Neural network was used for training the data.
In [6] a leap motion controller is used for recognition of Australian sign language. Leap motion controller senses the hand movement and convert that hand movement into computer commands. Artificial neural network is used for training symbols. The disadvantage of that system was low accuracy and fidelity. With the emergence of RGB-D (color images and depth maps synchronized) and capture devices, using mainly the Microsoft Kinect sensor; the gesture recognition field had a great push forward [12]. In [5] a Microsoft kinect was used to recognize American Sign Language (ASL). Depth camera is Kinect sensor used to detect ASL alphabet. Distance adaptive scheme was used for feature extraction. Support vector machine and RF classifier algorithm used for classification purpose. Training of data was done using neural network.
The accuracy of the system was 90%. In [2] a 3D trajectory description of one sign language word is used and matched it against a gallery of trajectories. Another work presented in [7] used an RGB-D image from the Microsoft Kinect sensor to recognize the letters of the manual alphabet, known as fingerspelling. These works used data from a point cloud and required further processing for hand detection before actually detecting gestures. The Leap Motion skip this step, because already handles the detection by itself.
Recognition based on sensors such as accelerometers and gyroscopes offer the following advantages: a) because movement sensors are not affected by the surrounding, recognition is more adequate than recognition based in vision in complex surroundings b) they are joined to a user, this allows a bigger coverage, and c) the signs can be acquired wirelessly [13]. Gloves have been successfully used for the recognition of signs in previous works [4, 24], in [1] a system for the recognition of the 23 letters of the Vietnamese language is presented; this system uses a glove with accelerometers MEMS, whose data is transformed to relative angles between the fingers and the hand palm. For the recognition of the letters, it uses a classification system based in fuzzy logic.
In [27] a glove based in accelerometers and myoelectric sensors is reported, its elements allow it to automatically detect the initial and final point of two significative segments of the symbols by the intensity of the myoelectric sensors. To obtain the final result, it uses decision trees and hidden models of Markov. The functionality of the system is shown by the classification of the 72 symbols of Chinese sign language. [8] presents a framework for Sign Language Gesture recognition using an accelerometer glove. The evaluation of the solution presents the results of the gesture recognition attempt by using a selected set of sign language gestures with a described method based on Hidden Markov Model (HMM) and parallel HMM approaches, achieving a 99.75% recognition accuracy.
In this work, the implementation of a training system for the sign language of the Spanish alphabet for deaf-mute people is presented. It consists of a glove-like device with an accelerometer connected to each finger. The outputs of the sensors go through an acquisition board that sends the data wirelessly to a computer where an interface in LabVIEW resides. The collected data are kept in a sign database in which, differently to [9], the information is classified using an statistic method.
Once the signs are discriminated without ambiguity, the system can be used for the training of deaf-mute people who can make each of the Spanish alphabet letters from another interface in LabVIEW, and confirm if they are doing it right. The rest of this document is organized in the following way. In section 2, a description of the system is presented, making emphasis on the implementation of the glove and the sensors functioning. In section 3, data classification mechanism is presented. In section 4, the tests carried out to the system and some of the results obtained are presented, and finally in section 5 conclusions and future work are presented.
2 System Description
The system consists of three elements, a glove instrumented with analog accelerometers that can send information wirelessly, and two programs in LabVIEW, the first one for the samples capture and the second one for people’s training in sign making. The programs count with a graphic interface which are intuitive and allow any user to interact with the system.
2.1 Glove Construction
The glove design is based on the use of accelerometers, in this case ADXL335 since they are low cost and consume little power. These accelerometers give a measurement of the fingers position in three axes with a serial format (x, y, z), the glove accelerometers provide raw data which is sent to the acquisition board in a vector format and it is sent to the central computer through an Xbee device, Figure 1.
2.2 Samples Capture
The computer has a LabVIEW program used to capture the data corresponding to each of the Spanish alphabet letter and to storage them in a database. In order to do that, a group of 25 deaf-mute people was used to make each letter 50 times. The user interface made for the samples capture is shown in Figure 2.
This data is used offline to make a classification process in which each subclass corresponds to an specific letter. For the online operation, the user to be trained, accesses to another user interface in which he is notified if he is adequately making each letter. The user executes a letter and then the reading of the glove data is made, the information is compared with the information obtained by the training system. Once each X, Y, Z accelerometer reading is recognized, the corresponding letter is shown on the screen, this way, the user can corroborate if he makes it adequately and repeat the process with a new sign. If the user wants to, he can proceed to form a word.
3 Classification
Once the data is captured, we can use it to build a classification model that could be used later to identify a sign and automatically associate it to a determined letter.
The X, Y, Z readings that were obtained from each of the five accelerometers are used as characteristics for the construction of the classification model. Particularly, we have experimented with the three following classifiers:
(a) J48: It is a decision tree type classifier: Algorithm J48 is an implementation of algorithm C4.5, one of the data mining algorithms most used in several applications.
(b) SMO: It stands for the English words ”Sequential minimal optimization”, and is an algorithm used to solve the problem of quadratic programming that arises during the training of Support Vector Machines. It was invented in 1998 by John Platt [19] and it is broadly used nowadays.
(c) The multilayered perceptron is an artificial neuronal network formed by multiple layers; this allows solving problems that are not linearly separable, which is the main limitation of the perceptron (also named simple perceptron).
The results obtained in the experiments are shown in the following section.
4 Tests and Results
In this section, characteristics associated to the training corpus, as well as the evaluation methodology and the results obtained are described.
4.1 Data Set
Table 1, shows the number of samples that have been taken for each of the signs considered in the training corpus. The minimum number of samples was of 47 for letter ’m’, and the maximum number of samples was of 96 for letter ’f’. The median of the samples was of 55.12. In total, 1,378 samples were taken.
4.2 Evaluation Methodology
The evaluation process considers the use of the training corpus to validate the exactitude of the letter identification by using the three models of automatic classification suggested before.
Every set of samples of each letter is divided in 10 partitions and ten interactions are executed using 90% of the data for training and the remaining 10%, for tests in a process named 10-fold cross validation and leave-one out. The results obtained in the three classifiers, as well as the discussion of those results, are presented in the following section.
4.3 Obtained Results
Table 2 shows the results obtained for each of the classifiers. As it can be observed, the classifier based in multilayered perceptron is the one that obtains the best results, with an exactitude superior to 97%. Out of the 1,378 classified samples, only a total of 36 instances are classified incorrectly, this resulted in an error of 2.61%.
J48 | SMO | Multilayered Perceptron |
||||
---|---|---|---|---|---|---|
Classified instances | Quantity | Percentage | Quantity | Percentage | Quantity | Percentage |
Correctly | 1,227 | 89.04% | 1,276 | 92.60% | 1,342 | 97.39% |
Incorrectly | 151 | 10.96% | 102 | 7.40% | 36 | 2.61% |
In fact, it surpasses in 5 percent points the SMO classifier and in 8 percent points the J48 classifier. These results show that the exactitude degree is high and sufficient for the classification process of letter based on sign language.
It is necessary however to perform an analysis of the execution times necessary for each algorithm to construct the classification model in order to verify its use pertinence in real time systems. Table 3 shows the results.
As it can be observed, the exactitude level is inversely proportional to the necessary execution times to construct the model. Actually, the nearly 18 necessary seconds used by the classifier based in the multilayered perceptron does not turn out to be prohibitive for the construction of a classification model. As a matter of fact, the evaluation time of the test instances are of thousands of seconds for any of the three classifiers tested.
5 Conclusions and Future Work
This work presents a glove based in accelerometers that allow the training of deaf-mute people for the making of Spanish alphabet letters. The data management has been carried out statistically; this gives precision to the process of classification, makes the system independent from the user and allows the sign detection even if signs are not correctly made. Experiments carried out with three methods of automatic classification show that the precision obtained in the sign identification is higher than 89%. It is the algorithm based in neuronal networks using a multilayered perceptron the one that particularly obtained the best result, with exactitude of 97%.