Sign language is an essential tool for those who are hearing impaired They can use sign language to communicate with each other For those who cannot use sign language handwriting is a way to communicate with hearing impaired people who otherwise might also read lips of people to comprehend the content of communications Nevertheless lip-reading skill is not as easy as using sign language In sign language one gesture clearly represents one meaning while in lip-reading similar lip shapes may be interpreted as different characters which raises an interesting issue for study Today most people use mobile phones On any occasion if you take video on a talking person by a mobile phone which can read the lips of the person and display the contents of the conversations it will be a very convenient tool for people who are hearing impaired and people who do not know sign language Although there are many researches on lip-reading for Chinese and English their accuracy on recognizing non-specific vocabulary is not high enough especially on Chinese lip-reading Using real life vocabulary this study employs coordinate differences of lips feature points and different combinations of neuro networks to establish a real-world lip-reading application on mobile phones In this study several vocabularies from sentences frequently used in daily life are collected by an ordinary mobile phone camera The faces of persons reading sentences are filmed as 30-frame per second videos that are split as clips according to the vocabularies in the sentences In order to reflect the different light conditions of a scene in actual applications brightness adjustments on the clips are performed which also increases the number of clips for training Then for every frame of a clip the feature points of the face are found and the coordinates of those belonging to the lip area are recorded For a vocabulary the sequence of coordinate differences of feature points between the frames of a clip are calculated to form sequence vectors The training set comprises all such vectors vocabularies and all the original clips The training model comprises CNN and Resnet for extracting lip features and LSTM and GRU for extracting time sequence features of clips Resnet and GRU will be used for the original clips and LSTM will be used for the sequences of coordinate differences The last stage of the training model is a fully connected layer The lip-reading system established in this study when uses different combinations of training models can reach up to 76% and 62% accuracies when predicting vocabularies and whole sentences respectively and can confirm the feasibility in practical applications
Date of Award | 2020 |
---|
Original language | English |
---|
Supervisor | Tzone-I Wang (Supervisor) |
---|
Chinese Lipreading System base on Coordinate Differences of Lips Feature Points
士龍, 游. (Author). 2020
Student thesis: Doctoral Thesis