Existing video conferencing applications such as Zoom are lacking in accessibility features for sign language users, who are currently restricted mostly to typing for communication. Our project aims to develop models capable of detecting sign language in real-time and translating it into speech/captions quickly and with high accuracy, which will allow sign language users to communicate by signing on video live. These models were created using a combination of Convolutional Neural Networks (CNN), Long Short-term Memory (LSTM), and the open-source MediaPipe Holistic framework, which were then assessed on speed and accuracy by an extensive dataset of 700 videos. We justify our final selection of an LSTM model that utilises the MediaPipe Holistic framework and discuss several possible avenues of improvement that could be made to our chosen model to further enhance the prediction accuracy.
- Module CS3244 Machine Learning
- Report Real-Time Sign Language Detection and Translation
- Models Convolutional Neural Networks (CNN)
Long Short-term Memory (LSTM) - Libraries MediaPipe Holistic, TensorFlow, Scikit-Learn
- LanguagePython
- Github CS3244 Sign Language Detection