This project implements real-time sign language recognition using MediaPipe Hands for hand landmark detection and an MLP (Multi-Layer Perceptron) model for character classification. Additionally, we experimented with MobileNetV2 for sign recognition but found that MediaPipe-based MLP performs better in real-time scenarios. The system captures hand gestures from a webcam, extracts landmark features, and predicts sign language letters, dynamically forming words and sentences.
π Sign Language Recognition System/ βββ π SIGN_TO_SENTENCE_PROJECT/ β βββ π Asl_Sign_Data/ # Raw ASL dataset β βββ π asl_mediapipe_keypoints_dataset.csv # Preprocessed dataset for MLP model β βββ π asl_mediapipe_mlp_model.h5 # Trained MLP model β βββ π sign_language_model_MobileNetV2.h5 # Trained MobileNetV2 model β βββ π Combined_Architecture.ipynb # Hybrid model experiments β βββ π LLM.ipynb # Language Model Integration β βββ π Mediapipe_Training.ipynb # Training script for MLP model β βββ π MobileNetV2_Training.ipynb # Training script for MobileNetV2 β βββ π concluion.txt # Summary of results β βββ π requirements.txt # Required dependencies - The dataset used for training was obtained from Kaggle ASL Sign Language Dataset.
- It contains hand gesture images labeled with ASL characters.
- For MobileNetV2, we used raw images.
- For MLP (MediaPipe), we extracted landmark keypoints from each image and stored them in CSV format.
- We trained MobileNetV2 on raw images for sign classification.
- However, it struggled with real-time sign recognition, leading us to explore MediaPipe-based MLP.
from tensorflow.keras.applications import MobileNetV2 from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense, Flatten # Load pre-trained MobileNetV2 model base_model = MobileNetV2(input_shape=(224, 224, 3), include_top=False, weights='imagenet') # Add custom classification layers model = Sequential([ base_model, Flatten(), Dense(256, activation='relu'), Dense(num_classes, activation='softmax') ]) # Compile model model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])- Extracted hand landmark coordinates using MediaPipe Hands.
- Trained a MLP model on the extracted landmark features.
- This method proved to be faster and more reliable for real-time recognition.
import mediapipe as mp import numpy as np import tensorflow as tf # Load the trained MLP model mlp_model = tf.keras.models.load_model("asl_mediapipe_mlp_model.h5") # Initialize MediaPipe Hands mp_hands = mp.solutions.hands hands = mp_hands.Hands(min_detection_confidence=0.7, min_tracking_confidence=0.7) # Predict a sign using MediaPipe landmarks def predict_sign(landmarks): input_data = np.array(landmarks).flatten().reshape(1, -1) prediction = mlp_model.predict(input_data) return np.argmax(prediction)pip install -r requirements.txtTo test the output of individual models, run the last cell in:
Mediapipe_Training.ipynbfor MLP model evaluation.MobileNetV2_Training.ipynbfor MobileNetV2 evaluation.
To see the working of both MobileNetV2 and MediaPipe integrated, run:
jupyter notebook Combined_Architecture.ipynb- Normal Signs β Letters are appended to the sentence.
- SPACE Sign β Adds a space.
- DELETE Sign β Removes the last character.
- NOTHING β No input detected.
- MobileNet did not perform well on real-time images, so we moved to MediaPipe-based MLP.
- Next Phase β Building a FastAPI backend (
SignConnect-Backend) for better integration and mobile app support.
As the next phase of development, we aim to implement Text-to-Sign Language Actions, allowing users to input text that gets translated into sign language animations. Possible technologies we will explore:
- AI-generated 3D avatars to perform sign language gestures.
- Computer Vision & Reinforcement Learning to map text to sign movements.
- Deep Learning models to generate smooth sign transitions.
We welcome contributions from the community for this phase! If you're interested in helping develop Text-to-Sign Language Generation, feel free to open an issue or submit a pull request on our GitHub repository.
- Uses MediaPipe Hands for landmark detection.
- Model trained using TensorFlow & Scikit-Learn.
- Inspired by existing research on gesture recognition & sign language AI.
This project is licensed under the MIT License - see the LICENSE file for details.