What if you could speak with your eyes?
It's a thought that feels like it's straight out of science fiction. But when I first saw an AI mapping 478 points across my face in real-time, directly in my browser, that science fiction suddenly felt tangible. The white mesh, clinging to every contour, knew the exact position of my lips, my cheeks, the tip of my nose. It knew when I blinked.
I decided I was going to use it to achieve a lifelong dream: to actually speak with my eyes. The plan was to translate my blinks into Morse code.
It turned out to be way harder and far more interesting than I ever imagined!
This project was also deeply inspired by the incredible story of Jeremiah Denton, a captured US pilot who blinked the word "TORTURE" in Morse code during a propaganda video. I wanted to see if I could build a digital, open-source version of that incredible human ingenuity.
Here's a breakdown of the most interesting technical challenges and how I solved them.
The Tech Stack is surprisingly simple:
- JavaScript (ES6 Modules): To handle all the logic.
- MediaPipe Face Landmarker: For the real-time facial landmark and blendshape detection. This is the core of the project.
- HTML5 & CSS3: For the UI. No frameworks needed!
Challenge #1: How to Tell a Dot from a Dash?
The first problem was detecting blinks. MediaPipe provides real-time scores for 52 different facial expressions, called "blendshapes," that track everything from a smile to a raised eyebrow. Luckily for me, two of those blendshapes were exactly what I needed: eyeBlinkLeft and eyeBlinkRight.
The way they work is simple: a score near 0 means the eye is open, and a score closer to 1 means it's closed.
My first thought was, "Easy, I'll just use setTimeout to see how long the eye is closed." This was a terrible idea. It was messy and out of sync with the video's requestAnimationFrame loop.The way they work is simple: a score near 0 means the eye is open, and a score closer to 1 means it's closed.
My first thought was, "Easy, I'll just use setTimeout to see how long the eye is closed." This was a terrible idea. It was messy and out of sync with the video's requestAnimationFrame loop.
The solution was much cleaner: count the frames.
Since the AI analyzes every frame, I could just increment a counter for every consecutive frame the eyes were closed.
💡 The Logic:
- If the eye-blink score is above a threshold (say,
0.5
), incrementblinkFrameCounter
. - If the score drops below the threshold, check the counter's value.
- If
blinkFrameCounter
is between2-14
frames, it's a short blink (dot). - If it's
15
or more frames, it's a long blink (dash).
Here's the code snippet for that logic:
// From js/detection.js // Check if the average blink score is above our threshold if ((rightEyeBlink + leftEyeBlink) / 2.0 > CONST.BLINK_THRESHOLD) { appState.blinkFrameCounter++; return null; // Still blinking, don't do anything yet } // If we've stopped blinking, check how long we blinked for if (appState.blinkFrameCounter >= CONST.BLINK_CONSECUTIVE_FRAMES) { const isLong = appState.blinkFrameCounter >= CONST.LONG_BLINK_FRAMES; appState.blinkFrameCounter = 0; // Reset for next time return isLong ? 'long' : 'short'; // Success! } // If we didn't blink long enough, just reset appState.blinkFrameCounter = 0; return null;
Challenge #2: Detecting Nods Without the Jitters
For separating letters and words, I wanted to use head nods. My first attempt was to track the y
position of the nose tip (landmark #4) and compare it frame-by-frame. This failed spectacularly. The system was so sensitive it would trigger from me just breathing!
The key was to realize a nod is a journey, not a snapshot. The solution: a "sliding window" algorithm.
💡 The Logic:
- Keep a running history of the nose's
y
position in an array (noseYHistory
) for the last 15 frames. - In every frame, find the
min
andmax
Y-position within that history. - If
(max - min)
is greater than our movement threshold, we have a clear, intentional nod.
This smoothed out all the noise and only registered real nods.
// From js/detection.js export function processNod(landmarks) { const noseY = landmarks[CONST.NOSE_TIP_INDEX].y * DOM.video.videoHeight; appState.noseYHistory.push(noseY); // Keep the history at a fixed length if (appState.noseYHistory.length > CONST.NOD_HISTORY_LENGTH) { appState.noseYHistory.shift(); } // Don't detect another nod immediately if (appState.nodCooldownCounter > 0) { appState.nodCooldownCounter--; return false; } if (appState.noseYHistory.length === CONST.NOD_HISTORY_LENGTH) { const minY = Math.min(...appState.noseYHistory); const maxY = Math.max(...appState.noseYHistory); if ((maxY - minY) > CONST.NOD_MOVEMENT_THRESHOLD) { appState.nodCooldownCounter = CONST.NOD_COOLDOWN_FRAMES; // Set cooldown appState.noseYHistory = []; // Clear history after detection return true; // We have a nod! } } return false; }
Challenge #3: The Cheek-to-Nose Ratio for Head Turns
Detecting head turns for backspace and spaces had the same "jitter" problem. The solution came from noticing how my face's proportions change when I turn.
When looking straight, the distance from my nose to my left cheek is about the same as to my right cheek. When I turn left, my right cheek gets "wider" from the nose's perspective, and my left cheek gets "narrower."
💡 The Logic:
- Get the
x
coordinates of the left cheek (#454), right cheek (#234), and nose (#4). - Calculate the total width:
leftCheek.x - rightCheek.x
. - Calculate the ratio:
(nose.x - rightCheek.x) / totalCheekWidth
. - When looking straight, this ratio is
~0.5
. When I turn left, it goes up (> 0.65
). When I turn right, it goes down (< 0.35
).
This worked so well I was very happy!
// From js/detection.js export function processTurn(landmarks) { const nose = landmarks[CONST.NOSE_TIP_INDEX]; const leftCheek = landmarks[CONST.LEFT_CHEEK_INDEX]; const rightCheek = landmarks[CONST.RIGHT_CHEEK_INDEX]; const totalCheekWidth = leftCheek.x - rightCheek.x; const noseToRightCheekDist = nose.x - rightCheek.x; if (totalCheekWidth > 0.1) { // Avoid division by zero const turnRatio = noseToRightCheekDist / totalCheekWidth; if (turnRatio > CONST.TURN_RIGHT_RATIO_THRESHOLD) { return 'left'; } else if (turnRatio < CONST.TURN_LEFT_RATIO_THRESHOLD) { return 'right'; } } return null; // Head is centered }
Final Thoughts
This was one of the most fun projects I've ever built. It's a great example of how accessible and powerful browser-based AI has become. What once required specialized hardware can now be built with a bit of creative problem-solving and open-source tools.
If you have any questions, drop them in the comments below. I'd love to hear your thoughts! What would you build with this tech?
Full Video:
- Live Demo: Click Here!
- Full Video: Watch on YouTube
- GitHub Repo: Click Here! Happy coding! ✨
Top comments (0)