DEV Community

Play Button Pause Button
Matt Hamilton
Matt Hamilton

Posted on • Edited on

Using a Convolutional Neural Network (CNN) to Detect Smiling Faces

What is a Convolutional Neural Network (CNN)? How can they be used to detect features in images? This is the video of a live coding session in which I show how to build a CNN in Python using Keras and extend the "smile detector" I built last week to use it.

A 1080p version of this video can be found on Cinnamon

A Convolutional Neural Network is a particular type of neural network that is very suited to analysing images. It works by passing a 'kernel' across the input image (convolution) to produce an output. These convolutional layers are stacked to produce a deep learning network and able to learn quite complex features in images.

A typical Convolutional Neural Network

In this session I coded a simple 3-layer CNN and trained it with manually classified images of faces.

Much of the code was based on the previous iteration of this. Subsequent to the live coding session, I actually refactored the code to use python generators to simplify the processing pipeline.

Frame Generator

This method opens the video file and iterates through the frames yielding each frame.

def frame_generator(self, video_fn): cap = cv2.VideoCapture(video_fn) while 1: # Read each frame of the video  ret, frame = cap.read() # End of file, so break loop  if not ret: break yield frame cap.release() 
Enter fullscreen mode Exit fullscreen mode

Calculating the Threshold

Like in the previous session, we iterate through the frames to calculate the different between each frame and the previous one. It then returns the threshold needed in which to filter out just the top 5% of images:

def calc_threshold(self, frames, q=0.95): prev_frame = next(frames) counts = [] for frame in frames: # Calculate the pixel difference between the current  # frame and the previous one  diff = cv2.absdiff(frame, prev_frame) non_zero_count = np.count_nonzero(diff) # Append the count to our list of counts  counts.append(non_zero_count) prev_frame = frame return int(np.quantile(counts, q)) 
Enter fullscreen mode Exit fullscreen mode

Filtering the Image Stream

Another generator that takes in an iterable of the frames and a threshold and then yields each frame whose difference from the previous frame is above the supplied threshold.

def filter_frames(self, frames, threshold): prev_frame = next(frames) for frame in frames: # Calculate the pixel difference between the current  # frame and the previous one  diff = cv2.absdiff(frame, prev_frame) non_zero_count = np.count_nonzero(diff) if non_zero_count > threshold: yield frame prev_frame = frame 
Enter fullscreen mode Exit fullscreen mode

Finding the Smiliest Image

By factoring out the methods above we can chain the generators together and pass them in to this method to actually look for the smiliest image. This means that (unlike the previous version) this method doesn't need to concern itself with deciding which frames to analyse.

We use the trained neural network (as a Tensorflow Lite model) to predict whether a face is smiling. Much of this structure is similar to last session in which we first scan the image to find faces. We then align each of those faces using a facial aligner -- this transforms the face such that the eyes are in the same location of each image. We pass each face into the neural network that gives us a score from 0 to 1.0 of how likely it is smiling. We sum all those values up in order to get an overall score of 'smiliness' for the frame.

def find_smiliest_frame(self, frames, callback=None): # Allocate the tensors for Tensorflow lite  self.interpreter.allocate_tensors() input_details = self.interpreter.get_input_details() output_details = self.interpreter.get_output_details() def detect(gray, frame): # detect faces within the greyscale version of the frame  faces = self.detector(gray, 2) smile_score = 0 # For each face we find...  for rect in faces: (x, y, w, h) = rect_to_bb(rect) face_orig = imutils.resize(frame[y:y + h, x:x + w], width=256) # Align the face  face_aligned = self.face_aligner.align(frame, gray, rect) # Resize the face to the size our neural network expects  face_aligned = face_aligned.reshape(1, 256, 256, 3) # Scale to pixel values to 0..1  face_aligned = face_aligned.astype(np.float32) / 255.0 # Pass the face into the input tensor for the network  self.interpreter.set_tensor(input_details[0]['index'], face_aligned) # Actually run the neural network  self.interpreter.invoke() # Extract the prediction from the output tensor  pred = self.interpreter.get_tensor( output_details[0]['index'])[0][0] # Keep a sum of all 'smiliness' scores  smile_score += pred return smile_score, frame best_smile_score = 0 best_frame = next(frames) for frame in frames: # Convert the frame to grayscale  gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) # Call the detector function  smile_score, frame = detect(gray, frame) # Check if we have more smiles in this frame  # than out "best" frame  if smile_score > best_smile_score: best_smile_score = smile_score best_frame = frame if callback is not None: callback(best_frame, best_smile_score) return best_smile_score, best_frame 
Enter fullscreen mode Exit fullscreen mode

We can then chain the functions together:

smiler = Smiler(landmarks_path, model_path) fg = smiler.frame_generator(args.video_fn) threshold = smiler.calc_threshold(fg, args.quantile) fg = smiler.frame_generator(args.video_fn) ffg = smiler.filter_frames(fg, threshold) smile_score, image = smiler.find_smiliest_frame(ffg) 
Enter fullscreen mode Exit fullscreen mode

Output

Testing it out it all works pretty well, and finds a nice snapshot from the video of smiling faces.

A frame of smiling people

The full code to this is now wrapped up as a complete Python package:

GitHub logo Choirless / smiler

Extract the most smiling image from a video clip

Smiler

This is a library and CLI tool to extract the "smiliest" of frame from a video of people.

It was developed as part of Choirless as part of IBM Call for code.

Installation

% pip install choirless_smiler 

Usage

Simple usage:

% smiler video.mp4 snapshot.jpg 

Output image of people singing

It will do a pre-scan to determine the 5% most changed frames from their previous frame in order to just consider them. If you know the threshold of change you want to use you can use that. e.g.

The first time smiler runs it will download facial landmark data and store it in ~/.smiler location of this data and cache directory can be specified as arguments

% smiler video.mp4 snapshot.jpg --threshold 480000 

Help

% smiler -h usage: smiler [-h] [--verbose] [--threshold THRESHOLD] [--landmarks-url LANDMARKS_URL] [--cache-dir CACHE_DIR] [--quantile QUANTILE] video_fn image_fn Save thumbnail of smiliest frame in video positional arguments: video_fn filename for video to

I hope you enjoyed the video, if you want to catch them live, I stream each week at 2pm UK time on the IBM Developer Twitch channel:

https://developer.ibm.com/livestream

Top comments (1)

Collapse
 
lizard profile image
Lizard

Very nice project :)