Posted on Sep 23

Graph-Based Movie Recommendations: A Practical Guide with Python

#algorithms #python #programming #computerscience

Introduction

I built a small Python project where I use a well-known algorithm called BFS (Breadth-First Search), which I employ to recommend movies based on the closest match by genre. The BFS algorithm, for example, searches for movies related within N degrees of separation, for example:

movie_graph = { "The Matrix" ["Inception", "Blade Runner", "Dark City"], "Inception": ["The Matrix", "Interstellar", "Memento"], # ... More connections }

As you can see or try to understand, The Matrix is very similar to Blade Runner or Dark City (to some extent), and another film in roughly the same neighborhood would be: Inception is similar to or has connections to The Matrix, Interstellar, and, finally, Memento.

For the structure, I went with something basic that’s always used for these small projects, so you can understand the prototypes whether it’s for a big or small idea.

Here we find the README file, the data folder or database in a JSON file, the entry point main.py, and the src directory where we put all the files for the program’s logic.

movie_recommender/ ├── README.md ├── data │ └── movies_sample.json ├── main.py └── src └── src └── movie_data.py └── search.py

Of course, we need to make our interface as easy and obvious as possible in this case, so the CLI would be the way to go—to keep things simple and not difficult to understand, which is also fun (well, for me anyway :) ).

So we start with a welcome message, just like we would in any game. After that, we ask what to do—that is, we give options. This is usually done in almost any language with a while loop, which is the easiest way to discard options as they’re chosen, and the game or program continues with a series of questions until we’ve answered everything and the program ends.

This is what I mean:

Welcome to Movie Recommender! Enter a movie title (or type to search): inc Suggestions: ['Inception', 'The Incredibles', 'Incendies'] Selected: Inception Inception What would you like to do? 1. Find similar movies Find similar movies 2. See top-rated titles in the same genre 3. Explore by actor/director Explore by actor/director Choice: 1 Movies similar to Inception: - Interstellar (same director, sci-fi) - The Matrix (Mind-bending, action) - Memento (Christopher Nolan, thriller)

But anyway, just to refresh your memory, the BFS algorithm searches all nodes at one level, and when it’s done with that level, it moves on to the next level until it’s finished. That’s basically it: it visits each node and keeps track of which ones it’s visited until it’s done. Consequently, it uses a Queue (FIFO) data structure → BFS = Goes level by level, meaning the first one in is the first one out, just like a line. Well, this topic is quite extensive, and I’d have to write another article since it takes time to understand how this type of data structure works.

An example

Tree Structure:

 A / \ C B / \ / \ G F E D

The BFS = [A] Root

First level = [B, C]

Second level = [D, E, F, G]

BFS Order: A → B → C → D → E → F → G

What this highly popular algorithm does is:

Depth 1: Direct connections (very highly similar movies)
Depth 2: Extended recommendation (movies similar to other similar movies)

As you can imagine, this algorithm is basically everyplace on the internet, from movie recommenders to the products we buy on any e-commerce site—like related items—it’s really quite popular.

Okay, all this talk is a shame, my friend. Let’s get down to business—briefly:

First, we create movies_sample.json, which is our small internal or lite database:

"movies": [ "id": 1, "title": "Inception", "year": 2010, "genres": ["Sci-Fi", "Thriller"], "rating": 8.8 }, { "id": 2, "English": "id": 2, "title": "28 Days Later" "year": 2002, "genres": ["Sci-Fi", "Drama", "Thriller"], "rating": 7.3 }, and 8 more ...

After that, we define the MovieDatabase class:

import json class MovieDatabase: def __init__(self): self.movies = {} # id -> movie info  self.connections = {} # id -> list of connected ids  def load_data(self, filepath): """Load movie data from JSON file""" with open(filepath, 'r') as f: data = json.load(f) # Extract movies and store them  for movie in data['movies']: self.movies[movie['id']] = movie # Extract connections  for connection in data['connections']: self.connections[connection['movie_id']] = connection['similar_to'] def get_movie(self, movie_id): """Return movie info by ID""" return self.movies.get(movie_id, None) def get_all_movies(self): """Return list of all movies""" return list(self.movies.values())

First import json module to extract data, then we initialize the constructor with its variables instantiated. Now that we have the above, we can define methods so that our main class, MovieRecommenderCLI, can interact with them. The first method is for loading data from the JSON file, from which we extract the data so that we can manipulate it once it’s been extracted. The other two methods are the getters get_movie and get_all_movies; as you can see, they are for reading data.

import json class MovieSearcher: def __init__(self, database): """ Initialize with a MovieDatabase instance Parameters: database: MovieDatabase object that contains our movie data """ self.db = database

In the next step, we will proceed to create the MovieSearcher class. This is where we find several algorithms, such as the search_by_title method:

def search_by_title(self, query): """ Search movies by partial title match (case-insensitive) Parameters: query: String to search for in movie titles Returns: List of matching movie dictionaries """ results = [] query_lower = query.lower() # Convert to lowercase for case-insensitive search  # Iterate through all movies in database  for movie_id, movie in self.db.movies.items(): # Check if query appears anywhere in the title  if query_lower in movie['title'].lower(): results.append(movie) return results

In which we convert the query to lowercase, iterate through each movie in the database, check if the query appears in the movie’s title (also converted to lowercase), and add the movies to the results list.

def search_by_genre(self, genre): """ Find all movies that have a specific genre Parameters: genre: String genre to search for Returns: List of movies that contain this genre """ results = [] genre_lower = genre.lower() for movie_id, movie in self.db.movies.items(): # Convert all genres to lowercase for comparison  movie_genres_lower = [g.lower() for g in movie['genres']] # Check if the genre appears in any of the movie's genres  for movie_genre in movie_genres_lower: if genre_lower in movie_genre: # Partial match  results.append(movie) break # Don't add the same movie twice  return results

After this, we continue with the next method, search_by_genre, which does the following: Convert the string parameter genre to lowercase, iterate through each movie, and check all its genres. We use a partial comparison, such as ("Horror" matches "Slasher Horror"), and finally break after finding the first match to prevent duplicates.

def find_recommendations(self, movie_id, depth=1): """ Use BFS to find related movies up to a certain depth Parameters: movie_id: ID of the movie to find recommendations for depth: How many "hops" away to search (1 = direct connections only) Returns: List of recommended movies with their connection distance """ if movie_id not in self.db.movies: return [] visited = set() # Track visited movies  queue = [(movie_id, 0)] # (movie_id, current_depth)  recommendations = [] while queue: current_id, current_depth = queue.pop(0) # FIFO for BFS  # Skip if we've seen this movie or exceeded depth  if current_id in visited or current_depth > depth: continue visited.add(current_id) # Don't include the original movie in recommendations  if current_id != movie_id: movie = self.db.movies.get(current_id) if movie: recommendations.append({ 'movie': movie, 'distance': current_depth }) # Add connected movies to queue (if they exist)  if current_id in self.db.connections: for connected_id in self.db.connections[current_id]: if connected_id not in visited: queue.append((connected_id, current_depth + 1)) return recommendations

The last method is where we apply the BFS algorithm find_recommendations method, and it works like this:
We start with the movie source at depth 0, then use a queue (FIFO) to process the movies level by level. As explained above, we mark the movies that have been visited (i.e., the nodes) to prevent cycles. From here, we add all unvisited connections to the queue with depth + 1, or another level. And finally, we stop when we have exceeded the specified depth.

We now have practically the entire program, but the most important element of the project is still the MovieRecommenderCLI class. This is the brain, where actions or options are thought out, because from there we extract user input as the Command Line Interface of CLI, which all developers are so proud of for making these types of programs. And tell me, guys? And if you're just starting out, you're going to feel super special for making these CLIs.
Returning to the topic at hand, here above we see the code and walk through it very briefly as follows:
The CLI interface methods:
It shows us the menu with all the options, handles user input with error checking, formats the output appropriately with its structure, allows searching by title, genre, and obtaining recommendations.

def search_by_title_interface(self): """Handle title search interaction""" query = input("\nEnter movie title to search: ").strip() results = self.searcher.search_by_title(query) if results: print(f"\nFound {len(results)} movie(s):") print("-"*40) for movie in results: self.display_movie(movie) else: print(f"\nNo movies found with '{query}' in the title") def search_by_genre_interface(self): """Handle genre search interaction""" # Show avaialable genres  all_genres = set() for movie in self.db.movies.values(): for genre in movie['genres']: all_genres.add(genre) print("\nAvalilable genres") for genre in sorted(all_genres): print(f" . {genre}") query = input("\nEnter genre to search: ").strip() results = self.searcher.search_by_genre(query) if results: print(f"\nFound {len(results)} movies(s) in '{query}' genre") print("-"*40) for movie in results: self.display_movie(movie) else: print(f"\nNo movies found in '{query}' genre")

The search_by_title_interface and search_by_genre_interface methods basically handle the interactions for searching for movies. With these, we collect user input, call the other related methods of the searcher object, and display the formatted results. The genre search has an extra feature: it displays all the genres before prompting the input, making it more user-friendly.

def get_recommendations_interface(self): """Handle recommendation interaction""" # Show all movies with IDs  print("\nSelect a movie to get recommendations") print("-"*40) for movie_id, movie in self.db.movies.items(): print(f" [{movie_id}] {movie['title']} ({movie['year']})") try: movie_id = int(input("\nEnter movie ID: ")) if movie_id not in self.db.movies: print(f"\nMovie with ID {movie_id} not found") return print("\nRecommendation type:") print("1. Similar movies (closest matches)") print("2. Extended recommendations (includes related to similar)") choice = input("Choose (1 o 2) [default=1]: ").strip() depth = 2 if choice == '2' else 1 source_movie = self.db.movies[movie_id] print(f"\nMovies similar to: {source_movie['title']}") print("="*50) recommendations = self.searcher.find_recommendations(movie_id, depth) if recommendations: # Sort by distance  recommendations.sort(key=lambda x: x['distance']) for rec in recommendations: if rec['distance'] == 1: similarity_label = "Highly Similar" else: similarity_label = "Also Recommended" print(f"\n {similarity_label}") self.display_movie(rec['movie'], " ") # Add reason if it's a direct connection  if rec['distance'] == 1 and movie_id in self.db.connections: # Find the connection reason from the original JSON if available  print(f" Why: Similar themes and style") else: print("\nNo recommendations found") except ValueError: print("\nNo recommendations found")

The get_recommendation_interface method is where we also use the BFS algorithm. When we enter depth=2 for its depth to make "extended recommendations," it implements a two-level BFS traversal through your movie's connection graph. This algorithm first finds movies directly connected to the source (distance=1), and then finds movies connected to it (distance=2). The results are sorted by distance and displayed with labels such as "Highly Similar" for direct connections and "Also Recommended" for second-degree connections. The show_all_movies method simply displays all movies sorted by rating in descending order.
A very important piece of information, at the end of this brief explanation of the code, is that in: self.searcher.find_recommendations(movie_id, depth). With depth=2, it explores the graph level by level, first visiting all immediate neighbors (similar movies), then their neighbors (extended recommendations), thus creating a tree recommendation system that expands beyond the selected movie.

Wrappping Up: The Main Application Loop

def show_all_movies(self): """Display all movies in the database""" print(f"\nAll movies in Database ({len(self.db.movies)} total):") print("="*50) # Sort by rating  sorted_movies = sorted(self.db.movies.values(), key=lambda x: x['rating'], reverse=True) for movie in sorted_movies: self.display_movie(movie) def run(self): """Main loop for the CLI""" print("\nWelcome to Movie Recommender!") while True: self.display_menu() choice = input("\nEnter your choice (1-5): ").strip() if choice == '1': self.search_by_title_interface() elif choice == '2': self.search_by_genre_interface() elif choice == '3': self.get_recommendations_interface() elif choice == '4': self.show_all_movies() elif choice == '5': print("\nThanks for using Movie Recommneder! Goodbye!") break else: print("\nInvalid choice. Please try again.") input("\n[Press Enter to continue...]") if __name__ == "__main__": cli = MovieRecommenderCLI() cli.run()

The final pieces bring everything togheter into a working application. The show_all_moviesmethod provides a simple but useful feature, it retrieves all movies from the database, sorts them by ratin in descending order (highest-rated first), and displays them in a formatted list. This gives user a quick overview of the best movies in the system.

The core also of the application it's the run method main event loop, orchestrating the entire user experinece. It displays a welcome message, then enters an infinite while loop that continuouly presents the meny and processes user choices as any CLI. Based on the selected option (1-5), it routes to the appropriate interface method, whether that's searching by title, genre, getting recommendations, showing all movies, or exiting the program. Lastly, it pauses after we did all operations with "Press Enter to continue" to let users review results before returning to the main menu.
Finally, the entry point at the bottom if __name__ == "__main__" is where the magic begins. When the script runs directly, it creates an instance of our MovieRecommenderCLI class and call its run() method, launching the interactive command-line interface. Well amazing job, I deserve a I deserve a pat on the back, well done Ivan :)

Now you can test this final project please check the provided link to clone the project from Github:

Github repository project

Conclusion

As you could see, this was a great little project because it really showed how the BFS algorithm works, more or less. Although I didn’t go into great depth, it’s not difficult, but it does require attention. We were able to create several classes and their methods for searching and recommending using this very useful algorithm, by the way. Well, as homework :) see how this type of algorithm is used all over the internet—it’s really impressive. Just think that this is only one of hundreds, if not thousands, of algorithms out there.

Thank you for reading this article. I hope you enjoyed it, great community. I want to keep writing more regularly, and I hope I can, so I can share my thoughts on this great topic of computer science.

About the Author

Ivan Duarte is a full-stack developer, Node.js and Python developer, content writer, entrepreneur, and founder of ByteUp LLC, a software development company.

DEV Community

Graph-Based Movie Recommendations: A Practical Guide with Python

Introduction

Wrappping Up: The Main Application Loop

Conclusion

About the Author

Top comments (0)