top of page

Project: Movie Recommender system with TensorFlow Recommenders and Netflix dataset

A recommendation system helps users find content in a large corpora, with 60% of watch time on Youtube coming from recommendations. With the amount of content being produced everyday online, users receive an overwhelming variety of content (movies, articles, etc.). Users may have a hard time sorting through all of them, and a recommendation engine can also suggest items they may not have previously thought of. In this project, I explored the world of recommendation system using the Netflix Dataset by utilizing the TensorFlow Recommenders (TFRS) framework, to predict user's preferences and recommend movies they would enjoy based on their past habits and ratings.

The dataset

For this project, we will be using the Netflix Movie Dataset, available here

the dataset consists of 2 csv files:

  • Movies: Movie_ID, Year, Name

  • Rating: User_ID, Rating, Movie_ID

In this project, we will build a joint model consisting of both retrieval and ranking tasks. This may produce better results than task-specific models, especially when some data are more abundant than others. In these scenarios, a joint model may use representations learned from the abundant task to improve predictions on the sparse task through transfer learning.

This multi-task recommender will use both implicit (movie watches) and explicit signals(ratings). Exploratory Data Analysis is available on the full Jupyter Notebook here


import required libraries

!pip install -q tensorflow-recommenders
from typing import Dict, Text

import pandas as pd
import numpy as np
import tensorflow as tf
import tensorflow_recommenders as tfrs

Data Processing

In this section, we will read, merge and process our data to feed into the model

# read our data

ratings_df = pd.read_csv('../netflix-project/data/Netflix_Dataset_Rating.csv')
movies_df = pd.read_csv('../netflix-project/data/Netflix_Dataset_Movie.csv')

# create a temporary dataframe and merge with ratings_df to get the movie titles

temp_movies_df = pd.read_csv('../netflix-project/data/Netflix_Dataset_Movie.csv')
ratings_df = ratings_df.merge(temp_movies_df, on='Movie_ID')

# look at the data information to check if we need to make changes before processing into the model

# convert 'User ID' to prepare for user embedding layer in the model

ratings_df['User_ID'] = ratings_df['User_ID'].astype('str')

ratings =[['User_ID', 'Rating', 'Name']]))
movies =[['Name']]))

# only keep necessary data columns for further processing
ratings = x: {
    "Name": x["Name"],
    "User_ID": x["User_ID"],
    "Rating": x["Rating"]

movies = x: x["Name"])

print("Total Data: {}".format(len(ratings)))

Set seed is used so we can reproduce results when we create variables that take on random values.

# prep for building vocabularies and splitting data into a train and test set

shuffled = ratings.shuffle(100_000, seed=42, reshuffle_each_iteration=False)

train = shuffled.take(80_000)
test = shuffled.skip(80_000).take(20_000)

movie_titles = movies.batch(1_000)
user_ids = ratings.batch(1_000_000).map(lambda x: x["User_ID"])

unique_movie_titles = np.unique(np.concatenate(list(movie_titles)))
unique_user_ids = np.unique(np.concatenate(list(user_ids)))

Model Implementation

The focus on this model would be on two critical-parts:

  • optimize for two objectives (retrieval and ranking), thus, having two losses

  • share variables between tasks, allowing for transfer learning

The graph shows the architecture of the two-tower model that will be used for this project: retrieval and ranking using the dataset with ratings of movies given by the user. It is a neural network with 2 sub-models using representations for queries('User_ID') and candidates('Name') separately. With this model, it may use representations learned from abundant task to improve its predictions on the sparse task via transfer learning. The two-tower model will include the following:

  • User-tower: turns 'User_ID's into user-embeddings (high-dimensional vector representations)

  • Movie-tower: turns movie titles ('Name') into movie-embeddings

This model will have 2 tasks:

  • Rating (Ranking): MSE (loss to predict ratings), RMSE (metrics)

  • Retrieval: the retrieval task object is a wrapper that bundles together the loss function and metric computation. Top-K metric will be used

Top-k metrics: given a user and a known watched movie, how highly would the model rank the true movie out of all possible movies? The model architecture also shows the score of the given query-candidate pair, which is the dot product of the output of the two towers.

Embedding dimension: we will use an embedding size of 32, larger dimensions for embedding layer may yield more accurate result but may be prone to overfitting.

Call: define how our model computes its predictions, which use user_embeddings, movie_embeddings as inputs and applying them to the rating model. (not to be called directly)

Compute_loss: describe how our model will be trained, since this is a multi-task model, loss weights will be combined in both tasks, and can be adjusted depending on the weights assigned. (for this project, we'll assign both weights at 1.0)

class MovieModel(tfrs.models.Model):
    def __init__(self, rating_weight: float, retrieval_weight: float) -> None:
        # we take the loss weights in the constructor: this allows us to instantiate
        # several model objects with different loss weights

        embedding_dimension = 32

        # user and movie models
        self.movie_model: tf.keras.layers.Layer = tf.keras.Sequential([
            vocabulary=unique_movie_titles, mask_token=None
            # we add an additional embedding to account for unknown tokens
            tf.keras.layers.Embedding(len(unique_movie_titles) + 1, embedding_dimension)
        self.user_model: tf.keras.layers.Layer = tf.keras.Sequential([
            vocabulary=unique_user_ids, mask_token=None),
            tf.keras.layers.Embedding(len(unique_user_ids) + 1, embedding_dimension)

        # A small model to take in user and movie embeddings and predict ratings
        # we can make this as complicated as we want as long as we output a scalar
        # as our prediction
        self.rating_model = tf.keras.Sequential([
            tf.keras.layers.Dense(256, activation="relu"),
            tf.keras.layers.Dense(128, activation="relu"),

        # the tasks
        self.rating_task: tf.keras.layer.Layer = tfrs.tasks.Ranking(
        self.retrieval_task: tf.keras.layers.Layer = tfrs.tasks.Retrieval(

        # The loss weights
        self.rating_weight = rating_weight
        self.retrieval_weight = retrieval_weight

    def call(self, features: Dict[Text, tf.Tensor]) -> tf.Tensor:
        # we pick out the user features and pass them into the user model
        user_embeddings = self.user_model(features["User_ID"])
        # and pick out the movie features and pass them into the movie model
        movie_embeddings = self.movie_model(features["Name"])

            # we apply the multi-layered rating model to a concatenation of 
            # user and movie embeddings
            tf.concat([user_embeddings, movie_embeddings], axis=1)
    def compute_loss(self, features: Dict[Text, tf.Tensor], training=False) -> tf.Tensor:
        ratings = features.pop("Rating")
        user_embeddings, movie_embeddings, rating_predictions = self(features)

        # we compute the loss for each task
        rating_loss = self.rating_task(
        retrieval_loss = self.retrieval_task(user_embeddings, movie_embeddings)

        # and combine them using the loss weights
        return (self.rating_weight * rating_loss
                + self.retrieval_weight * retrieval_loss)

Fitting and Evaluating

After defining our model, we will fit and evaluate the model with the standard Keras routines.

# instantiate the model and use the Adagrad optimizer with a learning rate of 0.1
# assign both weights at 1.0

model = MovieModel(rating_weight=1.0, retrieval_weight=1.0)

# shuffle, batch, and cache the training and evaluation data

cached_train = train.shuffle(100_000).batch(8_192).cache()
cached_test = test.batch(4_096).cache()

# train the model, epochs=3)
metrics = model.evaluate(cached_test, return_dict=True)

print(f"Retrieval top-100 accuracy: {metrics['factorized_top_k/top_100_categorical_accuracy']:.3f}")
print(f"Ranking RMSE: {metrics['root_mean_squared_error']:.3f}")

Model Metrics Interpretation

In this model, we will receive two metrics to evaluate the performance of our model for both tasks:

  • Retrieval Top-100 categorical accuracy: the number of movies generated that the model has accurately predicted out of the top 100 movies generated. Higher number usually equates to better model performance.

  • Ranking RMSE: we want our error value to be as low as possible.

The rating and retrieval weights of our model may be tweaked to compare which weight distribution yield better results.

Making Predictions

'tfrs.layers.factorized_top_k.BruteForce' layer will be used to make predictions. The BruteForce layer may be slower when serving a model with many possible candidates, in which other layer may be used to speed this up, such as the 'TFRS ScaNN' layer.

def predict_movie(user, top_n=5):
    # create a model that takes in raw query ft user
    index = tfrs.layers.factorized_top_k.BruteForce(model.user_model)

    # recommends movies out of the entire movies dataset
    index.index_from_dataset(, movies.batch(100).map(model.movie_model)))

    # get recommendations
    _, titles = index(tf.constant([str(user)]))

    print('Top {} recommendations for user {}:\n'.format(top_n, user))
    unique_titles = set()  # To store unique titles
    for i, title in enumerate(titles[0].numpy()):
        title_str = title.decode("utf-8")
        if title_str not in unique_titles:
            print('{}. {}'.format(len(unique_titles), title_str))
            if len(unique_titles) == top_n:

def predict_rating(user, movie):
    trained_movie_embeddings, trained_user_embeddings, predicted_rating = model({
        "User_ID": np.array([str(user)]),
        "Name": np.array([movie])
    print("Predicted rating for {}: {}".format(movie, predicted_rating))


let's try our model by making some predictions on a random user from our test dataset. Let's also make sure that our random user does not exist in our training dataset.

# Convert the CacheDataset to an iterator
cached_test_iter = iter(cached_test)

# Get the number of batches in the dataset
num_batches = len(cached_test)

# Choose a random batch index
random_batch_index = random.randint(0, num_batches - 1)

# Create a separate random number generator for this code snippet
user_id_random_generator = random.Random(1)

# Iterate to the random batch
for i in range(random_batch_index + 1):
    element = next(cached_test_iter)

# Choose a random index within the batch using the separate random generator
random_index_in_batch = user_id_random_generator.randint(0, len(element['User_ID']) - 1)

# Get the random 'User_ID'
random_user_id = element['User_ID'][random_index_in_batch].numpy()

print("Randomly selected 'User_ID':", random_user_id.decode("utf-8"))

Running the code above will select a random user from our test dataset, and running it again may randomly select a different user. To ensure that our random user that was previously selected from the test set was not in the training set, we will run the following code Let's check to make sure 'User_ID': 169999 does not exist in the training dataset:

# Convert the CacheDataset to an iterator
cached_train_iter = iter(cached_train)

# Check if 'User_ID' '169999' exists in the dataset
user_id_to_find = b'169999'
user_id_found = False

for _ in range(len(cached_train)):  # Loop through all batches in the cached_train
    element = next(cached_train_iter)
    user_ids = element['User_ID']
    if user_id_to_find in user_ids:
        user_id_found = True

if user_id_found:
    print(f"User with 'User_ID' {user_id_to_find} exists in the cached_train dataset.")
    print(f"User with 'User_ID' {user_id_to_find} does not exist in the cached_train dataset.")

Let's look at what movies we should recommend to our random user '169999', for this example I have set the rating_weight = 0.9, with ranking_weight = 1.0:

# retrieve top 10 movies to recommend to user '169999'
predict_movie(169999, 10)

# predict the rating user '169999' will give to the movie 'Pride and Prejudice'

predict_rating(169999, 'Pride and Prejudice')

# let's look at what user 169999 rating history
# to see if they would enjoy the top 10 movies we just recommended

ratings_df[ratings_df['User_ID'] == '169999']

Result Interpretation:

When looking at the results with the assigned weights, we can see that some of the recommendations has appeared in his/her previous watch, such as Lilo and Stitch, and this may be a good thing, as the recommendation included familiar movies but also movies the user has never viewed before.

When used to predict rating, it may be a little inaccurate as Pride and Prejudice predicted rating for this user is 2.7, but the user actually rated the movie 5.0. (the user's actual rating for Pride and Prejudice is not included in the data shown here, but it's included when you run the actual code in the notebook, data shown in this post has been truncated and does not show the full data)

This is expected as we have set the rating_weight less, increasing weight of the rating may produce better rating prediction.

This may not be entirely necessary as most users would be more interested in getting recommendation of movies they would actually want to watch, as opposed to how good the recommender is at to predicting the rating the user will give to a particular movie.

Final Thoughts

Given the result, Here's a few improvements we can implement to improve the test result:

  • Add additional Features to the dataset: expand the dataset to include additional features that could provide valuable information for recommendations. For movies, we could consider adding genres, actors, directors, or movie descriptions. For users, we may include demographic information, preferences, and historical interactions: such as, watch time, time most active, etc.

  • Regularization, Hyperparameters and Parameters Tuning: Experiment with different regularization techniques and hyperparameters settings to optimize model performance while considering bias/variance tradeoff

  • Explore different Models: explore different models that combine the prediction of multiple models, or combine different techniques, such as matrix factorization and reinforcement learning.

Full Notebook available here:



bottom of page