Recommender Systems part 4: Candidate Generation - Content-Based Filtering

eyereece
Aug 3, 2023
4 min read

In this section, we will start to develop a second type of recommender system called content-based filtering algorithm.

Collaborative vs Content-based filtering:

Collaborative: recommend items to you based on ratings of users who gave similar ratings as you
Content-based: recommend items to you based on features of user and item to find a good match

With content-based filtering, we still have the following data provided:

r(i, j) = 1; if user j has rated item i
y(i, j): rating given by user j on item i (if defined)

Some examples of user and item features:

user features (Xu(j)): age, gender, country, movies watched, average rating per genre, etc
movie features (Xm(i)): year, genres, reviews, average rating

The vector size of user features and movie features may be different.

In content-based filtering, we are going to develop an algorithm that learns to match users and movies:

Notations:

V stands for vector
Vu(j): a list of numbers computed from the features of user j
Vm(i): a list of numbers computed from the features of movie i

If we're able to come up with an appropriate choice of these vectors, then, the dot product between these 2 vectors will be a good prediction of the rating that user j gives movie i

The dot product, which multiplies these lists of numbers, element-wise then takes a sum, will give a sense of how much this particular user will like this particular movie.

The challenges given features of a user, Xu(j), how can we compute vector, Vu(j), that represents succinctly the user's preferences?

Notice that Xu and Xm could be different in size
Vu and Vm has to be the same size or dimension

Deep Learning for content-based filtering

Recall that given a feature vector, Xu, we have to compute the vector Vu and the same for for Xm to vector Vm. We can do that with a neural network.

Note that the user network and the movie network can have different numbers of different layers and different number of units per hidden layer, but all the output layer needs to have the same size of the same dimension.

You can also modify the algorithm, instead of the dot product of Vu and Vm, we can also use the sigmoid function for binary outputs: g(Vu . Vm) to predict probability that y(i, j) = 1

Let's look at a sample diagram:

image from Deeplearning.Ai's Machine Learning specialization course

This model has a lot of parameters. Each of these layers of neural network has a usual set of parameters of the neural network. We're going to construct a cost function J to train all the parameters of both the user network and the movie network. We're going to use the following cost function and add a neural network regularization term:

We're going to judge the two networks according to how well Vu and Vm predict y(i,j), and with this cost function, we're going to use gradient descent or other optimization algorithm to tune the parameters of the neural network to cause the cost function J to be as small as possible.

After training this model, we can also use it to find similar items:

Vu(j) is a vector that describes user j with features Xu(j)
Vm(i) is a vector that describes movie i with features Xm(i)
To find movie k similar to movie i, look for the movie with small distance between Vm(k) and Vm(i):

Recommending from a large catalog

It becomes computationally infeasible to which out which products to recommend with large catalogs, as we would have to run neural network inference, thousands of millions every time a user shows up on your website.

Many large scale recommender systems are implemented as 2 steps, which are called retrieval and ranking steps. The idea is during the retrieval step, it will generate a large list of plausible item candidates that tries to cover a lot of possible things you might recommend to a user, and then, during the ranking step, fine tune and pick the best items to recommend to the user.

Ethical use of recommender system

Some common goals of a recommender system is to recommend:

Movies most likely to be rated 5 stars by user
Products most likely to be purchased
Ads most likely to be clicked on
Products generating the largest profit
Video leading to maximum watch time

Some examples of problematic cases:

amplify conspiracy theories and hate/toxicity amelioration: filter out problematic content, such as hate speech, fraud, scams, violence
can a ranking system maximize your profit rather than users' welfare presented in a transparent way? amelioration: be transparent with users

TensorFlow implementation of content-based filtering

Implement a user network (user_NN) and movie network (movie_NN):

user_NN = tf.keras.models.Sequential([
    tf.keras.layers.Dense(256, activation='relu'),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(32)
    ])
    
item_NN = tf.keras.models.Sequential([
    tf.keras.layers.Dense(256, activation='relu'),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(32)
    ])

Next, we will need to tell TensorFlow Keras how to feed the user features or the item features to the neural networks:

# create the user input and point to the base network
input_user = tf.keras.layers.Input(shape=(num_user_features))
vu = user_NN(input_user)
vu = tf.linalg.l2_normalize(vu, axis=1)

# create the item input and point to the base network
input_item = tf.keras.layers.Input(shape=(num_item_features))
vm = item_NN(input_item)
vm = tf.linalg.l2_normalize(vm, axis=1)

note: when we normalize the length with l2-norm to basically put the length of vector vu and vm to 1

After computing both Vu and Vm, we have to compute the dot product of both of them:

# measure the similarity of the 2 vector outputs
output = tf.keras.layers.Dot(axes=1)([vu, vm])

# specify the inputs and outputs of the model
model = Model([input_user, input_item], output)

# specify the cost function
cost_fn = tf.keras.losses.MeanSquaredError()

References

Deeplearning.Ai's Machine Learning Specialization Course