40 % of app installs on Google play come from recommendations and 60% of watch time on Youtube come from recommendations
Terminology
Items (documents): the entities a system recommends, for example, apps, movies, videos, books
Query (context): the information a system uses to make recommendations
Queries can be a combination of the following:
user information, such as user id, item(s) a user has previously interacted with
additional context, such as, time of day or user's device
Embedding: a mapping from a discrete set (in this case, the set of queries or the set of items to recommend) to a vector space called the embedding space.
Many recommendation systems rely on learning an appropriate embedding representation
Overview
A common architecture many recommender systems employ consist of the following components:
candidate generation: collaborative and/or content-based filtering
scoring
re-ranking
Candidate generation
During this stage, the system potentially starts with a huge corpus and generate a smaller subset of candidates. for example, the candidate generator of Youtube reduces billions of videos down to hundreds or thousands.
The model needs to evaluate queries quickly given the enormous size of the corpus. A given model may provide multiple candidate generators, each nominating a different subset of candidates
Scoring
Next, another model scores and ranks the candidates in order to select the set of items (on the order of 10) to display to the user.
Since this model evaluates a relatively small subset of items, the system can use a more precise model relying on additional queries
Re-ranking
Finally, the system take into account additional constraints for the final ranking. for example, the system may remove items that the user explicitly dislikes or boosts the score of fresher content.
re-ranking can also help ensure diversity, freshness, and fairness
References
Comments