Embedding Space
Embedding refers to a technique used in machine learning to convert discrete elements, such as words, objects, or categories, into continuous vectors composed of real numbers. It involves creating a connection between discrete variables and continuous values, where each unique element is transformed into a vector containing continuous numerical values. Essentially, embedding allows us to translate categorical information into a more fluid and mathematically manageable form, enabling algorithms to process and analyze such data more effectively.
In the context of neural networks, the embedding space is usually low-dimensional and captures some latent structure of the item or query set. Similar items, such as movies watched by the same user, end up close together in the embedding space, these closeness is defined by a similarity measure.
Both content-based and collaborative filtering map each item and each query (context) to an embedding vector in a common embedding space
Similarity Measures
A similarity measure is a function that takes a pair of embeddings and returns a scalar measuring their similarity. The embeddings can be used for candidate generation as follows: given a query embedding (q), the system looks for item embeddings (x) that are close to q, that means embeddings with high similarity s(q, x)
Most recommendation systems rely on one or more of the following similarity measures to determine the degree of similarity:
cosine: the cosine of the angle between the two vectors
dot product: the dot product of two vectors
Euclidean Distance: the usual distance in Euclidean space
When measuring similarity measures, cosine would measure items with the smallest angle from the query, dot product would rank items with the largest norm, and Euclidean distance would favor items physically closest to the query
Compared to the cosine, the dot product similarity is sensitive to the norm of the embedding and is more likely to recommend items that are already popular, as popular items that appear frequently in the training set tend to have embeddings with large norms.
To counteract this, you may use other variants of similarity measures that put less emphases on the norm of the item, consequently, if new items are initialized with a large norm the system may recommend rare items over more relevant items. By being more careful about embedding initialization, and using appropriate regularization, we may avoid this problem.
For further reading on other similarity measures, check out this article:
Comments