We live in the digital age where most of our activities and services are carried out over the internet. Items such as music, movies, products etc. are being consumed over the web by millions of users. The number of such items is large enough that it is impossible for a user to experience everything. This is where recommender systems come into play. Recommender systems are employed to play a crucial role of filtering and ranking items to each user based on their individual preferences. Recommender systems essentially assist the user in making decisions to overcome the problem of information overload. These systems are responsible for understanding a user's interests and inferring their needs over time. Recommender systems are widely employed across the web and in many cases, are the core aspect of a business. For example, on Quora, a question-answering website, the entire interface relies on the recommender system for deciding what content to display to the user. The content ranges from homepage question ranking, topics recommendation and answer ranking. The goal of a recommender system is to assist users in selecting items based on their personal interest. By doing so, it also increases the number of transactions thereby creating a win-win situation for both the end users and the web service. Recommender systems is a relatively new and exciting field that promises a huge potential in the future. It has originated from the field of information retrieval and search engines where the task was: given a query retrieve the most relevant documents. In the recommender system domain, the user should be able to discover items that he/she would not have been able to search for directly. One main challenge in recommender systems is cold-start. It is defined as the situation when a new user/item joins the system. We are interested in item cold start and in this case the recommender system needs to learn about the new item and decide which users should it recommend to. In this work, we propose a new approach to tackle the cold-start problem in recommender system using word embeddings. Word embeddings are semantic representations of the words in a mathematical form like vectors. Embeddings are very useful since they are able to capture the semantic relationship between words in the vocabulary. There are various methods to generate such a mapping which include: neural networks, dimensionality reduction on word co-occurrence matrix, probabilistic models etc. The underlying concept behind these approaches is that words that share common contexts in the corpus have close proximity in the semantic space. Word2vec is a popular technique by Mikolov et al. that has gained tremendous popularity in the natural language processing domain. They came up with two versions, namely: continuous skip-gram and continuous bag-of-words model (CBOW). They were able to overcome the problem of sparsity in text and demonstrate its effectiveness in a wide range of NLP tasks. Our dataset is based on a popular website called Delicious which allows users to store, share and discover bookmarks on the web. For each bookmark, users are able to generate tags that provide meta information about the page such as the topics discussed, important entities. For example, a website about research might contain tags like science, biology, experiment. The problem now becomes: Given a new bookmark with tags, compute which users to recommend this new bookmark. For item cold start situation, a popular technique is to use content based approaches and find items similar to the new item. The new item can then be recommended to users of the computed similar items. In this paper, we propose a method to compute similar items using word embeddings of the tags present for each bookmark. Our methodology involves representing each bookmark as a vector by combining the word embeddings of its tags. There are various possible aggregation mechanisms and we chose to use the average in our experiments since it is intuitive and easy to compute. The similarity between two bookmarks can be computed by taking the cosine similarity between their corresponding embedding vectors. The total number of bookmarks in the dataset is around 70,000 with around 54,000 tags. The embeddings are obtained from the GloVe project where the training is performed on Wikipedia data based on aggregated global word-word co-occurrence statistics. The vocabulary of these embeddings are fairly large, containing about 400 k words and each word is stored in the form of a 300-dimension vector. The results were evaluated manually and the results look promising. We found that the bookmarks recommended were highly relevant in terms of the topics being discussed. Some example topics being discussed in the bookmarks were: social media analysis, movie reviews, vacation planner, web development etc. The reason that embeddings perform well is that they are to capture the semantic information of bookmarks using tags which is useful in cold start situations. Our future work would involve using other aggregation combinations such as weighting the tags differently based on their importance. A more suitable method of evaluation would be to measure the feedback (ratings/engagement) from users in a live recommender system and compare along with other approaches. In this work, we demonstrate the feasibility of using word embeddings to tackle the item cold start problem in recommender systems. This is an important problem that can deliver a positive impact in improving the performance of recommender systems.


Article metrics loading...

Loading full text...

Full text loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error