Recommendation

From P2P-Fusion

Jump to: navigation, search
Recommendation
Depends on: rating, ranking, filtering
Metadata: userids, contentids, (ratings)
Scenarios: all
Communities: all
Importance: high

Recommendation tries to predict user needs and carry relevant content to them.

Contents

Description

This tool helps to discover new relevant content in huge databases by making predictions on what the user may like. Usually it means a personalized list of recommended items in the actual context. This is useful for finding new content which we cannot create a search query for (for example, because we are not familiar with the keywords) or just to get to know unpopular items from the long tail.

A recommendation can also come from a user, who can let other people or group of people know about a content. In this case the recommendation is like a special type of comment for the other user's or group's profile.

Automatic recommendation systems are often realized by collaborative filtering algorithms, which are defining similarity measures between objects and use this knowledge to predict user behavior.

Data

Data for defining similarities can be collected by explicit and implicit methods.

Most popular explicit methods include rating and ranking. Implicit methods are based on the natural actions of the user like purchasing, downloading, viewing or listening to items. In P2P-Fusion, input can be based on both methods: rating and downloading.

Input

Implicit methods don't need any specific user action, just the natural behavior of the user is monitored. For explicit data collecting, the user should give a rating or ranking input.

Nowadays, secondary input is also used to give feedback about the recommendations. The user can mark a recommendation to be excluded from the list. This feedback can be used to make better recommendations in the future.

Recommendations created by another user can be given as a special message in any asynchronous communication tool.

Storage

For Fusion, we need to store ratings and download statistics. These can be stored in a decentralized way, but we cannot access all this data from every peer, so we cannot compute optimal recommendations, but we can still compute good ones.

Output

Usually it is presented in a list ordered by relevancy. This can be a list for a specific user, or a list of similar items for a specific content. Some systems also present user similarities.

Dependencies

Tools used

Tools using this tool

Management

It is part of the system, there is no need to set up or give parameters to this tool.

Interface

In the case of personalized music or video stations, the list of recommendation is not presented, but the content is streamed to the user.

A way for filtering the recommendations is to let the user decide how popular content he/she wants. It is often solved by visualizing a "long tail" and letting the user put a vertical marker on it. In this case, all of the popular content above the marker will be filtered out from the recommendations.

More sophisticated methods can create these list based on the data collected in a shorter period. For example, as taste changes, one would be more interested in the recommendations based on his/her actions only in the last month.

Prevalence

This tool is widely used in areas where there is huge amount of content. Recommendations help users to navigate and discover new content. It always offers a step further, so the user never feels that there is no more relevant content.

Both companies and communities use it, especially those dealing with audiovisual content.

"Netflix members select approximately 60 percent of their movies based on movie recommendations tailored to their individual tastes." ([1])

Technical aspects

The problems are social most of the time (see next chapter). The technical challenge is to make better and better recommendations, to create an algorithmic predictor for user behavior.

The most popular algorithm family is collaborative filtering. There are different modifications, and there are some academic papers about using it in a p2p context. Tribler's current algorithm is one of the first implementations in this area.

Other algorithms use content-based methods. They somehow analyze the content and this way they are able to find similarities between content instances.

Hybrid systems are the mixture of the above two, and have some emerging interest in the past few years.

Social aspects

As content-based recommendations don't have any social component, these aspects only apply for collaborative and hybrid methods.

Community

Norms and rules

Incentives and sanctions

The main incentive of using the system is that it becomes smarter and makes better recommendations as the user trains it with relevant input.

Amazon's approach is that it gives details why they recommend each particular item, so the user can relate to it. According to Amazon, users buy these items more often than before.

Techniques

Problems

A recommendation engine usually has the cold-start problem, which means that the recommendations made are very low quality while there is a low amount of data. Implicit data collection can be useful, because enough data can be collected before actually starting to make recommendations. On the other hand, explicit methods can completely fail, because users experience low quality and they leave the service before the critical amount of data is collected.

Another problem is in connection with toplists. If a service had only a group of (quite homogeneous) people devoted to a specific type of content in the beginning, then that content stays the most popular content forever. It is because the common starting point in a newly discovered site is the toplist, so the site attracts people devoted to the same content. Recommendations are likely to include this popular content, and so making the balance worse.

A new problem comes from the collaborative aspect. If someone tweaks the data collecting, it doesn't make a big difference. But if a group of people does the same, it can make a significant result. This is also the problem with many collaborative tools, like tagging, annotation and rating. A possible solution for this problem is the use of reputation and trust systems.

Existing examples

Amazon - Online store with many products (books, CDs, DVDs etc), they are using recommendations in many different ways. "Customers who bought this item also bought", "What do customers ultimately buy after viewing this item?" list relevant items, and they have usual collaborative recommendation too.

Netflix - Online DVD rental service, where DVDs are recommended according to previous ratings. Recently they offered 1 million dollars for improving their algorithm by at least 10% (see Netflix Prize).

Last.fm - Social music recommendation system based on listening habit of the users. They use recommendation between all objects (artist, track, user etc.) in the systems. Every user can recommend artists and songs to any user and group.

Pandora - Similar to the Last.fm service, but it works more like a content-driven manner, based on the Music Genome Project. Users can define a starting point for the radio (an artist or some songs), and the system will generate a playlist of similar music. Users can also give feedback and get better recommendations.

MovieLens - One of the first movie recommender systems, which is an ongoing research project of University of Minnesota.

P2P file sharing examples

As far as we know, Tribler has the only implementation in this area.

Application in Fusion

  • There should be an easy way to add personal recommendations for a user or a group in the profile. It should be as easy as a comment box.
  • Users can get personal recommendations based on their downloads and ratings. There should be two slides to filter the recommendations. One can set a threshold for the maximum popularity of an item that should be included, and the other can specify the time range of the collected data, so the recommendations are based on those ratings and downloads only.
  • Item-to-item similarities can be defined, so one can easily find similar items. The similar items should be computed everytime when someone downloads or rates that content, and the similar items should be hard-linked to the content.
  • We can recommend users with similar downloads and ratings. In the list of similar users, we should mark the most important similarities between them, so it's easier to relate to the other user. There should be an option to download the other user's activity profile and get detailed information about similarities and differences between them. It is a good starting point for a conversation.
  • Each peer should scan the network for open groups with similar taste, so we can recommend to the user to join a group and we can give detailed information about the group and the similarities.

For audio/video files

In P2P systems

Scenarios

Communities

Related tools

  • search, tagging - alternative ways for navigating through content
  • filtering - automatic recommendation is a special case of it
  • see Dependencies

External links

Personal tools