Similar films

Computing Similiarity of films (FilmComparator object)

How to run it

To compute film comparators, just run run_compute_film_comparators.sh in the main folder.

The procedure compute_film_comparators, responsible for the whole operation is in the film20/recommendations/bot_helper.py

How does it work

At this moment the procedure is rather simple. We take into consideration four factors: common tags, actors, directors and similiarity of users' ratings.

The first three are rather obvious, we multiply the number of tags that films a and b share, by certain weight. The same goes for actors and directors, but the weights differ.

We sum all those influences

The only problem is optimization, some approaches could be too slow to implement.

The last factor is influence of ratings given to films a and b.

We count the average difference between the rates given to a and b over the set of users who rated both films.

We add this average to the final score with weight that depends on the number of users who rated both films.

Performance issues

The biggest problem is that the algorithm is still too slow. Therefore we have to introduce some limits.

The actors and ratings influence are in fact included only for the pairs of films that have more rates than some defined number and the most popular films. Still there's tag and directors' influence considered for all movies, but it is often insufficient. Unless some optimization is done on the algorithm, we won't be able to improve it's effectivness. 

What's next?

The described algorithm is very straightforward and is rather ineffective when the films are not tagged. These are some ideas on how to improve it:

  • Tag users - we can try to measure user's taste with reference to tags. Ie if the user rates films tagged by comedy generally very low, we can assume he's not a comedy fan. Therefore if he rates some movie high, it is less probable that it is a comedy. Of course the single user won't give us any real information, but if the same situation occurs for reasonable group of users, the probability that our guess is correct increases. The approach has been tested and has given some interesting results, but it still needs more work.
  • Tag actors and directors - Similiar to the above, but now we consider films staring or directed by given person. There are some comedy actors, who appear mainly in this type of films, so again basing on tags, we try to guess the actor's specialization in reference to some tag. Then we can increase that tag weight in movies staring the actor. The same goes for directors.

Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.