We use django-taggingas the basis for our tagging mechanism. We extend it to add some Filmaster-specific features.
Tagging module
The module responsible for tagging is called simply tagging. All the code that affects tags is stored there.
Which objects can be tagged?
Basically all objects, but currently only films and film people can be tagged.
How are the tags localized?
Each tag belongs to a specified localization and is not visible for tagging objects in other localizations. In current solution, there is no translation between tags in different localizations (i.e. ('drama', 'en' ) and ('dramat', 'pl')) and each localization of the service has it's own, independent set of tags for every film. So, the objects actually tagged are ObjectLocalized instances.
How do tag aliases work?
Tag aliases are simple mappings between texts and tag instances. When resolving a tag from a name specified when users tag objects, a lookup is first made for a matching alias. If such an alias is found, the tag specified by it is used and no further search takes place. Tag aliases are not localization specific - they can be used in all localization and can't map to different tags in each one of them.
How are the similar movies computed?
Every tag is assigned a positive weight, based on the frequency in which it occurs. This weight is calculated as: round( 100 * log_2(2 + (number of tagged items in the localization +10.0) / (number of items tagged with the given tag +10.0)))
The formula has been chosen experimentally so that for current data the weight of the most popular tag (over 1/3 of all items tagged) is less then 6 times the weight of a tag with single occurance, and also so that weight doesn't differ drastically between tags with only a couple occurences. Objects considered most similar tag-wise to a given object are those, for which the total sum of weights of all tags common with the object.
All films by the same director are considered to be sharing a 'virtual' tag with a small weight (comparable with weight of the 'drama' tag) to move them up in the list of similar films. Films with the same similarity weight are sorted based on popularity (number of ratings).