Surprise: Movie Recommender System Example
Surprise is an easy-to-use Python scikit for recommender systems.
This example uses MovieLens dataset with 100, 000 5-star ratings and 3,600 tag applications applied to 9,000 movies by 600 users. It was collected at various times by GroupLens, a research lab in the Department of Computer Science and Engineering at the University of Minnesota.
import numpy
conda install -c conda-forge scikit-surprise
Using evaluating RSME, MAE of algorithm SVD.
from surprise import SVD, KNNBasic
from surprise import Dataset
from surprise.model_selection import cross_validate
# Load the movielens-100k dataset (download it if needed).
data = Dataset.load_builtin('ml-100k', prompt = False)
# Use the famous SVD algorithm.
algo = SVD()
# Run 5-fold cross-validation and print results.
cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True);
Using evaluating RSME, MAE of K nearest neighbors algorithm.
# Use KNNBasic().
algo = KNNBasic()
# Run 5-fold cross-validation and print results.
cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True);
Obs 1. SVD seems to be more accurate compared to KNNBasic, but it seems to take longer.
Using evaluating RSME, MAE of algorithm SVD++.
from surprise import SVDpp
from surprise import Dataset
from surprise.model_selection import cross_validate
# Load the movielens-100k dataset (download it if needed).
data = Dataset.load_builtin('ml-100k', prompt = False)
# Use the famous SVD algorithm.
algo = SVDpp()
# Run 5-fold cross-validation and print results.
cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True);
Obs 2. SVD++ takes unreasonably longer.
Obs 3. The error is around 0.93 RSME at best, and this is on a 5-star rating. Although there is no consistency in literature, we can normalize it with the range. And this would loosely mean that our model is 20% undervaluing or overvaluing a rating, based on the data.
Cheers,