Surprise: Movie Recommender System Example

Surprise is an easy-to-use Python scikit for recommender systems.
This example uses MovieLens dataset with 100, 000 5-star ratings and 3,600 tag applications applied to 9,000 movies by 600 users. It was collected at various times by GroupLens, a research lab in the Department of Computer Science and Engineering at the University of Minnesota.
import numpyconda install -c conda-forge scikit-surpriseUsing evaluating RSME, MAE of algorithm SVD.
from surprise import SVD, KNNBasicfrom surprise import Datasetfrom surprise.model_selection import cross_validate# Load the movielens-100k dataset (download it if needed).data = Dataset.load_builtin('ml-100k', prompt = False)# Use the famous SVD algorithm.algo = SVD()# Run 5-fold cross-validation and print results.cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True);Using evaluating RSME, MAE of K nearest neighbors algorithm.
# Use KNNBasic().algo = KNNBasic()# Run 5-fold cross-validation and print results.cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True);Obs 1. SVD seems to be more accurate compared to KNNBasic, but it seems to take longer.
Using evaluating RSME, MAE of algorithm SVD++.
from surprise import SVDppfrom surprise import Datasetfrom surprise.model_selection import cross_validate# Load the movielens-100k dataset (download it if needed).data = Dataset.load_builtin('ml-100k', prompt = False)# Use the famous SVD algorithm.algo = SVDpp()# Run 5-fold cross-validation and print results.cross_validate(algo, data, measures=['RMSE', 'MAE'], cv=5, verbose=True);Obs 2. SVD++ takes unreasonably longer.
Obs 3. The error is around 0.93 RSME at best, and this is on a 5-star rating. Although there is no consistency in literature, we can normalize it with the range. And this would loosely mean that our model is 20% undervaluing or overvaluing a rating, based on the data.
Cheers,