小白机器学期笔记1 Surprise 使用

1
Surprise python 推荐系统库
1
2
安装 pip install Surprise
`

基本使用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# -*- coding: UTF-8 -*-
from surprise import SVD
from surprise import Dataset
from surprise import evaluate, print_perf

# 自动载入movielens 数据集

data = Dataset.load_builtin("ml-100k")

# k折交叉验证
data.split(n_folds=3)
# 试一把SVD矩阵分解
algo = SVD()
# 在数据集上测试一下效果
perf = evaluate(algo, data, measures=['RMSE', 'MAE'])
#输出结果
print_perf(perf)
RMSE: 0.9454
MAE: 0.7450
------------
Fold 2
RMSE: 0.9411
MAE: 0.7429
------------
Fold 3
RMSE: 0.9431
MAE: 0.7429
------------
------------
Mean RMSE: 0.9432
Mean MAE : 0.7436
------------
------------
Fold 1 Fold 2 Fold 3 Mean
RMSE 0.9454 0.9411 0.9431 0.9432
MAE 0.7450 0.7429 0.7429 0.7436
PyDev console: starting.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

# 手动载入数据集,指定文件所在位置
file_path = os.path.expanduser('~/.surprise_data/ml-100k/ml-100k/u.data')

reader = Reader(line_format='user item rating timestamp', sep='\t')

data = Dataset.load_from_file(file_path, reader)

data.split(n_folds=5)

algo = SVD()

pref = evaluate(algo, data, measures=['rmse', 'mae'])
#输出结果
print_perf(pref)
RMSE: 0.9335
MAE: 0.7383
------------
Fold 2
RMSE: 0.9409
MAE: 0.7396
------------
Fold 3
RMSE: 0.9331
MAE: 0.7339
------------
Fold 4
RMSE: 0.9352
MAE: 0.7361
------------
Fold 5
RMSE: 0.9416
MAE: 0.7422
------------
------------
Mean RMSE: 0.9369
Mean MAE : 0.7380
------------