Full Pipeline of the Two-level Recommender System#

In this chapter, we will wrap up all steps from 1.2 to 1.5:

  • Preprocess data with proper two-level validation;

  • Develop candidate generation model with implicit library;

  • Then, move to Catboost and get our reranker - second level model;

  • Finally, evaluate our models: implicit vs implicit + reranker

First, let’s recall what we discussed in Metrics & Validation In recommender systems we have special data split to validate our model - we split data by time for candidates and by users for reranker. Now, we move on to coding.

0. Configuration#

# KION DATA
INTERACTIONS_PATH = 'https://drive.google.com/file/d/1MomVjEwY2tPJ845zuHeTPt1l53GX2UKd/view?usp=share_link'
ITEMS_METADATA_PATH = 'https://drive.google.com/file/d/1XGLUhHpwr0NxU7T4vYNRyaqwSK5HU3N4/view?usp=share_link'
USERS_DATA_PATH = 'https://drive.google.com/file/d/1MCTl6hlhFYer1BTwjzIBfdBZdDS_mK8e/view?usp=share_link'

1. Modules and functions#

# just to make it available to download w/o SSL verification
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

import shap
import numpy as np
import pandas as pd
import datetime as dt

from sklearn.utils import shuffle
from sklearn.model_selection import train_test_split

from lightfm.data import Dataset
from lightfm import LightFM

from catboost import CatBoostClassifier

from tqdm import tqdm
import warnings
warnings.filterwarnings('ignore')

pd.set_option('display.float_format', lambda x: '%.3f' % x)
/home/runner/.cache/pypoetry/virtualenvs/rekko-handbook-y_Nwlfrq-py3.9/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
"is" with a literal. Did you mean "=="?
"is" with a literal. Did you mean "=="?
"is" with a literal. Did you mean "=="?
"is" with a literal. Did you mean "=="?
"is not" with a literal. Did you mean "!="?
"is not" with a literal. Did you mean "!="?

1. 1. Helper functions to avoid copy paste#

def read_parquet_from_gdrive(url, engine: str = 'pyarrow'):
    """
    gets csv data from a given url (taken from file -> share -> copy link)
    :url: example https://drive.google.com/file/d/1BlZfCLLs5A13tbNSJZ1GPkHLWQOnPlE4/view?usp=share_link
    """
    file_id = url.split('/')[-2]
    file_path = 'https://drive.google.com/uc?export=download&id=' + file_id
    data = pd.read_parquet(file_path, engine = engine)

    return data

2. Main#

2.1. Load and preprocess data#

interactions dataset shows list of movies that users watched, along with given total_dur in seconds and watched_pct proportion

# interactions data
interactions = read_parquet_from_gdrive(INTERACTIONS_PATH)
interactions.head()
user_id item_id last_watch_dt total_dur watched_pct
0 176549 9506 2021-05-11 4250 72.000
1 699317 1659 2021-05-29 8317 100.000
2 656683 7107 2021-05-09 10 0.000
3 864613 7638 2021-07-05 14483 100.000
4 964868 9506 2021-04-30 6725 100.000

movies_metadata dataset shows the list of movies existing on OKKO platform

# information about films etc
movies_metadata = read_parquet_from_gdrive(ITEMS_METADATA_PATH)
movies_metadata.head(3)
item_id content_type title title_orig release_year genres countries for_kids age_rating studios directors actors description keywords
0 10711 film Поговори с ней Hable con ella 2002.000 драмы, зарубежные, детективы, мелодрамы Испания NaN 16.000 None Педро Альмодовар Адольфо Фернандес, Ана Фернандес, Дарио Гранди... Мелодрама легендарного Педро Альмодовара «Пого... Поговори, ней, 2002, Испания, друзья, любовь, ...
1 2508 film Голые перцы Search Party 2014.000 зарубежные, приключения, комедии США NaN 16.000 None Скот Армстронг Адам Палли, Брайан Хаски, Дж.Б. Смув, Джейсон ... Уморительная современная комедия на популярную... Голые, перцы, 2014, США, друзья, свадьбы, прео...
2 10716 film Тактическая сила Tactical Force 2011.000 криминал, зарубежные, триллеры, боевики, комедии Канада NaN 16.000 None Адам П. Калтраро Адриан Холмс, Даррен Шалави, Джерри Вассерман,... Профессиональный рестлер Стив Остин («Все или ... Тактическая, сила, 2011, Канада, бандиты, ганг...

users_data contains basic info like gender, age group, income group and kids flag

users_data = read_parquet_from_gdrive(USERS_DATA_PATH)
users_data.head()
user_id age income sex kids_flg
0 973171 age_25_34 income_60_90 М 1
1 962099 age_18_24 income_20_40 М 0
2 1047345 age_45_54 income_40_60 Ж 0
3 721985 age_45_54 income_20_40 Ж 0
4 704055 age_35_44 income_60_90 Ж 0

Now, a bit of preprocessing to avoid noisy data.

# remove redundant data points
interactions_filtered = interactions.loc[interactions['total_dur'] > 300].reset_index(drop = True)
print(interactions.shape, interactions_filtered.shape)
(5476251, 5) (4195689, 5)
# convert to datetime
interactions_filtered['last_watch_dt'] = pd.to_datetime(interactions_filtered['last_watch_dt'])

2.1.2. Train / Test split#

As we dicussed in Validation and metrics [chapter], we need time based split for candidates generation to avoid look-ahead bias. Therefor, let’s set date thresholds

# set dates params for filter
MAX_DATE = interactions_filtered['last_watch_dt'].max()
MIN_DATE = interactions_filtered['last_watch_dt'].min()
TEST_INTERVAL_DAYS = 14
TEST_MAX_DATE = MAX_DATE - dt.timedelta(days = TEST_INTERVAL_DAYS)

print(f"min date in filtered interactions: {MAX_DATE}")
print(f"max date in filtered interactions:: {MIN_DATE}")
print(f"test max date to split:: {TEST_MAX_DATE}")
min date in filtered interactions: 2021-08-22 00:00:00
max date in filtered interactions:: 2021-03-13 00:00:00
test max date to split:: 2021-08-08 00:00:00
# define global train and test
global_train = interactions_filtered.loc[interactions_filtered['last_watch_dt'] < TEST_MAX_DATE]
global_test = interactions_filtered.loc[interactions_filtered['last_watch_dt'] >= TEST_MAX_DATE]

global_train = global_train.dropna().reset_index(drop = True)
print(global_train.shape, global_test.shape)
(3530223, 5) (665015, 5)

Here, we define “local” train and test to use some part of the global train for ranker

local_train_thresh = global_train['last_watch_dt'].quantile(q = .7, interpolation = 'nearest')

print(local_train_thresh)
2021-07-11 00:00:00
local_train = global_train.loc[global_train['last_watch_dt'] < local_train_thresh]
local_test = global_train.loc[global_train['last_watch_dt'] >= local_train_thresh]

print(local_train.shape, local_test.shape)
(2451040, 5) (1079183, 5)

Final filter, we will focus on warm start – remove cold start users

local_test = local_test.loc[local_test['user_id'].isin(local_train['user_id'].unique())]
print(local_test.shape)
(579382, 5)

2.1.2 LightFM Dataset setup#

LightFM provides built-in Dataset class to work with and use in fitting the model.

# init class
dataset = Dataset()

# fit tuple of user and movie interactions
dataset.fit(local_train['user_id'].unique(), local_train['item_id'].unique())

Next, we will need mappers as usual, but with lightfm everything is easier and can be extracted from initiated data class dataset

# now, we define lightfm mapper to use it later for checks
lightfm_mapping = dataset.mapping()
lightfm_mapping = {
    'users_mapping': lightfm_mapping[0],
    'user_features_mapping': lightfm_mapping[1],
    'items_mapping': lightfm_mapping[2],
    'item_features_mapping': lightfm_mapping[3],
}
print('user mapper length - ', len(lightfm_mapping['users_mapping']))
print('user features mapper length - ', len(lightfm_mapping['user_features_mapping']))
print('movies mapper length - ', len(lightfm_mapping['items_mapping']))
print('Users movie features mapper length - ', len(lightfm_mapping['item_features_mapping']))
user mapper length -  539173
user features mapper length -  539173
movies mapper length -  13006
Users movie features mapper length -  13006
# inverted mappers to check recommendations
lightfm_mapping['users_inv_mapping'] = {v: k for k, v in lightfm_mapping['users_mapping'].items()}
lightfm_mapping['items_inv_mapping'] = {v: k for k, v in lightfm_mapping['items_mapping'].items()}
# crate mapper for movie_id and title names
item_name_mapper = dict(zip(movies_metadata['item_id'], movies_metadata['title']))
# special iterator to use with lightfm
def df_to_tuple_iterator(df: pd.DataFrame):
    '''
    :df: pd.DataFrame, interactions dataframe
    returs iterator
    '''
    return zip(*df.values.T)

Finally, built dataset using user_id & item_id

# defining train set on the whole interactions dataset (as HW you will have to split into test and train for evaluation)
train_mat, train_mat_weights = dataset.build_interactions(df_to_tuple_iterator(local_train[['user_id', 'item_id']]))
train_mat
<539173x13006 sparse matrix of type '<class 'numpy.int32'>'
	with 2451040 stored elements in COOrdinate format>
train_mat_weights
<539173x13006 sparse matrix of type '<class 'numpy.float32'>'
	with 2451040 stored elements in COOrdinate format>

2.2. Fit the model#

Set some default parameters for the model

# set params
NO_COMPONENTS = 64
LEARNING_RATE = .03
LOSS = 'warp'
MAX_SAMPLED = 5
RANDOM_STATE = 42
EPOCHS = 20
# init model
lfm_model = LightFM(
    no_components = NO_COMPONENTS,
    learning_rate = LEARNING_RATE,
    loss = LOSS,
    max_sampled = MAX_SAMPLED,
    random_state = RANDOM_STATE
    )

Run training pipeline

# execute training
for _ in tqdm(range(EPOCHS), total = EPOCHS):
    lfm_model.fit_partial(
        train_mat,
        num_threads = 4
    )
  0%|          | 0/20 [00:00<?, ?it/s]
  5%|▌         | 1/20 [00:01<00:34,  1.81s/it]
 10%|█         | 2/20 [00:03<00:27,  1.51s/it]
 15%|█▌        | 3/20 [00:04<00:23,  1.38s/it]
 20%|██        | 4/20 [00:05<00:20,  1.30s/it]
 25%|██▌       | 5/20 [00:06<00:18,  1.25s/it]
 30%|███       | 6/20 [00:07<00:16,  1.20s/it]
 35%|███▌      | 7/20 [00:08<00:15,  1.17s/it]
 40%|████      | 8/20 [00:09<00:13,  1.13s/it]
 45%|████▌     | 9/20 [00:10<00:12,  1.11s/it]
 50%|█████     | 10/20 [00:12<00:10,  1.09s/it]
 55%|█████▌    | 11/20 [00:13<00:09,  1.07s/it]
 60%|██████    | 12/20 [00:14<00:08,  1.06s/it]
 65%|██████▌   | 13/20 [00:15<00:07,  1.05s/it]
 70%|███████   | 14/20 [00:16<00:06,  1.04s/it]
 75%|███████▌  | 15/20 [00:17<00:05,  1.02s/it]
 80%|████████  | 16/20 [00:18<00:04,  1.01s/it]
 85%|████████▌ | 17/20 [00:19<00:03,  1.01s/it]
 90%|█████████ | 18/20 [00:20<00:01,  1.00it/s]
 95%|█████████▌| 19/20 [00:21<00:00,  1.01it/s]
100%|██████████| 20/20 [00:22<00:00,  1.01it/s]
100%|██████████| 20/20 [00:22<00:00,  1.10s/it]

Let’s make sense-check on the output model

top_N = 10
user_id = local_train['user_id'][100]
row_id = lightfm_mapping['users_mapping'][user_id]
print(f'Rekko for user {user_id}, row number in matrix - {row_id}')
Rekko for user 713676, row number in matrix - 62
# item indices
all_cols = list(lightfm_mapping['items_mapping'].values())
len(all_cols)

# predictions
pred = lfm_model.predict(
    row_id,
    all_cols,
    num_threads = 4)
pred, pred.shape

# sort and final postprocessing
top_cols = np.argpartition(pred, -np.arange(top_N))[-top_N:][::-1]
top_cols
array([  5,  87, 298, 506, 675, 294, 435, 132, 933, 868])
# pandas dataframe for convenience
recs = pd.DataFrame({'col_id': top_cols})
recs['item_id'] = recs['col_id'].map(lightfm_mapping['items_inv_mapping'].get)
recs['title'] = recs['item_id'].map(item_name_mapper)
recs
col_id item_id title
0 5 7571 100% волк
1 87 16166 Зверополис
2 298 13915 Вперёд
3 506 10761 Моана
4 675 13159 Рататуй
5 294 9164 ВАЛЛ-И
6 435 13018 Король лев (2019)
7 132 11985 История игрушек 4
8 933 13243 Головоломка
9 868 7889 Миа и белый лев

In the end, we need to make predictions on all local_test users to use this sample to train reranker model. As I have mentioned earlier, in reranker we split randomly by users.

# make predictions for all users in test
local_test_preds = pd.DataFrame({
    'user_id': local_test['user_id'].unique()
})
len(local_test_preds)
144739
def generate_lightfm_recs_mapper(
        model: object,
        item_ids: list,
        known_items: dict,
        user_features: list,
        item_features: list,
        N: int,
        user_mapping: dict,
        item_inv_mapping: dict,
        num_threads: int = 4
        ):
    def _recs_mapper(user):
        user_id = user_mapping[user]
        recs = model.predict(
            user_id,
            item_ids,
            user_features = user_features,
            item_features = item_features,
            num_threads = num_threads)
        
        additional_N = len(known_items[user_id]) if user_id in known_items else 0
        total_N = N + additional_N
        top_cols = np.argpartition(recs, -np.arange(total_N))[-total_N:][::-1]
        
        final_recs = [item_inv_mapping[item] for item in top_cols]
        if additional_N > 0:
            filter_items = known_items[user_id]
            final_recs = [item for item in final_recs if item not in filter_items]
        return final_recs[:N]
    return _recs_mapper
# init mapper to get predictions
mapper = generate_lightfm_recs_mapper(
    lfm_model, 
    item_ids = all_cols, 
    known_items = dict(),
    N = top_N,
    user_features = None, 
    item_features = None, 
    user_mapping = lightfm_mapping['users_mapping'],
    item_inv_mapping = lightfm_mapping['items_inv_mapping'],
    num_threads = 20
)
# get predictions
local_test_preds['item_id'] = local_test_preds['user_id'].map(mapper)

Prettify predictions to use in catboost - make list to rows and add rank

local_test_preds = local_test_preds.explode('item_id')
local_test_preds['rank'] = local_test_preds.groupby('user_id').cumcount() + 1 
local_test_preds['item_name'] = local_test_preds['item_id'].map(item_name_mapper)
print(f'Data shape{local_test_preds.shape}')
local_test_preds.head()
Data shape(1447390, 4)
user_id item_id rank item_name
0 646903 16361 1 Doom: Аннигиляция
0 646903 10440 2 Хрустальный
0 646903 1554 3 Последний богатырь: Корень зла
0 646903 11237 4 День города
0 646903 13865 5 Девятаев
# sense check for diversity of recommendations
local_test_preds.item_id.nunique()
1763

2.3. CatBoostClassifier (ReRanker)#

2.3.1. Data preparation#

We need to creat 0/1 as indication of interaction:

  • positive event – 1, if watch_pct is not null;

  • negative venet – 0 otherwise

positive_preds = pd.merge(local_test_preds, local_test, how = 'inner', on = ['user_id', 'item_id'])
positive_preds['target'] = 1
positive_preds.shape
(77276, 8)
negative_preds = pd.merge(local_test_preds, local_test, how = 'left', on = ['user_id', 'item_id'])
negative_preds = negative_preds.loc[negative_preds['watched_pct'].isnull()].sample(frac = .2)
negative_preds['target'] = 0
negative_preds.shape
(274023, 8)

Random split by users to train reranker

train_users, test_users = train_test_split(
    local_test['user_id'].unique(),
    test_size = .2,
    random_state = 13
    )

Set up train/test set and shuffle samples

cbm_train_set = shuffle(
    pd.concat(
    [positive_preds.loc[positive_preds['user_id'].isin(train_users)],
    negative_preds.loc[negative_preds['user_id'].isin(train_users)]]
    )
)
cbm_test_set = shuffle(
    pd.concat(
    [positive_preds.loc[positive_preds['user_id'].isin(test_users)],
    negative_preds.loc[negative_preds['user_id'].isin(test_users)]]
    )
)
print(f'TRAIN: {cbm_train_set.describe()} \n, TEST: {cbm_test_set.describe()}')
TRAIN:           user_id       rank   total_dur  watched_pct     target
count  280638.000 280638.000   61573.000    61573.000 280638.000
mean   550643.994      5.295   18589.373       65.233      0.219
std    316631.666      2.888   37194.814       36.878      0.414
min        11.000      1.000     301.000        0.000      0.000
25%    276530.750      3.000    3944.000       25.000      0.000
50%    551313.500      5.000    7791.000       80.000      0.000
75%    825260.000      8.000   22652.000      100.000      0.000
max   1097528.000     10.000 3086101.000      100.000      1.000 
, TEST:           user_id      rank  total_dur  watched_pct    target
count   70661.000 70661.000  15703.000    15703.000 70661.000
mean   548997.041     5.298  18806.697       65.001     0.222
std    317709.942     2.886  35039.585       37.018     0.416
min       106.000     1.000    301.000        0.000     0.000
25%    271675.000     3.000   3824.000       25.000     0.000
50%    548768.000     5.000   7750.000       80.000     0.000
75%    825085.000     8.000  22587.500      100.000     0.000
max   1097486.000    10.000 782421.000      100.000     1.000
# in this tutorial, I will not do any feature aggregation - use default ones from data
USER_FEATURES = ['age', 'income', 'sex', 'kids_flg']
ITEM_FEATURES = ['content_type', 'release_year', 'for_kids', 'age_rating']

Prepare final datasets - joins user and item features

cbm_train_set = pd.merge(cbm_train_set, users_data[['user_id'] + USER_FEATURES],
                         how = 'left', on = ['user_id'])
cbm_test_set = pd.merge(cbm_test_set, users_data[['user_id'] + USER_FEATURES],
                        how = 'left', on = ['user_id'])
# joins item features
cbm_train_set = pd.merge(cbm_train_set, movies_metadata[['item_id'] + ITEM_FEATURES],
                         how = 'left', on = ['item_id'])
cbm_test_set = pd.merge(cbm_test_set, movies_metadata[['item_id'] + ITEM_FEATURES],
                        how = 'left', on = ['item_id'])

print(cbm_train_set.shape, cbm_test_set.shape)
(280638, 16) (70661, 16)
cbm_train_set.head()
user_id item_id rank item_name last_watch_dt total_dur watched_pct target age income sex kids_flg content_type release_year for_kids age_rating
0 368152 10440 3 Хрустальный 2021-07-27 2177.000 10.000 1 age_25_34 income_20_40 М 1.000 series 2021.000 NaN 18.000
1 204531 9728 6 Гнев человеческий NaT NaN NaN 0 age_35_44 income_90_150 М 0.000 film 2021.000 NaN 18.000
2 1036370 13865 1 Девятаев NaT NaN NaN 0 age_35_44 income_20_40 Ж 1.000 film 2021.000 NaN 12.000
3 55723 9728 1 Гнев человеческий 2021-07-16 6531.000 96.000 1 age_18_24 income_40_60 М 1.000 film 2021.000 NaN 18.000
4 85953 3182 5 Ральф против Интернета NaT NaN NaN 0 age_25_34 income_40_60 Ж 1.000 film 2018.000 NaN 6.000

Set necessary cols to filter out sample

ID_COLS = ['user_id', 'item_id']
TARGET = ['target']
CATEGORICAL_COLS = ['age', 'income', 'sex', 'content_type']
DROP_COLS = ['item_name', 'last_watch_dt', 'watched_pct', 'total_dur']
X_train, y_train = cbm_train_set.drop(ID_COLS + DROP_COLS + TARGET, axis = 1), cbm_train_set[TARGET]
X_test, y_test = cbm_test_set.drop(ID_COLS + DROP_COLS + TARGET, axis = 1), cbm_test_set[TARGET]
print(X_train.shape, X_test.shape)
(280638, 9) (70661, 9)

Fill missing values with mode - just in case by default

X_train = X_train.fillna(X_train.mode().iloc[0])
X_test = X_test.fillna(X_test.mode().iloc[0])

2.3.2 Train the model#

cbm_classifier = CatBoostClassifier(
    loss_function = 'CrossEntropy',
    iterations = 5000,
    learning_rate = .1,
    depth = 6,
    random_state = 1234,
    verbose = True
)
cbm_classifier.fit(
    X_train, y_train,
    eval_set=(X_test, y_test),
    early_stopping_rounds = 100, # to avoid overfitting,
    cat_features = CATEGORICAL_COLS,
    verbose = False
)
<catboost.core.CatBoostClassifier at 0x7f4dcfc2d0d0>

2.3.3. Model Evaluation#

Let’s make basic shapley plot to investigate feature importance. We expect that rank - predicted order from LightFM - must be on top

explainer = shap.TreeExplainer(cbm_classifier)
shap_values = explainer.shap_values(X_train)
shap.summary_plot(shap_values, X_train, show = False, color_bar = False)
../_images/154fd5b1075539cd9414629e795ddf3e2ecd989a55c9b267d9cbce09302724cb.png

Let’s see performance of the classifier

# predictions on test
from sklearn.metrics import roc_auc_score
y_test_pred = cbm_classifier.predict_proba(X_test)

print(f"ROC AUC score = {roc_auc_score(y_test, y_test_pred[:, 1]):.2f}")
ROC AUC score = 0.68

2.4. Evaluation on global test#

Here, we compare predictions of two models - LightFM vs LightFM + CatBoost. First, let’s calculate predictions from both models - here we generate candidates via LightFM.

global_test_predictions = pd.DataFrame({
    'user_id': global_test['user_id'].unique()
        }
    )

# filter out cold start users
global_test_predictions = global_test_predictions.loc[global_test_predictions['user_id'].isin(local_train.user_id.unique())]
# set param for number of candidates
top_k = 100

# generate list of watched titles to filter
watched_movies = local_train.groupby('user_id')['item_id'].apply(list).to_dict()

mapper = generate_lightfm_recs_mapper(
    lfm_model, 
    item_ids = all_cols, 
    known_items = watched_movies,
    N = top_k,
    user_features = None, 
    item_features = None, 
    user_mapping = lightfm_mapping['users_mapping'],
    item_inv_mapping = lightfm_mapping['items_inv_mapping'],
    num_threads = 10
)

global_test_predictions['item_id'] = global_test_predictions['user_id'].map(mapper)
global_test_predictions = global_test_predictions.explode('item_id').reset_index(drop=True)
global_test_predictions['rank'] = global_test_predictions.groupby('user_id').cumcount() + 1 

Now, we can move to reranker to make predictions and make new order. Beforehand, we need to prepare data for reranker

cbm_global_test = pd.merge(global_test_predictions, users_data[['user_id'] + USER_FEATURES],
                         how = 'left', on = ['user_id'])

cbm_global_test = pd.merge(cbm_global_test, movies_metadata[['item_id'] + ITEM_FEATURES],
                         how = 'left', on = ['item_id'])
cbm_global_test.head()
user_id item_id rank age income sex kids_flg content_type release_year for_kids age_rating
0 203219 15297 1 NaN NaN NaN NaN series 2021.000 NaN 18.000
1 203219 10440 2 NaN NaN NaN NaN series 2021.000 NaN 18.000
2 203219 4151 3 NaN NaN NaN NaN series 2021.000 NaN 18.000
3 203219 4880 4 NaN NaN NaN NaN series 2021.000 NaN 18.000
4 203219 2657 5 NaN NaN NaN NaN series 2021.000 NaN 16.000

Fill missing values with the most frequent values

cbm_global_test = cbm_global_test.fillna(cbm_global_test.mode().iloc[0])

Predict scores to get ranks

cbm_global_test['cbm_preds'] = cbm_classifier.predict_proba(cbm_global_test[X_train.columns])[:, 1]
cbm_global_test.head()
user_id item_id rank age income sex kids_flg content_type release_year for_kids age_rating cbm_preds
0 203219 15297 1 age_35_44 income_20_40 М 0.000 series 2021.000 0.000 18.000 0.364
1 203219 10440 2 age_35_44 income_20_40 М 0.000 series 2021.000 0.000 18.000 0.336
2 203219 4151 3 age_35_44 income_20_40 М 0.000 series 2021.000 0.000 18.000 0.287
3 203219 4880 4 age_35_44 income_20_40 М 0.000 series 2021.000 0.000 18.000 0.269
4 203219 2657 5 age_35_44 income_20_40 М 0.000 series 2021.000 0.000 16.000 0.148
# define cbm rank
cbm_global_test = cbm_global_test.sort_values(by = ['user_id', 'cbm_preds'], ascending = [True, False])
cbm_global_test['cbm_rank'] = cbm_global_test.groupby('user_id').cumcount() + 1
cbm_global_test.head()
user_id item_id rank age income sex kids_flg content_type release_year for_kids age_rating cbm_preds cbm_rank
5673204 14 9728 5 age_35_44 income_20_40 М 0.000 film 2021.000 0.000 18.000 0.368 1
5673200 14 10440 1 age_35_44 income_20_40 М 0.000 series 2021.000 0.000 18.000 0.364 2
5673201 14 15297 2 age_35_44 income_20_40 М 0.000 series 2021.000 0.000 18.000 0.336 3
5673202 14 13865 3 age_35_44 income_20_40 М 0.000 film 2021.000 0.000 12.000 0.327 4
5673205 14 3734 6 age_35_44 income_20_40 М 0.000 film 2021.000 0.000 16.000 0.279 5

Finally, let’s move on to comparison

  • define function to calculate matrix-based metrics;

  • create table of metrics for both models

def calc_metrics(df_true, df_pred, k: int = 10, target_col = 'rank'):
    """
    calculates confusion matrix based metrics
    :df_true: pd.DataFrame
    :df_pred: pd.DataFrame
    :k: int, 
    """
    # prepare dataset
    df = df_true.set_index(['user_id', 'item_id']).join(df_pred.set_index(['user_id', 'item_id']))
    df = df.sort_values(by = ['user_id', target_col])
    df['users_watch_count'] = df.groupby(level = 'user_id')[target_col].transform(np.size)
    df['cumulative_rank'] = df.groupby(level = 'user_id').cumcount() + 1
    df['cumulative_rank'] = df['cumulative_rank'] / df[target_col]
    
    # params to calculate metrics
    output = {}
    num_of_users = df.index.get_level_values('user_id').nunique()

    # calc metrics
    df[f'hit@{k}'] = df[target_col] <= k
    output[f'Precision@{k}'] = (df[f'hit@{k}'] / k).sum() / num_of_users
    output[f'Recall@{k}'] = (df[f'hit@{k}'] / df['users_watch_count']).sum() / num_of_users
    output[f'MAP@{k}'] = (df["cumulative_rank"] / df["users_watch_count"]).sum() / num_of_users
    print(f'Calculated metrics for top {k}')
    return output
# first-level only - LightFM
lfm_metrics = calc_metrics(global_test, global_test_predictions)
lfm_metrics
Calculated metrics for top 10
{'Precision@10': 0.009759609148687794,
 'Recall@10': 0.04728422132567612,
 'MAP@10': 0.023561604793177825}
# LightFM + ReRanker
full_pipeline_metrics = calc_metrics(global_test, cbm_global_test, target_col = 'cbm_rank')
full_pipeline_metrics
Calculated metrics for top 10
{'Precision@10': 0.009676164122962514,
 'Recall@10': 0.04739903010213416,
 'MAP@10': 0.024951027231271457}

Prettify both metrics calculation results for convenience

metrics_table = pd.concat(
    [pd.DataFrame([lfm_metrics]),
    pd.DataFrame([full_pipeline_metrics])],
    ignore_index = True
)
metrics_table.index = ['LightFM', 'FullPipeline']

# calc relative diff
metrics_table = metrics_table.append(metrics_table.pct_change().iloc[-1].mul(100).rename('lift_by_ranker, %'))

metrics_table
Precision@10 Recall@10 MAP@10
LightFM 0.010 0.047 0.024
FullPipeline 0.010 0.047 0.025
lift_by_ranker, % -0.855 0.243 5.897

Thus, with a few number of features we could improve our metrics using reranker. Further, imagine how it can be improved if we add more features and fine tune the reranker

Source & further recommendations#