ML Zoomcamp 2023 – Decision Trees and Ensemble Learning– Part 11

  1. Gradient boosting and XGBoost – Part 2/2
    1. Performance Monitoring
    2. Parsing xgboost’s monitoring output

Gradient boosting and XGBoost – Part 2/2

This is part 2 of ‘Gradient boosting and XGBoost.’ In the first part, we compared random forests and gradient boosting, followed by the installation of XGBoost and training our first XGBoost model. In this chapter, we delve into performance monitoring.

Performance Monitoring

In XGBoost, it’s feasible to monitor the performance of the training process, allowing us to closely observe each stage of the training procedure. To achieve this, after each iteration where a new tree is trained, we can promptly evaluate its performance on our validation data to assess the results. For this purpose, we can establish a watchlist that comprises the datasets we intend to use for evaluation.

watchlist = [(dtrain, 'train'), (dval, 'val')]

By default, XGBoost displays the error rate (logloss), a metric commonly used for parameter tuning. However, given the technical nature of this metric, we’ll opt for another more accessible metric for our analysis.

xgb_params = {
    'eta': 0.3, 
    'max_depth': 6,
    'min_child_weight': 1,
    
    'objective': 'binary:logistic',

    'nthread': 8,
    'seed': 1,
    'verbosity': 1,
}

model = xgb.train(xgb_params, dtrain, num_boost_round=200,
                  evals=watchlist)

# Output:
# [0]	train-logloss:0.49703	val-logloss:0.54305
# [1]	train-logloss:0.44463	val-logloss:0.51462
# [2]	train-logloss:0.40707	val-logloss:0.49896
# [3]	train-logloss:0.37760	val-logloss:0.48654
# [4]	train-logloss:0.35990	val-logloss:0.48007
# [5]	train-logloss:0.33931	val-logloss:0.47563
# [6]	train-logloss:0.32586	val-logloss:0.47413
# [7]	train-logloss:0.31409	val-logloss:0.47702
# [8]	train-logloss:0.29962	val-logloss:0.48205
# [9]	train-logloss:0.29216	val-logloss:0.47996
# [10]	train-logloss:0.28407	val-logloss:0.47969
# ...
# [195]	train-logloss:0.02736	val-logloss:0.67492
# [196]	train-logloss:0.02728	val-logloss:0.67518
# [197]	train-logloss:0.02693	val-logloss:0.67791
# [198]	train-logloss:0.02665	val-logloss:0.67965
# [199]	train-logloss:0.02642	val-logloss:0.68022

For our monitoring purposes, we choose to use AUC as the metric, which we’ve previously employed. To do this, we set the eval_metric parameter to auc.

xgb_params = {
    'eta': 0.3, 
    'max_depth': 6,
    'min_child_weight': 1,
    
    'objective': 'binary:logistic',
    'eval_metric': 'auc',

    'nthread': 8,
    'seed': 1,
    'verbosity': 1,
}

model = xgb.train(xgb_params, dtrain, num_boost_round=200,
                  evals=watchlist)

# Output:
# [0]	train-auc:0.86730	val-auc:0.77938
# [1]	train-auc:0.89140	val-auc:0.78964
# [2]	train-auc:0.90699	val-auc:0.79010
# [3]	train-auc:0.91677	val-auc:0.79967
# [4]	train-auc:0.92246	val-auc:0.80443
# [5]	train-auc:0.93086	val-auc:0.80858
# [6]	train-auc:0.93675	val-auc:0.80981
# [7]	train-auc:0.94108	val-auc:0.80872
# [8]	train-auc:0.94809	val-auc:0.80456
# [9]	train-auc:0.95100	val-auc:0.80653
# [10]	train-auc:0.95447	val-auc:0.80851
# ...
# [195]	train-auc:1.00000	val-auc:0.80708
# [196]	train-auc:1.00000	val-auc:0.80759
# [197]	train-auc:1.00000	val-auc:0.80718
# [198]	train-auc:1.00000	val-auc:0.80719
# [199]	train-auc:1.00000	val-auc:0.80725

The AUC on the training data reaches perfect accuracy (equal to one), but on the validation data, the performance remains relatively stable at around 80%. This suggests that the model may be overfitting.

To make this output more user-friendly, it would be beneficial to visualize it. Instead of printing output for every epoch, we can use verbose_eval=5 to display results only for every 5th epoch, making the monitoring process more manageable.

xgb_params = {
    'eta': 0.3, 
    'max_depth': 6,
    'min_child_weight': 1,
    
    'objective': 'binary:logistic',
    'eval_metric': 'auc',

    'nthread': 8,
    'seed': 1,
    'verbosity': 1,
}

model = xgb.train(xgb_params, dtrain, num_boost_round=200,
                  verbose_eval=5,
                  evals=watchlist)

# Output: 
# [0]	    train-auc:0.86730	val-auc:0.77938
# [5]	    train-auc:0.93086	val-auc:0.80858
# [10]	    train-auc:0.95447	val-auc:0.80851
# [15]	    train-auc:0.96554	val-auc:0.81334
# [20]	    train-auc:0.97464	val-auc:0.81729
# [25]	    train-auc:0.97953	val-auc:0.81686
# [30]	    train-auc:0.98579	val-auc:0.81543
# [35]	    train-auc:0.99011	val-auc:0.81206
# [40]	    train-auc:0.99421	val-auc:0.80922
# [45]	    train-auc:0.99548	val-auc:0.80842
# [50]	    train-auc:0.99653	val-auc:0.80918
# [55]	    train-auc:0.99765	val-auc:0.81114
# [60]	    train-auc:0.99817	val-auc:0.81172
# [65]	    train-auc:0.99887	val-auc:0.80798
# [70]	    train-auc:0.99934	val-auc:0.80870
# [75]	    train-auc:0.99965	val-auc:0.80555
# [80]	    train-auc:0.99979	val-auc:0.80549
# [85]	    train-auc:0.99988	val-auc:0.80374
# [90]	    train-auc:0.99993	val-auc:0.80409
# [95]	    train-auc:0.99996	val-auc:0.80548
# [100]	train-auc:0.99998	val-auc:0.80509
# [105]	train-auc:0.99999	val-auc:0.80629
# [110]	    train-auc:1.00000	val-auc:0.80637
# [115]	    train-auc:1.00000	val-auc:0.80494
# [120]	train-auc:1.00000	val-auc:0.80574
# [125]	train-auc:1.00000	val-auc:0.80727
# [130]	train-auc:1.00000	val-auc:0.80746
# [135]	train-auc:1.00000	val-auc:0.80753
# [140]	train-auc:1.00000	val-auc:0.80899
# [145]	train-auc:1.00000	val-auc:0.80733
# [150]	train-auc:1.00000	val-auc:0.80841
# [155]	train-auc:1.00000	val-auc:0.80734
# [160]	train-auc:1.00000	val-auc:0.80711
# [165]	train-auc:1.00000	val-auc:0.80707
# [170]	train-auc:1.00000	val-auc:0.80734
# [175]	train-auc:1.00000	val-auc:0.80704
# [180]	train-auc:1.00000	val-auc:0.80723
# [185]	train-auc:1.00000	val-auc:0.80678
# [190]	train-auc:1.00000	val-auc:0.80672
# [195]	train-auc:1.00000	val-auc:0.80708
# [199]	train-auc:1.00000	val-auc:0.80725

Parsing xgboost’s monitoring output

When you’re interested in visualizing this information on a plot, one of the challenges with XGBoost is that it doesn’t provide an easy way to extract this information since it’s printed to standard output. However, in Jupyter Notebook, there’s a method to capture whatever is printed to standard output and manipulate it. You can use the command %%capture output to achieve this. It captures all the content that the code outputs into a special object, which you can then use to extract the information. It’s worth noting that although something is happening in the code, we won’t see any output because it’s being captured.

%%capture output

xgb_params = {
    'eta': 0.3, 
    'max_depth': 6,
    'min_child_weight': 1,
    
    'objective': 'binary:logistic',
    'eval_metric': 'auc',

    'nthread': 8,
    'seed': 1,
    'verbosity': 1,
}

model = xgb.train(xgb_params, dtrain, num_boost_round=200,
                  verbose_eval=5,
                  evals=watchlist)
s = output.stdout

print(s)
# Output:
# [0]	    train-auc:0.86730	val-auc:0.77938
# [5]	    train-auc:0.93086	val-auc:0.80858
# [10]	    train-auc:0.95447	val-auc:0.80851
# [15]	    train-auc:0.96554	val-auc:0.81334
# [20]	    train-auc:0.97464	val-auc:0.81729
# [25]	    train-auc:0.97953	val-auc:0.81686
# [30]	    train-auc:0.98579	val-auc:0.81543
# [35]	    train-auc:0.99011	val-auc:0.81206
# [40]	    train-auc:0.99421	val-auc:0.80922
# [45]	    train-auc:0.99548	val-auc:0.80842
# [50]	    train-auc:0.99653	val-auc:0.80918
# [55]	    train-auc:0.99765	val-auc:0.81114
# [60]	    train-auc:0.99817	val-auc:0.81172
# [65]	    train-auc:0.99887	val-auc:0.80798
# [70]	    train-auc:0.99934	val-auc:0.80870
# [75]	    train-auc:0.99965	val-auc:0.80555
# [80]	    train-auc:0.99979	val-auc:0.80549
# [85]	    train-auc:0.99988	val-auc:0.80374
# [90]	    train-auc:0.99993	val-auc:0.80409
# [95]	    train-auc:0.99996	val-auc:0.80548
# [100]	train-auc:0.99998	val-auc:0.80509
# [105]	train-auc:0.99999	val-auc:0.80629
# [110]	    train-auc:1.00000	val-auc:0.80637
# [115]	    train-auc:1.00000	val-auc:0.80494
# [120]	train-auc:1.00000	val-auc:0.80574
# [125]	train-auc:1.00000	val-auc:0.80727
# [130]	train-auc:1.00000	val-auc:0.80746
# [135]	train-auc:1.00000	val-auc:0.80753
# [140]	train-auc:1.00000	val-auc:0.80899
# [145]	train-auc:1.00000	val-auc:0.80733
# [150]	train-auc:1.00000	val-auc:0.80841
# [155]	train-auc:1.00000	val-auc:0.80734
# [160]	train-auc:1.00000	val-auc:0.80711
# [165]	train-auc:1.00000	val-auc:0.80707
# [170]	train-auc:1.00000	val-auc:0.80734
# [175]	train-auc:1.00000	val-auc:0.80704
# [180]	train-auc:1.00000	val-auc:0.80723
# [185]	train-auc:1.00000	val-auc:0.80678
# [190]	train-auc:1.00000	val-auc:0.80672
# [195]	train-auc:1.00000	val-auc:0.80708
# [199]	train-auc:1.00000	val-auc:0.80725

Now that we have the captured output in a string, the first step is to split it into individual lines by using the new line operator \n. The result is a string for each line of the output.

s.split('\n')

# Output:
# ['[0]\ttrain-auc:0.86730\tval-auc:0.77938',
#  '[5]\ttrain-auc:0.93086\tval-auc:0.80858',
#  '[10]\ttrain-auc:0.95447\tval-auc:0.80851',
#  '[15]\ttrain-auc:0.96554\tval-auc:0.81334',
#  '[20]\ttrain-auc:0.97464\tval-auc:0.81729',
#  '[25]\ttrain-auc:0.97953\tval-auc:0.81686',
#  '[30]\ttrain-auc:0.98579\tval-auc:0.81543',
#  '[35]\ttrain-auc:0.99011\tval-auc:0.81206',
#  '[40]\ttrain-auc:0.99421\tval-auc:0.80922',
#  '[45]\ttrain-auc:0.99548\tval-auc:0.80842',
#  '[50]\ttrain-auc:0.99653\tval-auc:0.80918',
#  '[55]\ttrain-auc:0.99765\tval-auc:0.81114',
#  '[60]\ttrain-auc:0.99817\tval-auc:0.81172',
#  '[65]\ttrain-auc:0.99887\tval-auc:0.80798',
#  '[70]\ttrain-auc:0.99934\tval-auc:0.80870',
#  '[75]\ttrain-auc:0.99965\tval-auc:0.80555',
#  '[80]\ttrain-auc:0.99979\tval-auc:0.80549',
#  '[85]\ttrain-auc:0.99988\tval-auc:0.80374',
#  '[90]\ttrain-auc:0.99993\tval-auc:0.80409',
#  '[95]\ttrain-auc:0.99996\tval-auc:0.80548',
#  '[100]\ttrain-auc:0.99998\tval-auc:0.80509',
#  '[105]\ttrain-auc:0.99999\tval-auc:0.80629',
#  '[110]\ttrain-auc:1.00000\tval-auc:0.80637',
#  '[115]\ttrain-auc:1.00000\tval-auc:0.80494',
#  '[120]\ttrain-auc:1.00000\tval-auc:0.80574',
#  '[125]\ttrain-auc:1.00000\tval-auc:0.80727',
#  '[130]\ttrain-auc:1.00000\tval-auc:0.80746',
#  '[135]\ttrain-auc:1.00000\tval-auc:0.80753',
#  '[140]\ttrain-auc:1.00000\tval-auc:0.80899',
#  '[145]\ttrain-auc:1.00000\tval-auc:0.80733',
#  '[150]\ttrain-auc:1.00000\tval-auc:0.80841',
#  '[155]\ttrain-auc:1.00000\tval-auc:0.80734',
#  '[160]\ttrain-auc:1.00000\tval-auc:0.80711',
#  '[165]\ttrain-auc:1.00000\tval-auc:0.80707',
#  '[170]\ttrain-auc:1.00000\tval-auc:0.80734',
#  '[175]\ttrain-auc:1.00000\tval-auc:0.80704',
#  '[180]\ttrain-auc:1.00000\tval-auc:0.80723',
#  '[185]\ttrain-auc:1.00000\tval-auc:0.80678',
#  '[190]\ttrain-auc:1.00000\tval-auc:0.80672',
#  '[195]\ttrain-auc:1.00000\tval-auc:0.80708',
#  '[199]\ttrain-auc:1.00000\tval-auc:0.80725',
# '']

Each line consists of three components: the number of iterations, the evaluation on the training dataset, and the evaluation on the validation dataset. We can split these components using the tabulator operator \t, resulting in three separate components. To ensure the correct format (integer, float, float), we utilize the strip method and perform the necessary string-to-integer and string-to-float conversions. The following snippet demonstrates these steps.

line = s.split('\n')[0]
line
# Output: '[0]\ttrain-auc:0.86730\tval-auc:0.77938'

line.split('\t')
# Output: ['[0]', 'train-auc:0.86730', 'val-auc:0.77938']

num_iter, train_auc, val_auc = line.split('\t')
num_iter, train_auc, val_auc
# Output: ('[0]', 'train-auc:0.86730', 'val-auc:0.77938')

int(num_iter.strip('[]'))
# Output: 0
float(train_auc.split(':')[1])
# Output: 0.8673
float(val_auc.split(':')[1])
# Output: 0.77938

We can combine all these steps to transform the information (number of iterations, AUC on the training data, and AUC on the validation data) from the output into a dataframe. The following snippet encapsulates all these steps within a single function for ease of use. This allows us to plot the data and perform further analysis.

def parse_xgb_output(output):
    results = []

    for line in output.stdout.strip().split('\n'):
        it_line, train_line, val_line = line.split('\t')

        it = int(it_line.strip('[]'))
        train = float(train_line.split(':')[1])
        val = float(val_line.split(':')[1])

        results.append((it, train, val))
    
    columns = ['num_iter', 'train_auc', 'val_auc']
    df_results = pd.DataFrame(results, columns=columns)
    return df_results

Now, let’s see how the function works in action.

df_score = parse_xgb_output(output)
df_score
num_itertrain_aucval_auc
000.867300.77938
150.930860.80858
2100.954470.80851
3150.965540.81334
4200.974640.81729
5250.979530.81686
6300.985790.81543
7350.990110.81206
8400.994210.80922
9450.995480.80842
10500.996530.80918
11550.997650.81114
12600.998170.81172
13650.998870.80798
14700.999340.80870
15750.999650.80555
16800.999790.80549
17850.999880.80374
18900.999930.80409
19950.999960.80548
201000.999980.80509
211050.999990.80629
221101.000000.80637
231151.000000.80494
241201.000000.80574
251251.000000.80727
261301.000000.80746
271351.000000.80753
281401.000000.80899
291451.000000.80733
301501.000000.80841
311551.000000.80734
321601.000000.80711
331651.000000.80707
341701.000000.80734
351751.000000.80704
361801.000000.80723
371851.000000.80678
381901.000000.80672
391951.000000.80708
401991.000000.80725
Output of parse_xgb_output function

The result of the parse_xgb_output function is a dataframe, enabling us to utilize the plot function for graph visualization.

# x-axis - number of iterations
# y-axis - auc
plt.plot(df_score.num_iter, df_score.train_auc, label='train')
plt.plot(df_score.num_iter, df_score.val_auc, label='val')
plt.legend()

We can observe that the AUC on the training dataset consistently improves. However, the picture is different for the validation dataset. The curve reaches its peak earlier and then starts to decline and stagnate, indicating the onset of overfitting. This decline in performance on the validation dataset is more apparent when plotting only the AUC on validation, while the AUC on the training dataset remains consistently high. The decline in performance is more evident when we exclusively plot the validation graph.

plt.plot(df_score.num_iter, df_score.val_auc, label='val')
plt.legend()

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.