XGBoost parameter tuning – Part 1/2
1. Tuning Eta
2. Plotting Eta

XGBoost parameter tuning – Part 1/2

This part is about XGBoost parameter tuning. It’s the first part of a two-part series, where we begin by tuning the initial parameter – ‘eta‘. The subsequent article will explore parameter tuning for ‘max_depth‘ and ‘min_child_weight‘. In the final phase, we’ll train the final model. Let’s start tuning the first parameter.

Tuning Eta

Eta, also known as the learning rate, determines the influence of the following model when correcting the results of the previous model. If the weight is set to 1.0, all new predictions are used to correct the previous ones. However, when the weight is 0.3, only 30% of the new predictions are considered. In essence, eta governs the size of the steps taken during the learning process.

Now, let’s explore how different values of eta impact model performance. To facilitate this, we’ll create a dictionary called ‘scores‘ to store the performance scores for each value of eta.

Eta = 0.3

scores = {}

%%capture output

xgb_params = {
    'eta': 0.3,
    'max_depth': 6,
    'min_child_weight': 1,
    
    'objective': 'binary:logistic',
    'eval_metric': 'auc',
    
    'nthread': 8,
    'seed': 1,
    'verbosity': 1
}

model = xgb.train(xgb_params, dtrain, num_boost_round=200,
                 verbose_eval=5,
                 evals=watchlist)

We aim to structure keys in the format ‘eta=0.3’ to serve as identifiers in the scores dictionary.

'eta=%s' % (xgb_params['eta'])
# Output: 'eta=0.3'

In the next snippet, we once again employ the ‘parse_xgb_output‘ function, which we defined in the previous article. This function returns a dataframe that contains train_auc and val_auc values for different num_iter.

key = 'eta=%s' % (xgb_params['eta'])
scores[key] = parse_xgb_output(output)
key

# Output: 'eta=0.3'

Now the dictionary scores contains the dataframe for this eta.

scores

# Output:
# {'eta=0.3':     num_iter  train_auc  val_auc
#  0          0    0.86730  0.77938
#  1          5    0.93086  0.80858
#  2         10    0.95447  0.80851
#  3         15    0.96554  0.81334
#  4         20    0.97464  0.81729
#  5         25    0.97953  0.81686
#   ...
#  36       180    1.00000  0.80723
#  37       185    1.00000  0.80678
#  38       190    1.00000  0.80672
#  39       195    1.00000  0.80708
#  40       199    1.00000  0.80725}

scores['eta=0.3']

	num_iter	train_auc	val_auc
0	0	0.86730	0.77938
1	5	0.93086	0.80858
2	10	0.95447	0.80851
3	15	0.96554	0.81334
4	20	0.97464	0.81729
5	25	0.97953	0.81686
6	30	0.98579	0.81543
7	35	0.99011	0.81206
8	40	0.99421	0.80922
9	45	0.99548	0.80842
10	50	0.99653	0.80918
11	55	0.99765	0.81114
12	60	0.99817	0.81172
13	65	0.99887	0.80798
14	70	0.99934	0.80870
15	75	0.99965	0.80555
16	80	0.99979	0.80549
17	85	0.99988	0.80374
18	90	0.99993	0.80409
19	95	0.99996	0.80548
20	100	0.99998	0.80509
21	105	0.99999	0.80629
22	110	1.00000	0.80637
23	115	1.00000	0.80494
24	120	1.00000	0.80574
25	125	1.00000	0.80727
26	130	1.00000	0.80746
27	135	1.00000	0.80753
28	140	1.00000	0.80899
29	145	1.00000	0.80733
30	150	1.00000	0.80841
31	155	1.00000	0.80734
32	160	1.00000	0.80711
33	165	1.00000	0.80707
34	170	1.00000	0.80734
35	175	1.00000	0.80704
36	180	1.00000	0.80723
37	185	1.00000	0.80678
38	190	1.00000	0.80672
39	195	1.00000	0.80708
40	199	1.00000	0.80725

Dataframe for ‘eta=0.3’

Eta = 1.0

Now, let’s set the eta value to its maximum, which is 1.0.

%%capture output

xgb_params = {
    'eta': 1.0, 
    'max_depth': 6,
    'min_child_weight': 1,
    
    'objective': 'binary:logistic',
    'eval_metric': 'auc',

    'nthread': 8,
    'seed': 1,
    'verbosity': 1,
}

model = xgb.train(xgb_params, dtrain, num_boost_round=200,
                  verbose_eval=5,
                  evals=watchlist)

key = 'eta=%s' % (xgb_params['eta'])
scores[key] = parse_xgb_output(output)
key
# Output: 'eta=1.0'

With the updated eta value, the scores dictionary should now encompass two values and two corresponding keys.

scores['eta=1.0']

	num_iter	train_auc	val_auc
0	0	0.86730	0.77938
1	5	0.95857	0.79136
2	10	0.98061	0.78355
3	15	0.99549	0.78050
4	20	0.99894	0.78591
5	25	0.99989	0.78401
6	30	1.00000	0.78371
7	35	1.00000	0.78234
8	40	1.00000	0.78184
9	45	1.00000	0.77963
10	50	1.00000	0.78645
11	55	1.00000	0.78644
12	60	1.00000	0.78545
13	65	1.00000	0.78612
14	70	1.00000	0.78515
15	75	1.00000	0.78516
16	80	1.00000	0.78420
17	85	1.00000	0.78570
18	90	1.00000	0.78793
19	95	1.00000	0.78865
20	100	1.00000	0.79075
21	105	1.00000	0.79107
22	110	1.00000	0.79022
23	115	1.00000	0.79036
24	120	1.00000	0.79021
25	125	1.00000	0.79025
26	130	1.00000	0.78994
27	135	1.00000	0.79084
28	140	1.00000	0.79048
29	145	1.00000	0.78967
30	150	1.00000	0.78969
31	155	1.00000	0.78992
32	160	1.00000	0.79064
33	165	1.00000	0.79067
34	170	1.00000	0.79115
35	175	1.00000	0.79126
36	180	1.00000	0.79199
37	185	1.00000	0.79179
38	190	1.00000	0.79165
39	195	1.00000	0.79201
40	199	1.00000	0.79199

Dataframe for ‘eta=1.0’

Eta = 0.1

Let’s go through the process once more for ‘eta=0.1’ and subsequently print out the dataframe.

%%capture output

xgb_params = {
    'eta': 0.1, 
    'max_depth': 6,
    'min_child_weight': 1,
    
    'objective': 'binary:logistic',
    'eval_metric': 'auc',

    'nthread': 8,
    'seed': 1,
    'verbosity': 1,
}

model = xgb.train(xgb_params, dtrain, num_boost_round=200,
                  verbose_eval=5,
                  evals=watchlist)

key = 'eta=%s' % (xgb_params['eta'])
scores[key] = parse_xgb_output(output)
key

# Output: 'eta=0.1'

scores['eta=0.1']

	num_iter	train_auc	val_auc
0	0	0.86730	0.77938
1	5	0.90325	0.79290
2	10	0.91874	0.80510
3	15	0.93126	0.81380
4	20	0.93873	0.81804
5	25	0.94638	0.82065
6	30	0.95338	0.82063
7	35	0.95874	0.82404
8	40	0.96325	0.82644
9	45	0.96694	0.82602
10	50	0.97195	0.82549
11	55	0.97475	0.82648
12	60	0.97708	0.82781
13	65	0.97937	0.82775
14	70	0.98214	0.82681
15	75	0.98315	0.82728
16	80	0.98517	0.82560
17	85	0.98721	0.82503
18	90	0.98840	0.82443
19	95	0.98972	0.82389
20	100	0.99061	0.82456
21	105	0.99157	0.82359
22	110	0.99224	0.82274
23	115	0.99288	0.82147
24	120	0.99378	0.82154
25	125	0.99481	0.82195
26	130	0.99541	0.82252
27	135	0.99564	0.82190
28	140	0.99630	0.82219
29	145	0.99673	0.82177
30	150	0.99711	0.82136
31	155	0.99750	0.82154
32	160	0.99774	0.82102
33	165	0.99821	0.82060
34	170	0.99838	0.82060
35	175	0.99861	0.82012
36	180	0.99882	0.82053
37	185	0.99898	0.82028
38	190	0.99904	0.81973
39	195	0.99920	0.81909
40	199	0.99927	0.81864

Dataframe for ‘eta=0.1’

Eta = 0.05

Let’s do it again for ‘eta=0.05’ and print out the dataframe.

%%capture output

xgb_params = {
    'eta': 0.05, 
    'max_depth': 6,
    'min_child_weight': 1,
    
    'objective': 'binary:logistic',
    'eval_metric': 'auc',

    'nthread': 8,
    'seed': 1,
    'verbosity': 1,
}

model = xgb.train(xgb_params, dtrain, num_boost_round=200,
                  verbose_eval=5,
                  evals=watchlist)

key = 'eta=%s' % (xgb_params['eta'])
scores[key] = parse_xgb_output(output)
key

# Output: 'eta=0.05'

scores['eta=0.05']

	num_iter	train_auc	val_auc
0	0	0.86730	0.77938
1	5	0.88650	0.79584
2	10	0.90368	0.79623
3	15	0.91072	0.79938
4	20	0.91774	0.80510
5	25	0.92385	0.80895
6	30	0.92987	0.81175
7	35	0.93379	0.81480
8	40	0.93856	0.81547
9	45	0.94316	0.81807
10	50	0.94753	0.81793
11	55	0.95028	0.81926
12	60	0.95324	0.81998
13	65	0.95581	0.82159
14	70	0.95762	0.82299
15	75	0.95944	0.82368
16	80	0.96100	0.82524
17	85	0.96308	0.82604
18	90	0.96572	0.82666
19	95	0.96798	0.82667
20	100	0.96955	0.82719
21	105	0.97133	0.82745
22	110	0.97288	0.82819
23	115	0.97426	0.82822
24	120	0.97578	0.82768
25	125	0.97702	0.82790
26	130	0.97788	0.82760
27	135	0.97923	0.82764
28	140	0.98012	0.82725
29	145	0.98113	0.82665
30	150	0.98190	0.82575
31	155	0.98285	0.82581
32	160	0.98381	0.82560
33	165	0.98457	0.82576
34	170	0.98541	0.82591
35	175	0.98652	0.82581
36	180	0.98711	0.82526
37	185	0.98789	0.82525
38	190	0.98876	0.82535
39	195	0.98932	0.82538
40	199	0.98977	0.82522

Dataframe for ‘eta=0.05’

Eta = 0.01

Once more, let’s assess the performance for ‘eta=0.01’.

%%capture output

xgb_params = {
    'eta': 0.01, 
    'max_depth': 6,
    'min_child_weight': 1,
    
    'objective': 'binary:logistic',
    'eval_metric': 'auc',

    'nthread': 8,
    'seed': 1,
    'verbosity': 1,
}

model = xgb.train(xgb_params, dtrain, num_boost_round=200,
                  verbose_eval=5,
                  evals=watchlist)

key = 'eta=%s' % (xgb_params['eta'])
scores[key] = parse_xgb_output(output)
key

# Output: 'eta=0.01'

scores['eta=0.01']

	num_iter	train_auc	val_auc
0	0	0.86730	0.77938
1	5	0.87157	0.77925
2	10	0.87247	0.78051
3	15	0.87541	0.78302
4	20	0.87584	0.78707
5	25	0.88406	0.79331
6	30	0.89027	0.79763
7	35	0.89559	0.79914
8	40	0.89782	0.79883
9	45	0.89983	0.79845
10	50	0.90182	0.79697
11	55	0.90394	0.79775
12	60	0.90531	0.79684
13	65	0.90630	0.79616
14	70	0.90796	0.79672
15	75	0.90955	0.79807
16	80	0.91116	0.79976
17	85	0.91227	0.80130
18	90	0.91368	0.80285
19	95	0.91515	0.80390
20	100	0.91654	0.80499
21	105	0.91791	0.80534
22	110	0.91902	0.80523
23	115	0.92032	0.80515
24	120	0.92135	0.80497
25	125	0.92262	0.80520
26	130	0.92405	0.80605
27	135	0.92547	0.80692
28	140	0.92667	0.80730
29	145	0.92771	0.80818
30	150	0.92919	0.80928
31	155	0.92990	0.81039
32	160	0.93077	0.81125
33	165	0.93155	0.81160
34	170	0.93246	0.81189
35	175	0.93348	0.81257
36	180	0.93466	0.81340
37	185	0.93585	0.81429
38	190	0.93685	0.81481
39	195	0.93777	0.81554
40	199	0.93862	0.81575

Dataframe for ‘eta=0.01’

Plotting Eta

Now that we’ve inserted key-value pairs and gathered information from different runs, we can examine the keys in the dictionary. Next we can compare all runs of ‘eta=0.3’, ‘eta=1.0’, ‘eta=0.1’, ‘eta=0.05’, and ‘eta=0.01’.

scores.keys()
# Output: dict_keys(['eta=0.3', 'eta=1.0', 'eta=0.1', 'eta=0.05', 'eta=0.01'])

Let’s plot the information of our runs.

for key, df_score in scores.items():
    plt.plot(df_score.num_iter, df_score.val_auc, label=key)
plt.legend()

Let’s concentrate on a few graphs for the initial analysis. We will plot three of them: ‘eta=1.0’, ‘eta=0.3’, and ‘eta=0.1’.

etas = ['eta=1.0', 'eta=0.3', 'eta=0.1']
for eta in etas:
    df_score = scores[eta]
    plt.plot(df_score.num_iter, df_score.val_auc, label=eta)
plt.legend()

This plot provides a clearer view of the results. Notably, ‘eta=1.0’ exhibits the worst performance. It quickly reaches peak performance but then experiences a sharp decline, maintaining a consistently poor level. ‘eta=0.3’ performs reasonably well until around iteration 25, after which it steadily deteriorates. On the other hand, ‘eta=0.1’ demonstrates a slower growth rate, reaching its peak at a later stage before descending. This pattern is a direct reflection of the learning rate’s influence.

The learning rate controls both the speed at which the model learns and the size of the steps it takes during each iteration. If the steps are too large, the model learns rapidly but eventually starts to degrade due to the excessive step size, resulting in overfitting. Conversely, a smaller learning rate signifies slower but more stable learning. Such models tend to degrade more gradually, and their overfitting tendencies are less pronounced compared to models with higher learning rates.

Next let’s look at eta=0.3, eta=0.1, and eta=0.01

etas = ['eta=0.3', 'eta=0.1', 'eta=0.01']
for eta in etas:
    df_score = scores[eta]
    plt.plot(df_score.num_iter, df_score.val_auc, label=eta)
plt.legend()

‘eta=0.01’ displays an extremely slow learning rate, making it challenging to estimate how long it might take to outperform the other model (represented by the orange curve). This model’s progress is painstakingly slow, as the steps it takes are exceedingly tiny.

On the other hand, ‘eta=0.3’ takes a few significant steps initially but succumbs to overfitting more rapidly. In this plot, ‘eta=0.1’ seems to strike the ideal balance, particularly between 50 and 75 iterations. It may take a bit longer to reach its peak performance, but the resulting performance improvement justifies the wait.

There was also eta=0.05 let’s finally look also at this plot.

etas = ['eta=0.1', 'eta=0.05', 'eta=0.01']
for eta in etas:
    df_score = scores[eta]
    plt.plot(df_score.num_iter, df_score.val_auc, label=eta)
plt.legend()

The ‘eta=0.05’ model requires approximately twice as many iterations to converge when compared to the blue model (‘eta=0.1’). Although it takes smaller steps and requires more time, the end result is still inferior to the blue model. Thus, it’s evident that the ‘eta=0.1’ model stands out as the best option, as it achieves better performance with fewer steps.

Now that we’ve found the best value for ‘eta,’ in the second part of XGBoost parameter tuning, we’ll focus on tuning two more parameters: ‘max_depth‘ and ‘min_child_weight‘ before proceeding to train the final model.

ML Zoomcamp 2023 – Decision Trees and Ensemble Learning– Part 12

XGBoost parameter tuning – Part 1/2

Tuning Eta

Eta = 0.3

Eta = 1.0

Eta = 0.1

Eta = 0.05

Eta = 0.01

Plotting Eta

Leave a comment Cancel reply

XGBoost parameter tuning – Part 1/2

Tuning Eta

Eta = 0.3

Eta = 1.0

Eta = 0.1

Eta = 0.05

Eta = 0.01

Plotting Eta

Teilen mit:

Related

Leave a comment Cancel reply