Skip to content

Oversample training data only

Joseph Parker requested to merge oversample_training_only into main

This MR swaps the order of oversampling and the test/train split, so that the oversampling is done on the training data only and not on the test data. Therefore the test data remains imbalanced. This gives a more realistic assessment of model performance.

For comparison, we also retain the option of oversampling the whole data set. This is controlled by the input argument oversample_on_test_data to preprocess_data. The new option (False) is default.

This branch builds on the preprocessing script added in !3 (merged).

Merge request reports

Loading