Oversample training data only (!5) · Merge requests · EuroCC-DIF-URGE / AI_Models-Carotid_diagnoses

Joseph Parker requested to merge oversample_training_only into main Feb 26, 2025

This MR swaps the order of oversampling and the test/train split, so that the oversampling is done on the training data only and not on the test data. Therefore the test data remains imbalanced. This gives a more realistic assessment of model performance.

For comparison, we also retain the option of oversampling the whole data set. This is controlled by the input argument oversample_on_test_data to preprocess_data. The new option (False) is default.

This branch builds on the preprocessing script added in !3 (merged).

Oversample training data only

Merge request reports