Understanding and verifying the outputs for 02_using_batches
After looking over the outputs from 02_using_batches/comparison.py, particularly the test_batches graphs, I'm not sure I fully understand the reasoning behind the difference between the NumPy and PyTorch versions.
I addressed some minor errors since the last update which I believe has removed any user-side error in the results. The main issue I'm facing is with the output for the batch size test function changing for different learning rates.
For learning rate = 0.1 the following is produced:
For learning rate = 0.01 the following is produced:
For learning rate = 0.001 the following is produced:
While I understand that for noisy data (in this case y += N(0, 0.5)) lower learning rates can help stop divergence in the SGD optimiser I'm not sure why we get such a sharp degradation in results for the PyTorch implementation compared to the NumPy version. I've looked into any potential differences between the manual NumPy implementation of (S)GD and PyTorch's and believe that any differences (namely the initial conditions) have been resolved. I also looked over the storing of the error between both versions and believe them to be functionally identical.
Additionally, the "good" results, namely for lr = 0.01
and lr = 0.001
, differ slightly, more so than in the 01 learning rate example (but this might just account for differences in rounding, although coupled with the massively divergent lr = 0.1
results leaves me slightly cautious).
This might just be a case of me being overly-paranoid and the results are as expected and without any errors, but I just wanted to confirm them before committing them and moving on to the next topic.