Expose necessary interface to guide manual SGD process from Python#6238
Expose necessary interface to guide manual SGD process from Python#6238shelhamer merged 4 commits intoBVLC:masterfrom
Conversation
Noiredd
left a comment
There was a problem hiding this comment.
Good addition, I was about to implement something like this myself but first we need to pull some Pycaffe changes from the OpenCL branch to synchronize the whole repository. For this reason I recommend holding this until #6209 is reviewed and merged.
In the meantime, please address a few questions I've left in the code.
|
Thanks for the quick code review! |
python/caffe/_caffe.cpp
Outdated
| .add_property("display", &SolverParameter::display) | ||
| .add_property("layer_wise_reduce", &SolverParameter::layer_wise_reduce); | ||
| .add_property("layer_wise_reduce", &SolverParameter::layer_wise_reduce) | ||
| .add_property("base_lr", &SolverParameter::base_lr, &SolverParameter::set_base_lr); |
python/caffe/_caffe.cpp
Outdated
| shared_ptr<SGDSolver<Dtype> >, boost::noncopyable>( | ||
| "SGDSolver", bp::init<string>()); | ||
| "SGDSolver", bp::init<string>()) | ||
| .def("get_learning_rate", &SGDSolver<Dtype>::GetLearningRate); |
There was a problem hiding this comment.
What about other solvers, will they get this property only due to the inheritance?
There was a problem hiding this comment.
Other solvers extending from SGDSolver? I think only them will get this, but this is OK, no? this is on purpose.
There was a problem hiding this comment.
So if I do a = caffe.AdamSolver("solver.prototxt") in my pycaffe script, will I be able to call a.get_learning_rate() or not?
There was a problem hiding this comment.
From my understanding, the answer is yes, but I have not tested it.
There was a problem hiding this comment.
No, this does not inherit correctly b.c. the bp::bases of the child SGDSolvers are set to Solver and not SGDSolver. Switching the NesteroveSolver and other children to have the SGDSolver base fixes the issue. I second incorporating the apply_update() test from c5b5b55, so could you cherry-pick it to give @nitnelave credit for it?
| // Increment the internal iter_ counter -- its value should always indicate | ||
| // the number of times the weights have been updated. | ||
| ++iter_; | ||
|
|
There was a problem hiding this comment.
Why did this have to be moved to SGDSolver?
There was a problem hiding this comment.
Based on the comment: "its value should always indicate the number of times the weights have been updated."
Because now it is possible to update weights without having to call step, this has to be moved so that also calling directly ApplyUpdate increases it. I think that this is even cleaner from the design perspective. The code is next to the code which is updating weights. Not somewhere else.
There was a problem hiding this comment.
Ideally the base Solver would enforce the increment to iter_, but since all Caffe solvers are SGD solvers and other solvers that depart from this mold aren't expected I think the move is a pragmatic choice that avoids unnecessary interface code.
|
Please run |
Thanks. I can rebase after that is merged.
Working on that. BTW, I cannot see results from Travis CI run. Are they set as private? |
|
Pushed linting fixes. |
shelhamer
left a comment
There was a problem hiding this comment.
Thanks @mitar for exposing solver updating to pycaffe. This has come up many times and been done in forks and branches and so it certainly deserves to be done in master. I vote for merge once the inheritance issue is resolved.
Although #6209 came first and makes further improvements, this PR is simpler b.c. of its reduced scope and is ready for immediate use. #6209 can be adjusted once this is merged. Thanks @Noiredd for porting all those improvements and reviewing this PR too.
| // Increment the internal iter_ counter -- its value should always indicate | ||
| // the number of times the weights have been updated. | ||
| ++iter_; | ||
|
|
There was a problem hiding this comment.
Ideally the base Solver would enforce the increment to iter_, but since all Caffe solvers are SGD solvers and other solvers that depart from this mold aren't expected I think the move is a pragmatic choice that avoids unnecessary interface code.
python/caffe/_caffe.cpp
Outdated
| shared_ptr<SGDSolver<Dtype> >, boost::noncopyable>( | ||
| "SGDSolver", bp::init<string>()); | ||
| "SGDSolver", bp::init<string>()) | ||
| .def("get_learning_rate", &SGDSolver<Dtype>::GetLearningRate); |
There was a problem hiding this comment.
No, this does not inherit correctly b.c. the bp::bases of the child SGDSolvers are set to Solver and not SGDSolver. Switching the NesteroveSolver and other children to have the SGDSolver base fixes the issue. I second incorporating the apply_update() test from c5b5b55, so could you cherry-pick it to give @nitnelave credit for it?
a sketch of `solver.step()` done out manually: 1. `solver.net.forward()` 2. `solver.net.backward()` 3. `solver.net.apply_update()` 4. `solver.net.clear_param_diffs()`
with update exposed it is important to increment the iteration when an update is made, whether by step or update alone. more fundementally, it's the update that defines an iterationa, so this is a natural place for the increment.
`solver.lr` is the effective learning rate in use while `solver.base_lr` is the configured learning rate at initialization. the solver parameter is now editable for setting fields that are in use throughout the lifetime of the solver, such as the maximum iteration.
|
@mitar I had time to do a pass so I updated this branch by rebasing on master, rewording for detail, and including a test for updating from pycaffe. I'm planning to merge once the tests pass to double-check my local results. |
|
Oh, thanks! |
|
What are plans for a release which would include this? Or do you have some daily build I could pip install without compiling the repo myself? |
|
The current state of affairs with master is that you have to compile it yourself.
There is no definite plan right now but I could imagine a release in early July to collect improvements since 1.0. I will bring it up with the core devs.
No, we do not have a daily build, although it would obviously be useful. I've never rigged up such a system myself, but perhaps a daily linux build (specifically for the Ubuntu LTS) would be feasible. |
You could even push artifacts from Travis CI. I think you also publish a PyPi package like other libraries do. Using manylinux and if you need more space there for the package, you can request it here. |
[pycaffe] expose interface for manual, step-by-step optimization
[pycaffe] expose interface for manual, step-by-step optimization
This fixes #3959 and #2686.
EDIT by @Noiredd: do not merge until #6209 is merged first (or else terrible conflicts will arise during
mastertoopenclpull).