Skip to content

Expose necessary interface to guide manual SGD process from Python#6238

Merged
shelhamer merged 4 commits intoBVLC:masterfrom
mitar:manual-sgd
Jun 8, 2018
Merged

Expose necessary interface to guide manual SGD process from Python#6238
shelhamer merged 4 commits intoBVLC:masterfrom
mitar:manual-sgd

Conversation

@mitar
Copy link
Contributor

@mitar mitar commented Feb 14, 2018

This fixes #3959 and #2686.

EDIT by @Noiredd: do not merge until #6209 is merged first (or else terrible conflicts will arise during master to opencl pull).

@mitar mitar mentioned this pull request Feb 14, 2018
Copy link
Member

@Noiredd Noiredd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good addition, I was about to implement something like this myself but first we need to pull some Pycaffe changes from the OpenCL branch to synchronize the whole repository. For this reason I recommend holding this until #6209 is reviewed and merged.

In the meantime, please address a few questions I've left in the code.

@BVLC BVLC deleted a comment from mitar Feb 14, 2018
@mitar
Copy link
Contributor Author

mitar commented Feb 14, 2018

Thanks for the quick code review!

Copy link
Member

@Noiredd Noiredd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusion @mitar, I am afraid I accidentally removed your comment. Sorry about that, I only wanted to merge my review comments for brevity.

.add_property("display", &SolverParameter::display)
.add_property("layer_wise_reduce", &SolverParameter::layer_wise_reduce);
.add_property("layer_wise_reduce", &SolverParameter::layer_wise_reduce)
.add_property("base_lr", &SolverParameter::base_lr, &SolverParameter::set_base_lr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is included in #6209 which will be merged prior to any other changes to _caffe.cpp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oho, looking at #6209 looks like everything in this pull request is done also there, no?

shared_ptr<SGDSolver<Dtype> >, boost::noncopyable>(
"SGDSolver", bp::init<string>());
"SGDSolver", bp::init<string>())
.def("get_learning_rate", &SGDSolver<Dtype>::GetLearningRate);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about other solvers, will they get this property only due to the inheritance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other solvers extending from SGDSolver? I think only them will get this, but this is OK, no? this is on purpose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I do a = caffe.AdamSolver("solver.prototxt") in my pycaffe script, will I be able to call a.get_learning_rate() or not?

Copy link
Contributor Author

@mitar mitar Feb 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding, the answer is yes, but I have not tested it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to know this. Maybe consider implementing some tests for this change? See #4342, it is somewhat similar to this (a bit more clumsy) but also had some very simple tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this does not inherit correctly b.c. the bp::bases of the child SGDSolvers are set to Solver and not SGDSolver. Switching the NesteroveSolver and other children to have the SGDSolver base fixes the issue. I second incorporating the apply_update() test from c5b5b55, so could you cherry-pick it to give @nitnelave credit for it?

// Increment the internal iter_ counter -- its value should always indicate
// the number of times the weights have been updated.
++iter_;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this have to be moved to SGDSolver?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the comment: "its value should always indicate the number of times the weights have been updated."

Because now it is possible to update weights without having to call step, this has to be moved so that also calling directly ApplyUpdate increases it. I think that this is even cleaner from the design perspective. The code is next to the code which is updating weights. Not somewhere else.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally the base Solver would enforce the increment to iter_, but since all Caffe solvers are SGD solvers and other solvers that depart from this mold aren't expected I think the move is a pragmatic choice that avoids unnecessary interface code.

@Noiredd
Copy link
Member

Noiredd commented Feb 14, 2018

Please run make lint and fix the style issues and we will resume the review.
Also, I'm not sure if you noticed but I accidentally removed your comment - please repost it if needed.

@mitar
Copy link
Contributor Author

mitar commented Feb 14, 2018

Good addition, I was about to implement something like this myself but first we need to pull some Pycaffe changes from the OpenCL branch to synchronize the whole repository. For this reason I recommend holding this until #6209 is reviewed and merged.

Thanks. I can rebase after that is merged.

Please run make lint and fix the style issues and we will resume the review.

Working on that.

BTW, I cannot see results from Travis CI run. Are they set as private?

@mitar
Copy link
Contributor Author

mitar commented Feb 14, 2018

Pushed linting fixes.

@shelhamer
Copy link
Member

Thanks @mitar! I have been meaning to make a PR like this myself for some time. I'll take a look at both #6209 and this soon.

Copy link
Member

@shelhamer shelhamer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mitar for exposing solver updating to pycaffe. This has come up many times and been done in forks and branches and so it certainly deserves to be done in master. I vote for merge once the inheritance issue is resolved.

Although #6209 came first and makes further improvements, this PR is simpler b.c. of its reduced scope and is ready for immediate use. #6209 can be adjusted once this is merged. Thanks @Noiredd for porting all those improvements and reviewing this PR too.

// Increment the internal iter_ counter -- its value should always indicate
// the number of times the weights have been updated.
++iter_;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally the base Solver would enforce the increment to iter_, but since all Caffe solvers are SGD solvers and other solvers that depart from this mold aren't expected I think the move is a pragmatic choice that avoids unnecessary interface code.

shared_ptr<SGDSolver<Dtype> >, boost::noncopyable>(
"SGDSolver", bp::init<string>());
"SGDSolver", bp::init<string>())
.def("get_learning_rate", &SGDSolver<Dtype>::GetLearningRate);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this does not inherit correctly b.c. the bp::bases of the child SGDSolvers are set to Solver and not SGDSolver. Switching the NesteroveSolver and other children to have the SGDSolver base fixes the issue. I second incorporating the apply_update() test from c5b5b55, so could you cherry-pick it to give @nitnelave credit for it?

mitar and others added 4 commits June 7, 2018 15:13
a sketch of `solver.step()` done out manually:

1. `solver.net.forward()`
2. `solver.net.backward()`
3. `solver.net.apply_update()`
4. `solver.net.clear_param_diffs()`
with update exposed it is important to increment the iteration when an
update is made, whether by step or update alone. more fundementally,
it's the update that defines an iterationa, so this is a natural place
for the increment.
`solver.lr` is the effective learning rate in use while `solver.base_lr`
is the configured learning rate at initialization. the solver parameter
is now editable for setting fields that are in use throughout the
lifetime of the solver, such as the maximum iteration.
@shelhamer
Copy link
Member

@mitar I had time to do a pass so I updated this branch by rebasing on master, rewording for detail, and including a test for updating from pycaffe. I'm planning to merge once the tests pass to double-check my local results.

@mitar
Copy link
Contributor Author

mitar commented Jun 8, 2018

Oh, thanks!

@shelhamer shelhamer merged commit 2a1c552 into BVLC:master Jun 8, 2018
@mitar
Copy link
Contributor Author

mitar commented Jun 8, 2018

What are plans for a release which would include this? Or do you have some daily build I could pip install without compiling the repo myself?

@shelhamer
Copy link
Member

The current state of affairs with master is that you have to compile it yourself.

What are plans for a release which would include this?

There is no definite plan right now but I could imagine a release in early July to collect improvements since 1.0. I will bring it up with the core devs.

Or do you have some daily build I could pip install without compiling the repo myself?

No, we do not have a daily build, although it would obviously be useful. I've never rigged up such a system myself, but perhaps a daily linux build (specifically for the Ubuntu LTS) would be feasible.

@mitar
Copy link
Contributor Author

mitar commented Jun 8, 2018

No, we do not have a daily build, although it would obviously be useful. I've never rigged up such a system myself, but perhaps a daily linux build (specifically for the Ubuntu LTS) would be feasible.

You could even push artifacts from Travis CI.

I think you also publish a PyPi package like other libraries do. Using manylinux and if you need more space there for the package, you can request it here.

sjb7749 pushed a commit to sjb7749/caffe that referenced this pull request Jul 2, 2018
[pycaffe] expose interface for manual, step-by-step optimization
XinYao1994 pushed a commit to XinYao1994/caffe that referenced this pull request Aug 29, 2018
[pycaffe] expose interface for manual, step-by-step optimization
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python manual sgd

4 participants