-
Notifications
You must be signed in to change notification settings - Fork 6.7k
[BUGFIX] fix log_sigmoid bugs #20372
Conversation
|
Hey @Adnios , Thanks for submitting the PR
CI supported jobs: [edge, centos-gpu, sanity, unix-gpu, windows-gpu, miscellaneous, unix-cpu, centos-cpu, clang, windows-cpu, website] Note: |
|
@mxnet-bot run ci [unix-gpu] |
|
Jenkins CI successfully triggered : [unix-gpu] |
|
@mxnet-bot run ci [unix-gpu] |
|
Jenkins CI successfully triggered : [unix-gpu] |
|
@szha @bartekkuncer @bgawrych can you help preview review? |
| )code" ADD_FILELINE) | ||
| .set_attr<FCompute>("FCompute<cpu>", UnaryOp::Compute<cpu, mshadow_op::log_sigmoid>) | ||
| .set_attr<nnvm::FGradient>("FGradient", ElemwiseGradUseIn{"_backward_log_sigmoid"}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous version looks correct with the ElemwiseGradUseIn which makes the input to the gradient function the input of the elementwise function. Could you elaborate on in which cases this would fail and why you need to change it to ElemwiseGradUseOut and the definition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how scalar array would trigger the problem yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @szha .
The reason of "scalar array would trigger the problem" is:
https://github.com/apache/incubator-mxnet/blob/835e25031f847b80277b6d11db0519723d26a80a/src/operator/nn/activation.cc#L126-L140
- If the input
xis scalar array,SupportMKLDNNAct(param, inputs[0])willreturn false. - If the input
xis vector array,SupportMKLDNNAct(param, inputs[0])willreturn true. And the functionMKLDNNActivationBackwardwill make it work. Maybe the following code take effect.
https://github.com/apache/incubator-mxnet/blob/835e25031f847b80277b6d11db0519723d26a80a/src/operator/nn/mkldnn/mkldnn_act.cc#L266-L272
There are 2 solutions to make scalar array input to work.
- The input of
log_sigmoid_gradshould bey. So we can modify the following code which takesxas its input. This is also what I am doing in this pr.
https://github.com/apache/incubator-mxnet/blob/835e25031f847b80277b6d11db0519723d26a80a/src/operator/mshadow_op.h#L416MXNET_UNARY_MATH_OP(log_sigmoid_grad, 1.0f - math::exp(a));
- Since
log_sigmoid_gradtakesxas input, we can also change the following code. Now it will makexas input tolog_sigmoid_grad.
https://github.com/apache/incubator-mxnet/blob/835e25031f847b80277b6d11db0519723d26a80a/src/operator/nn/activation-inl.h#L207-L210case activation::kLogSigmoid: ActivationBackward<xpu, mshadow_op::log_sigmoid, mshadow_op::log_sigmoid_grad>( ctx, inputs[0], inputs[2], req[0], outputs[0]); break;
I think solution_1 is better. For y = log_sigmoid(x), it calculates dx based on (dy, y) instead of (dy, x) which enables inplace operation during y = log_simoid(x) (i.e. y and x shares the same memory).
Another problem arose when I adopted the solution_1. The gradient of sym.log_sigmoid() will be wrong. The reason of this problem is that the input of _backward_log_sigmoid is x. When I adopt the solution_1, the input of _backward_log_sigmoid should be y. The source code of sym.log_sigmoid() is the following.
https://github.com/apache/incubator-mxnet/blob/835e25031f847b80277b6d11db0519723d26a80a/src/operator/tensor/elemwise_unary_op_basic.cc#L152-L167
So, I change it to ElemwiseGradUseOut in reference of the source code of sym.sigmoid().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed analysis. The proposed change looks good and I have no further concern.
Description
Solve #20371
Checklist
Essentials
Comments
When the input's shape=(), the backward log_sigmoid activation is incorrect
y = log_sigmoid(x), the inputs forlog_sigmoid_gradis(dy, y). This problem is similar to _backward_softsign activation is incorrect #10868There is an error when run the log_sigmoid in gpu.
cudnn_activation-inl.hhttps://github.com/apache/incubator-mxnet/blob/da4ff3a4dc0bd6a54af3d75c492021d18ba1867b/src/operator/nn/cudnn/cudnn_activation-inl.h#L48-L65