torch.Tensor.backward#
- Tensor.backward(gradient=None, retain_graph=None, create_graph=False, inputs=None)[source]#
- Computes the gradient of current tensor wrt graph leaves. - The graph is differentiated using the chain rule. If the tensor is non-scalar (i.e. its data has more than one element) and requires gradient, the function additionally requires specifying a - gradient. It should be a tensor of matching type and shape, that represents the gradient of the differentiated function w.r.t.- self.- This function accumulates gradients in the leaves - you might need to zero - .gradattributes or set them to- Nonebefore calling it. See Default gradient layouts for details on the memory layout of accumulated gradients.- Note - If you run any forward ops, create - gradient, and/or call- backwardin a user-specified CUDA stream context, see Stream semantics of backward passes.- Note - When - inputsare provided and a given input is not a leaf, the current implementation will call its grad_fn (though it is not strictly needed to get this gradients). It is an implementation detail on which the user should not rely. See pytorch/pytorch#60521 for more details.- Parameters
- gradient (Tensor, optional) – The gradient of the function being differentiated w.r.t. - self. This argument can be omitted if- selfis a scalar. Defaults to- None.
- retain_graph (bool, optional) – If - False, the graph used to compute the grads will be freed; If- True, it will be retained. The default is- None, in which case the value is inferred from- create_graph(i.e., the graph is retained only when higher-order derivative tracking is requested). Note that in nearly all cases setting this option to True is not needed and often can be worked around in a much more efficient way.
- create_graph (bool, optional) – If - True, graph of the derivative will be constructed, allowing to compute higher order derivative products. Defaults to- False.
- inputs (Sequence[Tensor], optional) – Inputs w.r.t. which the gradient will be accumulated into - .grad. All other tensors will be ignored. If not provided, the gradient is accumulated into all the leaf Tensors that were used to compute the- tensors. Defaults to- None.