Accelerated training with floating point fp16

Thanks for the work!
I'd like to know if the optimizer SAM is also applicable to accelerated training, i.e. using automatic mixed precision like fp16. I tried to adopt SAM in my own training codes with fp16 on Pytorch, but Nan loss happens and the computed grad norm is Nan. Regular training using SGD gives no error. So I'm wondering if it is caused by some error in the Pytorch reimplementation or is it due to the limitation of SAM?  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accelerated training with floating point fp16 #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Accelerated training with floating point fp16 #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions