Is there a reason not to wirk with AMP (automatic mixed precision)?
According to: Introducing native PyTorch automatic mixed precision for faster training on NVIDIA GPUs
It's better to use AMP (with 16 Floating Point) due to:
- Shorter training time.
- Lower memory requirements, enabling larger batch sizes, larger models, or larger inputs.
- So is there a reason not to work with
FP16
? - Which models / datasets / solutions we will need to use
FP32
and notFP16
? - Can I find an example in
kaggle
which we must useFP32
and notFP16
?
Category Data Science