Which Amazon EC2 instance for Deep Learning tasks?

I have discovered that Amazon has a dedicated Deep Learning AMI with TensorFlow, Keras etc. preinstalled (not to mention other prebuilt custom AMIs). I tried out this with a typical job on several GPU-based instances to see the performances. There are five such in the Ireland region (maybe in other regions exist even more, I don't know, this variance is a bit confusing):

  • g2.2xlarge
  • g2.8xlarge
  • p2.xlarge
  • p2.8xlarge
  • p2.16xlarge

My first question is, what is the difference between the two groups (g-something and p-something)? Both group mentions "GPUs along with CPU", but no further clue for Deep Learning usability.

My second problem is that I have been running my job on g2.2 and g2.8 as well, and while the task processing toke quite long time to run, the workload of the GPUs was relatively low (20-40%). Why didn't the framework increase the workload if there is spare processor capacity? Is is necessary/possible to paramter/set anything to optimize the work?

Topic amazon-ml deep-learning

Category Data Science


enter image description here enter image description here

I think the differences and use cases are well pointed here. As far the workload, there are features which help you optimise it. According to the official documentation, you can try:

  1. For persistency,

    sudo nvidia-smi -pm 1

  2. Disabling the autoboost feature

    sudo nvidia-smi --auto-boost-default=0

  3. Set all GPU clock speeds to their maximum frequency.

    sudo nvidia-smi -ac 2505,875

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.