How to make my Neural Netwok run on GPU instead of CPU

I have installed Anaconda3 and have installed latest versions of Keras and Tensorflow.

Running this command :

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

I find the Notebook is running in CPU:

[name: "/device:CPU:0" device_type: "CPU" memory_limit: 268435456 locality { } incarnation: 2888773992010593937 ]

This is my Nvidia version:

nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2018 NVIDIA Corporation Built on Sat_Aug_25_21:08:04_Central_Daylight_Time_2018 Cuda compilation tools, release 10.0, V10.0.130

running nvidia-smi, I'm getting this result:

I want to make the neural network to train on GPU. Please help me in switching from CPU to GPU.

Topic nvidia gpu keras tensorflow

Category Data Science


First some stupid sanity-check questions: do you have a GPU in your local machine? (you didn't mention that explicitly). I ask because it will not work e.g. on an integrated Intel graphics card found in some laptops.

Second, you installed Keras and Tensorflow, but did you install the GPU version of Tensorflow? Using Anaconda, this would be done with the command:

conda install -c anaconda tensorflow-gpu 

Other useful things to know:

  • what operating system are you using? (I assume Linux e.g. Ubuntu)
  • what GPU do you expect to be shown as available?
  • Can you run the command: nvidia-smi in a terminal and update your question with the output?

If you have installed the correct package (the above method is one of a few possible ways of doing it), and if you have an Nvidia-GPU available, Tensorflow would usually by default reserve all available memory of the GPU as soon as it starts building the static graph.


If you were not already, it is probably a good idea to use a conda environment, which keeps the requirements of your model separate from whatever your system might already have installed. Have a look at this nice little walkthrough on getting started - this will likely be a good test to see if your system is able to run models on a GPU as it removes all other possible problems created components unrelated to your script. In short, create and activate a new envrinment that includes the GPU version of Tensowflow like this:

conda create --name gpu_test tensorflow-gpu    # creates the env and installs tf
conda activate gpu_test                        # activate the env
python test_gpu_script.py                      # run the script given below

UPDATE

I would suggest running a small script to execute a few operations in Tensorflow on a CPU and on a GPU. This will rule out the problem that you might not have enough memory for the RNN you're trying to train.

I made a script to do that, so it tests the same operations on a CPU and GPU and prints a summary. You should be able to just copy-paste the code and run it:

import numpy as np
import tensorflow as tf
from datetime import datetime

# Choose which device you want to test on: either 'cpu' or 'gpu'
devices = ['cpu', 'gpu']

# Choose size of the matrix to be used.
# Make it bigger to see bigger benefits of parallel computation
shapes = [(50, 50), (100, 100), (500, 500), (1000, 1000)]


def compute_operations(device, shape):
    """Run a simple set of operations on a matrix of given shape on given device

    Parameters
    ----------
    device : the type of device to use, either 'cpu' or 'gpu' 
    shape : a tuple for the shape of a 2d tensor, e.g. (10, 10)

    Returns
    -------
    out : results of the operations as the time taken
    """

    # Define operations to be computed on selected device
    with tf.device(device):
        random_matrix = tf.random_uniform(shape=shape, minval=0, maxval=1)
        dot_operation = tf.matmul(random_matrix, tf.transpose(random_matrix))
        sum_operation = tf.reduce_sum(dot_operation)

    # Time the actual runtime of the operations
    start_time = datetime.now()
    with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as session:
            result = session.run(sum_operation)
    elapsed_time = datetime.now() - start_time

    return result, elapsed_time



if __name__ == '__main__':

    # Run the computations and print summary of each run
    for device in devices:
        print("--" * 20)

        for shape in shapes:
            _, time_taken = compute_operations(device, shape)

            # Print the result and also the time taken on the selected device
            print("Input shape:", shape, "using Device:", device, "took: {:.2f}".format(time_taken.seconds + time_taken.microseconds/1e6))
            #print("Computation on shape:", shape, "using Device:", device, "took:")

    print("--" * 20)

Results from running on CPU:

Computation on shape: (50, 50),       using Device: 'cpu' took: 0.04s
Computation on shape: (500, 500),     using Device: 'cpu' took: 0.05s
Computation on shape: (1000, 1000),   using Device: 'cpu' took: 0.09s
Computation on shape: (10000, 10000), using Device: 'cpu' took: 32.81s

Results from running on GPU:

Computation on shape: (50, 50),       using Device: 'gpu' took: 0.03s
Computation on shape: (500, 500),     using Device: 'gpu' took: 0.04s
Computation on shape: (1000, 1000),   using Device: 'gpu' took: 0.04s
Computation on shape: (10000, 10000), using Device: 'gpu' took: 0.05s

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.