First some stupid sanity-check questions: do you have a GPU in your local machine? (you didn't mention that explicitly). I ask because it will not work e.g. on an integrated Intel graphics card found in some laptops.
Second, you installed Keras and Tensorflow, but did you install the GPU version of Tensorflow? Using Anaconda, this would be done with the command:
conda install -c anaconda tensorflow-gpu
Other useful things to know:
- what operating system are you using? (I assume Linux e.g. Ubuntu)
- what GPU do you expect to be shown as available?
- Can you run the command:
nvidia-smi
in a terminal and update your question with the output?
If you have installed the correct package (the above method is one of a few possible ways of doing it), and if you have an Nvidia-GPU available, Tensorflow would usually by default reserve all available memory of the GPU as soon as it starts building the static graph.
If you were not already, it is probably a good idea to use a conda environment, which keeps the requirements of your model separate from whatever your system might already have installed. Have a look at this nice little walkthrough on getting started - this will likely be a good test to see if your system is able to run models on a GPU as it removes all other possible problems created components unrelated to your script. In short, create and activate a new envrinment that includes the GPU version of Tensowflow like this:
conda create --name gpu_test tensorflow-gpu # creates the env and installs tf
conda activate gpu_test # activate the env
python test_gpu_script.py # run the script given below
UPDATE
I would suggest running a small script to execute a few operations in Tensorflow on a CPU and on a GPU. This will rule out the problem that you might not have enough memory for the RNN you're trying to train.
I made a script to do that, so it tests the same operations on a CPU and GPU and prints a summary. You should be able to just copy-paste the code and run it:
import numpy as np
import tensorflow as tf
from datetime import datetime
# Choose which device you want to test on: either 'cpu' or 'gpu'
devices = ['cpu', 'gpu']
# Choose size of the matrix to be used.
# Make it bigger to see bigger benefits of parallel computation
shapes = [(50, 50), (100, 100), (500, 500), (1000, 1000)]
def compute_operations(device, shape):
"""Run a simple set of operations on a matrix of given shape on given device
Parameters
----------
device : the type of device to use, either 'cpu' or 'gpu'
shape : a tuple for the shape of a 2d tensor, e.g. (10, 10)
Returns
-------
out : results of the operations as the time taken
"""
# Define operations to be computed on selected device
with tf.device(device):
random_matrix = tf.random_uniform(shape=shape, minval=0, maxval=1)
dot_operation = tf.matmul(random_matrix, tf.transpose(random_matrix))
sum_operation = tf.reduce_sum(dot_operation)
# Time the actual runtime of the operations
start_time = datetime.now()
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as session:
result = session.run(sum_operation)
elapsed_time = datetime.now() - start_time
return result, elapsed_time
if __name__ == '__main__':
# Run the computations and print summary of each run
for device in devices:
print("--" * 20)
for shape in shapes:
_, time_taken = compute_operations(device, shape)
# Print the result and also the time taken on the selected device
print("Input shape:", shape, "using Device:", device, "took: {:.2f}".format(time_taken.seconds + time_taken.microseconds/1e6))
#print("Computation on shape:", shape, "using Device:", device, "took:")
print("--" * 20)
Results from running on CPU:
Computation on shape: (50, 50), using Device: 'cpu' took: 0.04s
Computation on shape: (500, 500), using Device: 'cpu' took: 0.05s
Computation on shape: (1000, 1000), using Device: 'cpu' took: 0.09s
Computation on shape: (10000, 10000), using Device: 'cpu' took: 32.81s
Results from running on GPU:
Computation on shape: (50, 50), using Device: 'gpu' took: 0.03s
Computation on shape: (500, 500), using Device: 'gpu' took: 0.04s
Computation on shape: (1000, 1000), using Device: 'gpu' took: 0.04s
Computation on shape: (10000, 10000), using Device: 'gpu' took: 0.05s