Cuda's 'libcudaart.so' libraries cannot be loaded

I'm trying to run my tensorflow model (v2.4.1) in an AWS instance with CUDA drivers installed:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

Tensorflow cannot load 'libcudart.so.11.0', however:

 python3 -c 'import tensorflow as tf; print(tf.__version__)'
2021-04-29 07:36:15.555572: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-04-29 07:36:15.555616: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2.4.1

When I run my original script (I added from tensorflow.python.client import device_lib print(device_lib.list_local_devices())) it prints more errors:

2021-04-29 07:39:38.636400: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-04-29 07:39:38.636444: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-04-29 07:39:40.336298: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-04-29 07:39:40.337286: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-04-29 07:39:40.338236: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-04-29 07:39:42.253057: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-29 07:39:42.253696: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:00:1e.0 name: Tesla M60 computeCapability: 5.2
coreClock: 1.1775GHz coreCount: 16 deviceMemorySize: 7.44GiB deviceMemoryBandwidth: 149.31GiB/s
2021-04-29 07:39:42.253810: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-04-29 07:39:42.253891: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2021-04-29 07:39:42.253962: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2021-04-29 07:39:42.255917: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-04-29 07:39:42.256253: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-04-29 07:39:42.258383: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-04-29 07:39:42.258482: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2021-04-29 07:39:42.258559: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2021-04-29 07:39:42.258580: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-04-29 07:39:42.354292: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-04-29 07:39:42.354329: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0
2021-04-29 07:39:42.354348: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N
[name: /device:CPU:0
device_type: CPU
memory_limit: 268435456
locality {
}
incarnation: 851600583298219116
]

I tried to look for the 'libcudart.so.11.0' and it really doesn't exist. My system has the following library:

$ sudo find / -name libcudart.*
/usr/lib/x86_64-linux-gnu/libcudart.so
/usr/lib/x86_64-linux-gnu/libcudart.so.10.1.243
/usr/lib/x86_64-linux-gnu/libcudart.so.10.1
/usr/share/man/man7/libcudart.7.gz
/usr/share/man/man7/libcudart.so.7.gz

How can I solve this issue? I've seen people recommending using a symlink, but I'm a bit hesitant because I'm new to tensorflow.

Many thanks!

Topic tensorflow library

Category Data Science


It seems that you have cuda 10 installed, and not version 11 which is what TensorFlow is looking for. It has rather specific requirements to get working.

Tensorflow 2.4.1 requires cuda 11 and cudnn 8 (see GPU table of requirements here). I would suggest checking also, your nvidia driver version, 450.x or higher is required.

This is the TensorFlow official documentations on how to get all requirements installed for GPU training/inference, look under which OS you are running.

If you just wanted to update cuda via conda, you could do a conda install cudatoolkit=11.0.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.