Cuda's 'libcudaart.so' libraries cannot be loaded
I'm trying to run my tensorflow model (v2.4.1) in an AWS instance with CUDA drivers installed:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
Tensorflow cannot load 'libcudart.so.11.0', however:
python3 -c 'import tensorflow as tf; print(tf.__version__)'
2021-04-29 07:36:15.555572: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-04-29 07:36:15.555616: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2.4.1
When I run my original script (I added from tensorflow.python.client import device_lib print(device_lib.list_local_devices())
) it prints more errors:
2021-04-29 07:39:38.636400: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-04-29 07:39:38.636444: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-04-29 07:39:40.336298: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-04-29 07:39:40.337286: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-04-29 07:39:40.338236: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-04-29 07:39:42.253057: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-04-29 07:39:42.253696: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:00:1e.0 name: Tesla M60 computeCapability: 5.2
coreClock: 1.1775GHz coreCount: 16 deviceMemorySize: 7.44GiB deviceMemoryBandwidth: 149.31GiB/s
2021-04-29 07:39:42.253810: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-04-29 07:39:42.253891: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory
2021-04-29 07:39:42.253962: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
2021-04-29 07:39:42.255917: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-04-29 07:39:42.256253: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-04-29 07:39:42.258383: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-04-29 07:39:42.258482: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory
2021-04-29 07:39:42.258559: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2021-04-29 07:39:42.258580: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-04-29 07:39:42.354292: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-04-29 07:39:42.354329: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-04-29 07:39:42.354348: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
[name: /device:CPU:0
device_type: CPU
memory_limit: 268435456
locality {
}
incarnation: 851600583298219116
]
I tried to look for the 'libcudart.so.11.0' and it really doesn't exist. My system has the following library:
$ sudo find / -name libcudart.*
/usr/lib/x86_64-linux-gnu/libcudart.so
/usr/lib/x86_64-linux-gnu/libcudart.so.10.1.243
/usr/lib/x86_64-linux-gnu/libcudart.so.10.1
/usr/share/man/man7/libcudart.7.gz
/usr/share/man/man7/libcudart.so.7.gz
How can I solve this issue? I've seen people recommending using a symlink, but I'm a bit hesitant because I'm new to tensorflow.
Many thanks!
Topic tensorflow library
Category Data Science