Why is Tensorflow LSTM training slower on a machine with far better components?
Training an LSTM using the exact code and dataset on two different machines with different components yields different results in terms of training time. However, for my case, the results were the opposite of what was expected. Is there reasoning for this? Perhaps I'm not making full use of the second machine.
Both machines are running identical versions of CUDA 10.1, cuDNN 7.6.5.32, Python 3.8 along with relevant modules installed a few days ago at the same time (tensorflow, tensorflow-gpu,keras,scikit-learn,numpy,pandas,finnhub-python).
On the first machine is a laptop running a mobile Intel Core i7, and a GTX 1080, here were the results (13s training time):
C:\Users\Keanu\AppData\Local\Programs\Python\Python38\python.exe C:/Users/Keanu/PycharmProjects/OptionPriceLookback/Backmon/Backmon.py
2020-06-19 12:39:56.922182: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
Using TensorFlow backend.
Train size: (9004,)
Test size: (3000,)
2020-06-19 12:39:59.604040: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-06-19 12:39:59.618801: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1080 computeCapability: 6.1
coreClock: 1.771GHz coreCount: 20 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 298.32GiB/s
2020-06-19 12:39:59.618962: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-06-19 12:39:59.623763: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-06-19 12:39:59.627131: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-06-19 12:39:59.628207: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-06-19 12:39:59.631473: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-06-19 12:39:59.633438: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-06-19 12:39:59.639704: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-06-19 12:39:59.639846: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-06-19 12:39:59.640277: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-06-19 12:39:59.646988: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x24845fa17f0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-06-19 12:39:59.647131: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-06-19 12:39:59.647314: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1080 computeCapability: 6.1
coreClock: 1.771GHz coreCount: 20 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 298.32GiB/s
2020-06-19 12:39:59.647464: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-06-19 12:39:59.647544: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-06-19 12:39:59.647616: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-06-19 12:39:59.647688: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-06-19 12:39:59.647851: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-06-19 12:39:59.647924: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-06-19 12:39:59.648039: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-06-19 12:39:59.648156: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-06-19 12:40:00.057990: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-19 12:40:00.058074: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2020-06-19 12:40:00.058121: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2020-06-19 12:40:00.058289: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6280 MB memory) - physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
2020-06-19 12:40:00.060791: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2486ed35b50 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-06-19 12:40:00.060890: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce GTX 1080, Compute Capability 6.1
Epoch 1/5
2020-06-19 12:40:05.451220: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-06-19 12:40:05.622984: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
1940/1940 - 14s - loss: 0.0044
Epoch 2/5
1940/1940 - 14s - loss: 0.0025
Epoch 3/5
1940/1940 - 13s - loss: 0.0022
Epoch 4/5
1940/1940 - 13s - loss: 0.0018
Epoch 5/5
1940/1940 - 13s - loss: 0.0018
Process finished with exit code 0
On the second machine is a full desktop running two RTX 2080Tis and an overclocked Intel Core i7-8700K. Here are the results (18s training time):
C:\Users\Keanu\AppData\Local\Programs\Python\Python38\python.exe C:/Users/Keanu/PycharmProjects/DeepLearning/playground.py
2020-06-19 15:48:22.355678: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
Using TensorFlow backend.
Train size: (9003,)
Test size: (3000,)
2020-06-19 15:48:24.766386: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-06-19 15:48:24.817440: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.65GHz coreCount: 68 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 573.69GiB/s
2020-06-19 15:48:24.817765: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 1 with properties:
pciBusID: 0000:02:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.65GHz coreCount: 68 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 573.69GiB/s
2020-06-19 15:48:24.817897: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-06-19 15:48:24.821972: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-06-19 15:48:24.824697: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-06-19 15:48:24.825604: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-06-19 15:48:24.828585: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-06-19 15:48:24.830096: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-06-19 15:48:24.841935: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-06-19 15:48:24.842761: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0, 1
2020-06-19 15:48:24.843164: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-06-19 15:48:24.849308: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x26ff9825290 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-06-19 15:48:24.849407: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-06-19 15:48:25.203999: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.65GHz coreCount: 68 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 573.69GiB/s
2020-06-19 15:48:25.204287: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 1 with properties:
pciBusID: 0000:02:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.65GHz coreCount: 68 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 573.69GiB/s
2020-06-19 15:48:25.204421: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-06-19 15:48:25.204489: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-06-19 15:48:25.204557: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-06-19 15:48:25.204625: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-06-19 15:48:25.204692: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-06-19 15:48:25.204763: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-06-19 15:48:25.204831: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-06-19 15:48:25.205450: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0, 1
2020-06-19 15:48:25.915277: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-19 15:48:25.915355: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0 1
2020-06-19 15:48:25.915400: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N Y
2020-06-19 15:48:25.915445: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 1: Y N
2020-06-19 15:48:25.916204: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8513 MB memory) - physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-06-19 15:48:25.917148: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 8513 MB memory) - physical GPU (device: 1, name: GeForce RTX 2080 Ti, pci bus id: 0000:02:00.0, compute capability: 7.5)
2020-06-19 15:48:25.919099: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x26fb09bc490 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-06-19 15:48:25.919190: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5
2020-06-19 15:48:25.919259: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (1): GeForce RTX 2080 Ti, Compute Capability 7.5
Epoch 1/5
2020-06-19 15:48:30.751854: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-06-19 15:48:31.004408: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
1940/1940 - 18s - loss: 0.0041
Epoch 2/5
1940/1940 - 18s - loss: 0.0025
Epoch 3/5
1940/1940 - 18s - loss: 0.0020
Epoch 4/5
1940/1940 - 18s - loss: 0.0018
Epoch 5/5
1940/1940 - 18s - loss: 0.0017
Process finished with exit code 0
Here's a run on the same desktop except using only one of the GPUs (18s training time, notice that although it detects two GPUs in the beginning, only one device is added to StreamExecutorDevice):
C:\Users\Keanu\AppData\Local\Programs\Python\Python38\python.exe C:/Users/Keanu/PycharmProjects/DeepLearning/playground.py
2020-06-19 15:52:22.600070: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
Using TensorFlow backend.
2020-06-19 15:52:23.889151: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-06-19 15:52:23.935351: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.65GHz coreCount: 68 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 573.69GiB/s
2020-06-19 15:52:23.935655: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 1 with properties:
pciBusID: 0000:02:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.65GHz coreCount: 68 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 573.69GiB/s
2020-06-19 15:52:23.935793: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-06-19 15:52:23.939843: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-06-19 15:52:23.942405: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-06-19 15:52:23.943265: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-06-19 15:52:23.946439: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-06-19 15:52:23.948040: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-06-19 15:52:23.959539: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-06-19 15:52:23.965942: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0, 1
Train size: (9003,)
Test size: (3000,)
2020-06-19 15:52:25.432354: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-06-19 15:52:25.438156: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x152c9051da0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-06-19 15:52:25.438306: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-06-19 15:52:25.438750: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.65GHz coreCount: 68 deviceMemorySize: 11.00GiB deviceMemoryBandwidth: 573.69GiB/s
2020-06-19 15:52:25.438885: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
2020-06-19 15:52:25.438956: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-06-19 15:52:25.439026: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cufft64_10.dll
2020-06-19 15:52:25.439093: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library curand64_10.dll
2020-06-19 15:52:25.439158: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusolver64_10.dll
2020-06-19 15:52:25.439226: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cusparse64_10.dll
2020-06-19 15:52:25.439307: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-06-19 15:52:25.439688: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2020-06-19 15:52:26.034751: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-19 15:52:26.034836: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2020-06-19 15:52:26.034882: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2020-06-19 15:52:26.035478: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8513 MB memory) - physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2020-06-19 15:52:26.037626: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x152f50b35b0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-06-19 15:52:26.037730: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5
Epoch 1/5
2020-06-19 15:52:30.912600: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll
2020-06-19 15:52:31.166746: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
1940/1940 - 18s - loss: 0.0046
Epoch 2/5
1940/1940 - 18s - loss: 0.0023
Epoch 3/5
1940/1940 - 18s - loss: 0.0020
Epoch 4/5
1940/1940 - 18s - loss: 0.0018
Epoch 5/5
1940/1940 - 18s - loss: 0.0016
Process finished with exit code 0
Code if interested (you should be able to just run this with Python 3.8 and all modules installed):
import finnhub
from datetime import datetime
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
#for deep learning model
from keras import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
import math
# Configure API key
configuration = finnhub.Configuration(
api_key={
'token': 'brm0q6vrh5re8ma1ote0' # Replace this
}
)
finnhub_client = finnhub.DefaultApi(finnhub.ApiClient(configuration))
ticker = 'PENN'
start_date = datetime.fromisoformat('2019-11-04')
end_date = datetime.fromisoformat('2020-06-10')
stocks = finnhub_client.stock_candles(ticker, '5', int(start_date.timestamp()), int(end_date.timestamp()),adjusted=True)
keys = ['o','c','h','l','v','t']
closed_candles = stocks.c
timestamps = stocks.t
train_size = 0.75
test_size = 0.25
df = pd.DataFrame(zip(closed_candles,timestamps),columns=['close','dates'])
df.dates = df.dates.apply(lambda x: datetime.fromtimestamp(x) )
train_set = df.close[:math.ceil(len(df.index)*0.75)].values
test_set = df.close[math.ceil(len(df.index)*0.75)+1:].values
print(Train size: ,train_set.shape)
print(Test size:,test_set.shape)
plt.plot_date(df.dates, df.close,fmt='-')
plt.suptitle = ticker
plt.show()
sc = MinMaxScaler()
train_set_scaled = sc.fit_transform(np.array(train_set).reshape(-1,1))
x_train = []
y_train = []
for i in range(60,2000):
x_train.append(train_set_scaled[i-60:i,0])
y_train.append(train_set_scaled[i,0])
x_train = np.array(x_train)
y_train = np.array(y_train)
x_train = np.reshape(x_train,(x_train.shape[0],x_train.shape[1],1))
reg = Sequential()
reg.add(LSTM(units = 50,return_sequences=True,input_shape=(x_train.shape[1],1)))
reg.add(Dropout(0.2))
reg.add(LSTM(units = 50,return_sequences=True))
reg.add(Dropout(0.2))
reg.add(LSTM(units = 50,return_sequences=True))
reg.add(Dropout(0.2))
reg.add(LSTM(units=50))
reg.add(Dropout(0.2))
reg.add(Dense(units=1))
reg.compile(optimizer = 'adam',loss='mean_squared_error')
reg.fit(x_train,y_train, epochs=5, batch_size =1,verbose=2)
input = df.close[len(df.close)-len(test_set)-60:].values
input = sc.transform(np.array(input).reshape(-1,1))
x_test = []
for i in range(60,95):
x_test.append(input[i-60:i,0])
x_test = np.array(x_test)
x_test = np.reshape(x_test,(x_test.shape[0],x_test.shape[1],1))
pred = reg.predict(x_test)
pred = sc.inverse_transform(pred)
plt.plot(test_set,color='green')
plt.plot(pred,color='red')
plt.title('Stock_prediction')
plt.show()
Topic cuda keras tensorflow python machine-learning
Category Data Science