Loading saved model fails

I've trained a model and saved it in .h5 format. when I try loading it I received this error ValueError Traceback (most recent call last) ~\AppData\Local\Temp/ipykernel_588/661726548.py in <module> 9 # returns a compiled model 10 # identical to the previous one ---> 11 reconstructed_model = keras.models.load_model("./custom_model.h5") ~\Anaconda3\lib\site-packages\keras\utils\traceback_utils.py in error_handler(*args, **kwargs) 65 except Exception as e: # pylint: disable=broad-except 66 filtered_tb = _process_traceback_frames(e.__traceback__) ---> 67 raise e.with_traceback(filtered_tb) from None 68 finally: 69 del filtered_tb ~\Anaconda3\lib\site-packages\keras\utils\generic_utils.py in class_and_config_for_serialized_keras_object(config, module_objects, custom_objects, printable_module_name) 560 …
Category: Data Science

Variational AutoEncoder giving negative loss

I'm learning about variational autoencoders and I've implemented a simple example in keras, model summary below. I've copied the loss function from one of Francois Chollet's blog posts and I'm getting really really negative losses. What am I missing here? Model: "model_1" __________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== input_1 (InputLayer) [(None, 224)] 0 __________________________________________________________________________________________________ encoding_flatten (Flatten) (None, 224) 0 input_1[0][0] __________________________________________________________________________________________________ encoding_layer_2 (Dense) (None, 256) 57600 encoding_flatten[0][0] __________________________________________________________________________________________________ encoding_layer_3 (Dense) (None, 128) 32896 encoding_layer_2[0][0] __________________________________________________________________________________________________ encoding_layer_4 …
Category: Data Science

Val Loss and manually calculated loss produce different values

I have a CNN classification model that uses loss: binary cross entropy: optimizer_instance = Adam(learning_rate=learning_rate, decay=learning_rate / 200) model.compile(optimizer=optimizer_instance, loss='binary_crossentropy') We are saving the best model so the latest saved model is the one that achieved the best val_loss: es = EarlyStopping(monitor='val_loss', mode='min', verbose=0, patience=Config.LearningParameters.Patience) modelPath = modelFileFolder + Config.LearningParameters.ModelFileName checkpoint = keras.callbacks.ModelCheckpoint(modelPath , monitor='val_loss', save_best_only=True, save_weights_only=False, verbose=1) callbacks = [checkpoint,es] history = model.fit(x=training_generator, batch_size=Config.LearningParameters.Batch_size, epochs=Config.LearningParameters.Epochs, validation_data=validation_generator, callbacks=callbacks, verbose=1) on the course of the training the logs show that the …
Category: Data Science

Negative loss values for adaptive loss in tensorflow

I have used adaptive loss implementation on a neural network, however after training a model long enough, I am getting negative loss values. Any help/suggestion would be highly appreciated! Please let me know if you need additional info Model definition - hyperparameter_space = {"gru_up": 64, "up_dropout": 0.2, "learning_rate": 0.004} def many_to_one_model(params): input_1 = tf.keras.Input(shape = (1, 53), name = 'input_1') input_2 = tf.keras.Input(shape = (1, 19), name = 'input_2') input_3 = tf.keras.Input(shape = (1, 130), name = 'input_3') input_3_flatten = …
Category: Data Science

Trying to implement a loss function read from a journal-article in python

Computer science undergrad here. I am trying to understand Eqn 12 from this paper so that I can implement it in python code. In this paper, the NN model takes a blurred image as input and outputs a sharp (deblurred) image and the kernel that can produce the same blurred image after multiplying with the sharp image. Here - $\widetilde{K_t}$ = kernel predicted matrix $K_t^{train}$ = ground truth kernel (for training) matrix $\widetilde{X_t}$ = predicted sharp image matrix $X_t^{train}$ = …
Category: Data Science

Using Transaction Amount to Guide Learning in an Fraud Detection Machine Learning Model

I am currently using transaction amount as a feature in an XGBoost classification model designed to identify fraudulent transactions. Furthermore, transaction amount is bounded for this problem between 0 and 500. Using transaction amount as a feature does improve target class separability. However, I can't help but wonder if there is a better way to use this variable. To explain, I care more about getting the high transaction amount values correct than I do the low ones. However, the model …
Category: Data Science

High MAE and Loss with good performance, but low MAE and Loss with worse performance?

I have a deep-q reinforcement learning model, and when i train it with neural network A, I get high scores - 2-3x better than random (score for random is avrg 0 per step, completing the task after 223000 steps, score for this is 1-2 per step, completing in more like 80000 steps). The reported Mean absolute error for this ranges in the 200-250 range, and the loss at something like 2000 - 2100. When i train with neural network B, …
Category: Data Science

How to implement my own loss function for Prototype learning using Keras Model

I'm trying to migrate this code, "Omniglot Character Set Classification Using Prototypical Network", into Tensorflow 2.1.0 and Keras 2.3.1. My problem is about how to use euclidean distance between train data and validation data. Look at this code: def convolution_block(inputs, out_channels, name='conv'): conv = tf.layers.conv2d(inputs, out_channels, kernel_size=3, padding='SAME') conv = tf.contrib.layers.batch_norm(conv, updates_collections=None, decay=0.99, scale=True, center=True) conv = tf.nn.relu(conv) conv = tf.contrib.layers.max_pool2d(conv, 2) return conv def get_embeddings(support_set, h_dim, z_dim, reuse=False): net = convolution_block(support_set, h_dim) net = convolution_block(net, h_dim) net = convolution_block(net, …
Category: Data Science

how to tune hyperparameters inn regression neural network

hope you are enjoying good health,i am trying to built a simple neural network which has to predict a shear wave well log values from other well logs,but my model's is stuck in mean absolute error of 2.45 and it is not improving further,i have changed the number of neurons,learning rate,loss function but of no use. Here is my model: tf.random.set_seed(42) model=tf.keras.Sequential([ tf.keras.layers.Dense(22,activation='relu'), tf.keras.layers.Dense(1) ]) #commpiling: model.compile( loss=tf.losses.mae, optimizer=tf.optimizers.Adam(learning_rate=0.006), metrics=['mae'] ) #fitting: history=model.fit(x_train,y_train,epochs=1000,verbose=0,) #evaluation: model.evaluate(x_test,y_test) here is the boxplot of …
Category: Data Science

Regression sequence output loss function

I am fairly new to deep learning, and I have the following task. Based on an audio sequence of shape (200, 1024), I have to predict two sequences of shape (200, 1) of continuous values (for e.g 0.5687) that represent the emotion at each timestep (valence "v" and arousal "a"). So I've created the following LSTM: inputs_audio = Input(shape=(200, 1024)) audio_net = LSTM(256, return_sequences=True)(inputs_audio) audio_net = LSTM(256, return_sequences=True)(audio_net) audio_net = LSTM(256, return_sequences=False)(audio_net) audio_net = Dropout(0.3)(audio_net) final_model = audio_net target_names = …
Category: Data Science

Large jumps in loss in simple transformer model?

As an exercise, I created a very simple transformer model that just sees the same simple batch of dummy data repeatedly and (one would assume) should quickly learn to fit it perfectly. And indeed, training reaches a loss of zero quickly. However I noticed that the loss does not stay at zero, or even close to it: there are occasional large jumps in the loss. The script below counts every time that the loss jumps by 10 or more between …
Category: Data Science

Why is each successive tree in GBM fit on the negative gradient of the loss function?

Page 359 of Elements Of Statistical Learning 2nd edition says the below. Can someone explain the intuition & simplify it in layman terms? Questions What is the reason/intuition & math behind fitting each successive tree in GBM on the negative gradient of the loss function? Is it done to make GBM more generalization on unseen test dataset? If so how does fitting on negative gradient achieve this generalization on test data?
Category: Data Science

uncertainties in non-convex optimization problems (neural networks)

How do you treat statistical uncertainties coming from non-convex optimization problems? More specifically, suppose you have a neural network. It is well known that the loss is not convex; the optimization procedure with any approximated stochastic optimizer together with the random weights initialization introduce some randomness in the training process, translating into different "optimal" regions reached at the end of training. Now, supposing that any minimum of the loss is an acceptable solution there are no guarantees that those minima …
Category: Data Science

calculating gradient descent

when using mini batch gradient descent , we perform backpropagation after each batch , ie we calculate the gradient after each batch , we also capture y-hat after each sample in the batch and finally calculate the loss function all over the batch , we use this latter to calculate the gradient, correct ? now as the chainrule states we calculate the gradient this way for the below neural network: the question is if we calculate the gradient after passing …
Category: Data Science

Converting a negative loss term to inverse

I'm training a classifier using this loss function: $$ \mathcal{L} = \mathcal{L}_{CE} - \lambda_1 \mathcal{L}_{push} +\lambda_2 \mathcal{L}_{pull} $$ I need to maximize a certain value using $\mathcal{L}_{push}$ and that's why it has a negative coefficient. The problem is while I'm training the model the loss value became negative and I keep getting random accuracy results. I tried changing $- \lambda_1 \mathcal{L}_{push}$ to $\lambda_1 \frac{1}{ \mathcal{L}_{push}}$ to get numeric stability and results are not bad anymore. The thing is I'm not …
Category: Data Science

how to calculate loss function?

i hope you are doing well , i want to ask a question regarding loss function in a neural network i know that the loss function is calculated for each data point in the training set , and then the backpropagation is done depending on if we are using batch gradient descent (backpropagation is done after all the data points are passed) , mini-batch gradient descent(backpropagation is done after batch) or stochastic gradient descent(backpropagation is done after each data point). …
Category: Data Science

Is the Cross Entropy Loss important at all, because at Backpropagation only the Softmax probability and the one hot vector are relevant?

Is the Cross Entropy Loss (CEL) important at all, because at Backpropagation (BP) only the Softmax (SM) probability and the one hot vector are relevant? When applying BP, the derivative of CEL is the difference between the output probability (SM) and the one hot encoded vector. For me the CEL output, which is very sophisticated, does not play any roll for learning. I´m expecting a fallacy in my reasoning, so could somebody please help me out?
Category: Data Science

Differentiable loss function for ranking problem in regression model

In regression problem, we may need a loss function to measure the relative ranking accuracy between targets $y$ and predicted values $y_{pred}$. Abviously, the simple MSE does not consider such ranking relations. A straight choice is the so-called IC (Information Coefficient) $$IC\propto corr(\text{rank}(y), \text{rank}(y_{pred}))$$ which use the correlation between two ranks. However, rank function is not differentiable, thus it can't be used in loss function for regression which uses gradient propagation to update the parameters. Another choice might be a …
Category: Data Science

If two functions are close apart can I proof the difference of their empirical loss is also small?

I am trying to understand the proof of Theorem 3 in the paper "A Universal Law of Robustness via isoperimetry" by Bubeck and Sellke. Basically there exist atleast one $w_{L,e}$ in $\mathcal{W}_{L,e}$ for which there is another $w_{L}$ in $\mathcal{W}_{L}$ $\frac{\epsilon}{6j}$ apart. And this this makes clear, $$||f_{w_{L,\epsilon}} - f_{w_{L}} ||_{\infty} = J * \epsilon/ 6J = \epsilon/6 \cdot \cdot \cdot\cdot\cdot\cdot\cdot\cdot\cdot (a) $$ using this assumption $1$ $$\boxed{ \left\|f_{\boldsymbol{w}_{1}}-f_{\boldsymbol{w}_{2}}\right\|_{\infty} \leq J\left\|\boldsymbol{w}_{1}-\boldsymbol{w}_{2}\right\|} $$ This equation (a) denotes that those two functions …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.