Loading saved model fails

I've trained a model and saved it in .h5 format. when I try loading it I received this error ValueError Traceback (most recent call last) ~\AppData\Local\Temp/ipykernel_588/661726548.py in <module> 9 # returns a compiled model 10 # identical to the previous one ---> 11 reconstructed_model = keras.models.load_model("./custom_model.h5") ~\Anaconda3\lib\site-packages\keras\utils\traceback_utils.py in error_handler(*args, **kwargs) 65 except Exception as e: # pylint: disable=broad-except 66 filtered_tb = _process_traceback_frames(e.__traceback__) ---> 67 raise e.with_traceback(filtered_tb) from None 68 finally: 69 del filtered_tb ~\Anaconda3\lib\site-packages\keras\utils\generic_utils.py in class_and_config_for_serialized_keras_object(config, module_objects, custom_objects, printable_module_name) 560 …
Category: Data Science

What enables transformers or very deep models "plan" ahead for sequential decision making?

I was watching this amazing lecture by Oriol Vinyals. On one slide, there is a question asking if the very deep models plan. Transformer models or models employed in applications like Dialogue Generation do not have a planning component but behave like they already have the dialogue planned. Dr. Vinyals mentioned that there are papers on "how transformers are building up knowledge to answer questions or do all sorts of very interesting analyses". Can any please refer to a few …
Category: Data Science

Neural network / machine learning approach to model specific sequencing-classification problem in industry

I am working on a project which involves developing a machine learning/deep learning for an application in a roll-to-roll industry. For a long time, I have been looking for similar problems as a way to get some guidance but I was never able to find anything related. Basically, the problem can be seen as follows: An industrial machine is producing a roll of some material, which tends to have visible defects throughout the roll. I have already available a machine …
Category: Data Science

Class token in ViT and BERT

I'm trying to understand the architecture of the ViT Paper, and noticed they use a CLASS token like in BERT. To the best of my understanding this token is used to gather knowledge of the entire class, and is then solely used to predict the class of the image. My question is — why does this token exist as input in all the transformer blocks and is treated the same as the word / patches tokens? Treating the class token …
Category: Data Science

Intuitively, why do Non-monotonic Activations Work?

The swish/SiLU activation is very popular, and many would argue it has dethroned ReLU. However, it is non-monotonic, which seems to go against popular intuition (at least on this site: example 1, example 2). Reading the swish paper, the justification that the authors give is that non-monotonicity "increases expressivity and improves gradient flow... [and] may also provide some robustness to different initializations and learning rates." The authors provide an image to back up this claim, but at best this argument …
Category: Data Science

Computing probabilities in Plackett-Luce model

I am trying to implement a Plackett-Luce model for learning to rank from click data. Specifically, I am following the paper: Doubly-Robust Estimation for Correcting Position-Bias in Click Feedback for Unbiased Learning to Rank. The objective function is the reward function similar to the one used in reinforcement learning $R_d$ is the reward for document d, $\pi(k \vert d)$ is the probability of document d placed at position k for a given query q. $w_k$ is the weight of position …
Category: Data Science

Facebook picture labeling

I want to train a neural network and use open CV for facial recognition. Nicholas Martin, whose a user here on SE told me that this is a supervised learning class (clearly). So I need pictures and corresponding labels. So I thought hey! Maybe Facebook could be of help. So how can I label potentially millions of facebook pictures? Would it be by the user's profile name or is there a way to find out the name of the person …
Category: Data Science

Is a dense layer required for implementing Bahdanau attention?

I saw that everyone adds Dense( ) layer in their custom Bahdanau attention layer, which I think isn't needed. This is an image from a tutorial here. Here, we are just multiplying 2 vectors and then doing several operations on these vectors only. So what is the need of Dense( ) layer. Is the tutorial on 'how does attention work' wrong?
Category: Data Science

Why convolutional layer learns only biases?

I`m training a siamese CNN to distinguish between pairs of images and though my train/val binary cross-entropy loss values show negative trend, implying some of the model parameters are being updated, I noticed that convolution kernels barely change while their biases change significantly tensorboard weights histogram image. Also, while loss value decreases, accuracy appears to be frozen for some epochs and then instantly shoots up accuracy and loss plots. Q1: If this is caused by vanishing gradient, why would it …
Category: Data Science

Fast AI Lesson 4 - MNIST. Confused about multiplying weights by pixels?

I’m on lesson 4 of the Fast AI "Deep Learning for Coders" course, and have been back through the same lesson a few times now but I don’t think I’m quite getting a few things. I want to have an understanding of what’s going on before moving on. This lesson is on MNIST - and Jeremy is recognising 3s vs 7s. So he has 12000 images (ignoring mini-batches) of about 800 pixels each, and his tensor has a shape of …
Category: Data Science

How to use text as an input for a neural network - regression problem? How many likes/claps an article will get

I am trying to predict the number of likes an article or a post will get using a NN. I have a dataframe with ~70,000 rows and 2 columns: "text" (predictor - strings of text) and "likes" (target - continuous int variable). I've been reading on the approaches that are taken in NLP problems, but I feel somewhat lost as to what the input for the NN should look like. Here is what I did so far: Text cleaning: removing …
Category: Data Science

Find highest reward for epsilon-greedy bandit program

I started to learn reinforcement learning, the first example is handling bandit program using epsilon-greedy method, In this example, there are three bandit machines used, the output is the mean value for all bandit machines and cumulative average with respect to the epsilon value The code - class Bandit: def __init__(self, m): self.m = m self.mean = 0 self.N = 0 def pull(self): return np.random.randn() + self.m def update(self, x): self.N += 1 self.mean = (1 - 1.0/self.N)*self.mean + 1.0/self.N*x …
Category: Data Science

False positive in Multi class Image classification

I am training a neural network with some convolution layers for multi class image classification. I am using keras to build and train the model. I am using 1600 images for all categories for training. I have used softmax as final layer activation function. The model predicts well on all True categories with high softmax probability. But when I test model on new or unknown data, it predicts with high softmax probability. How can I reduce that? Should I make …
Category: Data Science

Val Loss and manually calculated loss produce different values

I have a CNN classification model that uses loss: binary cross entropy: optimizer_instance = Adam(learning_rate=learning_rate, decay=learning_rate / 200) model.compile(optimizer=optimizer_instance, loss='binary_crossentropy') We are saving the best model so the latest saved model is the one that achieved the best val_loss: es = EarlyStopping(monitor='val_loss', mode='min', verbose=0, patience=Config.LearningParameters.Patience) modelPath = modelFileFolder + Config.LearningParameters.ModelFileName checkpoint = keras.callbacks.ModelCheckpoint(modelPath , monitor='val_loss', save_best_only=True, save_weights_only=False, verbose=1) callbacks = [checkpoint,es] history = model.fit(x=training_generator, batch_size=Config.LearningParameters.Batch_size, epochs=Config.LearningParameters.Epochs, validation_data=validation_generator, callbacks=callbacks, verbose=1) on the course of the training the logs show that the …
Category: Data Science

Overfitting in CNN

I am training a VGG net on STL-10 dataset I am getting Top-5 validation accuracy about 98% and Top-1 validation accuracy about 83% But both the Top-1 and Top-5 Training accuracy is reaching 100% Does this mean that the network is over-fitting? Or not? Code:: def conv2d(inp,name,kshape,s): with tf.variable_scope(name) as scope: kernel = get_weights('weights',shape=kshape) conv = tf.nn.conv2d(inp,kernel,[1,s,s,1],'SAME') bias = get_bias('biases',shape=kshape[3]) preact = tf.nn.bias_add(conv,bias) convlayer = tf.nn.relu(preact,name=scope.name) return convlayer def maxpool(inp,name,k,s): return tf.nn.max_pool(inp,ksize=[1,k,k,1],strides=[1,s,s,1],padding='SAME',name=name) def loss(logits,labels): labels = tf.reshape(tf.cast(labels,tf.int64),[-1]) #print labels.get_shape().as_list(),logits.get_shape().as_list() cross_entropy …
Category: Data Science

In a Time Series Problem, is it possible to forecast quantities by learning the patterns of other items? What are my options?

Suppose I own a store that sells a variety of apples and I have the following stats each month. Report Date Type of Apple (TA) Quantity Available(QA) Quantity Sold in the Past 30 days(QS30) Quantity Shipping In (QSI) Quantity Needed to Order(QN) Lets make the following assumptions/givens: There are three types of apples: red apples, green apples and yellow apples. T(1) denotes the first month and T(60) denotes the 60th month. QA @ T(i + 1) = QA@T(i) + QSI@T(i) …
Category: Data Science

NLP Deep Learning Project (or Paper)

Kind of trying to find a good repos/course/paper or something that can get me up to speed for an NLP problematic. Exemple, Classify email or something else. Also that is relevant to latest state of art (transformers, multi heads etc). Thank you
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.