Initializing weights that are a pointwise product of multiple variables

In two-layer perceptrons that slide across words of text, such as word2vec and fastText, hidden layer heights may be a product of two random variables such as positional embeddings and word embeddings (Mikolov et al. 2017, Section 2.2): $$v_c = \sum_{p\in P} d_p \odot u_{t+p}$$ However, it's unclear to me how to best initialize the two variables. When only word embeddings are used for the hidden layer weights, word2vec and fastText initialize them to $\mathcal{U}(-1 / \text{fan_out}; 1 / \text{fan_out})$. …
Category: Data Science

NNs for fitting highly oscillatory functions

in a scientific computing application of neural networks, I have to maximize several neural networks with scalar output with respect to a target/loss function (coming from a weak form of a PDE). It is known from theoretical considerations that typically the functions that would be optimal with respect to the target function (i.e. the maximizers) are extremly oscillatory functions. I suppose that this is the reason, why - according to my first numerical experiments - typical network architectures, initializations and …
Category: Data Science

Should weight distribution change more when fine-tuning transformers-based classifier?

I'm using pre-trained DistilBERT model from Huggingface with custom classification head, which is almost the same as in the reference implementation: class PretrainedTransformer(nn.Module): def __init__( self, target_classes): super().__init__() base_model_output_shape=768 self.base_model = DistilBertModel.from_pretrained("distilbert-base-uncased") self.classifier = nn.Sequential( nn.Linear(base_model_output_shape, out_features=base_model_output_shape), nn.ReLU(), nn.Dropout(0.2), nn.Linear(base_model_output_shape, out_features=target_classes), ) for layer in self.classifier: if isinstance(layer, nn.Linear): layer.weight.data.normal_(mean=0.0, std=0.02) if layer.bias is not None: layer.bias.data.zero_() def forward(self, input_, y=None): X, length, attention_mask = input_ base_output = self.base_model(X, attention_mask=attention_mask)[0] base_model_last_layer = base_output[:, 0] cls = self.classifier(base_model_last_layer) return cls During …
Category: Data Science

Update of mean and variance of weights

I'm trying to understand the Bayes by Backprop algorithm from the paper Weight Uncertainty in Neural Networks, the idea is to make a NN in which each weight has it's own probability distribution. I get the theory, but I don't undertsand how to update the mean and variance in the learning part. I found a code in Pytorch which simply does: class BayesianLinear(nn.Module): def __init__(self, in_features, out_features): (...) # Weight parameters self.weight_mu = nn.Parameter(torch.Tensor(out_features, in_features).uniform_(-0.2, 0.2)) self.weight_rho = nn.Parameter(torch.Tensor(out_features, in_features).uniform_(-5,-4)) …
Category: Data Science

Is it wrong to use Glorot Initialization with ReLu Activation?

I'm reading that keras' default initialization is glorot_uniform. However, all of the tutorials I see are using relu activation as the go-to for hidden layers, yet I do not see them specifying initialization for those layers as he. Would it be better for these relu layers to use he instead of glorot? As seen in OReilly's Hands-On Machine Learning with Scikit-Learn & Tensorflow: | initialization | activation | +----------------+-------------------------------+ | glorot | none, tanh, logistic, softmax | | he | …
Category: Data Science

Matrix factorization how to initialize weights and biases?

I have a matrix factorization and I'm wondering how I should initialize its weights and biases. When getting prediction (recommendation), after computing a dot product and adding bias I want to use sigmoid function on that to get value from 0 to 1. But when introducing a sigmoid here I also introduce a possibile vanishing/exploding gradient problem. For that I think that weights can be initialized with xavier function. But what aboud biases? Should I just use uniform distribution from …
Category: Data Science

Why are deep learning models unstable compare to machine learning models?

I would like to understand why deep learning models are so unstable. Suppose I use the same dataset to train a machine learning model multiple times (for example logistic regression) and a deep learning model multiple times as well (for example LSTM). After that, I compute the average of each model and its standard deviation. The standard deviation of the deep learning model will be much more higher than that of the machine learning model. why is this so? Does …
Category: Data Science

Keras model's embedding weight get NaN value

I am working on 3 categorical and 19 numerical features in which I plan to use trained embedding weights (from categorical features). After training, and get weights from embedding layers, I got NaN values. Please help me if you know problem. This is the model: def create_model(embedding1_vocab_size = 7, embedding1_dim = 3, embedding2_vocab_size = 7, embedding2_dim = 3, embedding3_vocab_size = 7, embedding3_dim = 3): embedding1_input = Input((1,)) embedding1 = Embedding(input_dim=embedding1_vocab_size, output_dim=embedding1_dim, name='embedding1')(embedding1_input) embedding2_input = Input((1,)) embedding2 = Embedding(input_dim=embedding2_vocab_size, output_dim=embedding2_dim, name='embedding2')(embedding2_input) …
Category: Data Science

Is saddle point a cause for the vanishing gradient problem

I am a beginner to neural networks and I am writing a report summarising on the causes and solutions to the vanishing gradient problem. From what I have read, the 2 main causes are the repeated multiplication of saturated activation function derivatives and repeated multiplication of large weights from bad initialisation. I tend to consider both of them as the poor choice of neural network components, leading to computational troubles. Additionally the proliferation of saddle points on the cost function …
Category: Data Science

Where Does the Normal Glorot Initialization Come from?

The famous Glorot initialization is described first in the paper Understanding the difficulty of training deep feedforward neural networks. In this paper, they derive the following uniform initialization, cf. Eq. (16) in their paper: \begin{equation} W \sim U\left[ -\frac{\sqrt{6}}{\sqrt{n_j + n_{j+1}}}, \frac{\sqrt{6}}{\sqrt{n_j + n_{j+1}}}\right]. \tag{16}\end{equation} If we take a look at the PyTorch documentation for weight initialization, then there are two Glorot (Xavier) initializations, namely torch.nn.init.xavier_uniform_(tensor, gain=1.0) and torch.nn.init.xavier_normal_(tensor, gain=1.0). According to the documentation, the initialization for the latter is …
Category: Data Science

Create weights network with randomly initialized weights for Keras Models

I work with a tool for audio feature extraction which has several layers (DenseNet, etc) for the extraction. The default is to use pre-trained imagenet weights, however I want to evaluate the performance with randomly initialized weights. I can use a path to the weight network (stored in h5), however I don't know how to create that weight network for a layer of which I do not know the exact dimensions/architecture. I know how to create randomly initialized weights for …
Category: Data Science

Why checkpoint loss is different?

I am training a Mask RCNN model in Keras. I used checkpoints to save weights so I can resume training with the last optimized values. However, the loss is different when I save the checkpoint and resume training - it was ~1.66 at checkpoint time and ~2.62 when I resumed it. I assumed since I saved weights the loss would continue to drop from the point it stopped. Could anyone explain this?
Category: Data Science

Question regarding weight initialization of an artificial neural network

This is what i'm trying to implement in Python. w0,...,w8 = vector w1 of shape (9,1) w9,...,w11 = vector w2 of shape (3,1) b0 (first bias) is of shape (3,1) b1 is of shape (1,1) vector X is of shape (99, 3) I don't know where the problem resides because when I try to forward propagate, I get the not aligned error when doing the dot product since the multiplication is not possible... Is my neural network wrong ?
Category: Data Science

Shared classifier for 3 neural networks (is this weights sharing?)

I would like to create 3 different VGGs with a shared classifier. Basically, each of these architectures has only the convolutions, and then I combine all the nets, with a classifier. For a better explanation, let’s see this image: I have no idea on how to do this in Pytorch. Do you have any examples that can I study? Is this a case of weights sharing? Edit: my actual code. Do you think is correct? class VGGBlock(nn.Module): def __init__(self, in_channels, …
Category: Data Science

Bad marshal data YoLo model

I tried to run a project from repo and got the following log which, I believe, tells a problem with weights load. python3 main.py Traceback (most recent call last): File "main.py", line 66, in <module> yolo = YOLO() File "/home/matalan/venv/survalance/Traffic-Survalance/ObjectDetection.py", line 32, in __init__ self.model = self._get_model() File "/home/matalan/venv/survalance/Traffic-Survalance/ObjectDetection.py", line 39, in _get_model return load_model(self.model_path) File "/home/matalan/.local/lib/python3.8/site- packages/tensorflow/python/keras/saving/save.py", line 182, in load_model return hdf5_format.load_model_from_hdf5(filepath, custom_objects, compile) File "/home/matalan/.local/lib/python3.8/site- packages/tensorflow/python/keras/saving/hdf5_format.py", line 177, in load_model_from_hdf5 model = model_config_lib.model_from_config(model_config, File "/home/matalan/.local/lib/python3.8/site- packages/tensorflow/python/keras/saving/model_config.py", line …
Category: Data Science

CNN Design for Counting on Simple Images

This is the first CNN I'm designing following college examples and assignments. I'm working on a CNN that I'd like to use to classify images by the number of shapes on them. My basic problem is that I can't seem, to get the CNN to respond (accuracy and val_accuracy are flat) after n EPOCHS (I have varied n these along with Steps and Batch Size) The images are 98 x 150 pixels and look like this: This is 10 data …
Category: Data Science

How to force a NN to ouput the same output given a reverse input?

I want to choose an architecture that can deal with an input symmetry. As input, I have a sequence of zeros and ones, like [1, 1, 1, 0, 1, 0] and at the output layer I have N neurons that outputs a categorical distribution like [0.3, 0.4, 0.3]. How can I force a NN to ouput the same distribution when I feed its reverse copy, i.e [1, 1, 1, 0, 1, 0]? A simple way just to learn twice: feed …
Category: Data Science

class weights formula for imbalanced dataset

I am trying to make some semantic segmentation. I have 7 imbalanced classes in my case. I found several methods for handling Class Imbalance in a dataset is to perform Undersampling for the Majority Classes or Oversampling for the minority classes. but the most used one is introducing weights in the Loss Function. And I found several formula to calculate weights such us: wj=n_samples / (n_classes * n_samplesj) or wj=1/n_samplesj which is the best one?
Category: Data Science

Mathematical bias and weight vs machine learning bias and weight

I am a little confused about the term Bias and Weight with respect to machine learning. Say we want to predict the heights of people whose weights are given. So plot weights to x-axis and height to yaxis. To find out the linear relationship between height and weight we draw a straight line that shows the relationship between height and weight. Using the equation for a line, you could write down this relationship $y= mx+b ...i)$ more specifically in the …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.