weight-initialization

Initializing weights that are a pointwise product of multiple variables

Witiko

2022年5月21日 17:05

In two-layer perceptrons that slide across words of text, such as word2vec and fastText, hidden layer heights may be a product of two random variables such as positional embeddings and word embeddings (Mikolov et al. 2017, Section 2.2): $$v_c = \sum_{p\in P} d_p \odot u_{t+p}$$ However, it's unclear to me how to best initialize the two variables. When only word embeddings are used for the hidden layer weights, word2vec and fastText initialize them to $\mathcal{U}(-1 / \text{fan_out}; 1 / \text{fan_out})$. …

Topic: fasttext weight-initialization word2vec word-embeddings nlp

Category: Data Science

NNs for fitting highly oscillatory functions

PM25

2022年5月18日 18:33

in a scientific computing application of neural networks, I have to maximize several neural networks with scalar output with respect to a target/loss function (coming from a weak form of a PDE). It is known from theoretical considerations that typically the functions that would be optimal with respect to the target function (i.e. the maximizers) are extremly oscillatory functions. I suppose that this is the reason, why - according to my first numerical experiments - typical network architectures, initializations and …

Topic: weight-initialization training neural-network

Category: Data Science

What are the reasons for drawing initial neural network weights from the Gaussian distribution?

Stephane Bersier

2022年5月17日 08:04

Are there theoretical or empirical reasons for drawing initial weights of a multilayer perceptron from a Gaussian rather than from, say, a Cauchy distribution?

Topic: weight-initialization gaussian deep-learning neural-network machine-learning

Category: Data Science

Should weight distribution change more when fine-tuning transformers-based classifier?

Marcin Zablocki

2022年4月10日 10:02

I'm using pre-trained DistilBERT model from Huggingface with custom classification head, which is almost the same as in the reference implementation: class PretrainedTransformer(nn.Module): def __init__( self, target_classes): super().__init__() base_model_output_shape=768 self.base_model = DistilBertModel.from_pretrained("distilbert-base-uncased") self.classifier = nn.Sequential( nn.Linear(base_model_output_shape, out_features=base_model_output_shape), nn.ReLU(), nn.Dropout(0.2), nn.Linear(base_model_output_shape, out_features=target_classes), ) for layer in self.classifier: if isinstance(layer, nn.Linear): layer.weight.data.normal_(mean=0.0, std=0.02) if layer.bias is not None: layer.bias.data.zero_() def forward(self, input_, y=None): X, length, attention_mask = input_ base_output = self.base_model(X, attention_mask=attention_mask)[0] base_model_last_layer = base_output[:, 0] cls = self.classifier(base_model_last_layer) return cls During …

Topic: huggingface transformer weight-initialization pytorch historgram

Category: Data Science

Update of mean and variance of weights

Cosapocha

2022年3月12日 07:02

I'm trying to understand the Bayes by Backprop algorithm from the paper Weight Uncertainty in Neural Networks, the idea is to make a NN in which each weight has it's own probability distribution. I get the theory, but I don't undertsand how to update the mean and variance in the learning part. I found a code in Pytorch which simply does: class BayesianLinear(nn.Module): def __init__(self, in_features, out_features): (...) # Weight parameters self.weight_mu = nn.Parameter(torch.Tensor(out_features, in_features).uniform_(-0.2, 0.2)) self.weight_rho = nn.Parameter(torch.Tensor(out_features, in_features).uniform_(-5,-4)) …

Topic: weight-initialization pytorch bayesian probability neural-network

Category: Data Science

Is it wrong to use Glorot Initialization with ReLu Activation?

Kalanos

2022年3月6日 14:04

I'm reading that keras' default initialization is glorot_uniform. However, all of the tutorials I see are using relu activation as the go-to for hidden layers, yet I do not see them specifying initialization for those layers as he. Would it be better for these relu layers to use he instead of glorot? As seen in OReilly's Hands-On Machine Learning with Scikit-Learn & Tensorflow: | initialization | activation | +----------------+-------------------------------+ | glorot | none, tanh, logistic, softmax | | he | …

Topic: weight-initialization activation-function keras deep-learning neural-network

Category: Data Science

Matrix factorization how to initialize weights and biases?

Karol

2022年2月21日 17:04

I have a matrix factorization and I'm wondering how I should initialize its weights and biases. When getting prediction (recommendation), after computing a dot product and adding bias I want to use sigmoid function on that to get value from 0 to 1. But when introducing a sigmoid here I also introduce a possibile vanishing/exploding gradient problem. For that I think that weights can be initialized with xavier function. But what aboud biases? Should I just use uniform distribution from …

Topic: weight-initialization matrix-factorisation deep-learning neural-network machine-learning

Category: Data Science

Why are deep learning models unstable compare to machine learning models?

Kyv

2021年11月25日 13:54

I would like to understand why deep learning models are so unstable. Suppose I use the same dataset to train a machine learning model multiple times (for example logistic regression) and a deep learning model multiple times as well (for example LSTM). After that, I compute the average of each model and its standard deviation. The standard deviation of the deep learning model will be much more higher than that of the machine learning model. why is this so? Does …

Topic: weight-initialization cnn deep-learning logistic-regression machine-learning

Category: Data Science

Keras model's embedding weight get NaN value

Phạm Tâm

2021年9月17日 06:20

I am working on 3 categorical and 19 numerical features in which I plan to use trained embedding weights (from categorical features). After training, and get weights from embedding layers, I got NaN values. Please help me if you know problem. This is the model: def create_model(embedding1_vocab_size = 7, embedding1_dim = 3, embedding2_vocab_size = 7, embedding2_dim = 3, embedding3_vocab_size = 7, embedding3_dim = 3): embedding1_input = Input((1,)) embedding1 = Embedding(input_dim=embedding1_vocab_size, output_dim=embedding1_dim, name='embedding1')(embedding1_input) embedding2_input = Input((1,)) embedding2 = Embedding(input_dim=embedding2_vocab_size, output_dim=embedding2_dim, name='embedding2')(embedding2_input) …

Topic: weight-initialization machine-learning-model keras tensorflow python

Category: Data Science

Is saddle point a cause for the vanishing gradient problem

siegfried

2021年9月15日 02:53

I am a beginner to neural networks and I am writing a report summarising on the causes and solutions to the vanishing gradient problem. From what I have read, the 2 main causes are the repeated multiplication of saturated activation function derivatives and repeated multiplication of large weights from bad initialisation. I tend to consider both of them as the poor choice of neural network components, leading to computational troubles. Additionally the proliferation of saddle points on the cost function …

Topic: weight-initialization gradient-descent

Category: Data Science

Where Does the Normal Glorot Initialization Come from?

Hermi

2021年9月13日 08:46

The famous Glorot initialization is described first in the paper Understanding the difficulty of training deep feedforward neural networks. In this paper, they derive the following uniform initialization, cf. Eq. (16) in their paper: \begin{equation} W \sim U\left[ -\frac{\sqrt{6}}{\sqrt{n_j + n_{j+1}}}, \frac{\sqrt{6}}{\sqrt{n_j + n_{j+1}}}\right]. \tag{16}\end{equation} If we take a look at the PyTorch documentation for weight initialization, then there are two Glorot (Xavier) initializations, namely torch.nn.init.xavier_uniform_(tensor, gain=1.0) and torch.nn.init.xavier_normal_(tensor, gain=1.0). According to the documentation, the initialization for the latter is …

Topic: weight-initialization deep-learning neural-network

Category: Data Science

Create weights network with randomly initialized weights for Keras Models

Link

2021年7月5日 20:21

I work with a tool for audio feature extraction which has several layers (DenseNet, etc) for the extraction. The default is to use pre-trained imagenet weights, however I want to evaluate the performance with randomly initialized weights. I can use a path to the weight network (stored in h5), however I don't know how to create that weight network for a layer of which I do not know the exact dimensions/architecture. I know how to create randomly initialized weights for …

Topic: weight-initialization keras python

Category: Data Science

Why checkpoint loss is different?

kron_ninja

2021年6月25日 18:34

I am training a Mask RCNN model in Keras. I used checkpoints to save weights so I can resume training with the last optimized values. However, the loss is different when I save the checkpoint and resume training - it was ~1.66 at checkpoint time and ~2.62 when I resumed it. I assumed since I saved weights the loss would continue to drop from the point it stopped. Could anyone explain this?

Topic: weight-initialization keras loss-function deep-learning

Category: Data Science

Question regarding weight initialization of an artificial neural network

user

2021年6月19日 16:31

This is what i'm trying to implement in Python. w0,...,w8 = vector w1 of shape (9,1) w9,...,w11 = vector w2 of shape (3,1) b0 (first bias) is of shape (3,1) b1 is of shape (1,1) vector X is of shape (99, 3) I don't know where the problem resides because when I try to forward propagate, I get the not aligned error when doing the dot product since the multiplication is not possible... Is my neural network wrong ?

Topic: weight-initialization neural-network python

Category: Data Science

Shared classifier for 3 neural networks (is this weights sharing?)

CasellaJr

2021年6月15日 13:38

I would like to create 3 different VGGs with a shared classifier. Basically, each of these architectures has only the convolutions, and then I combine all the nets, with a classifier. For a better explanation, let’s see this image: I have no idea on how to do this in Pytorch. Do you have any examples that can I study? Is this a case of weights sharing? Edit: my actual code. Do you think is correct? class VGGBlock(nn.Module): def __init__(self, in_channels, …

Topic: weight-initialization vgg16 pytorch classifier neural-network

Category: Data Science

Bad marshal data YoLo model

Michael

2021年5月19日 21:06

I tried to run a project from repo and got the following log which, I believe, tells a problem with weights load. python3 main.py Traceback (most recent call last): File "main.py", line 66, in <module> yolo = YOLO() File "/home/matalan/venv/survalance/Traffic-Survalance/ObjectDetection.py", line 32, in __init__ self.model = self._get_model() File "/home/matalan/venv/survalance/Traffic-Survalance/ObjectDetection.py", line 39, in _get_model return load_model(self.model_path) File "/home/matalan/.local/lib/python3.8/site- packages/tensorflow/python/keras/saving/save.py", line 182, in load_model return hdf5_format.load_model_from_hdf5(filepath, custom_objects, compile) File "/home/matalan/.local/lib/python3.8/site- packages/tensorflow/python/keras/saving/hdf5_format.py", line 177, in load_model_from_hdf5 model = model_config_lib.model_from_config(model_config, File "/home/matalan/.local/lib/python3.8/site- packages/tensorflow/python/keras/saving/model_config.py", line …

Topic: weight-initialization yolo keras tensorflow

Category: Data Science

CNN Design for Counting on Simple Images

AndyC

2021年5月19日 14:26

This is the first CNN I'm designing following college examples and assignments. I'm working on a CNN that I'd like to use to classify images by the number of shapes on them. My basic problem is that I can't seem, to get the CNN to respond (accuracy and val_accuracy are flat) after n EPOCHS (I have varied n these along with Steps and Batch Size) The images are 98 x 150 pixels and look like this: This is 10 data …

Topic: weight-initialization cnn tensorflow image-classification

Category: Data Science

How to force a NN to ouput the same output given a reverse input?

Kenenbek Arzymatov

2021年5月5日 20:05

I want to choose an architecture that can deal with an input symmetry. As input, I have a sequence of zeros and ones, like [1, 1, 1, 0, 1, 0] and at the output layer I have N neurons that outputs a categorical distribution like [0.3, 0.4, 0.3]. How can I force a NN to ouput the same distribution when I feed its reverse copy, i.e [1, 1, 1, 0, 1, 0]? A simple way just to learn twice: feed …

Topic: weight-initialization machine-learning-model deep-learning

Category: Data Science

class weights formula for imbalanced dataset

safa

2021年5月2日 17:51

I am trying to make some semantic segmentation. I have 7 imbalanced classes in my case. I found several methods for handling Class Imbalance in a dataset is to perform Undersampling for the Majority Classes or Oversampling for the minority classes. but the most used one is introducing weights in the Loss Function. And I found several formula to calculate weights such us: wj=n_samples / (n_classes * n_samplesj) or wj=1/n_samplesj which is the best one?

Topic: weight-initialization loss-function class-imbalance deep-learning

Category: Data Science

Mathematical bias and weight vs machine learning bias and weight

Encipher

2021年3月23日 16:01

I am a little confused about the term Bias and Weight with respect to machine learning. Say we want to predict the heights of people whose weights are given. So plot weights to x-axis and height to yaxis. To find out the linear relationship between height and weight we draw a straight line that shows the relationship between height and weight. Using the equation for a line, you could write down this relationship $y= mx+b ...i)$ more specifically in the …

Topic: weight-initialization bias linear-regression predictive-modeling machine-learning

Category: Data Science

About