mini-batch-gradient-descent

Can we use decreasing step size to replace mini-batch in SGD?

coolcat

2022年5月22日 14:01

As far as I know, mini-batch can be used to reduce the variance of the gradient, but I am also considering if we can achieve the same result if we use the decreasing step size and only single sample in each iteration? Can we compare the convergence rate of them?

Topic: mini-batch-gradient-descent gradient-descent optimization machine-learning

Category: Data Science

What will happen if we train a model on a dataset sorted by class

Abdulwahab Almestekawy

2022年5月16日 18:04

Suppose we have a dataset of two classes (0 and 1) divided into over 12k mini-batches where the first half of the dataset (over 6k mini-batches) belong to class 0, and the other half belongs to class 1. What will happen if a model is trained on this dataset without shuffling the samples?

Topic: mini-batch-gradient-descent training

Category: Data Science

Online vs minibatch training for speed

StatsSorceress

2022年5月7日 06:04

If I do online learning in a setting where I have a HUGE amount of data, is that faster than doing minibatch learning (even if I optimize my batch size for GPU use, that is, use a multiple of 32 examples per minibatch)? Details: I have 12600 time series examples, each with 24 time steps, and each time step has 972196 binary labels. This is a multilabel problem. Assuming float32 numbers: loading the entire dataset should take about 1095 GB …

Topic: mini-batch-gradient-descent online-learning deep-learning machine-learning

Category: Data Science

What is the technical name for a "batch element" in ML?

user2585501

2022年5月5日 13:47

What is the technical name for a "batch element" in machine learning? Given a batch of data (size: batchSize*numberOfFeatures), what is the technical name used to refer to an element within the batch (data[batchElementIndex,:])?

Topic: mini-batch-gradient-descent machine-learning

Category: Data Science

How are session-parallel mini-batches used for training RNNs for session-based recommender tasks?

Artun Boz

2022年4月18日 13:17

I am reading this paper on session-based recommenders with RNNs: https://arxiv.org/abs/1511.06939. During the training phase, the authors apply what they call "session-parallel mini-batches," as depicted in the image below: What is not clear to me is how they take items from different sessions, and feed them into the network while maintaining separate hidden states for each session. The explanation that I could come up with is maintaining as many networks as the number of parallel sessions, and use one network …

Topic: mini-batch-gradient-descent rnn recommender-system

Category: Data Science

Vowpal Wabbit Online Normalization -- Possible to parallelize?

JC1

2022年3月25日 14:33

Vowpal Wabbit (VW) uses online normalization as explained here [1]. When running VW with multiple workers, workers synchronize their models with an AllReduce at the end of each epoch. Is it possible or is there any code/paper that explores the idea of doing online learning with multiple workers in a parameter server setting? [1] https://arxiv.org/abs/1305.6646

Topic: vowpal-wabbit mini-batch-gradient-descent gradient-descent machine-learning

Category: Data Science

Why does neural network need loss as scalar?

rakeshKM

2022年3月24日 11:04

I have a loss function that's a weighted cross entropy loss for binary classification def BinaryCrossEntropy_weighted( y_true, y_pred, class_weight ): y_true= y_true.astype(np.float) y_pred = K.clip(y_pred, K.epsilon(), 1 - K.epsilon()) first_term = class_weight[1] * (y_true) * K.log(y_pred + K.epsilon()) second_term = class_weight[0] * (1.0 -y_true) * K.log(1.0 - y_pred + K.epsilon()) loss = -K.mean(first_term + second_term, axis=0) return loss And when I run this loss=BinaryCrossEntropy_weighted( np.array(y),np.array(predict), class_weight ) I got output <tf.Tensor: shape=(1,), dtype=float64, numpy=array([0.16916199])> If one can observe carefully, can …

Topic: mini-batch-gradient-descent convolutional-neural-network tensorflow loss-function gradient-descent

Category: Data Science

Should mini-batches contain an even mix of classes or can this be random?

TheyTakingTheHobbitsToIsengard

2022年3月5日 10:21

I'm creating mini-batches to put into a CNN. Is it best to try and get an even mix of classes into each mini-batch (Scenario 1), or can this/should this be a random assortment of my classes (Scenario 2)? Scenario 1: I have 2 classes and a mini-batch size of 32. I should try and have 16 samples from each class in each mini-batch. Scenario 2: Same as 1, but I have a random distribution of samples in each mini-batch. So …

Topic: mini-batch-gradient-descent deep-learning neural-network

Category: Data Science

Short term memory for online/incremenetal training a linear model

jumbodrawn

2022年2月13日 21:20

I am trying to make a linear model that predicts user preferences that can be trained in mini batches so that it can be trained incrementally. I think sklearn's partial fit function would work well for this, allowing me to train the linear model as the data comes in gradually. The question I have is whether it is possible to have the model gradually forget the data it was trained on in the past? For example, if for a few …

Topic: mini-batch-gradient-descent linear-regression recommender-system

Category: Data Science

Why are mini-batches degrading my conv net MNIST classifier?

dontloseyourgoalie

2022年1月28日 13:08

I have made a convolutional neural network from scratch in python to classify the MNIST handwritten digits (centralized). It is composed of a single convolutional network with 8 3x3 kernels, a 2x2 maxpool layer and a 10 node dense layer with softmax as the activation function. I am using cross entropy loss and SGD. When I train the network on the whole training set for a single epoch with a batch size of 1, I get 95% accuracy. However, when …

Topic: mini-batch-gradient-descent convolutional-neural-network gradient-descent neural-network

Category: Data Science

Hidden state dimensions in Pytorch LSTM

pathfinder

2022年1月19日 08:41

Please read the question completely before you mark it as duplicate I was trying to understand the syntax of using an LSTM in PyTorch. I came across the following in PyTorch docs. h_0: tensor of shape $(D * \text{num_layers}, N, H_{out})$ containing the initial hidden state for each element in the batch. Defaults to zeros if (h_0, c_0) is not provided. where: \begin{aligned} N ={} & \text{batch size} \\ L ={} & \text{sequence length} \\ D ={} & 2 \text{ …

Topic: mini-batch-gradient-descent pytorch lstm

Category: Data Science

Tuning Batch size and Learning rate in neural net

Suvra Dutta

2021年11月19日 16:15

The following MCQ question is provided in "Exam Readiness: AWS Certified Machine Learning - Specialty" document. The correct answer has been marked in the document but I am not able to understand why this option is correct. Question: "A data scientist is working on optimizing a model during the training process by varying multiple parameters. The data scientist observes that, during multiple runs with identical parameters, the loss function converges to different, yet stable, values. What should the data scientist …

Topic: hyperparameter-tuning mini-batch-gradient-descent learning-rate deep-learning

Category: Data Science

RMSprop in weight update - what if vertical slopes small and horizontal slopes large?

Lam TRINH Thanh

2021年9月15日 15:54

I have a question regarding the intuition behind RMSprop, As shown in the lecture video of Deep Learning Specialization by Andrew Ng, RMSprop helps to reduce the oscillation (the values of the vertical slope b as in the example figure), and speed up the convergence at the minima point through stepping long horizontal axis, This is achieved by update our weights as: $$w:= w - \frac{d_{w}}{\sqrt{S_{dw}}}$$ $$b:= b - \frac{d_{b}}{\sqrt{S_{db}}}$$ So, if initially $W$ is small so $\sqrt{S_{dw}}$ is small, …

Topic: mini-batch-gradient-descent gradient-descent deep-learning optimization

Category: Data Science

Minibatches when training on two datasets of different size

becko

2021年9月13日 19:50

Suppose I have two datasets, $X$ and $Y$, of different sizes. I am training two networks together, one which takes inputs $x\in X$, and the other takes inputs $y\in Y$. The two networks share parameters and therefore are trained together. Are there some guidelines on how to chose the batch-sizes for the samples from $X$ vs. those from $Y$? That is, should the the batches from $X$ have the same size as the batches from $Y$? In general, the two …

Topic: mini-batch-gradient-descent

Category: Data Science

Sequential batch processing vs parallel batch processing?

Adex

2021年9月13日 12:02

In deep learning based model training, in general batch of inputs are passed. For example for training a deep learning model with [512] dimensional input feature vector, say for batch size= 4, we mainly pass [4,512] dimenional input. I am curious what are the logical significance of passing the same input after flattening the input across the batch and channel dimenions [2048]. Logically the locality structure will be destroyed but will it significanlty speed up my implementation? And can it …

Topic: mini-batch-gradient-descent mathematics cnn batch-normalization deep-learning

Category: Data Science

Minibatch SGD performs better than Adam for Region proposal network training

Abhisek Dash

2021年8月20日 15:01

I am using both minibatch SGD (with momentum) and Adam for training a region proposal network. The library used is KERAS. The batch size in both cases is 5 and initial learning rate is 0.01. The learning rate decay schedule is also same for both optimizers The rpn classification loss steadily reduces in case of SGD with momentum but diverges in case of Adam. The performance of SGD with momentum is noticeably better after about 500 epochs Given that everything …

Topic: object-detection mini-batch-gradient-descent keras deep-learning machine-learning

Category: Data Science

How do i get the loss function graph?

xstrx

2021年8月17日 17:05

I used Mini-batch gradient descent to train the model, but i am unable to get the proper loss graph. The loss graph is always showed as a straight line. I know there is something wrong but would anyone be able to guide me from sklearn import metrics error = [] for epoch in range(epochs): for i in range(0,x_train.shape[0],minibatch_size): x_mini = x_train[i:i + minibatch_size-1,:] y_mini = y_train[i:i + minibatch_size-1,:] #feed forward #layer 1 in1 = x_mini@w1 + b1 out1 = sigmoid(in1) …

Topic: matplotlib mini-batch-gradient-descent gradient-descent deep-learning python

Category: Data Science

Can the 'Rainbow Algorithm' be scaled up and sped up?

Ant

2021年7月29日 15:49

What's the proper way to train the algorithm with bigger batches or otherwise speed it up? The 'Rainbow Algorithm' is a Deep Q, Reinforcement Learning algorithm with two neural networks that I would like to speed up or scale up during training. You can read the paper here. Training is fairly slow because the observations have to be converted to tensors and updated to the model after each step. It's kind of a special and unique model, so I hope …

Topic: mini-batch-gradient-descent pytorch algorithms

Category: Data Science

How backpropagation through gradient descent represents the error after each forward pass

Katherine

2021年7月9日 12:10

In Neural NEtwork Multilayer Perceptron, I understand that the main difference between Stochastic Gradient Descent (SGD) vs Gradient Descent (GD) lies in the way of how many samples are chosen while training. That is, SGD iteratively chooses one sample to perform forward pass followed by backpropagation to adjust the weights, as oppose to GD where the backpropagation starts only after the entire samples have been calculated in the forward pass). My question is: When the Gradient Descent (or even mini-batch …

Topic: mini-batch-gradient-descent gradient-descent scikit-learn neural-network machine-learning

Category: Data Science

Compare rate of change for multiple object/weights

Arun

2021年7月5日 10:25

For a Neural Network, the weight update equation is: However, there are millions of such weights W_i. If I am interested in capturing how much each weight/connection W_i is changing as compared to other weights, I am using the absolute magnitude of gradient summation for each weight W_i: where you are summing the absolute magnitude of gradients for the entirety of 'k' training iterations. number of training iterations (k) = train dataset size / batch size. After computing this summation …

Topic: mini-batch-gradient-descent gradient-descent neural-network

Category: Data Science

About