How to make XGBOOST capture trend in time series forecasting?

I am trying to forecast some sales data with monthly values, I have been trying some classical models as well ML models like XGBOOST. My data with a feature set looks like this with a length of 110 months and I am trying to forecast for next 12 months, When it comes to XGBOOST, I've been spending time mostly on hyperparameter optimization with Gridsearch and also state-of-art packages like optuna. My currently best set of parameters looks like this, parameters …
Category: Data Science

Input Signal Shape Optimization

I have a system, described by a black-box (a fully connected neural network), that takes as input a signal in time (let's say something similar to a sine wave) and returns a single scalar as output. My goal is to find the optimum signal that maximizes/minimizes the output. As constraints, the time average of the signal must be kept constant and the minimum and maximum value of the signal must be within a specific range. I wonder what kind of …
Category: Data Science

Understanding Learning Rate in depth

I am trying to understand why the learning rate does not work universally. I have two different data sets and have tested out three learning rates 0.001 ,0.01 and 0.1 . For the first data set, I was able to achieve results for all learning rates at optimization using stochastic gradient descent. For the second data set the learning rate 0.1 did not converge. I understand the logic behind it overshooting the gradients, however, I'm failing to understand why this …
Category: Data Science

Machine learning with constraints on features

I am working on a learning to rank problem. I have queries and documents related to every query which I have to rank. I used lightgbm ranker to fit the model. Some of features are very important and if they are changed the fitted model predicts a better score for that document and thus a better rank. Lets say, for a single query id, I have a group of documents d1....d5 each having features f1...fn. I change the features f1,f2,f3 …
Category: Data Science

Why is each successive tree in GBM fit on the negative gradient of the loss function?

Page 359 of Elements Of Statistical Learning 2nd edition says the below. Can someone explain the intuition & simplify it in layman terms? Questions What is the reason/intuition & math behind fitting each successive tree in GBM on the negative gradient of the loss function? Is it done to make GBM more generalization on unseen test dataset? If so how does fitting on negative gradient achieve this generalization on test data?
Category: Data Science

uncertainties in non-convex optimization problems (neural networks)

How do you treat statistical uncertainties coming from non-convex optimization problems? More specifically, suppose you have a neural network. It is well known that the loss is not convex; the optimization procedure with any approximated stochastic optimizer together with the random weights initialization introduce some randomness in the training process, translating into different "optimal" regions reached at the end of training. Now, supposing that any minimum of the loss is an acceptable solution there are no guarantees that those minima …
Category: Data Science

Is it beneficial to use a batch size > 1 even when all computing power can be used?

In regards to training a neural network, it is often said that increasing the batch size decreases the network's ability to generalize, as alluded to here. This is due to the fact that training on large batches causes the network to converge to sharp minimas, as opposed to wide ones, as explained here. This begs the question: In situations where all available computing power can be used by training on a batch size of one, is there a benefit to …
Category: Data Science

error while running lasso.py

The following is the error code generated while running lasso.py. Can anybody help in fixing the same. Here is the code: from cvxpy import * import numpy as np import cvxopt from multiprocessing import Pool # Problem data. n = 10 m = 5 A = cvxopt.normal(n,m) b = cvxopt.normal(n) gamma = Parameter(nonneg=True) # Construct the problem. x = Variable(m) objective = Minimize(sum_squares(A*x - b) + gamma*norm(x, 1)) p = Problem(objective) # Assign a value to gamma and find the …
Category: Data Science

'Solvers' in Machine Learning

What role do 'Solvers' play in optimization problems? Surprisingly, I could not find any definition for 'Solvers' online. All the sources I've referred to just explain the types of solvers & the conditions under which each one is supposed to be used. Examples of Solvers - ['Newton-cg', 'lbfgs', 'liblinear', 'sag,' 'saga']
Category: Data Science

How does the construction of a decision tree differ for different optimization metrics?

I understand how a decision tree is constructed (in the ID3 algorithm) using criterion such as entropy, gini index, and variance reduction. But the formulae for these criteria do not care about optimization metrics such as accuracy, recall, AUC, kappa, f1-score, and others. R and Python packages allow me to optimize for such metrics when I construct a decision tree. What do they do differently for each of these metrics? Where does the change happen? Is there a pattern to …
Category: Data Science

Reinforcement Learning : Why acting greedily with the optimal value function gives you the optimal policy?

The course of David Silver about Reinforcement Learning explains how you get the optimal policy from the optimal value function. It seems to be very simple, you just have to act greedily, by maximizing at each step the value function. In the case of a small grid world, once you have applied the Policy Evaluation algorithm, you get for example the following matrix for the value function : You start from the up-left corner and the unique actions are the …
Category: Data Science

Two steps optimization of a credit card limit

I have a problem similar to what is on the title but not the same. The problem on the title allows me to explain the dynamics of my need. I have to determine what the optimal value is for a variable called QUOTA or LIMIT for a credit card. The goal of the model is to allow me to minimize the probability of default, given this variable and others that characterize my costumer. What is the best way to determine …
Category: Data Science

My authors need to be able to preview their upload images and manipulate and scale

I am using a form and that form allows for IMAGE UPLOAD. After submission the form redirects and then they can see the reviewed post - however if the image which they uploaded blindly does not look good needs to be moved and scaled to best advantage how can they do this in the post. When I upload my profile pic to most apps I can place it and move it to best advantage within their mask - I want …
Category: Web

What is the most appropriate machine learning approach for this scenario?

The scenario is pretty simple, and I'm sure it's been done a million times. The problem is I don't know the terminology to find the correct resources on the web. Scenario: I have an environment that can be described in terms of 5 parameters, including and input value A and an output variable B. There is a dataset containing 100 rows and values for each parameter. The output B depends on A as well as the remaining environmental variables. The …
Category: Data Science

Determining the optimal number of clusters by elbow method

I have a dataset that consists of 700 categorical columns and around 6000 rows. I created 2-50 clusters with the k-mode algorithm and plotted the cost function to determine the optimal number of clusters. This is what the plot looks like I am unsure how determine what is the optimal number of clusters. The cost function seems to converge at 48 clusters, which seems alot considering i have only 700 categorical columns. On the other hand at 24 clusters the …
Category: Data Science

How to interpret arg min in the the following equation?

I am studying the following equation: $\hat{s}_m(n) = \text{arg}\text{min}_{s_m(n)\in A_s}|\frac{\psi_m^H}{||\psi_m^H||^2}y_m(n)-s_m(n)|^2$----(1) here $A_s$ is 1x$N$ vector of QPSK symbols, $s_m(n)$ belongs to $A_s$, $\psi_m$ is a random complex number, $y_m(n)$ is 1x$N$ vector and $n$ ranges from $1$ to $N$ and I have all these values. My query is what does arg min is signifying in this equation (1). Any help in this regard will be highly appreciated.
Category: Data Science

SGD versus Adam Optimization Clarification

Reading the Adam paper, I need some clarificaiton. It states that SGD optimization updates the parameters with the same learning rate (i.e. it does not change throughout training). They state Adam is different as learning rate is variable (adaptive), and can change during training. Is this the primary difference why Adam performs (for most cases) better than SGD? Also, it states that it is computationally cheaper, how can this be given that it seems more complex than SGD? I hope …
Category: Data Science

Defer Code in Widgets - Page Speed

How to UP Page Speed With Widget Defer? Is there a way to defer a widget in the footer? I have an external API in a footer widget which is slowing down my page. It is not needed until the page is loaded. The caching plugin I use (W3 Total Cache) gives me the option to defer other scripts, but not scripts directly coded into the widget. What is the best way to manually defer custom code API that is …
Category: Web

"Invalid value" in RMSprop implementation from scratch in Python

Edit 2: The regularization term (reg_term) is sometimes negative due negatative parameters. Hence S[f"dW{l}"] contains some negative values. I realize the reg_term has to be added before taking the sqrt, like this: S[f"dW{l}"] = beta2 * S[f"dW{l}"] + (1 - beta2) * (np.square(gradients[f"dW{l}"] + reg_term)) Edit 1: I see that S[f"dW{l}"] contains some negative values. How is this possible when np.square(gradients[f"dW{l}"] always contains positive values? I have implemented a neural network from scratch which uses mini-batch gradient descent. The network …
Category: Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.