I have run a lighgbm regression model by optimizing on RMSE and measuring the performance on RMSE: model = LGBMRegressor(objective="regression", n_estimators=500, n_jobs=8) model.fit(X_train, y_train, eval_metric="rmse", eval_set=[(X_train, y_train), (X_test, y_test)], early_stopping_rounds=20) The model keeps improving during the 500 iterations. Here are the performances I obtain on MAE: MAE on train : 1.080571 MAE on test : 1.258383 But the metric I'm really interested in is MAE, so I decided to optimize it directly (and choose it as the evaluation metric): model …
I have been thinking about the following the problem: Given some task we assume there is a magical function that perfectly solve this task. For example, if we want to distinguish cats and dogs, then we can train neural network that hopefully converges over time to a function "similar" to our magical function. The problem is now: How can we help/encourage our network to converge to a good/better function? In theory a single layer + a non-linearity can be enough, …
When creating a multi-objective optimisation/MCDM algorithm such as NSGA-ii, does it make sense to use a deep neural network trained on a supervised tabular regression prediction task, in place of a simple equation for the objective function? Is possible or advantageous to replace a nonlinear equation with model.predict() function in Keras to be able to model more complex objective functions? I am using pymoo with nsga-ii
I'm not an expert in the AI topic but for my underlying problem I need to find a function which rates data samples based on a specific value x. This means that based on the output of the function it should be determined, whether the data example is a good one or not. The score(=y of function) should be between 0 and 1. The rules I need to follow for the rating are the following: x should never be below …
Let's say we want to predict the probability of rain. So just the binary case: rain or no rain. In many cases it makes sense to have this in the [5%, 95%] interval. And for many applications this will be enough. And it is actually desired to make the classifier not too confident. Hence cross entropy (CE) is chosen: $$H_{y'} (y) := - \sum_{i} y_{i}' \log (y_i)$$ But cross entropy practically makes it very hard for the classifier to learn …
I implemented a custom objective and metric for a xgboost regression. In order to see if I'm doing this correctly, I started with a quadratic loss. The implementation seems to work well, but I cannot reproduce the results from a standard "reg:squarederror" objective. Question: I wonder if my current approach is correct (especially the implementation of the first and second order gradient)? If so, what could be a possible reason for the difference? Gradient and Hessian are defined as: grad …
I am using a genetic algorithm to maximize a few hundred thousand real-valued variables. Each of the variables, $x_i$, has its own independent boundary condition. The fitness function uses each of these variables to compute another value and returns the sum of everything: $$fitness = g(x_1) + g(x_2) + g(x_3) \ + \ ...$$ This is taking incredibly long in python. In this situation, what do I gain by maximizing all values at the same time, i.e. using the genetic …
I have some candidate items that I want to choose a subset of them that maximize an objective function. I don't know what is the target, or which subset is really best according to my objective function, is it possible to implement this problem with Neural Network? for clarification, imagine I have N candidate items, each of them has a score, I want a subset of them that has maximum score among all possible subsets.
I am trying to formulate a problem where we are trying to minimize the average resource allocated to different users. Due to some inherent properties of the environment, some users can be easily minimized while it is difficult for other users due to which a fairness issue arises. While the main objective is to minimize the average resource consumed by all the users, I also want to ensure that the allocation is fair so the variance of the resource allocation …
I would like to approximate a function $f(\cdot)$ by means of a neural network given a finite set of observations $f(x_i)$ where $x_i\in\mathbb{R}^n$ and $i=1\dots,N$. However, I have some prior knowledge on how this function should behave, for example that it is monotonic in the first coordinate. Are there methodologies accounting for this type of shape constraints when training a (D)NN?
I am working on a binary classifier using LightGBM. I try to see the results of the classifiers when changing the costs of false positives and false negatives, still working on the same training and validating datasets. As I want to have probabilities as a result of my modelling, I use isotonic regression as a final part of the pipeline. Applying exactly the same methodology and code, but only changing those variables of customized objective function, I can see that …
I am reading these two pages: xgboost documentation Post on evaluation metrics I have a dataset where I am trying to predict future spend at the user level. A lot of our spend comes from large spenders, outliers. So, we care about them. I am using XGBoost. I have tried xgboost with objective reg:squarederror. This tended to underpredict a little. I then tried with reg:squaredlogerror and this resulted in predictions that under predict by much more than just using squarederror. …
I am training an XGBoost model and as I care the most about resulting probabilities, not classification itself I have chosen Brier score as a metric for my model, so that probabilities would be well calibrated. I tuned my hyperparameters using GridSearchCV and brier_score_loss as a metric. Here's an example of a tuning step: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=0) cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=123) model = XGBClassifier(learning_rate=0.1, n_estimators=200, gamma=0, subsample=0.8, colsample_bytree=0.8, scale_pos_weight=1, verbosity=1, seed=0) parameters …
The objective function () = [1∑=1Lossℎ(()⋅())]+2‖‖2 where Lossℎ()=max{0,1−} is the hinge loss function, ((),()) with for =1,… are the training examples, with ()∈{1,−1} being the label for the vector (). how to find the sgd with respect to theta for when ⋅≤1 is it y*x and is it 0 when ⋅>1
I'm new to optimization problems. I want to find optimum values for my objective function. You can imagine my function as E = f(t1, t2, t3). I want to minimize E and following constraints limit the variables: 1- 0 < t1, t2, t3 < 255 2- t1, t2, and t3 be less as possible 3- being zero for one of this parameters is more important than overall values be low. for example (80, 150, 0) is better than (140, 50, …
Let's say we have a regular photo and three low-light photos illuminated in different colors. Each pixel is a three-component vector $q=(R,G,B)$. Then $q_k^{A}$ is the $k$-th pixel of the regular photo and $q_k^{B}$ $q_k^{C}$ $q_k^{D}$ be the $k$-th pixel of the three low-light versions. The task is to reconstruct the regular photo from the three low-light photos where: $q_k^{A} = F^{A}q_k^{B} + F^{C}q_k^{C} + F^{D}q_k^{D} + q_{const}$. Clearly, $F^A, F^{B}, F^{C}$ are $3 \times 3$ matrices and $q_{const}$ is …
I know that the first degree of the polynomial equation is considered as a linear function. But, I found some things confusing in linear regression. 1. f(x)= w1 x1+ w2 x2 + W3 x3 --> linear function 2. f(x)= w1 x1+ w2 x2 + W3 x1 x3 --> is it linear? if not, then why? 3. f(x)= w1 x1+ w2 x2 + W3 W4 x3 --> is it linear? if not, then why? 4. f(x)= w1 x1+ w2 x2 + …
In this article, the author talks about how deeplearning models no longer are trained for an objective function that humans specify, but find their own objective function. Specifically, he is talking about GANs. Is there a good resource explaining this idea that GANs find their own objective function? Based on what I've read about GANs, I don't think of them this way.
For example suppose I've data set which looks like: [[x,y,z], [1,2,5], [2,3,8], [4,5,14]] It's easy to find the theta parameters from those tiny data set. Which is theta = [1,2,0] z = 1*x + 2*y + 0 But if my data set are non linear. Suppose: [[x,y,z], [1,2,6], [2,3,15]]] If i choose the mapping function to be of: z = xy+yy It would return the theta parameter : theta = [1,1,0] So my deal is how to choose such mapping …