I am trying to assign costs to the confusion matrix. That is, in my problem, a FP does not have the same cost as a FN, so I want to assign to these cases a cost "x" so that the algorithm learns based on those costs. I will explain my case a little more with an example: When we want to detect credit card fraud, it does not have the same cost to predict that it is not fraud when …
I'm building a CNN to make a binary classification (1 or zero). For this, I'm using the cost function sigmoid_cross_entropy_with_logits. But for some reason, the cost using this function is never equal to zero even if the prediction is equal to the correct valuel. I tried plotting the output using the formula on TensorFlow's website: https://www.tensorflow.org/api_docs/python/tf/nn/sigmoid_cross_entropy_with_logits This formula: max(x, 0) - x * z + log(1 + exp(-abs(x))) And by making this plot, I realized that it really isn't zero …
I'm very new to writing cost functions for optimization and I have what may be a basic question or just a misinterpretation. I have multiple cost functions that I'd like to add up into one total cost function. Here is a simplified example: Say I want to maximize the bounciness $b$ of a bouncy ball while minimizing its weight $w$. The weight value goes into a function that computes the bounciness but we don't know what the function looks like. …
I've often met with the Mean Absolute Error loss function when dealing with regression problems in Artificial Neural Networks, but I'm still slightly confused about the difference between the word 'loss' and 'cost' function in this context. I understand that the 'cost' function is an average of the 'loss' functions, for instance when dealing with mini-batches. The loss is a single value for a single sample in the mini-batch, and the cost is the mean of summed up losses over …
I'm not an expert in the AI topic but for my underlying problem I need to find a function which rates data samples based on a specific value x. This means that based on the output of the function it should be determined, whether the data example is a good one or not. The score(=y of function) should be between 0 and 1. The rules I need to follow for the rating are the following: x should never be below …
In the case of a classification problem where a cost matrix is used to maximize the model performance, it is common to do a rebalance technique. Let's say for example that I have the following costs for the two classes. C(a,a) = 0, C(b,b) = 0, C(a,b) = 2, C(b,a) = 1. Then, with a Rebalancing technique, I would need examples of class b twice as the examples of class a. But, what should by rebalancing strategy will be when …
Why is the regularization parameter not applied to the intercept parameter? From what I have read about the cost functions for Linear and Logistic regression, the regularization parameter (λ) is applied to all terms except the intercept. For example, here are the cost functions for linear and logistic regression respectively (Notice that j starts from 1):
I want to ask a fairly simple question I think. I have a deep background in pure mathematics, so I don't have too much trouble understanding the mathematics of the cost function, but I would just like to clarify what exactly the cost function is in a neural network in practice (i.e. implementing it on real datasets). Given a fixed training sample, we can view the cost function as a function of the weights and biases, and thus optimizing this …
I have a problem where I want to minimize the monetary cost associated with the prediction error (Mean Error, ME) from the feature I want to predict. The monetary cost is calculated by multiplying ME with the cost tensor. Therefore I want to use a custom conditional loss function that uses one type of loss (MAE) if the loss is above a threshold and returns another loss if it is below the threshold. The second loss is just the mean …
I would like to find optimal combination of parameters for the algorithm affecting the disk space used by some storage. Therefore, several algorithm parameters (x1, x2, x3, where 0 < x1 < 1, 10 < x2 < 100, 0.1 < x3 < 0.5) are used as an input for the model, and the disk space occupied by storage S(x1, x2, x3) is the cost function I'd like to minimize. The problem is that every function call S(x1, x2, x3) is …
I am working with linear regression and I would like to know the Time complexity in big-O notation. The cost function of linear regression without an optimisation algorithm (such as Gradient descent) needs to be computed over iterations of the weight combinations (as a brute force approach). This makes computation time dependent on the number of weights and obviously on the number of training data. If $n$ is the number of training data, $W$ is the number of weights and …
In the formula below, could one understand $y^{(i)}$ as $y_i$ ? If not, what is the fundamental difference ? $$ j(\theta_0, \theta_1) = \frac{1}{2m}\sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})^2 $$
As we all know the cost function for linear regression is: Where as when we use Ridge Regression we simply add lambda*slope**2 but there I always seee the below as cost function of linear Regression where it's not divided by the number of records.: So I just want to knows what's the correct cost function, Ik both are correct but while ding Ridge or Lasso why we ignore the division part?
So firstly I have a network that I'm using to approximate the value of a function. Recently, at about 50000 trains, it began to show no further advancement in training, at any learning rate. The question is what design or training flaw could this be a symptom of ? To track progress after I train each value on each epoch, I run all the inputs from the training data through the model immediately after each individual backpropagation. I subtract the …
Is the Cross entropy cost function defined as $J(\Theta) = -\frac{1}{m}\sum_{i=1}^{m}\sum_{k=1}^{K}y_{k}^{(i)}log(\hat{p}_{k}^{(i)})$ the same as the one implemented in sklearn.metrics.log_loss? If not, what's the difference between them? $m=\text{number of samples}$ $K=\text{number of classes}$
I am implementing the cost function for logistic regression and have a question. The formulation for cost function is $J = -\frac{1}{m}\sum_{i=1}^{m}(y^{(i)}\log(a^{(i)})+(1-y^{(i)})\log(1-a^{(i)}))$ So in python I code the function as follow: cost = -1/m*np.sum(np.dot(Y.T,np.log(A))+ np.dot((1-Y.T),np.log(1-A))) # m=3 however, if I interchange the order of elements in the np.dot as below: cost = -1/m * np.sum(np.dot(np.log(A), Y.T) + np.dot(np.log(1-A), (1-Y.T))) Then the outcomes are different. I don't understand why one code is correct and why the other is not correct. Can …
I have a very basic question which relates to Python, numpy and multiplication of matrices in the setting of logistic regression. First, let me apologise for not using math notation. I am confused about the use of matrix dot multiplication versus element wise pultiplication. The cost function is given by: $J = - {1\over m} \sum_{i=1}^m y^{(i)}log(a^{(i)})+(1 - y^{(i)})log(1-a^{(i)})$ And in python I have written this as cost = -1/m * np.sum(Y * np.log(A) + (1-Y) * (np.log(1-A))) But for …
I am trying to build a binary classifier to predict a pulsar star with Single Hidden layer Neural Network. But the cost on training dataset after almost 100 iterations has no change, following is the implementation with python numpy. import os import csv import numpy as np def load_dataset(file): with open(file, 'r') as work_file: reader = list(csv.reader(work_file)) total = len(reader) train_set = reader[:round(total * 0.8)] val_set = reader[:round(total * 0.2)] features = len(train_set[0][:8]) x_train = np.zeros((len(train_set), features)) y_train = np.zeros((len(train_set), …