Difference between loss and cost function in the specific context of MAE in multiple-regression?

I've often met with the Mean Absolute Error loss function when dealing with regression problems in Artificial Neural Networks, but I'm still slightly confused about the difference between the word 'loss' and 'cost' function in this context.

I understand that the 'cost' function is an average of the 'loss' functions, for instance when dealing with mini-batches. The loss is a single value for a single sample in the mini-batch, and the cost is the mean of summed up losses over the whole mini-batch.

However, consider a multiple-regression problem where the output of the network is a vector of 5 values, and the true label is also a vector of 5 values. Would, in this context, the loss function still be labeled as 'loss' or would it now be the 'cost' function? Since we have to compute the absolute error sample wise, then sum it and take the mean, but we have to also do the same for every sample in the mini-batch.

Topic cost-function loss-function neural-network

Category Data Science


You've mixed up two concepts. In your example of the output having multiple dimensions, you have to have some way to measure the distance by which your prediction misses the true value; call these $d_i$. The loss function would then be a function of all of the $d_i$ values. You are free to use absolute differences for either, squared differences for either, or mix the two in any order.

For example, let the true values be $\{(1, 2, 1), (2, 5, 0), (-2, 3, 3)\}$ and the predictions be $\{(4, 1, 1), (2, 4, 1), (0, 4, 2)\}$.

First, calculate the distance between the pairs of points, using some distance you find interesting, such as $L1$ or $L2$.

$$ d_{1, L1}\bigg((1, 2, 1), (4, 1, 1)\bigg) = \vert 1-4\vert + \vert 2-1\vert + \vert 1-1\vert=4\\ d_{1, L2}\bigg((1, 2, 1), (4, 1, 1)\bigg) = \sqrt{(1-4)^2 + (2-1)^2 + (1-1)^2}=\sqrt{10} $$

In this multivariate setting, these distance values are analogous to residuals in a simple linear regression, so stick those "residuals" in the loss function. You might elect to square all of the $d_i$ values (for either $L1$ or $L2$, add those squared values, and take the square root (square loss), or you might elect to take the absolute values of those $d_i$ values and add those absolute values. You can do either with $L1$ or $L2$ distances (or some other distance) in the previous step.

Addressing "loss" vs "cost" function, those terms are loose. It is fair to separate the function you aim to optimize in the model training and the profit/loss that takes into account the consequences of being wrong, such as giving an unneeded treatment to a patient you have incorrectly diagnosed and causing them annoyance of having to go to the pharmacy vs withholding life-saving treatment from a patient with a disease but you missed it. The function you would aim to optimize in model training would (most likely) be crosentropy loss ("log loss" in some circles), and then your impression about the relative costs of misdiagnosis coming into play later.

Depending on your reference, people will not be consistent with which they call what, and some even discuss an "objective" function.


I'm not very knowledgeable about the formal distinction between loss and cost. As far as I know, there's not a lot of different between the two: in the context of NNs, loss is used specifically as a measure of the difference between the predicted and true value, typically on a validation set. Cost is more general, a cost function describes any way to measure performance with a "lower is better" value. Note that these terms are used in many different contexts so the specific way they are defined for a specific setting is not very important.

In the multiple-regression problem described in the question, I would say that the most important point is not use mean MAE across different target variables: unless there's information that these variables have exactly the same range, their MAE values are not comparable and therefore cannot be averaged. In general multiple regression correspond to multiple problems which should be evaluated independently. But if the specific context justifies it, it might make sense to average or calculate some ad-hoc cost function.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.