Someone says that accuracy has no relationship to the loss, but from a theoretical perspective, there IS a relationship.
Accuracy is $1 - (error\ rate)$ and the error rate can be seen as the expectation of the 0-1 loss:
\begin{equation}
l_{01}(f(x), y) :=
\begin{cases}
0 & (f(x) = y) \\
1 & (f(x) \neq y)
\end{cases}
\end{equation}
\begin{equation}
error\ rate = \mathbb{E}_{x, y} \left[ l_{01}(f(x), y) \right]
\end{equation}
where $f$ is the model, $x$ is its input and $y$ is the ground truth label for $x$.
In order to maximize the accuracy, we want to minimize the error rate.
However, due to the incontinuity of the 0-1 loss, it is practically impossible. Instead, a variety of "surrogate loss" is used. The surrogate loss function $l$ is required to have some properties:
- $l$ is continuous.
- $l$ is convex.
- $l$ bounds $l_{01}$ from above.
Surrogate losses with these properties allow us to minimize them via the well-known gradient descent algorithm.
Popular classes of those surrogate losses include the hinge loss that is used in support vector machine (SVM) and the logistic loss that is used in logistic regression and standard neural networks.
So, from a theoretical viewpoint, the accuracy and the loss displayed in every epoch of your training have some relationship. That is,
- Accuracy has a direct connection with the error rate, which we want to minimize in the training.
- Loss (usually the cross entropy loss, which is equivalent to the logistic loss in a sense) is a surrogate loss that bounds the error rate.