Learning parameters when loss is a piecewise function
I have a network to generate a single number $T$. I know in advance: a property of the loss function is that, when $T \in [a_1, a_2]$, the loss has the same value $L_1$; when $T \in [a_2, a_3]$, the loss has another value $L_2$; etc. The loss function resembles a piecewise function.
A concrete, simplified example of this problem is perhaps something like object classification. I have a set of objects, and their distances to a category $C$ that I want to classify these objects into. The distances are $[d_1, d_2, \dots, d_K]$. Assume without loss of generality that $d_1 \leq d_2 \leq \dots \leq d_K$. I want to learn a threshold $T$ for these objects that says, if the distance is near enough, then they belong to the category $C$; otherwise, they are not members of the category. For example, if $d_3 \leq T \leq d_4$, then objects $1, 2$ and $3$ (with distances $d_1, d_2, d_3$) belong to $C$.
What learning techniques I may use to learn weights of the network? Any help will be greatly appreciated.
I will next combine the above network with other differentiable learning components, so ideally it would be good if the approach is compatible with gradient descent.
Topic gradient-descent machine-learning
Category Data Science