Truncating float/doubles for reproducibility

I deploy machine learning models (typically GPU) to a variety of environments. I work sort of at the edge of ML RD and devops, so I am really big into reproducibility, and one thing that drives me nuts is when models output similar but not byte-for-byte identical values, frustrating any hash-based automated testing. For example, here is a score from the same sample, inference model, code, container image, etc, but one is on a Titan, one is on an RTX 2080.

Titan X  = 0.9887396097183228
RTX 2080 = 0.9887396693229675

That's a relative error of 6.0e-08 or about 60 parts-per-billion. Obviously, this is the "same number", well within the weirdness of IEEE 754 and GPU processing.

I'd like to truncate the output of my algos, as this simplifies the job of automated testing, but the question then becomes, how to round to achieve this, and how much precision? Sounds simple, but take for example this note on numpy.around

Results may also be surprising due to the inexact representation of decimal fractions in the IEEE floating point standard [R9] and errors introduced when scaling by powers of ten.

One part-per-million (for typically 0.0-1.0 confidence values) seems reasonable but I'm not sure if there are any subtle gotchas with this approach.

Topic auc finite-precision cross-validation deep-learning experiments

Category Data Science


I'm not expert in this but as far as I know the proper way to test for equality modulo floating point imprecision is to compare the differences of the two values, i.e. instead of:

trunc(a) == trunc(b) 

one would do:

abs(a-b) <= epsilon

where epsilon is the constant which represents an acceptable difference, e.g. $10^{-6}$. Of course this requires to read the two values and calculate the difference, instead of simply comparing one against the other.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.