Parametrization of a model with data-sheet's sensor information or with empirical data
When working with a 3D laser sensor (LiDAR), the volumetric point density versus distance $\rho_r$ can be theoretically worked out taking into account the physical properties of the laser (number of layers, TOF, etc.). On the other hand, it can also be computed $\hat \rho_r$ from the available training data, yielding into an estimation of this quantity.
When this density vs. distance is wanted to be included in a classification problem (in the preprocessing stage as parameter) what value should be regarded? I share my concerns about the two different approaches:
Theoretical values $\rho_r$ : It will give an upper bound, as it is only taking into account the nature of the sensor. Furthermore, if only physical properties from the data-sheet are used and no statistical information about the uncertainty of the measurements are regarded, this value will be deterministic.
Empirical values $\hat \rho$ : they collect both the nature of the sensor and the scenario. A clear disadvantage is that, it will work well for one scenario and not for others. If the estimator is consistent and asymptotically unbiased, the more data the better the parametrization.
The classification task is approached with CNN. Should the empirical value be used only for specific types of scenarios/applications? If a general classification model is itended, should the theoretical value be used, or having enough variability on the training data will make the empirical value work better?
Category Data Science