Joining of Technical replicates with experimental data
I have a task in which I need to join data collected from non-destructive biological sensor analyses with data collected from various microbiological wet-lab methods, e.g. colony counting, on the observation/sample names, which represent various environmental conditions, for the purposes of generating machine learning models for the prediction of microbiological status based on the aforementioned sensor output.
However, I am considering how to proceed with dealing with technical duplicates/repeats, i.e. additional plates from the same biological sample, re-runs/re-evaluation of samples with sensor devices etc. What seemed like a basic problem now has me doubting the best course of action.
It is my understanding that recombining an observation in one table in either direction, e.g. A1 against multiple replicates in the other table, e.g. A1_1, A1_2, A1_3 etc. in a one-to many type of relationship is not best practice for statistical or machine learning model development, and will influence the generated model in some way, as the technical replicates haven't arisen independently, and are without a corresponding entry in the opposing table. (data leakage/pseudoreplication?)
My intuition is telling me that the variability between these technical replicates isn't of significance in my immediate use-case. Would it be appropriate to average these replications into a single tabular observation/row to allow for a 1-to-1 join?
Topic data-leakage machine-learning-model data-wrangling machine-learning
Category Data Science