Correlation/distance between sparse vectors
I am looking for a metric for comparing gene count tables. These are long columns of data (a few millions genes by a few dozen samples), with all non-negative entries, about 90% of which are zeros. The goal is to compare the performance of several tools/algorithms that these tables originate from, by comparing the resulting tables among themselves or with the expected counts (in a case of sumulates data). In principle, one compares on a sample-by-sample basis, but comparing different samples might be also of interest, e.g., to filter out spurious correlations.
What I am using now is Spearman rank coefficient, taking account for the fact that some entries have identical ranks (certainly the zeros). I am looking for an approach more adapted to this setting (and preferably robust to outliers) and will appreciate suggestions.
Topic sparse spearmans-rank-correlation distance correlation
Category Data Science