Dimensionality reduction of vectors with null values

I have vectors of same length where each entry can have the value 0, 1 or null.

V = {[0,1,1,1,null,0], [null,1,0,null,0,1], ...}

How can I perform a dimensionality reduction of these vectors into a lower dimensional space (in this case 2d)?

Topic vector-space-models missing-data dimensionality-reduction

Category Data Science


You have several options:

  • Drop rows that have null values.

  • Impute the null values.

  • Pick a dimensionality reduction algorithm that can handle null values. One example is NIPALS (Nonlinear Iterative Partial Least Squares) algorithm. That algorithm is discussed in "Multivariate Analysis of Quality: An Introduction" by Martens and Martens


This is a data wrangling problem where you will need to experiment and it's even better if you know your data.

  • If you suspect that the null means 0 but the user just omitted it then replace it for zero.
  • If you can work with negative numbers replace the nulls with -1 as Nicolas mentioned unless -1 is a value that your numbers reach naturally.
  • If these nulls mean something important for your dataset you can create another column B where column A is null.
  • Another thing I can think of is that if you have categorical values, then you can one-hot-encode these columns.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.