Error on multitask neural nets where all outputs not observed for every example

Question

Error on multitask neural nets where all outputs not observed for every example

jamesmf

2020年2月10日 19:02

Let's say I have 2 datasets, each from a set of experiments. Dataset A measures a set of properties X for set S, while dataset B measures properties Y for set T. X and Y are highly correlated, and S and T have (not perfect) overlap. To give an example, I might have

A   |   ID   |   pKa    |  log P 
    |  cid12 |   3.51   |   1.2
    |  cid51 |   2.32   |   .9

B   |   ID   | Pol Srf A|  cLogP 
    |  cid12 |   48.5   |   1.5
    |  cid88 |   61.2   |   .9

I would like to use the shared information here (since all of these are physical properties) in a multitask neural net setup. For examples in both S and T, I have a target vector as long as X union Y. But for members exclusive to S or T, I am missing values.

In the examples, I have

 C   |   ID   |   pKa    |  log P  | Pol Srf A|  cLogP 
     |  cid12 |   3.51   |   1.2   |   48.5   |   1.5
     |  cid51 |   2.32   |   .9    |    na    |   na
     |  cid88 |    na    |   na    |   61.2   |   .9

When I train, do I simply zero fill these outputs, then not backpropagate error coming from these nodes? So for cid12, error from all nodes backpropagates, but for cid51, only error from pKa and logP?

If this is the case, is that a standard procedure implemented in most NN libraries (Theano, for example) and what is that called?

Topic theano backpropagation multitask-learning machine-learning

Category Data Science

Brian Spiering · Accepted Answer · 2019年9月13日 17:03

You should not impute the missing values to be zero. Missing values, the absence of value, is different than having no value. Zero has a numeric meaning. If you impute the missing values to be zero, the model will learn to predict that pattern.

Depending on the specific goal, there are two options:

Drop rows without complete data.
Update weights with the partial data.

To update with partial data, find the indexes that have data then calculate the loss only on those values. Most deep learning frameworks have a gather function to retrieve the elements for a set of indices in a tensor.

CharlesG · Accepted Answer · 2018年12月16日 16:39

You have already found the answer by yourself: if there are missing values, you should not backpropagate on them. I do not know of any particular name for this action.

I do not think that there are already implemented functions to do so, but it seems quite straightforward to embed your loss function (e.g. mean squared error) into a personalized one.

For example, let's say that you filled all missing values with $-1$:

(Disclaimer: I did not test this code but got inspiration in this answer)

import theano
from keras import backend as K

def custom_objective(y_true, y_pred):
    indices = y_true==-1
    y_pred2 = y_pred[indices]
    y_true2 = y_true[indices]
    # then you could use any already implemented loss function, or
    mse = K.mean(K.square(y_pred2-y_true2))
    return (mse)

Note that with nan coded as 0 you could use flatnonzero().

Error on multitask neural nets where all outputs not observed for every example

About