Error on multitask neural nets where all outputs not observed for every example
Let's say I have 2 datasets, each from a set of experiments. Dataset A measures a set of properties X for set S, while dataset B measures properties Y for set T. X and Y are highly correlated, and S and T have (not perfect) overlap. To give an example, I might have
A | ID | pKa | log P
| cid12 | 3.51 | 1.2
| cid51 | 2.32 | .9
B | ID | Pol Srf A| cLogP
| cid12 | 48.5 | 1.5
| cid88 | 61.2 | .9
I would like to use the shared information here (since all of these are physical properties) in a multitask neural net setup. For examples in both S and T, I have a target vector as long as X union Y. But for members exclusive to S or T, I am missing values.
In the examples, I have
C | ID | pKa | log P | Pol Srf A| cLogP
| cid12 | 3.51 | 1.2 | 48.5 | 1.5
| cid51 | 2.32 | .9 | na | na
| cid88 | na | na | 61.2 | .9
When I train, do I simply zero fill these outputs, then not backpropagate error coming from these nodes? So for cid12, error from all nodes backpropagates, but for cid51, only error from pKa and logP?
If this is the case, is that a standard procedure implemented in most NN libraries (Theano, for example) and what is that called?
Topic theano backpropagation multitask-learning machine-learning
Category Data Science