Getting mean and covariance matrix for multivariate normal from keras model

I have a dataset that has 6 input features and 5 output features. I want to use a keras sequential model to estimate the mean vector and covariance matrix from any row of input features assuming the output features to be following Multivariate Normal Distribution.

That is for my dataset for any row of 6 input features, I want to get a mean vector of 5 values and a 5*5 covariance matrix.

sample=pd.DataFrame({'X1':[1,2,3,4,5,6],
              'X2':[1,3,1,5,2,7],
              'X3':[3,0,0,7,5,0],
              'X4':[0,4,3,2,5,8],
              'X5':[9,7,0,2,4,5],
              'X6':[1,1,8,7,0,0],
              'Y1':[0.5,1.2,6.3,4.5,1.5,6.6],
              'Y2':[6.1,4.3,2.1,1.5,4.2,8.7],
              'Y3':[0,0,3.2,3.7,5.5,0.2],
              'Y4':[0.5,1.4,8.3,5.2,1.5,1.8],
              'Y5':[2.9,1.7,6.3,5.2,9.4,1.5]})
sample
    X1  X2  X3  X4  X5  X6  Y1  Y2  Y3  Y4  Y5
0   1   1   3   0   9   1   0.5 6.1 0.0 0.5 2.9
1   2   3   0   4   7   1   1.2 4.3 0.0 1.4 1.7
2   3   1   0   3   0   8   6.3 2.1 3.2 8.3 6.3
3   4   5   7   2   2   7   4.5 1.5 3.7 5.2 5.2
4   5   2   5   5   4   0   1.5 4.2 5.5 1.5 9.4
5   6   7   0   8   5   0   6.6 8.7 0.2 1.8 1.5

For loss function I am using the following, which maximizes the log probability.

def lossF(y_true, mu, cov):

  dist = tfp.distributions.MultivariateNormalTriL(loc=mu, scale_tril=tf.linalg.cholesky(cov))
  return tf.reduce_mean(-dist.log_prob(y_true))

I am trying something like below, but getting confused in the middle.

#X_train has 6 values in each row
#y_train has 5 values in each row
#y_pred should be either a distribution function or mu  cov for each row

opt = Adam(learning_rate=0.001)
inputs = Input(shape=(6,))
layer1 = Dense(24, activation='relu')(inputs)
layer2 = Dense(12, activation='relu')(layer1)
predictions = ???
model = Model(inputs=???, outputs=???)
model.compile(optimizer=opt, loss=loss_fn)
model.fit(X_train, y_train, epochs=100, batch_size=100)
y_pred=model.predict(X_test)

Note: instead of getting mu and cov separately, if its possible to get distribution function as output that would be helpful too.

Topic multivariate-distribution keras tensorflow python

Category Data Science


Given that the covariance matrix has to be positive definite, the cholesky decomposition is a good way to solve this problem. So the output of the network will be the mean vector mu and the upper triangular part of the cholesky matrix (denoted T here). The diagonal of this matrix must be positive elements (the diagonal of the covariance matrix are standard deviations):

p = y_train.shape[1] # dimension of the covariance matrix 
inputs = Input(shape=(6,))
layer1 = Dense(24, activation='relu')(inputs)
layer2 = Dense(12, activation='relu')(layer1)
mu = Dense(p, activation = "linear")(layer1)
T1 = Dense(p, activation="exponential")(layer1)# diagonal of T
T2 = Dense((p*(p-1)/2), activation="linear")(layer1)
outputs = Concatenate()([mu, T1, T2]) 

Now let's define the loss function. Firstly, let's define the function that will extract the outputs of the network:

def mu_sigma(output):
    mu = output[0][0:p]
    T1 = output[0][p:2*p]
    T2 = output[0][2*p:]
    ones = tf.ones((p,p), dtype=tf.float32) 
    mask_a = tf.linalg.band_part(ones, 0, -1)  
    mask_b = tf.linalg.band_part(ones, 0, 0)  
    mask = tf.subtract(mask_a, mask_b) 
    zero = tf.constant(0, dtype=tf.float32)
    non_zero = tf.not_equal(mask, zero)
    indices = tf.where(non_zero)
    T2 = tf.sparse.SparseTensor(indices,T2,dense_shape=tf.cast((p,p),
         dtype=tf.int64))
    T2 = tf.sparse.to_dense(T2)
    T1 = tf.linalg.diag(T1)
    sigma = T1 + T2
    return mu, sigma

Now for the loss function:

from tensorflow_probability import distributions as tfd
def gnll_loss(y, pred):
    mu, sigma = mu_sigma(pred)
    gm = tfd.MultivariateNormalTriL(loc=mu, scale_tril=sigma)
    log_likelihood = gm.log_prob(y)          
    return - tf.math.reduce_sum(log_likelihood)

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.