1D CNN Variational Autoencoder Conv1D Size

I am trying to create a 1D variational autoencoder to take in a 931x1 vector as input, but I have been having trouble with two things:

  1. Getting the output size of 931, since maxpooling and upsampling gives even sizes
  2. Getting the layer sizes proper

This is what I have so far. I added 0 padding on both sides of my input array before training (This is why you'll see h+2 for the input, 931+2 = 933), and then cropped the output to also get a 933 output size. Using 931 input gives 928 output which I am not sure what the best way to get 931 from there without cropping.


input_sig = Input(batch_shape=(w,h+2, 1))
x = Conv1D(8,3, activation='relu', padding='same',dilation_rate=2)(input_sig)
# x = ZeroPadding1D((2,1))(x)
x1 = MaxPooling1D(2)(x)
x2 = Conv1D(4,3, activation='relu', padding='same',dilation_rate=2)(x1)
x3 = MaxPooling1D(2)(x2)
x4 = AveragePooling1D()(x3)
flat = Flatten()(x4)
encoder = Dense(2)(flat)
x = encoder
z_mean = Dense(latent_dim, name=z_mean)(x)
z_log_var = Dense(latent_dim, name=z_log_var)(x)
z = Sampling()([z_mean, z_log_var])
encoder = Model(input_sig, [z_mean, z_log_var, z], name=encoder)
encoder.summary()

latent_inputs = keras.Input(shape=(latent_dim,))
# d1 = Dense(464)(latent_inputs)
d1 = Dense(468)(latent_inputs)
# d2 = Reshape((117,4))(d1)
d2 = Reshape((117,4))(d1)
d3 = Conv1D(4,1,strides=1, activation='relu', padding='same')(d2)
d4 = UpSampling1D(2)(d3)
d5 = Conv1D(8,1,strides=1, activation='relu', padding='same')(d4)
d6 = UpSampling1D(2)(d5)
d7 = UpSampling1D(2)(d6)
d8 = Conv1D(1,1, strides=1, activation='sigmoid', padding='same')(d7)
decoded = Cropping1D(cropping=(1,2))(d8) # this is the added step

decoder = Model(latent_inputs, decoded, name=decoder)
decoder.summary()

This is the summary printed:

Model: encoder
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_99 (InputLayer)           [(1, 933, 1)]        0                                            
__________________________________________________________________________________________________
conv1d_209 (Conv1D)             (1, 933, 8)          32          input_99[0][0]                   
__________________________________________________________________________________________________
max_pooling1d_90 (MaxPooling1D) (1, 466, 8)          0           conv1d_209[0][0]                 
__________________________________________________________________________________________________
conv1d_210 (Conv1D)             (1, 466, 4)          100         max_pooling1d_90[0][0]           
__________________________________________________________________________________________________
max_pooling1d_91 (MaxPooling1D) (1, 233, 4)          0           conv1d_210[0][0]                 
__________________________________________________________________________________________________
average_pooling1d_45 (AveragePo (1, 116, 4)          0           max_pooling1d_91[0][0]           
__________________________________________________________________________________________________
flatten_45 (Flatten)            (1, 464)             0           average_pooling1d_45[0][0]       
__________________________________________________________________________________________________
dense_89 (Dense)                (1, 2)               930         flatten_45[0][0]                 
__________________________________________________________________________________________________
z_mean (Dense)                  (1, 2)               6           dense_89[0][0]                   
__________________________________________________________________________________________________
z_log_var (Dense)               (1, 2)               6           dense_89[0][0]                   
__________________________________________________________________________________________________
sampling_45 (Sampling)          (1, 2)               0           z_mean[0][0]                     
                                                                 z_log_var[0][0]                  
==================================================================================================
Total params: 1,074
Trainable params: 1,074
Non-trainable params: 0
__________________________________________________________________________________________________
Model: decoder
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_100 (InputLayer)       [(None, 2)]               0         
_________________________________________________________________
dense_90 (Dense)             (None, 468)               1404      
_________________________________________________________________
reshape_44 (Reshape)         (None, 117, 4)            0         
_________________________________________________________________
conv1d_211 (Conv1D)          (None, 117, 4)            20        
_________________________________________________________________
up_sampling1d_117 (UpSamplin (None, 234, 4)            0         
_________________________________________________________________
conv1d_212 (Conv1D)          (None, 234, 8)            40        
_________________________________________________________________
up_sampling1d_118 (UpSamplin (None, 468, 8)            0         
_________________________________________________________________
up_sampling1d_119 (UpSamplin (None, 936, 8)            0         
_________________________________________________________________
conv1d_213 (Conv1D)          (None, 936, 1)            9         
_________________________________________________________________
cropping1d_18 (Cropping1D)   (None, 933, 1)            0         
=================================================================
Total params: 1,473
Trainable params: 1,473
Non-trainable params: 0
______________________________

However when I try to fit my model I get the following exception:

ValueError: Invalid reduction dimension 2 for input with 2 dimensions. for '{{node Sum}} = Sum[T=DT_FLOAT, Tidx=DT_INT32, keep_dims=false](Mean, Sum/reduction_indices)' with input shapes: [1,933], [2] and with computed input tensors: input[1] = 1 2.

Anyone experience this error, or see what I am doing wrong in my model construction? I am new at this and not sure what I am doing wrong.

Note that I have modified this from a working 28x28 MNIST VAE from the Keras documentation.

Thanks in advance

Topic vae keras convolutional-neural-network tensorflow autoencoder

Category Data Science


I think your input dimension to the autoencoder and its output dimensions are different. The input is (1,933,1) while the output is (933,1). These should be same actually.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.