1D CNN Variational Autoencoder Conv1D Size
I am trying to create a 1D variational autoencoder to take in a 931x1 vector as input, but I have been having trouble with two things:
- Getting the output size of 931, since maxpooling and upsampling gives even sizes
- Getting the layer sizes proper
This is what I have so far. I added 0 padding on both sides of my input array before training (This is why you'll see h+2 for the input, 931+2 = 933), and then cropped the output to also get a 933 output size. Using 931 input gives 928 output which I am not sure what the best way to get 931 from there without cropping.
input_sig = Input(batch_shape=(w,h+2, 1))
x = Conv1D(8,3, activation='relu', padding='same',dilation_rate=2)(input_sig)
# x = ZeroPadding1D((2,1))(x)
x1 = MaxPooling1D(2)(x)
x2 = Conv1D(4,3, activation='relu', padding='same',dilation_rate=2)(x1)
x3 = MaxPooling1D(2)(x2)
x4 = AveragePooling1D()(x3)
flat = Flatten()(x4)
encoder = Dense(2)(flat)
x = encoder
z_mean = Dense(latent_dim, name=z_mean)(x)
z_log_var = Dense(latent_dim, name=z_log_var)(x)
z = Sampling()([z_mean, z_log_var])
encoder = Model(input_sig, [z_mean, z_log_var, z], name=encoder)
encoder.summary()
latent_inputs = keras.Input(shape=(latent_dim,))
# d1 = Dense(464)(latent_inputs)
d1 = Dense(468)(latent_inputs)
# d2 = Reshape((117,4))(d1)
d2 = Reshape((117,4))(d1)
d3 = Conv1D(4,1,strides=1, activation='relu', padding='same')(d2)
d4 = UpSampling1D(2)(d3)
d5 = Conv1D(8,1,strides=1, activation='relu', padding='same')(d4)
d6 = UpSampling1D(2)(d5)
d7 = UpSampling1D(2)(d6)
d8 = Conv1D(1,1, strides=1, activation='sigmoid', padding='same')(d7)
decoded = Cropping1D(cropping=(1,2))(d8) # this is the added step
decoder = Model(latent_inputs, decoded, name=decoder)
decoder.summary()
This is the summary printed:
Model: encoder
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_99 (InputLayer) [(1, 933, 1)] 0
__________________________________________________________________________________________________
conv1d_209 (Conv1D) (1, 933, 8) 32 input_99[0][0]
__________________________________________________________________________________________________
max_pooling1d_90 (MaxPooling1D) (1, 466, 8) 0 conv1d_209[0][0]
__________________________________________________________________________________________________
conv1d_210 (Conv1D) (1, 466, 4) 100 max_pooling1d_90[0][0]
__________________________________________________________________________________________________
max_pooling1d_91 (MaxPooling1D) (1, 233, 4) 0 conv1d_210[0][0]
__________________________________________________________________________________________________
average_pooling1d_45 (AveragePo (1, 116, 4) 0 max_pooling1d_91[0][0]
__________________________________________________________________________________________________
flatten_45 (Flatten) (1, 464) 0 average_pooling1d_45[0][0]
__________________________________________________________________________________________________
dense_89 (Dense) (1, 2) 930 flatten_45[0][0]
__________________________________________________________________________________________________
z_mean (Dense) (1, 2) 6 dense_89[0][0]
__________________________________________________________________________________________________
z_log_var (Dense) (1, 2) 6 dense_89[0][0]
__________________________________________________________________________________________________
sampling_45 (Sampling) (1, 2) 0 z_mean[0][0]
z_log_var[0][0]
==================================================================================================
Total params: 1,074
Trainable params: 1,074
Non-trainable params: 0
__________________________________________________________________________________________________
Model: decoder
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_100 (InputLayer) [(None, 2)] 0
_________________________________________________________________
dense_90 (Dense) (None, 468) 1404
_________________________________________________________________
reshape_44 (Reshape) (None, 117, 4) 0
_________________________________________________________________
conv1d_211 (Conv1D) (None, 117, 4) 20
_________________________________________________________________
up_sampling1d_117 (UpSamplin (None, 234, 4) 0
_________________________________________________________________
conv1d_212 (Conv1D) (None, 234, 8) 40
_________________________________________________________________
up_sampling1d_118 (UpSamplin (None, 468, 8) 0
_________________________________________________________________
up_sampling1d_119 (UpSamplin (None, 936, 8) 0
_________________________________________________________________
conv1d_213 (Conv1D) (None, 936, 1) 9
_________________________________________________________________
cropping1d_18 (Cropping1D) (None, 933, 1) 0
=================================================================
Total params: 1,473
Trainable params: 1,473
Non-trainable params: 0
______________________________
However when I try to fit my model I get the following exception:
ValueError: Invalid reduction dimension 2 for input with 2 dimensions. for '{{node Sum}} = Sum[T=DT_FLOAT, Tidx=DT_INT32, keep_dims=false](Mean, Sum/reduction_indices)' with input shapes: [1,933], [2] and with computed input tensors: input[1] = 1 2.
Anyone experience this error, or see what I am doing wrong in my model construction? I am new at this and not sure what I am doing wrong.
Note that I have modified this from a working 28x28 MNIST VAE from the Keras documentation.
Thanks in advance
Topic vae keras convolutional-neural-network tensorflow autoencoder
Category Data Science