Using large CNNs (e.g., ResNet) in convolutional autoencoders for image representation learning

I am confused about which CNNs are generally used inside autoencoder architectures for learning image representations. Is it more common to use a large existing network like ResNet or VGG, or do most people write their own smaller networks? What are the pros and cons of each? If people are using a large network like ResNet or VGG, does the decoder mirror the same steps taken by the encoder, or can a more simple decoding network be used? I am having a hard time finding papers where people describe which networks they use inside their autoencoders. Any help would be greatly appreciated! Thank you!

Topic vgg16 representation cnn autoencoder computer-vision

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.