How do CNNs use a model and find the object(s) desired?
Background: I'm studying CNN's outside of my undergraduate CS course on ML. I have a few questions related to CNNs.
1) When training a CNN, we desire tightly bounded/cropped images of the desired classes, correct? I.e. if we were trying to recognize dogs, we would use thousands of images of tightly cropped dogs. We would also feed images of non-dogs, correct? These images are scaled to a specific size, i.e. 255x255.
2) Let's say training is complete. Our model's accuracy seems sufficient, with no problems. From here, let's have a large, HD image of a non-occluded dog running through a field with various obstacles. With a typical NN and some data, we just take the model, cross it with some input, and bam it's going to output some class. How will the CNN view this large image, and then 'find' the dog? Do we run some type of preprocessing on the image to partition it, and feed the partitions?