I would separate three terms: A loss function, regularizing terms and the function you want to optimize.
For example, you start out with a problem: You want to distinguish cat and dog images. For this you have already function that is dependent on some parameters. Your function gets an inputs images and its output is then either the word "cat" or "dog".
If you want to measure how good your function actually is, you need a loss function. For example, whenever it says "dog" to a cat image the loss function should indicate so.
In reality, you will only have a certain number of images and there will be arbitrary many and complex functions f* solving this optimization problem. These functions can perfectly work for your problem, your simple dataset, but in general not necessarily do what you want it to do. For example this function f* works perfectly on your dataset, but otherwise says "dog" to any kind of image regardless of it's actually dog or cat image.
Clearly, there are good and bad functions f* you can get after optimization. So, what you want to do is now guide the optimization process in such a way, that something like the above scenario doesn't happen. You may not be able to remove this possibility completely, but at least, make it much less likely that you end up with a function that solves your problem but is otherwise useless.
One of the tools at your deposal are regularizes. In physics you want to describe the world and you can make up arbitrary complex models that explain everything there is. Physicists though prefer simple models they can actually test and verify. What use is a model that explains everything just as well as a small model, but is magnitudes bigger? Bigger models may also do stuff that you don't want it do.
In Machine Learning we have a similar perspective: We dislike complex models, because their behaviour might not be easy to understand/predict or verify. That's why you apply a regularizing term to your loss function that penalizes complexity. So if you optimize your model, it naturally tries to move towards simpler versions of it that use less parameters.
Did this help?