The short answer is no, you don't always need to transform your data to a normal distribution.
This depends a lot on the learning algorithm you're using. Additionally, you should treat continuous and categorical variables differently.
Continuous variables:
Tree-based models such as Decision Trees, Random Forest, Gradient Boosting, XGBoost, and others, are not affected by the distribution of your data.
However, algorithms like Linear Regression, Logistic Regression, KNN or Neural Nets can be highly affected by both the distribution and scale of your data. You will likely both get better results and finish training the model faster if you transform the data for these algorithms.
Categorical variables:
Independently of what algorithm you're using, you should one-hot-encode nominal categorical variables (this is the most common way, but there are other approaches such as Feature Hashing and Bin-counting that might work better if you have many categories). If they're ordinal, you should keep them as they are (given that they are integers, and if not, convert them to integers while maintaining the implied order).
Extra side note:
Also, make sure to not scale the entire dataset at once to prevent data leakage. Instead, scale your train set, then apply the same scaler on the test set, as explained in this SO answer.