You have $98\%$ in one class, right? This means that, knowing nothing about the data, you should be able to get $98\%$ of them right by guessing that majority class. If you get $97\%$ of them right, that sounds like an $\text{A}$ in school and thus a good model, but the model does worse than randomly guessing!
Better yet, compare using proper scoring rules like log loss (crossentropy) or Brier score, against a model that always predicts the prior probability of $P(y=1) = 0.02$. This is analogous to how $R^2$ works in linear regression, by always guessing the mean of the $y$ variable. In your case, the mean of the $y$ variable is the class ratio. If you can't beat the model that always guesses $P(y=1) = 0.02$, perhaps you have a poor model. (Specifics would depend on the misclassification costs, which you might or might not know.)
$$\text{Log Loss}\\
L(y, \hat y) = -\frac{1}{N}\sum_{i = 1}^N \bigg( y_i\log(\hat y_i) + (1 - y_i)\log(1 - \hat y_i) \bigg)$$
$$
\text{Brier Score}\\
L(y, \hat y) = \frac{1}{N}\sum_{i = 1}^N \bigg(y_i - \hat y_i\bigg)^2
$$
This assumes your $y_i\in\{0, 1\}$. If you use $y_i\in\{-1. 1\}$, you would have to modify the loss functions or change how you label your categories. The $\hat y_i$ values are probabilities. There are issues with the log loss if you predict a probability of $0$ or $1$. Some see this as an upside of log loss, while others see it as a downside.
This kind of evaluation of the probability outputs is why statisticians do not see class imbalance as an issue.