Going through the notebook on well known kaggle competition of favorita sales forecasting. One puzzle is, after the data is split for train and testing, it seems y_train has two columns containing unit_sales and transactions, both of which are being predicted, and eventually compared with ground truths. But why would someone pass these two columns to one model.fit() call instead of developing two models to predict the columns? Or is that what sklearn does internally anyway, i.e. training two models …
I have some data I'm trying to analyze in SAS Studio (university edition). I am using the Distribution Analysis feature to try to test some data for normality. It gives me the following histogram: Skewness is approximately 2.934 and Kurtosis is approximately 9.013. I would have assumed based on that (and the fact that the shape of the histogram looks so different than the normal curve) that this is not normally distributed. However, my goodness-of-fit tests are: The Kolmogorov-Smirnov D …
I am fitting mixture models to data and assessing how mixtures with more or less components will fit the data. To do this, I am going to plot the cdf of the empirical data and the cdf of my mixture model with k components. As an example, here is a cdf of the empirical data plotted beside a mixture of lognormal distributions with 2 components. My question is: how do I use scipy's kstest to determine the goodness of fit …
Does statsmodels compute R2 and other metrics on a validation set? I am using the OLS from the statsmodels.api when printing summary, an r2 and r2_asjusted are presented. I did not trust those 0.88 and computed an own adjusted R2 with scikit-learn r2_score and the adjusted r2 function from this answer resulting in 0.88 as well. So the question arose.