Does statsmodels fully support MultiIndex?

The below code snippet shows how statsmodels seems to flatten MultiIndex tuples by joining them with an underscore _.

import numpy as np
import pandas as pd
from statsmodels.regression.linear_model import OLS

K = 2
N = 10
ERROR_VOL = 1

np.random.seed(0)
X = np.random.rand(N, K)
coefs = np.linspace(0.1, 1, K)
noise = np.random.rand(N)
y = X @ coefs + noise * ERROR_VOL

index_ = pd.MultiIndex.from_tuples([('some_var','feature_0'), ('some_var','feature_1')])
df = pd.DataFrame(X, columns=index_)
ols_fit = OLS(y, df, hasconst=False).fit()
print(ols_fit.params)

The result is

 some_var_feature_0    0.230474
some_var_feature_1    1.646789
dtype: float64

Because of the above flattening, the following, and similar operations relying on name matching, fail:

params_stdzd = ols_fit.params * df.std()
 ValueError: cannot join with no level specified and no overlapping names

Questions

  1. Is there a way to get statsmodels to respect a pandas MultiIndex rather than flatten it?

If not:

  1. Is there a way to set the flattening character to something other than underscore?

  2. can I rely on OLS.params respecting the order of df.columns? If so I could just reindex OLS.params with df.columns to get a properly indexed params Series.

  3. Are there better ways to get MultiIndex interoperabilty with statsmodels?

Topic statsmodels pandas data-indexing-techniques python

Category Data Science

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.