convert single index pandas data frame to multi-index

I have a data frame with following structure:

df.columns
Index(['first_post_date', 'followers_count', 'friends_count',
       'last_post_date','min_retweet', 'retweet_count', 'screen_name',
       'tweet_count',  'tweet_with_max_retweet', 'tweets', 'uid'],
        dtype='object')

Inside the tweets series, each cell is another data frame containing all the tweets of an user.

df.tweets[0].columns
Index(['created_at', 'id', 'retweet_count', 'text'], dtype='object')

I want to convert this data frame to a multi-index frame, essentially by breaking the cell containing tweets. One index will be the uid, and another will be the id inside tweet.

How can I do that?

link to sample data

Topic pandas indexing python

Category Data Science


One way to pull the embedded dataframe up into the main dataframe and build a multi index is like:

Code:

def expand_tweets(tweets_df):
    tweets = []
    for uid, sub_df in tweets_df.set_index('uid').tweets.iteritems():
        sub_df['uid'] = uid
        tweets.append(sub_df)
    return pd.concat(tweets).merge(
        tweets_df.drop('tweets', axis=1).reset_index(),
        how='outer', on='uid').set_index(['uid', 'id'])

How:

  1. Pull all of the tweet dataframes out of main dataframe using uid as an index, and concat() them together with their uid.

  2. Then merge the main dataframe into the concatenated tweets dataframe.

  3. Set the desired index.

Test Code:

import json
import pandas as pd
with open('test.json') as f:
    df = pd.DataFrame(json.load(f))
df['tweets'] = df.tweets.apply(lambda x: pd.DataFrame(x))

print(expand_tweets(df).text.head())

Results:

uid         id                
1153859336  655060275025047552    Article on my new Haunted Stevenage book Paran...
            653912439940120576    Big thank you to @bobfmuk for interviewing me ...
            643709869908996096    Another interesting non-toadstool tweet today,...
            547107275681579008    @sisax67 Thanks, Simon. All the best to you &a...
            546693940024733696    Paul Adams @SkySportsDarts The Wanderer from W...
Name: text, dtype: object

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.