How to create dictionary with multiple keys from dataframe in python?

I have a pandas dataframe as follows, I want to convert it to a dictionary format with 2 keys as shown:

    id              name                        energy             fibre    
0   11005   4-Grain Flakes                          1404            11.5    
1   35146   4-Grain Flakes, Gluten Free             1569             6.1    
2   32570   4-Grain Flakes, Riihikosken Vehnämylly  1443            11.2     

I am expecting the result to be of

 nutritionValues = {
  ('4-Grain Flakes', 'id'): 11005,
  ('4-Grain Flakes', 'energy'): 1404,
  ('4-Grain Flakes', 'fibre'):  11.5,
  ('4-Grain Flakes, Gluten Free', 'id'): 11005,
  ('4-Grain Flakes, Gluten Free', 'energy'): 1569,
  ('4-Grain Flakes, Gluten Free', 'fibre'):  6.1,
  ('4-Grain Flakes, Riihikosken Vehnämylly', 'id'): 32570,
  ('4-Grain Flakes, Riihikosken Vehnämylly', 'energy'): 1443,
  ('4-Grain Flakes, Riihikosken Vehnämylly', 'fibre'):  11.2}

foods, fiber = multidict({
  '4-Grain Flakes': 11.5,
  '4-Grain Flakes, Gluten Free':   6.1,
  '4-Grain Flakes, Riihikosken Vehnämylly':   11.2})

How can I achieve this?

Topic pandas indexing python

Category Data Science


In order to be able to create a dictionary from your dataframe, such that the keys are tuples of combinations (according to your example output), my idea would be to use a Pandas MultiIndex. This will then generate a dictionary of the form you want.

First I just recreate your example dataframe (would be nice if you provide this code in the future!):

import pandas as pd

# Create the example dataframe
df = pd.DataFrame(["4-Grain Flakes", "4-Grain Flakes, Gluten Free", "4-Grain Flakes, Riihikosken Vehnämylly"])
df["id"] = [11005, 35146, 32570]
df["energy"] = [1404, 1569, 1443]
df["fibre"] = [11.5, 6.1, 11.2]
df.columns = ["name"] + list(df.columns[1:])

print(df)
                                     name     id  energy  fibre
0                          4-Grain Flakes  11005    1404   11.5
1             4-Grain Flakes, Gluten Free  35146    1569    6.1
2  4-Grain Flakes, Riihikosken Vehnämylly  32570    1443   11.2

Now we can create the combinations of each value in "name" with each of the other column names. I will use lists, within a list comprehension, where I bundle up the values together into tuples. We end with a list of tuples:

names = df.name.tolist()
others = list(df.columns)
others.remove("name")         # We don't want "name" to be included

index_tuples = [(name, other) for name in names for other in others]

We can create the MultiIndex from this list of tuples as follows:

multi_ix = pd.MultiIndex.from_tuples(index_tuples)

Now we can create a new dataframe using out multi_ix. To populate this dataframe, notice that we simple need to row-wise values from columns ["id", "energy", "fibre"]. We can do this easily by extracting as an n * 3 NumPy array (using the values attribute of the dataframe) and then flattening the matrix, using NumPy's ravel method:

df1 = pd.DataFrame(df[others].values.ravel(), index=multi_ix, columns=["data"])

print(df1)

                                                  data
4-Grain Flakes                         id      11005.0
                                       energy   1404.0
                                       fibre      11.5
4-Grain Flakes, Gluten Free            id      35146.0
                                       energy   1569.0
                                       fibre       6.1
4-Grain Flakes, Riihikosken Vehnämylly id      32570.0
                                       energy   1443.0
                                       fibre      11.2

Now we can simply use to to_dict() method of the datframe to create the dictionary you are looking for:

nutritionValues = df1.to_dict()["data"]

print(nutritionValues)

{('4-Grain Flakes', 'energy'): 1404.0,
 ('4-Grain Flakes', 'fibre'): 11.5,
 ('4-Grain Flakes', 'id'): 11005.0,
 ('4-Grain Flakes, Gluten Free', 'energy'): 1569.0,
 ('4-Grain Flakes, Gluten Free', 'fibre'): 6.1,
 ('4-Grain Flakes, Gluten Free', 'id'): 35146.0,
 ('4-Grain Flakes, Riihikosken Vehnämylly', 'energy'): 1443.0,
 ('4-Grain Flakes, Riihikosken Vehnämylly', 'fibre'): 11.2,
 ('4-Grain Flakes, Riihikosken Vehnämylly', 'id'): 32570.0}

It is also possible to get your final example of a multidict, directly from the multi-indexed dataframe. You need to just use multi-index slicing:

fibre_df = final_df.loc[(slice(None), ["fibre"]), :]
print(fibre_df)

                                                 0
4-Grain Flakes                         fibre  11.5
4-Grain Flakes, Gluten Free            fibre   6.1
4-Grain Flakes, Riihikosken Vehnämylly fibre  11.2

You can then generate a dictionary as before:

d = final_df.loc[(slice(None), ["fibre"]), :].to_dict()[0]
print(d)

{('4-Grain Flakes', 'fibre'): 11.5,
 ('4-Grain Flakes, Gluten Free', 'fibre'): 6.1,
 ('4-Grain Flakes, Riihikosken Vehnämylly', 'fibre'): 11.2}

And you can drop the "fibre" value from the tuple-keys with a simple dictionary comprehension:

final_dict = {k[0]: v for k, v in d.items()}
print(final_dict)

{'4-Grain Flakes': 11.5,
 '4-Grain Flakes, Gluten Free': 6.1,
 '4-Grain Flakes, Riihikosken Vehnämylly': 11.2}

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.