In order to be able to create a dictionary from your dataframe, such that the keys are tuples of combinations (according to your example output), my idea would be to use a Pandas MultiIndex. This will then generate a dictionary of the form you want.
First I just recreate your example dataframe (would be nice if you provide this code in the future!):
import pandas as pd
# Create the example dataframe
df = pd.DataFrame(["4-Grain Flakes", "4-Grain Flakes, Gluten Free", "4-Grain Flakes, Riihikosken Vehnämylly"])
df["id"] = [11005, 35146, 32570]
df["energy"] = [1404, 1569, 1443]
df["fibre"] = [11.5, 6.1, 11.2]
df.columns = ["name"] + list(df.columns[1:])
print(df)
name id energy fibre
0 4-Grain Flakes 11005 1404 11.5
1 4-Grain Flakes, Gluten Free 35146 1569 6.1
2 4-Grain Flakes, Riihikosken Vehnämylly 32570 1443 11.2
Now we can create the combinations of each value in "name" with each of the other column names. I will use lists, within a list comprehension, where I bundle up the values together into tuples. We end with a list of tuples:
names = df.name.tolist()
others = list(df.columns)
others.remove("name") # We don't want "name" to be included
index_tuples = [(name, other) for name in names for other in others]
We can create the MultiIndex from this list of tuples as follows:
multi_ix = pd.MultiIndex.from_tuples(index_tuples)
Now we can create a new dataframe using out multi_ix
. To populate this dataframe, notice that we simple need to row-wise values from columns ["id", "energy", "fibre"]
. We can do this easily by extracting as an n * 3
NumPy array (using the values
attribute of the dataframe) and then flattening the matrix, using NumPy's ravel method:
df1 = pd.DataFrame(df[others].values.ravel(), index=multi_ix, columns=["data"])
print(df1)
data
4-Grain Flakes id 11005.0
energy 1404.0
fibre 11.5
4-Grain Flakes, Gluten Free id 35146.0
energy 1569.0
fibre 6.1
4-Grain Flakes, Riihikosken Vehnämylly id 32570.0
energy 1443.0
fibre 11.2
Now we can simply use to to_dict()
method of the datframe to create the dictionary you are looking for:
nutritionValues = df1.to_dict()["data"]
print(nutritionValues)
{('4-Grain Flakes', 'energy'): 1404.0,
('4-Grain Flakes', 'fibre'): 11.5,
('4-Grain Flakes', 'id'): 11005.0,
('4-Grain Flakes, Gluten Free', 'energy'): 1569.0,
('4-Grain Flakes, Gluten Free', 'fibre'): 6.1,
('4-Grain Flakes, Gluten Free', 'id'): 35146.0,
('4-Grain Flakes, Riihikosken Vehnämylly', 'energy'): 1443.0,
('4-Grain Flakes, Riihikosken Vehnämylly', 'fibre'): 11.2,
('4-Grain Flakes, Riihikosken Vehnämylly', 'id'): 32570.0}
It is also possible to get your final example of a multidict, directly from the multi-indexed dataframe. You need to just use multi-index slicing:
fibre_df = final_df.loc[(slice(None), ["fibre"]), :]
print(fibre_df)
0
4-Grain Flakes fibre 11.5
4-Grain Flakes, Gluten Free fibre 6.1
4-Grain Flakes, Riihikosken Vehnämylly fibre 11.2
You can then generate a dictionary as before:
d = final_df.loc[(slice(None), ["fibre"]), :].to_dict()[0]
print(d)
{('4-Grain Flakes', 'fibre'): 11.5,
('4-Grain Flakes, Gluten Free', 'fibre'): 6.1,
('4-Grain Flakes, Riihikosken Vehnämylly', 'fibre'): 11.2}
And you can drop the "fibre"
value from the tuple-keys with a simple dictionary comprehension:
final_dict = {k[0]: v for k, v in d.items()}
print(final_dict)
{'4-Grain Flakes': 11.5,
'4-Grain Flakes, Gluten Free': 6.1,
'4-Grain Flakes, Riihikosken Vehnämylly': 11.2}