Pandas: Assign back to table from grouping by column and index

I am trying to implement Exponential Moving Average calculation on a DataFrame. The formula is

An additional complication is that my table is grouped and there is a unique bin number per group. This is what I tried

import numpy as np
import numpy.random as rand

n = 5
groups = np.array(['one', 'two', 'three'])
data = pd.DataFrame({
    'price': rand.random(3 * n) * 10,
    'group': np.repeat(groups, n),
    'bin': np.tile(np.arange(n),3)}, index=np.arange(3 * n))

print(data)

price group bin 0 1.601310 one 0 1 3.190662 one 1 2 4.419421 one 2 3 3.817510 one 3 4 2.440774 one 4 5 6.832265 two 0 6 5.636502 two 1 7 4.630515 two 2 8 5.856423 two 3 9 0.916452 two 4 10 4.247134 three 0 11 7.146746 three 1 12 8.049161 three 2 13 7.852168 three 3 14 0.246720 three 4

This is how I am trying to implement the EMA calculation;

data['EMA'] = np.zeros(len(data.index))
data.loc[data['bin'] == 0, 'EMA'] = data.loc[data['bin'] == 0, 'price']

a = 2 / (n + 1)
for _, group in data.groupby('group'):
    for index, row in group.iloc[1:].iterrows():
        prev = group[group['bin'] == row['bin'] - 1].iloc[0]
        row['EMA'] = a * row['price'] + (1 - a) * prev['EMA'] # nope
        data.loc[index, 'EMA'] = a * row['price'] + (1 - a) * prev['EMA'] # nope

Unfortunately neither of these last lines update the values in the group. On the second iteration, the value of prev['EMA'] is still 0. What is the correct way to assign the values back to the table such that it is updated dynamically? Do I need to write out to a temporary array and write it back afterwards?

Additionally, I could not think of an elegant way to do this using assign or transform. If someone can solve that it might be a very good alternative.


Response

Thank you @DaFanat for your help. Unfortunately your code doesn't work. I tried the following

data.loc[:, 'EMA2'] = map(lambda x, y: x if pd.isnull(y) else x*a + (1-a) * y, 
data['price'], data.groupby('group')['price'].shift(1))

But I get an error TypeError: object of type 'map' has no len(). I tried changing it to this

data['EMA2'] = list(map(lambda x, y: x if pd.isnull(y) else x*a + (1-a) * y, 
data['price'], data.groupby('group')['price'].shift(1)))

And I do indeed get some results, but they do not look correct;

price group bin EMA EMA2 0 5.407722 one 0 5.407722 5.407722 1 0.495734 one 1 3.770393 3.770393 2 7.911491 one 2 5.150759 2.967653 3 1.085836 one 3 3.795785 5.636272 4 7.326432 one 4 4.972667 3.166035

I am inclined to believe my implementation, since the price goes up from 0.5 to 7.9, how can the moving average go down? I think the indices are getting lost and it is putting the values against the wrong cells. How do I retain the indices when performing this calculation?


Solution

Thanks @DaFanat, you took me a long way. I finally figured out how to do it with a slight modification of your original method;

data['EMA2'] = data.groupby('group') \
                   .apply(lambda x: x['price'] * a + x['EMA'].shift(1) * (1-a)) \
                   .reset_index(level=0, drop=True)
data.loc[data['bin'] == 0, 'EMA2'] = data.loc[data['bin'] == 0, 'price']

print(data)

price group bin EMA EMA2 0 3.556171 one 0 3.556171 3.556171 1 5.637241 one 1 4.249861 4.249861 2 3.278771 one 2 3.926164 3.926164 3 7.343718 one 3 5.065349 5.065349 4 6.128884 one 4 5.419861 5.419861

Not using the list(map()) ensures that the result is a DataFrame with indices retained, so it knows where to insert the individual rows.

Topic dataframe pandas indexing python

Category Data Science


EDIT:

Couldnt find a clean solution but this function should do the trick

def calc_ema(cur, *prevs):
    rele_prevs = [x for x in prevs if not pd.isnull(x)]
    rele_prevs_cnt = len(rele_prevs)
    if rele_prevs_cnt == 0:##First item
       res = cur
    else:
       x0 = rele_prevs[rele_prevs_cnt-1]
       prev_items_sum = sum([x * ((1-a) ** (i+1)) for i, x in 
       enumerate(rele_prevs)])
       st = a * (cur + prev_items_sum)
       res =  st + ((1-a) ** (rele_prevs_cnt+1)) * x0
    return res

You should use it as follows:

grpd = data.groupby('group')
data['EMA3'] = list(map(calc_ema, data['price'], *[grpd['price'].shift(i) for i in range(1, n + 1)]))

Original Answer:

-- Dosnt answer OP's needs --

You can use the shift function on a grouped by object.
This will let you calculate on $Y_t$ and $Y_{t-1}$

data.sort_values(by='bin', inplace=True)
a = 0.1
data.loc[:, 'ema'] = map(lambda x, y: x if pd.isnull(y) else x*a + (1-a) * y, 
data['price'], data.groupby('group')['price'].shift(1))

##python 3  
 data.loc[:, 'ema'] = list(map(lambda x, y: x if pd.isnull(y) else x*a + (1-a) * y, 
data['price'], data.groupby('group')['price'].shift(1)))

I figured it out by using a temporary array as mentioned, but the result is ugly as hell

a = 2 / (n + 1)
for _, group in data.groupby('group'):
    ema = np.zeros(len(group.index))
    ema[0] = group.iloc[0]['price']
    i = 1
    for index, row in group.iloc[1:].iterrows():
        ema[i] = a * row['price'] + (1 - a) * ema[i-1]
        data.loc[index, 'EMA'] = ema[i]
        i = i + 1

print(data)

price group bin EMA 0 5.121409 one 0 5.121409 1 6.359162 one 1 5.533993 2 0.841837 one 2 3.969941 3 9.348758 one 3 5.762880 4 3.671258 one 4 5.065673 5 1.651604 two 0 1.651604 6 7.940099 two 1 3.747769 7 8.777750 two 2 5.424429 8 8.227762 two 3 6.358873 9 3.283300 two 4 5.333682 10 8.290492 three 0 8.290492 11 8.589680 three 1 8.390221 12 0.159063 three 2 5.646502 13 5.504104 three 3 5.599036 14 9.392295 three 4 6.863456

I'm not going to accept my answer because there has to be a more elegant way of doing it. Any offers?

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.