Isolation forest - grouped by

Question

Isolation forest - grouped by

codecodecode

2022年4月23日 06:01

I'm trying to use isolation forest algorithm for outliers detection.
Data has 2 columns: id and REV. Below code gives me ungrouped result.
Could you pls advise how to get result grouped by first column (id)?

data= pd.read_excel (my_path)
outliers_fraction=0.1
scaler = StandardScaler()
np_scaled = scaler.fit_transform(data)
data = pd.DataFrame(np_scaled)
model =IsolationForest(contamination=outliers_fraction)
model.fit(data)
data['anomaly'] = pd.Series(model.predict(data))
print(data)

I have 2 columns: id and REV. I added a picture of what I expect to see as the final result:

Tried to use function, but have half of data NaN. Any thoughts please?

def groupreg(g):
        outliers_fraction = 0.1
        # train isolation forest
        model = IsolationForest(contamination=outliers_fraction)
        model.fit(g)
        g['anomalyX'] = pd.Series(model.predict(g))
        return g

    df1=data2.groupby('id').apply(groupreg)
    print(df1)

Topic outlier random-forest scikit-learn python

Category Data Science

codecodecode · Accepted Answer · 2019年8月19日 14:51

Here is solution if somebody will need it someday

def groupreg(df):
    a=df[['EUR']].values
    # train isolation forest
    model = IsolationForest()
    model.fit(a)
    return pd.Series(model.predict(a))

result=df.groupby('id').apply(groupreg)
#print(result)

#print(isolationforest(data2))
df['outlier1'] = result.tolist()
print(df)

Isolation forest - grouped by

About