Isolation forest - grouped by
I'm trying to use isolation forest algorithm for outliers detection.
Data has 2 columns: id
and REV
. Below code gives me ungrouped result.
Could you pls advise how to get result grouped by first column (id
)?
data= pd.read_excel (my_path)
outliers_fraction=0.1
scaler = StandardScaler()
np_scaled = scaler.fit_transform(data)
data = pd.DataFrame(np_scaled)
model =IsolationForest(contamination=outliers_fraction)
model.fit(data)
data['anomaly'] = pd.Series(model.predict(data))
print(data)
I have 2 columns: id
and REV
. I added a picture of what I expect to see as the final result:
Tried to use function, but have half of data NaN. Any thoughts please?
def groupreg(g):
outliers_fraction = 0.1
# train isolation forest
model = IsolationForest(contamination=outliers_fraction)
model.fit(g)
g['anomalyX'] = pd.Series(model.predict(g))
return g
df1=data2.groupby('id').apply(groupreg)
print(df1)
Topic outlier random-forest scikit-learn python
Category Data Science