How to find the count of consecutive same string values in a pandas dataframe?
Assume that we have the following pandas dataframe:
df = pd.DataFrame({'col1':['AG','CT','CT','GT','CT', 'AG','AG','AG'],'col2':['TCT','ACA','TCA','TCA','GCT', 'ACT','CTG','ATG'], 'start':[1000,2000,3000,4000,5000,6000,10000,20000]})
input:
col1 col2 start
0 AG TCT 1000
1 CT ACA 2000
2 CT TCA 3000
3 GT TCA 4000
4 CT GCT 5000
5 AG ACT 6000
6 AG CTG 10000
7 AG ATG 20000
8 CA TCT 10000
9 CT ACA 2000
10 CT TCA 3000
11 CT TCA 4000
What I want to get is the number of consecutive values in col1 and length of these consecutive values and the difference between the last element's start and first element's start:
output:
type length diff
0 CT 2 1000
1 AG 3 14000
2 CT 3 2000
Category Data Science