Create Period column based on a date column where the first month is 1, second 2, etc

I have a dataset with many project's monthly expendituries (cost curve), like this one:

Project Date Expenditure(USD)
Project A 12-2020 500
Project A 01-2021 1257
Project A 02-2021 125889
Project A 03-2021 102447
Project A 04-2021 1248
Project A 05-2021 1222
Project A 06-2021 856
Project B 01-2021 5589
Project B 02-2021 52874
Project B 03-2021 5698745
Project B 04-2021 2031487
Project B 05-2021 2359874
Project B 06-2021 25413
Project B 07-2021 2014
Project B 08-2021 2569

Using python, I want to create a Period column that replace the month value for a integer that represents the count of months of the project, like this: Where the line is the first month of the Project A (12-2020) the code should put 1 in the Period column, the second month (01-2021) is 2, the third (02-2021) is 3, etc. because I need to focus on the number of months that the projects of my dataframe had an expediture (month 1, month 2, month 3...)

Project Date Period Expenditure(USD)
Project A 12-2020 1 500
Project A 01-2021 2 1257
Project A 02-2021 3 125889
Project A 03-2021 4 102447
Project A 04-2021 5 1248
Project A 05-2021 6 1222
Project A 06-2021 7 856
Project B 01-2021 1 5589
Project B 02-2021 2 52874
Project B 03-2021 3 5698745
Project B 04-2021 4 2031487
Project B 05-2021 5 2359874
Project B 06-2021 6 25413
Project B 07-2021 7 2014
Project B 08-2021 8 2569

Topic project-planning data-science-model python-3.x scipy dataset

Category Data Science


The easiest thing is for you to calculate for each row:

  1. The start date of the corresponding project.
  2. The months since current date and start date of the project.

Below is a sample code that does that for you:

import pandas as pd
import numpy as np

df = pd.DataFrame(
    [
        ["Project A", "12-2020", 500],
        ["Project A", "01-2021", 1257],
        ["Project A", "02-2021", 125889],
        ["Project A", "03-2021", 102447],
        ["Project A", "04-2021", 1248],
        ["Project A", "05-2021", 1222],
        ["Project A", "06-2021", 856],
        ["Project B", "01-2021", 5589],
        ["Project B", "02-2021", 52874],
        ["Project B", "03-2021", 5698745],
        ["Project B", "04-2021", 2031487],
        ["Project B", "05-2021", 2359874],
        ["Project B", "06-2021", 25413],
        ["Project B", "07-2021", 2014],
        ["Project B", "08-2021", 2569],
    ],
    columns=["Project", "Date", "Expenditure(USD)"],
)

df["Date"] = pd.to_datetime(df["Date"], format="%m-%Y")  # Convert date column type

# get the start date of the project
# i.e find the lowest date of rows that have the same project as the current row
df["Project Start Date"] = df.apply(lambda row: min(df[df["Project"] == row["Project"]]["Date"]), axis=1)

# calculate the period
# i.e. the number of months of the current date since the start of the project + 1
df["Period"] = ((df["Date"] - df["Project Start Date"]) / np.timedelta64(1, "M") + 1).round().astype(int)

print(df)

It gives you the following:

      Project        Date  Expenditure(USD) Project Start Date  Period
0   Project A  2020-12-01               500         2020-12-01       1
1   Project A  2021-01-01              1257         2020-12-01       2
2   Project A  2021-02-01            125889         2020-12-01       3
3   Project A  2021-03-01            102447         2020-12-01       4
4   Project A  2021-04-01              1248         2020-12-01       5
5   Project A  2021-05-01              1222         2020-12-01       6
6   Project A  2021-06-01               856         2020-12-01       7
7   Project B  2021-01-01              5589         2021-01-01       1
8   Project B  2021-02-01             52874         2021-01-01       2
9   Project B  2021-03-01           5698745         2021-01-01       3
10  Project B  2021-04-01           2031487         2021-01-01       4
11  Project B  2021-05-01           2359874         2021-01-01       5
12  Project B  2021-06-01             25413         2021-01-01       6
13  Project B  2021-07-01              2014         2021-01-01       7
14  Project B  2021-08-01              2569         2021-01-01       8

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.