How to fill missing enteries in column A, and add respective corresponding enteries to column-B with value of previous cell

I am facing an issue with an excel file. I have an excel sheet with 2 columns

Column A : Time Increment with per second

Column B : A particular value of a machine sensor

The problem i am facing is when the machine is stopped (not in motion), the depth increment stop for that particular time and do not make entries in the excel sheet, and once the machine start moving it again add the entries for the starting point. Let me explain in better way as below.

04:23:12    2709.082597
04:23:13    2708.747333
04:23:14    2708.442548
04:23:15    2708.229198
**04:23:16  2708.229198
04:24:01    2708.137763**
04:24:02    2707.924413

The machine is stopped at 04:23:16 and it resumed its activity at 04:24:01 with a gap of 44 seconds.So the excel sheet missed the time entries for those 44 seconds. There are more than 1000 entries like this in the file. What i want that Column A (Time) should be incremented from Cell A to Cell Z, without missing any entry of the second (even when the machine is stopped), and corresponding value at column B should be added to all 44 cells which should be the value of sensor corresponding to the time 04:23:16 (when the machine stopped) i.e 2708.229198.

Like for example the machine is stopped at 04:23:16, Sensor value at Column B is 2708.229198, i want that file should fill all the missing entries for the time between 04:23:16 to 04:24:01 (44 seconds), and for all those 44 seconds the value at Column B should be 2708.229198 (A value of column B corresponding to 04:23:16), and this can be done for the whole sheet. As i explained its a very lengthy file and to do the option manually is not suitable, that may take hours or even days, I want some practical solution to sort out this issue.

Any help or guidance will be appreciated in this regard.

Jawad

Topic excel bigdata

Category Data Science


If you can use Python and Pandas, this is really easy in like 2 lines of code. Copied some code from n1k31t4's example. resample() fills the missing datetime entries w/ NaN and bfill() fills the NaN w/ prior #'s.

>>> df = pd.DataFrame(index=pd.to_datetime(["04:23:12", "04:23:13", "04:23:14", "04:23:15", "04:23:16", "04:24:01", "04:24:02"]), data=[2709.082597, 2708.747333, 2708.442548, 2708.229198, 2708.229198, 2708.137763, 2707.924413])
>>> df
                               0
2020-08-23 04:23:12  2709.082597
2020-08-23 04:23:13  2708.747333
2020-08-23 04:23:14  2708.442548
2020-08-23 04:23:15  2708.229198
2020-08-23 04:23:16  2708.229198
2020-08-23 04:24:01  2708.137763
2020-08-23 04:24:02  2707.924413


>>> df.resample('1S').bfill()
                               0

2020-08-23 04:23:12  2709.082597
2020-08-23 04:23:13  2708.747333
2020-08-23 04:23:14  2708.442548
2020-08-23 04:23:15  2708.229198
2020-08-23 04:23:16  2708.229198
2020-08-23 04:23:17  2708.137763
2020-08-23 04:23:18  2708.137763
2020-08-23 04:23:19  2708.137763
2020-08-23 04:23:20  2708.137763
2020-08-23 04:23:21  2708.137763
2020-08-23 04:23:22  2708.137763
2020-08-23 04:23:23  2708.137763
2020-08-23 04:23:24  2708.137763
2020-08-23 04:23:25  2708.137763
2020-08-23 04:23:26  2708.137763
2020-08-23 04:23:27  2708.137763
2020-08-23 04:23:28  2708.137763
2020-08-23 04:23:29  2708.137763
2020-08-23 04:23:30  2708.137763
2020-08-23 04:23:31  2708.137763
2020-08-23 04:23:32  2708.137763
2020-08-23 04:23:33  2708.137763
2020-08-23 04:23:34  2708.137763
2020-08-23 04:23:35  2708.137763
2020-08-23 04:23:36  2708.137763
2020-08-23 04:23:37  2708.137763
2020-08-23 04:23:38  2708.137763
2020-08-23 04:23:39  2708.137763
...

There are two main steps involved to get where you want. First we need to get the timestamps that you are expecting (going up in seconds without gaps).

Scondly we need to assign your data onto those timestamps and fill any missing values with the previously recorded value.

This solution is using python. It might be a good exercise to start with if you are new to Python and data frames, which shows the power of programming compared to spreadsheets.

Basic setup

import pandas as pd
import numpy as np

This just create the example dataframe you privded. The times will be used to make a date-time index, and the values will be out "data" - our only column.

times = ["04:23:12", "04:23:13", "04:23:14", "04:23:15", "04:23:16", "04:24:01", "04:24:02"]
values = [2709.082597, 2708.747333, 2708.442548, 2708.229198, 2708.229198, 2708.137763, 2707.924413]

We create a proper date-time Series to use as our index

index = pd.to_datetime(times)

Throw it into a dataframe

df = pd.DataFrame(index=index, data=values, columns=['data'])
df.head()
                        data    
2018-09-27 04:23:12  2709.082597
2018-09-27 04:23:13  2708.747333
2018-09-27 04:23:14  2708.442548
2018-09-27 04:23:15  2708.229198
2018-09-27 04:23:16  2708.229198

Part one

Now I will create a second empty dataframe, that will however have the desired timestamps in the index i.e. it will go up in seconds without any gaps.

We can use the first and last timestamps that we recorded above (or anythin else you need)

new_start = df.index[0]
new_end = df.index[-1]

We can specify the frequency as seconds, using the parameter freq='s' as shown:

new_index = pd.date_range(new_start, new_end, freq='s')

Now we create the empty dateframe with the desired index:

new_df = pd.DataFrame(index=new_index)

Part two

Now we combine the two dataframe (the one with your data and the one with the target index), and finally we fill the missing values.

We merge the two dataframes based on their indexes. Using the method="outer" means that we keep all values in both indexes, so no timestamps would be lost. In your case we just basically end up with new_index again, as it was already perfect.

combined_df = pd.merge(df, new_df, method="outer", left_index=True, right_index=True)

The last step is very simple: fill missing values with the last recorded value. Here ffill means forward fill, which does exactly what you want:

final_df = combined_df.fillna(method='ffill')

final_df
                        data    
2018-09-27 04:23:12  2709.082597
2018-09-27 04:23:13  2708.747333
2018-09-27 04:23:14  2708.442548
2018-09-27 04:23:15  2708.229198
2018-09-27 04:23:16  2708.229198
2018-09-27 04:23:17  2708.229198
2018-09-27 04:23:18  2708.229198
...
...
2018-09-27 04:23:58  2708.229198
2018-09-27 04:23:59  2708.229198
2018-09-27 04:24:00  2708.229198
2018-09-27 04:24:01  2708.137763
2018-09-27 04:24:02  2707.924413

This solution obviously only works for processing such static data, but a better solution might be to implement a short loop during the recording process to simply repeat the last recording if no new one is given, ensuring a datapoint is recorded every second.


try to make a sheet derived from the source. first, put all possible time in the new sheet in an ordered way. second, do vlookup from source for value. third, put vlookup inside a iferror to find missing value and for second part of iferror use previouse value.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.