How to extract and classify data from a column in excel?
I have a column in an Excel sheet that contains a lot of data separated by ||
delimiters. The data can be classified to some classes like Entity, IFSC codes, transaction reference id, etc.
A single cell looks like this:
EFT INCOMING||0141201||NHFI0141201||UTR||SBIN118121948660 M S||some-name ||some-purpose||TRN REF NO:a1b2c3d4e5
Not every cell has the same number of classes or even the same type of classes. Another example:
COMM/CHARGES/FEES||CHECK/REF.6546644473||BILPAY CCTY BEARING C||00.00||00012||18031358||BLPY||TRN REF NO:a1b2c3d4e5
I tried extracting this information using regular expressions and am able to get a list of ref-ids or IFSC codes extracted as a single list. But I need to break a cell to multiple cells with individual information. If some cell does not has that class data, it shall remain blank.
I also tried using named entity recognition but the same problem arises, I get the list of entities as output, not the breakdown.
Please help me in identifying what kind of problem this is? A text classification? And what would be the approach to solve it?
Topic text preprocessing named-entity-recognition classification python
Category Data Science