Natural Language text categorization using RapidMiner
I'm new to data mining, so this might sound like a very simple task to some.
I work in reliability engineering in aviation and have a set of data that is generated on a daily basis regarding system failures and failure rectification. This data is categorized using numerical tags of maintenance manual tasks (reference data) by chapter, section, and paragraph. However, since the data is entered manually by people, sometimes, the wrong chapter/section tags are entered and would require being manually checked to ensure the data's validity.
The failure/resolution data is available in table format (CSV
, Excel
...) and I also have the maintenance manual data keywords including their chapters/sections in table format.
My question is, is it possible using RapidMiner to take these tables, cross-check some keywords in the text (failure/rectification) and compare them with the reference data, and output it with the proper reference tags (of the chapters, sections...), taking into account spelling errors, acronyms, and abbreviations. Or, is there a program/application that is more specialized than RapidMiner to do these functions?
Example: Failure on system X was rectified and logged on the database. System X is under chapter 4 section 33; however, when entering the data, the person put it under chapter 3 section 44. I have a document that has the reference for system X being under Chapter 4 Section 33. Is it possible for RapidMiner to check the text in the failure text and the rectification text and cross-check it with a predefined list where system X is under Chapter 4 Section 33 and give me the output with the right Chapter/Section, taking into account spelling mistakes and the fact that some people write abbreviations and acronyms differently (example, I.B.M
/IBM
/I.B.M
.)
Topic rapidminer classification data-cleaning data-mining
Category Data Science