How to create a system to detect text structure of a file?
Let's say I want to create a Machine Learning system that has a lot of log files of some few types (F1, F2,.. Fn)
and I get a new Log file with maybe some errors or missing data.
How do I classify it into these class types or classify it is an anomaly if it doesn't belong to anyone of them.
I thought about anomaly detection
but couldn't figure how to parse structure information from the text classes like (F1, F2... .etc).
Also what kind of structural information to extract from text files?
These input classes contain 100 - 1000 lines of code per document of each class type.
I looked into Linting
or DeepCode
...
A sample log file looks like this:
11-02-11 16:47:35,985 +0000 E Activity class {com.trackingeng/LandingActivity} does not exist.
12-02-11 17:47:35,985 +0000 I Starting: Intent { act=android.intent.action.MAIN
.....
A log file may have stack trace like this also:
Error:
Error detail 1
Error detail 2
....
Non-Error:
.....
Warning:
.....
and similar to this.
Any help in which direction to look for is greatly appreciated.
Topic data error-handling machine-learning
Category Data Science