Is there any way to analyze the format of text strings?
I have a lot of data which basically consists of alphanumeric text on individual lines which can very in length and contain delimiters.
Since there are many thousands of lines of text, I'm looking to see whether there is an automated way to determine the different formats of text.
A sample of which is:
90665013-163
90731046-103
90840069-009
90847069-009
90880046-103
90889046-103
90897-051
9089744-103
9089844-103
90901-46909
90901-lep
9091046-103
9091046-909
90764046-1037
can10043E
can90065-op016
9094344-103
90669j4-4438718
90666ie79
90664046-103
90710-077
004-919
4A1900935
can90064-op016
can90066-E016
9094544-103
9094646-103
4A1900597
4A1900588
4A9443198
4A94431
So, from this sample, we can see that there are several lines that are the format of 8 numbers, a dash, and then three numbers. Another format is 3 numbers, a dash, and then another 3 numbers.
I'm looking for a way to determine all of the unique formats available...
Not sure whether this is possible either by some code, an online web service, or perhaps a feature of some software but I'm asking on the off chance this actually possible.
Topic data-analysis data-formats
Category Data Science