What are some method for pre-processing data in OCR?

I have a dataset for a supervised learning task.

Each row is a vector with a value of pixmap value in a range [0,255] of gray colormap, each vector is labeled with a character. I have to assign each vector with a character.

My Question:

What are some methods that I can try to pre-process the data to gain better accuracy?

Topic supervised-learning preprocessing methods

Category Data Science


An interesting one is to slice the image using energy-measurements. The idea is to separate letters. You subdivide the image by having lines "move" across it.

Every line is like a "lightning strike" in the sense that it follows the path of least resistance. There's a general direction, for instance top to bottom. You draw it by choosing moving a point on the top of the image, and you go down. Every step downward it either goes straight down, or one pixel down and one to the left, or one pixel down and one to the right. It picks the step with the least energy, which means minimum color change. You can use straight euclidean distance, and you'll be amazed how well this deals with things like gradient backgrounds, but it works even better to use information metrics. Another slight improvement can be had by adding a small cost to horizontal offsets, so the lines go more-or-less, but not quite, straight down.

You keep subdividing the image, and then subdividing the subdivisions until you have to accept quite big jumps in energy, while enforcing a minimum subdivision size. This should "mostly" give you individual letters and those are easy to OCR.


It could be advantageous to set the background level to 0.

If your background is a constant value (a), the best thing to do first is to subtract this value. This way the value won't be transmitted through the network (or whatever you are using). Secondly, it is best to normalize your data.

And lastly, search the web since "preprocessing data" will probably give you all the information you need.

Learn how to search the web ;)


You can do image tresholding, normalize the data first by dividing each elements of the vector by 255, then do thresholding, for each $x_i \in \mathbb{x}$ $x_i=1$ if $threshold\le x_i$ where $\mathbb{x}$ is your data vector, and $x_i$ is the element of vector, you can choose the threshold based on experiment or personaly I choose 0.5. By doing image thresholding we reduce the noise of the image, hence its increase the accuracy

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.