So if anyone is still reading this: the whole article is about OCR (which stands for Optical Character Recognition) by using a method called Naive Bayes (aka the probabilistic approach). Considering the amount of information about this subject I thought some code/example might come in handy.
// ps: if you’re still wondering, it’s the same Bayes from the math class.
About Naive Bayes
This is a method that is used to solve classification problems. Basically OCR can be classified as a classification problem (see what I did there? :D). Here I’m going to explain this in a generic manner and then add the OCR part.
The idea is this: given an input, the program should be able to match it with a known output by taking into account various properties and comparing them to the ones that we used for training.
Naive Bayes implies a probabilistic approach; so it outputs a bunch of probabilities that our object is, in fact, an object that it already knows.
It is all based on Bayes’ theorem which looks like this:
However, this theorem is supposed to work with independent variables, and we can’t always guarantee that. So, it is incorrect to expand the denominator like this:
You can try it but you may get probabilities higher than 1 for some cases.
To avoid this, it might be better to think this in terms of likelihoods and to use this property:
The equation above says that the likelihood is directly proportional to the probability itself, so we can consider only the likelihood when we do the classification (highest likelihood = highest chance).
You can later convert the likelihoods to probabilities between 0 and 1 - I’ll explain below.
Implementing a basic OCR
Moving on, it should be clear by now how our program should look:
- take a set of training images
- train using each image by taking the color of each pixel (where white color = ~P and black = P)
- for each unknown image: using its pixels compute the likelihood it resembles an image that was used during the training procedure
Where P(one|…) is the probability the image matches the one I used to train the classifier.
Computing the likelihoods
First, there’s going to be one big table containing both the data from the images and some additional sums that we’ll need in order to compute some probabilities - usually it’s called Frequency Table.
The table looks like this:
one | two | three | ... | ||
P1 | 0 or 1 | 0 or 1 | 0 or 1 | ... | |
P2 | 0 or 1 | 0 or 1 | 0 or 1 | ... | |
P3 | 0 or 1 | 0 or 1 | 0 or 1 | ... | |
... | ... | ... | ... | ... | |
sums_cols | sum_col#1 | sum_col#2 | sum_col#3 | ... |
Where table[P1][one]
is:
1
if the training image containing an 1 has the first pixel(P1) colored black (so the pixel contributes to drawing the digit)0
if the pixel is white
sum_col#1
means the sum of the first (current) column.
We need these in order to compute some probabilities while applying Bayes’ theorem.
Once all the sums are computed we can start determining likelihoods.
Now, we implement the Bayes’ theorem and apply the data from the table.
Intermission.
The formula, for an image with 3 pixels, should look like this:
Where:
P(one) = 1 / number_of_digits_we_are_training_for
P(P1|one) = table[P1][one] / sum_col#1
P(~P3|one) = 1 - table[P3][one] / sum_col#1
^ ehm…these are actually likelihoods, not probabilities. Don’t get confused.
Now the real deal - applying this for N_OF_IMAGES
images, each one containing IMG_SIZE
* IMG_SIZE
pixels.
// for each known image
for (int k = 0; k < N_OF_IMAGES; k++)
{
// determining P(one) / P(two) / P(three)
likelihoods[k] = 1 / (double)N_OF_IMAGES;
int crtPixel = 0;
for (int i = 0; i < IMG_SIZE; i++)
for (int j = 0; j < IMG_SIZE; j++)
{
// getting the current pixel's position in the table
crtPixel = i * IMG_SIZE + j;
// applying the formula
likelihoods[k] *= (testBmp.GetPixel(i, j).ToArgb() != Color.White.ToArgb() ? frequencies[crtPixel, k] / frequencies[N_OF_PIXELS, k] : (1 - frequencies[crtPixel, k] / frequencies[N_OF_PIXELS, k]));
}
// I use this to convert from likelihoods to probabilities (0-1)
totalLikelihood += likelihoods[k];
}
Short note:
You can get the probability for each image by doing: likelihoods[k] / totalLikelihood
.
Laplace smoothing
Given the fact that the frequency table may contain 0’s (not all pixels are black), we need to apply some smoothing to prevent working with null probabilities in our formula.
We add some “fake” inputs in the frequency table - kind of like a black background image for each digit. If it makes it easier think of it as adding 1 to the numerator and n to the denominator (n = number of pixels) when computing probabilities.
So…if in our frequency table there’s something like: table[P1][one] == 0
and sum_col#1 == 10
, then we’d get P(P1|one) = 0/10 = 0
Now, with Laplace smoothing: table[P1][one] == 1
and sum_col#1 == 10 + n
(let’s say n = 100 pixels, because the images I’m using are 10x10).
Notice that P(P1|one) = 1/(10+100) = 0.009
.
Results
Complete Sourcecode
The sourcecode that I used in this article; I guess there’s no need for additional documentation.
Final note: to me it seems to be working, but it’s the first time I’m coding stuff like this, so… it might contain mistakes.
using System;
using System.Drawing;
using System.IO;
namespace Bayes_Classifier
{
class Program
{
static void Main(string[] args)
{
string[] files = Directory.GetFiles(AppDomain.CurrentDomain.BaseDirectory + "/train/", "*.png");
int N_OF_IMAGES = files.Length;
int IMG_SIZE = 10; // 10x10 image
int N_OF_PIXELS = IMG_SIZE * IMG_SIZE;
double[,] table = new double[N_OF_PIXELS+1, N_OF_IMAGES];
// Laplace smoothing
for (int i = 0; i < N_OF_IMAGES; i++)
for (int j = 0; j < N_OF_PIXELS; j++)
table[j, i] = 1;
Bitmap bmp = null;
// copying data in the frequency table
for (int i = 0; i < N_OF_IMAGES; i++)
{
bmp = (Bitmap)Image.FromFile(files[i]);
for (int j = 0; j < IMG_SIZE; j++)
{
for (int k = 0; k < IMG_SIZE; k++)
if (bmp.GetPixel(j, k).ToArgb() != Color.White.ToArgb())
table[j * IMG_SIZE + k, i]++;
}
}
// -- computing the sums
for (int i = 0; i < N_OF_PIXELS; i++)
{
for (int j = 0; j < N_OF_IMAGES; j++)
table[N_OF_PIXELS, j] += table[i, j];
}
string[] testFiles = Directory.GetFiles(AppDomain.CurrentDomain.BaseDirectory + "/test/", "*.png");
for (int testFileIndex = 0; testFileIndex < testFiles.Length; testFileIndex++)
{
Console.WriteLine("File: " + testFiles[testFileIndex]);
Bitmap testBmp = (Bitmap)Image.FromFile(testFiles[testFileIndex]);
double[] likelihoods = new double[N_OF_IMAGES];
double totalLikelihood = 0;
for (int k = 0; k < N_OF_IMAGES; k++)
{
likelihoods[k] = 1 / (double)N_OF_IMAGES;
int crtPixel = 0;
for (int i = 0; i < IMG_SIZE; i++)
for (int j = 0; j < IMG_SIZE; j++)
{
crtPixel = i * IMG_SIZE + j;
likelihoods[k] *= (testBmp.GetPixel(i, j).ToArgb() != Color.White.ToArgb() ? table[crtPixel, k] / table[N_OF_PIXELS, k] : (1 - table[crtPixel, k] / table[N_OF_PIXELS, k]));
}
totalLikelihood += likelihoods[k];
}
for (int i = 0; i < N_OF_IMAGES; i++)
Console.WriteLine((i + 1) + " : " + likelihoods[i] / totalLikelihood);
Console.WriteLine();
}
Console.ReadLine();
}
}
}
No comments:
Post a Comment