EE 4367 / EE5367: Image Processing
Instructor: Dr. Sari-Sarraf
Authors: Jason Plumb, Cody Singleton, Chris Turner
This paper describes the design and implementation of a computer program that uses image processing techniques to determine the number of dots on a single face of a 6-sided game die. An image, in portable gray map (PGM) format, is presented to the program as input. After analysis, the program plays a sound to indicate which die roll the image represents. Images are preprocessed with threshholding to generate a binary image before actual analysis is performed. In this implementation, image analysis involves the creation of blob regions comprised of 8-connected components. Blob regions are counted and then sorted by area so that the differential blob area can be used in the final dot count determination. The algorithms used are rather insensitive to image scale and image rotation, and a brief analysis is presented to support this hypothesis.
The purpose of this project is to implement an image processing algorithm using 8-connected component analysis in order to distinguish images of each side of a six-sided die. The algorithm must be able to recognize which side of the die (1, 2, 3, 4, 5, or 6) is showing regardless of changes in scale or rotation, and it must produce an audio output indicating the number that appears in the image.
This project was implemented using Microsoft Visual C++ 6.0. There are four main C++ files:
Each of these files has a corresponding header file. The project was compiled and built as a Win32 console application to create dice.exe. Execution is performed by typing the name of the application at a command prompt followed by the input filename including extension.
dice[.exe] <image filename>
The current status of the program is output to the screen as the image is being processed. When the processing is complete, the result will be output to the screen as well as the sound card.
There are three main aspects to this project that deserve special attention: preprocessing, connected component analysis, and decision-making. File I/O, output to the screen, and audio output are somewhat trivial portions of the program and will not be discussed here.
The connected component analysis requires a binary image to be passed to it, so the first task is to convert the input image from 8-bit gray scale to binary black and white. This is accomplished by passing the pixel array through a very simple contrasting algorithm. The contrasting function is passed integer as one of its arguments that acts as a pixel bias. The bias ranges between plus and minus one half the maximum gray value of the image.
As the image passes through the algorithm, the bias is added to each pixel value and compared to MaxGrayVal/2. If it is greater than or less than the MaxGrayVal/2 it will be set to the MaxGrayVal or 0, respectively, thus creating a binary image. Under the current implementation the application uses the maximum bias. The implementation of this contrasting algorithm is by no means intended to be viewed a “real” contrasting algorithm, but it suited the purposes of this project adequately.
Once preprocessing is complete, the binary image gets sent to the second stage of the application: 8-connected component analysis. The first function to process the image in this particular stage is doBlob(). The primary functions of doBlob() are to use connected component analysis to create a label matrix from the binary pixel array and to create the B matrix. The label matrix, usually designated pBlobMatrix in the code, is the same size as the pixel array and contains the labels of the corresponding pixels. Eight-connected component analysis is much like the 4-connected version described in the textbook except that it must also check the top two diagonals. For the first pass through the image, the algorithm checks the neighbors in this order: top center, left center, top left, then top right. If any of these is labeled, it assigns the first encountered label to the current pixel. Else, it assigns a new label to the current pixel (if it’s part of an object). If it encounters a neighbor that is already labeled, along with labeling the current pixel, it will check the other neighbors and log the appropriate equivalencies in the B matrix. The labels range from 1 to 254. This is the range of values of an unsigned character with 0 being reserved for “not-a-label”.
The final part of connected component analysis is the resolution of equivalencies, which is performed in resolvEquiv(). The B matrix is first transformed into the B+ matrix by the function bToBPlus(). This is done exactly as described by Warshall’s algorithm outlined in the textbook. Using the B+ matrix, resolvEquiv() then proceeds to create a mapping of each label to its earliest used equivalent label. During this process it also keeps track of which labels are still used since some labels will drop out. Using this, it can realign the mapping to order the objects in numerical order. For example, without realigning the mapping after equivalency resolution, the objects would probably be labeled something like 1, 3, 6, 8, 9. Using the realignment method described here they would be labeled 1, 2, 3, 4, 5. This makes it easier to index each object after the image leaves the connected component analysis stage. This equivalency mapping is finally applied to the label matrix, pBlobMatrix, to complete the second pass through the image.
The final step in the system is the decision-making. The primary property used in this step is the area of the objects. This area is calculated using the basic formula of length * width. The dimensions length and width are based on the overall column size and overall row size of an object. For example, length is found by subtracting the column number of the object’s rightmost pixel from the column number of its leftmost pixel. Height is found similarly by subtracting the lowest row from the highest row. This method allows for the differentiation of the background or the die face from the dots. Even if the background takes up the same amount of pixels as the dots, its height and/or width would be much greater thus making its area calculation obviously larger.
Up to this point, the program, as it is currently implemented, has actually processed two versions of the image—i.e. there are two “blob” matrices. The first represents the original image, and the second represents an inverted copy. Both the original and the inverted image have passed through all the processes described above. The purpose of using both versions is to get a count of all objects in the image—black or white. Originally it was thought keeping track of all the objects would help alleviate some problems with speckles in the background, but, as it turns out, this was not a problem.
The rest of this portion of the project is relatively simple. The first task of the decision algorithm is to record the areas of the objects in both images. This is performed for each object by calling an ImageOp function called getBlobArea(). The areas are calculated as described above and are all recorded into a 1-D array. The objects are then sorted from largest to smallest. It is assumed that the first two objects in the sorted array are the background and the die face (in either order). Knowing this, the next object in the array must be a dot. The dots are counted until a 30% or more decrease is encountered in the sorted array of objects going from one to the next. An audio file is then played based on the number of dots counted to conclude the program.
The dice program performs a number of crucial processing algorithms that allow it to correct discrepancies in the images. Some of the correctable discrepancies include poor contrast, rotation, and erroneous dots. On the contrary like most programs there are a few irregularities that cannot be resolved. Both of these cases are discussed to the highest degree in this performance analysis.
A trial and error technique was used to comprehensively test the performance of the dice program. Many crude test images were prepared in Paint Shop Pro to imitate the various die images that the program could encounter. Pass and fail were the two possible outcomes of these image tests. If the program failed the test the code was modified to accommodate the image.
Notice in Figure 1 that the edges between the die and the background are very ragged and not precisely formed and the dot gradually fades from black to white. Even though the image is not sharp the program does not have any trouble detecting the background and dot. The dice program will identify this die as a one.
Figure 1. Rough Hand Drawing of Die Face With 1 Dot
Dr. Sari-Sarraf provided the image in Figure 2 as well as five other images corresponding to the other five sides of the die. Each of the six images are identical with the exception of the number dots on the face of the dice. These images composed a set of the highest quality images used to test the dice program. The only flaw of interest in these images was the slightly skewed contrast. The dice program effectively resolved this flaw by converting the image to binary. The outcome of these test images concluded that each of these images could be correctly processed.
Figure 2. Original Image Showing Die Face with 2 Dots
The image in figure 3 was used to test the program's behavior when an input image did not conform to the original requirement specifications. As one can clearly see from the image, there is not a clear dark background surrounding the light die face. Because this image does not meet the input image requirements, the output is technically undefined.
Figure 3 will not be identified as a five as you might think. The dice program will identify this die as a one. How can this be identified as a one when there are five dots located on the face of the die? The dice program cannot determine that the background is cut in half by the die. The program determines that the bottom portion of the background is the background and the top portion of the background is the first dot. The area of each individual dot on the face of the die is thirty percent less then the top portion of the background. Therefore the program disregards the dots on the face of the die and assumes the only dot in the image is the top portion of the background. The number of dots on the face of the die does not make a difference in this situation.
Figure 3. Hand Drawn Image That Does Not Meet Image Requirements
The die in Figure 4 is rotated counterclockwise 45 degrees. The dice program does not incur any problems when determining the number dots on this die. The only problem encountered during rotation was related to the preceding topic. If the die is rotated in such a way that two corners of the die touch edges of the image simultaneously the program again determines that the die is a one. As of now to determine the correct number of dots on the die the image must contain a background that is not disturbed by the presence of the die face.
Figure 4. Image of Die Face With 6 Dots, Rotated 45 Degrees
An interesting question that had to be answered while coding our program was, how do we deal with poorly contrasted images? The answer to this question is revealed in Figures 5 and 6. Due to poor contrast it is more difficult to differentiate between the die face and background in Figure 5. Without preprocessing it would be incredibly hard to successfully run blob analysis on the image in Figure 5. The preprocessing algorithm improves the contrast of the image by converting it to binary. The algorithm divides the maximum number of gray levels in half then assigns a zero to every pixel value below that number and a one to every pixel above that value. As you can see in Figure 6, the result of preprocessing the image in Figure 5, the algorithm is very successful. The program can determine that the die in Figure 5 is a six.
Figure 5. Image of Die Face With Contrast Lowered
Figure 6. Binary Image Created After Preprocessing the Image in Figure 5 (above)
It should be noted, however, that the threshholding technique described above does have its limits. If the contrast of the input image is too poor, or if the image is too dark or too bright, the resulting binary image will be solid black or solid white.
The die in Figure 7 has been rotated clockwise 45 degrees and enlarged. We have already stated that the rotation of the object doesn’t have any affect on the outcome of the dice program. If you suspected that the program would determine that this die is a six, you would be right. So why would we include this figure in the analysis? The size of the image must be considered when running this program. The program does not have an infinite number of labels to use in blob analysis. If the image is enlarged and noisy there is a chance that the labels will be used up before the equivalencies are resolved. In this case the program will indicate that the die has fewer dots than is actually true. How large is too large? This question can only be answered for an individual image. A tremendously noisy image obviously cannot be enlarged to the same size as a sharp image.
Figure 7. Image of Die Face With 6 Dots, Rotated and Scaled to 492 x 500
The image in Figure 8 consists of two large dots and two small erroneous dots. The size of the two small dots was exaggerated to effectively test the programs erroneous dot algorithm. In order to differentiate between real dots and false dots the dice program compares the area of each separate dot. The dots must be within thirty percent of the largest dot to be included in the dot set. In this case the program will deduct that this die is a two. The two erroneous dots do not meet the area requirements.
Figure 8. Image of Die Face With 2 Non-Dot Blob Regions
The image processing techniques used in this project are successful at detecting the number of dots on a die face with a high degree of accuracy. The algorithms used are relatively insensitive to image rotation and image scale. There are several nuances of the software's behavior, however, that should be discussed in more detail.
First, the issue of border requirements must be addressed. The software implemented here assumes that the input image will contain a distinct die face that does not touch more than one border of the image. If the die face does touch more than one border or if the image does not contain a border region, the differential dot area algorithm will fail and the output will be inaccurate. Additionally, the border must not contain any connected components that are the same size as the dot regions. If this situation occurs, the final dot count will also be inaccurate. This is due to the fact that the software does not check to ensure that the dot regions are contained within the die face. Additional code could be written to perform this check and make the program less prone to erroneous output.
Next is the issue of dot sizing and how it relates to the differential dot area algorithm. If the input image contains dot regions that have an area 30% larger or smaller than other dot regions, the smaller regions will be discarded. In some cases, this can give the appearance of incorrect output. For example, consider an image of a die that contains one "normal" dot with 4 additional dots 31% smaller in area. The software would count the largest, normal dot and disregard the remaining 4 as noise or speckle. To the human eye, however, this die appears to contain 5 dots and should be considered a 5. Increasing the hard-coded differential area threshhold above 30% in software could lessen this effect.
Additionally, the preprocessing method used to convert from grayscale to binary is prone to error. Because the threshhold value is computed simply by taking the maximum gray value and dividing by two, images that are too dark, too light, or with poor contrast may result in incorect output. The software could be modified to use more sophisticated techniques to determine a better threshhold value.
With respect to image size, the program does have one significant limitation. Internally, the program can only handle a maximum of 253 equivalency classes when resolving connected component equivalencies. Although this number is sufficient for all the images tested, it is possible that very large or noisy images will reach the maximum number and cause incorrect output. Increasing the size of the data type used to hold class labels would solve this problem.
Overall, the software developers have concluded that the program works well within its requirements. If the software were to be used in an industrial application, however, it is recommended that additional functionality be added.
The program source code can be viewed by clicking on one of the following links:
dice.h Main program header file dice.cpp Main program implementation file pgmfile.h Wrapper class for dealing with PGM files (class declaration) pgmfile.cpp Wrapper class for dealing with PGM files (class implementation) pgmfileOperations.h Wrapper class for modifying PGM images (class declaration) pgmfileOperations.cpp Wrapper class for modifying PGM images (class implementation) ImageOps.h Wrapper class for performing image operations, including blob analysis (connected component analysis), equivalence class resolution, blob height, blob width, and blob area. (class declaration) ImageOps.cpp Wrapper class for performing image operations, including blob analysis (connected component analysis), equivalence class resolution, blob height, blob width, and blob area. (class implementation)
The Win32 console application can be downloaded by clicking the following link: dice.exe