Home

Introduction

This assignment was made for the Image Processing course. For this assignment, we wanted to create a practical algorithm that could be applied and be useful in a real-life situation. Inspiration came from FAIR2Media, an acquainted design studio which is working on a table with a beamer projecting images on top of it. In the current iteration, the projection is always the same map. Through interaction with the table, the projection would update, allowing for interaction with the software. Since hands are a natural way to interact, our goal is to develop an algorithm that could detect the position of hands above a projected table and could distinguish between pointing hands and spread hands. The assignment specified that it was not allowed to use external image processing libraries. This means all code to manipulate the image has been written by ourselves. Example of input and output image

Requirements

Because the beamer is fixed at a certain height above the table, there is not a lot of variation in size of hands, aside from the real-life size. However, this difference in scale is near negligible. For our collection of photos, there was no variation in light conditions. Many different photos of hands were taken, with variations in rotation in relation to the table, partial occlusions caused by the projection, arms with or without sleeves and hands either pointing or spread out. We also assume only one hand at a time can be detected. This resulted in the following context table:

Criterium Possible values Minimum/maximum size Any size, but automatically cropped to 438x264 Lightning variations Constant, caused by beamer Rotation variations Multiple different rotations Occlusion Often partially occluded by projection Other Arms with/without sleeves, hands pointing/spread

The main processing pipeline works as follows:

Preprocessing.
Extracting the hand or other largest detected object from the image.
Object recognition
Refinement

Preprocessing

The preprocessing pipeline to threshold and extract the hand from the image was custom built. Normal automatic threshold methods could not be used, due to the peak to be extracted from the histogram was very low. Most normal automatic thresholding algorithms are built to extract relatively large peaks.

Histogram with region of the hand

The hand exists within the red labelled region from about 70 to 90. This peak is too small to extract by normal automatic thresholding methods. Our algorithm consists of: 0. Crop the image to the edge of the table, convert to greyscale and auto contrast.

Threshold the greyscale image, starting at a threshold value of 70.
Apply an opening filter to get rid of any noisy pixels (3x3 filter).
Apply a region labeling algorithm, count the number of regions and return the bounding box of the largest region.
If the number of regions increased compared to the previous iteration, don’t update the bounding box.
Repeat steps 1-4 on the image derived from step 0 with an incrementing threshold value until the difference between the smallest number of regions and the current number of regions is larger than 3.
Return the thresholded opened cut-out of the bounding box of the largest labeled region.

Thresholded hand image at 80, 88 and 92.

Example of an image thresholded at 80, 88 and 92 respectively. 88 was found to be the optimal threshold, while 92 was the point where the algorithm stopped running because more than 3 new regions were introduced.

Object Recognition

This step posed more challenges than the thresholding algorithm. We considered Fourier transfer, but the arm is not always the same length and shape. Rotation poses an additional challenge. Thereby some hands are partially cut off during threshold due to white occlusion of the beamer. We chose to use corner detection and would try to detect the corners near the extended finger(s) to match with what would be expected values.

Steps for this first design:

Harris corner detection. To find the order to connect the points to compare them correctly, the next steps are performed.
Create the convex hull by performing the Gift-wrapping algorithm on these corners.
Find the convex defects by walking over the boundary trace of the original hand from corner to corner and selecting the point on the boundary that is the closest to the centroid of the corners.This posed harder than expected, because the centroid was often not in the middle of the palm at the hand. This is why even if there was a point close to the middle of the hand, this algorithm would sometimes give one of the convex hull corners instead. Therefore, sometimes a corner near the finger is not detected, which increases the angle of the finger. This is part of the reason why some hands do not yield a correct result. To get a better approximation of the centroid of the hand, the centroid is calculated again from the determined hull points. This gives a better approximation because the most points are found in the hand.
Now we walk over the list with alternating convex hull and defect points and compare the angles with a threshold angle. Currently everything between 8 and 40 degrees is considered a finger. With this, we can determine how many fingers are hold up.

Post-processing

The last step is to take the original input image and show the object in it. For this, we use the upper-left corner of the bounding box as a reference and put coloured pixels at the position of the convex hull and convexity defect points. We use green pixels for a pointing hand, yellow pixels for a spread hand and red pixels for an unidentified object. With this, the object recognition function is complete. The points for the convex hull are "stamped" on the original image, and the lines between points are drawn as well.

Reflexion

First of all the reason why some hands are not labelled correctly as hand or pointing hand, are the following:

Due to holes in the hand, sometimes corners are found in weird places. This can cause extra 'fingers' to be detected.
Thereby, sometimes corners are not detected with the convex defects. Therefore, sometimes no fingers are detected. These are the most common reasons for the current images not being handled correctly.

Of course, in the real world, other difficulties are present. The most lacking part of our function is that we assume that the biggest region described in the image is a hand. If something simple like a package is on the table while someone is pointing, the function will fail, simply because the package will describe a larger region than the hand, meaning that the object recognition part will only happen on the package. A way to fix this would be to apply the object recognition for all other regions beside the largest one as well, but we feel this couldn’t have been realized for this assignment. The function currently only works for the photos we took ourselves, because we hardcoded the removal of the unprojected table parts from the image. Next time, we would have to implement this step in the pre-processing pipeline so that images in a different context would be suitable for this function as well.

Some more examples

Provide feedback

Saved searches

Use saved searches to filter your results more quickly