L2. Vision

Setup and Data Acquisition

A 15" by 15" wooden board and a 5" by 5" wooden board both had a set of tennis balls drilled into their corners, so that they protruded from one of the square faces. The opposite end was placed vertically, resting on a white board against the wall in the REL.

Using a measuring tape distances of 2, 4, 8, and 16 feet were marked with tape. Since the target rested with its bottom on the ground, photos were taken with the camera near the ground. The camera was brought up off the ground a distance half the size of the target to take the picture. This was approximated with a ruler.
For each picture taken, the target was at the near center of the image, as to avoid inaccuracy of later calculations to determine the target's distance from the camera.

The camera used was a component of a 4th generation iPod touch. This camera has a focal length of 3.85mm and field of view of 43.47°.

Both boards were placed against the white background and pictures were taken at each mark. This provided a useful data set with which to perform image analysis.

Gathered Data:

Small Target 2 Feet
Big Target 2 Feet

Small Target 4 Feet
Big Target 4 Feet

Small Target 8 Feet
Big Target 8 Feet

Small Target 16 Feet
Big Target 16 Feet

Threshold Images:

The personal goal for this project was to develop the most simplistic approach towards an effective segmentation of the tennis balls. This approach was sought through at the thresholding stage. The first threshold requires a hue value between 58 and 84. This covers the yellowish-green hue of the ball very well. Most non-ball material that matches this hue-range is left out by the second and final threshold, which excludes saturation values under 110. The intent was to maintain applicability across a wide range of images, and to steer away from unnecessary complexity. The following output contains a binary image of each photo after passing through the threshold.

Threshold Small Target 2 Feet
Threshold Big Target 2 Feet

Threshold Small Target 4 Feet
Threshold Big Target 4 Feet

Threshold Small Target 8 Feet
Threshold Big Target 8 Feet

Threshold Small Target 16 Feet
Threshold Big Target 16 Feet

Initial Segmentation Images:

Again trying to pertain to the goal of simplicity, a raster scan is invoked, assigning id numbers to each segment. An additional sweep is needed to group neighboring id states (as shown in the vision ii power point.) The output image here shows all segments that passed through the thresholding phase. Please note this is not a final segmentation representation, as filtering is applied in the following step.

Segmented Small Target 2 Feet
Segmented Big Target 2 Feet

Segmented Small Target 4 Feet
Segmented Big Target 4 Feet

Segmented Small Target 8 Feet
Segmented Big Target 8 Feet

Segmented Small Target 16 Feet
Segmented Big Target 16 Feet

Filtered Images (Final Set):

The filter finalizes the image segmentation process. This process has also been simplified. Pixel groupings of 80 or smaller are set as equivalent to the background. Then, a radial sweep is done around each ball to eliminate gross outliers. This is done by evaluating the ball area in number of pixels, dividing by pi, and taking the root to get a radius of pixels to sweep about the centroid of that ball. This can most evidently be seen in the set of pictures depicting the small target at 4 feet.

Filtered Small Target 2 Feet
Filtered Big Target 2 Feet

Filtered Small Target 4 Feet
Filtered Big Target 4 Feet

Filtered Small Target 8 Feet
Filtered Big Target 8 Feet

Filtered Small Target 16 Feet
Filtered Big Target 16 Feet

Numerical Calculations:

Small Target 2ft:

Ball 1 Ball 2 Ball 3 Ball 4

Centroids: (204,360) | (216,658) | (475,369) | (477,652)

Number Pixels: (39478) | (37167) | (38457) | (35824)

Covariance: (1.4307) | (2.6138) | (3.3317) | (5.4772) (X 1.0e+09)

Small Target 4ft:

Ball 1 Ball 2 Ball 3 Ball 4

Centroids: (274,433) | (273,567) | (396,439) | (397,562)

Number Pixels: (06982) | (07284) | (07336) | (07222)

Covariance: (4.0189) | (5.5386) | (6.1879) | (7.6433) (X 1.0e+08)

Small Target 8ft:

Ball 1 Ball 2 Ball 3 Ball 4

Centroids: (280,478) | (279,541) | (337,481) | (337,540)

Number Pixels: (01493) | (01661) | (01501) | (01572)

Covariance: (0.9372) | (1.2142) | (1.1482) | (1.3940) (X 1.0e+08)

Small Target 16ft:

Ball 1 Ball 2 Ball 3 Ball 4

Centroids: (294,491) | (293,521) | (321,493) | (321,520)

Number Pixels: (00242) | (00312) | (00318) | (00314)

Covariance: (1.5448) | (2.2683) | (2.3094) | (2.4349) (X 1.0e+07)

Big Target 2ft:

Ball 1 Ball 2 Ball 3 Ball 4

Centroids: (032,117) | (036,898) | (706,132) | (702,866)

Number Pixels: (12839) | (14612) | (03734) | (05339)

Covariance: (0.0226) | (0.2275) | (0.1377) | (1.3231) (X 1.0e+09)

Big Target 4ft:

Ball 1 Ball 2 Ball 3 Ball 4

Centroids: (103,277) | (114,657) | (480,261) | (485,652)

Number Pixels: (06576) | (05651) | (06859) | (06428)

Covariance: (0.9155) | (2.0665) | (4.2245) | (9.9799) (X 1.0e+08)

Big Target 8ft:

Ball 1 Ball 2 Ball 3 Ball 4

Centroids: (171,420) | (174,607) | (354,416) | (357,605)

Number Pixels: (01411) | (01240) | (01244) | (01419)

Covariance: (0.4937) | (0.6294) | (0.8710) | (1.4850) (X 1.0e+08)

Big Target 16ft:

Ball 1 Ball 2 Ball 3 Ball 4

Centroids: (239,454) | (241,547) | (328,453) | (329,547)

Number Pixels: (00240) | (00190) | (00236) | (00172)

Covariance: (1.1852) | (1.0002) | (1.5903) | (1.3311) (X 1.0e+07)

Results:

Final Distance Calculation (with error)

The distance was also found through a brief calculation. Because the target was laid up against the back of the photo and the camera's field of view was known to be 43.47° with a resolution of (960,720), the following equation was set up. (see distance calculations)

d = l/(2tan(ϑ/2))

This equation provided at worst a 13.5% error. This was the outlier for what was otherwise found to have an average error of 2.32%. The results are as follows.

Small Target 2 ft:

1.725ft (13.75%)

Small Target 4 ft:

3.915fr (2.25%)

Small Target 8 ft:

8.191ft (2.38%)

Small Target 16 ft:

17.220fr (7.63%)

Big Target 2 ft:

1.987ft (0.65%)

Big Target 4 ft:

3.912fr (2.20%)

Big Target 8 ft:

8.004ft (0.05%)

Big Target 16 ft:

16.174fr (1.09%)

Error Graphs:

Small Target Error
Big Target Error

The error graphs below display different cases. For the small target, there is larger error for the closest and farthest target. The close error may have been caused by error in distance positioning of the camera. A small human error in for that parameter would have larger error implications up close. For the farther case, curiosity leads one to observe as to whether the distance calculation was over simplified, although it would be plausible for its accuracy to increase with distance. A possible source of error for the farthest picture is that more diversity creeps into the background, potentially calling the automatic brightness functions the camera has, which could introduce more error in my initial threshold. This would change the saturation value in particular.

For the larger target, all pictures report a very low error. This is surprising considering the lack of complexity of the distance calculation. This time the closest picture had a negligible error, which isn't actually valid. Only partial amounts of each ball are included in the picture, so the centroids area slightly displaced, closer than they would normally be. This makes the image look smaller, so there should actually be some error that the balls are too close. An explanation for that error would again be that any human error in camera placement is magnified the closer one is to the target. The largest error lies at 4 feet, where thumb may be obscuring the picture from effectively picking up hue and saturation values that could assist with the thresholding procedure. Again, the farthest picture has more introduced into its background which can affect these same hue and saturation values.

In the case of comparing the error between both types of targets, the smaller target images experienced more error overall. This is most likely because of the same phenomena which occurs with taking the picture at closer distances. Any error in the 5" spread balls will propagate in a larger magnitude than for the same magnitudinal error occurring over the 15" spread. A more spread out target over which to measure and calculate will allow for less precise measurements to get accurate results. consider the extreme case, if the balls were placed half an inch apart, any small error would have a huge effect on the distance output.

Other possible sources of error include:

Balls drilled in holes are loose, not rigid. When balls on bottom laid against ground, they were pushed up due to lack of rigidity. Pixel length between centroids was affected.

Inaccuracy in camera height and orientation. Human error. Affects the centering of the image, affecting assumptions made in distance calculation.

Brian Bittner, Carnegie Mellon University