Omnidirectional Vision for Mobile Robots

This page will contain materials and progress reports related to my diploma thesis. The task is to enable robot navigation on a checkerboard floor of the Eurobot contest and to detect specific colored objects (skittles) with an omnidirectional visual sensor.

06-02-07 Defended!

The thesis was defeneded and marked as excellent by the oponnent and the entire commitee.

05-12-16 Thesis Online

The master thesis is finished and published online! You can either read the individual chapters or download the entire thesis.

05-09-22 Relevant Papers

Started creating a summary of all the known Vision-Based Localization - Relevant Papers. Each paper has own wikipage with abstract, short annotation and linked citations.

05-04-15 Monte Carlo Localization

05-03-30 Input normalization

Definitevely sick and tired of the camera’s unpredictible white balancing, I decided to try prepending color normalization in front of the filter chain.

The first filter now processes the captured frame to reach uniform distribution of all 3 channels. RGB values are recomputed so that each channel has mean (average value) 65 and standard deviation (square root of divergence) equal to 50. With these values, the RGB cube optimally fills the YCrCb color space:

Histogram of the YCrCb transformed image

Thanks to the normalization, all the tracked colors are always present in the thresholded image (even in the camera startup phase :!:). This enables to track the White playing table borders (which used to be very fugitive).

Border peaks Directrices of the border

Due to different montage of the camera, whole playing field is visible all the time and even the Checkerboard Pattern in Eurobot 2005 boundaries are easier to track:

Boundary peaks Directrices of the boundaries

05-03-21 Boundary Transform Evaluation

The current algorithm detects the orientation and shift of the Checkerboard Pattern in Eurobot 2005 boundaries performing the following filter chain:

The input image is thresholded, browN-beigE boundary is detected

Input image Thresholded Boundaries

Then it is transformed to ground plane where it gets summed in 128 different directions

Transformed Peaks in summed image

and the peaks in the sum image denote orientation and shift of straight lines in the transformed image:


05-03-15 New Vision Framework

The vision testing application has been rewritten to support capturing into several so called filter chains, each of which consists of several image filters that perform some operation on the image going from the camera through the chain. The image below shows output of the new BoundaryFilter, which extracts browN / beigE boundary information from the thresholded image.

Screenshot of the BoundaryChain in action

The next step is to sum these boundaries in different directions to get the most probable orientation and shift of the Checkerboard Pattern in Eurobot 2005 pattern.

05-03-06 New Beacon Support

Dense transformation of the new camera image Finished the definitive montage of the new camera to the Beacon Support with Omnivisual Sensor.

The montage allows fine-tuning of the camera position and orientation resulting in an evidently more precise shape of the projected surface.

05-02-24 Mirror Projection

First try of dense transformation Example of sparse pixelwise transformation Applied the pixelwise transformation on the camera image to get a sparse matrix of transformed camera pixels. It is optimistic, that the computed transformation seems to be correct :-) On the other hand, proper camera montage is necessary to see the Checkerboard Pattern in Eurobot 2005 pattern as actual squares.

Also tried to make a pretty dense image using a naive interpolation algorithm (sparse pixels are interpolated on horizontal lines only). It looks bad because in the top and bottom parts, the pixels are very sparse… It would be necessary to find the nearest pixels in the euclidean metric.

Future work:

05-02-17 Mirror Geometry

A simple schematic of the H3G mirror - courtesy of An Excel chart showing some of the reflected rays Performed a few computations to get the formulas necessary to describe the reflection of a ray corresponding to an image pixel.

I finally assumed that the center of camera projection (first principial point) is located in the second focus F (not contained in the mirror) of the mirror surface. This results in that the prolongations of all the reflected rays intersect in the fist focal point E.

The intersection of a ray passing from the camera chip towards the mirror and the mirror surface lays at the coordinates [t, F(t)] where F is the mirror surface function and t can be computed as:



Or alternatively (when fixing )

05-02-01 Meeting

Todays meeting brought some specification to the corner detection algorithm (checkerboard processing).

  • Using browN/beigE boundary pixels to locate edges and corners.
  • Transformation of such pixels into plane using a predicted camera location.
  • If location is well predicted, boundary pixels make rows and columns → SUM them.
  • Fix rotation error either using:
    1. repetition of the same transform using different alpha,
    2. estimated alpha from the location of static beacons.

Yet another possible approach:

More notes:

  1. You almost never get a whole square segment (either cut or melt with neighbor).
  2. Use information of the white border - to stop processing squares?
  3. Add another segmentation color (split White to “pure White”/graY).

05-01-23 Thresholding Visualization

Part of the color space defined by the YUV thresholds All the 6 tracked colors displayed after thresholding Today I just created pieces of code to better understand the yesterday’s work :-)

The code performs all the 6 thresholding operations at once (as described earlier) and shows the image in the six representing colors. The fact that the different cones in YUV space overlap is not a big problem because all the 6 classes are maintained, i.e. when searching for browN and beigE, it doesn’t matter that some of the beigE is classified as White too.

From the right image it is obvious how the YUV space is oriented in the RGB one. The bigger window contains a UV plane for a fixed Y while the upper one contains result of the segmenation process applied on the UV plane.

The colors will definitely be played with but I think it can be dismissed until the final cameras arrive because I fear that the colors will be a bit different.

05-01-23 Color Classification

Todays testing was based on the CMVision idea:

  1. The image is transformed to the YUV (Brightness, Chromacity) color space where object color can be described as a cone with a wide span in the brightness dimension.
  2. The cone is described as an intersection of three subspaces described by three binary tables.
  3. If 32 such binary tables are stored in different bits of one int table, one can track at most 32 different colors at a time using just 2 binary and operators on every pixel.

beigE squares highlighted by an YUV thresholding The test code (color_segmentation.c) allows to calibrate the area in the YUV space interactively. Left mouse button adds the selected pixel to the set while right mouse button removes all the three associated coordinates. The visually selected region is then manually modified to form a cone. The selected areas will be described as (y, u, v, y_span, u_span, v_span) which allows to automatically adjust the span tolerances in reaction to ambient conditions.


  • Blue, Red, beigE and browN – perfect sharp recognition (although beigE and browN result in Red when blurred together)
  • Green and White – poor recognition (Green is very dark - dark Blue and blacK are often misclassified as Green; the chromacity of White stripe is different for Green and Red skittles and the resulting cone is very big)


  1. Mask out the camera, robot and other static stuff.
  2. Color segmentation – all colors at once.
  3. Detect and process floor segments (Blue, beigE, browN).
  4. Crop the skittle segments far beyond table.
  5. Process individual skittle segments (Red, Green) and find an adjacent stripe (White) if any.

Future work:

  • Extract color information to enable auto-calibration.
  • Color thresholding to segments (maybe using CMVision).

05-01-23 Pyramid Segmentation

Pyramid segmentation marking a red skittle Today I tested the OpenCV cvPyrSegmentation function. This function joins color thresholding with descending an image pyramid to create initially interconnected segments. Then it creates larger connected components from the initial ones using a second threshold.


  • Suppors both 1-channel and 3-channel images (involves the color perception weights into the distance function).
  • Returns a list of all the interconnected components – FPS dramatically decreases for a small threshold and thus large amount of components.
  • Resulting segment color as well as the size of the interconnected component varies a lot from frame to frame – segment color classification will be difficult.


  • ROI mask to speed up the process throwing away uninteresting pixels.
  • Good illumination, fine-tuned thresholds (maybe individual for different tasks – skittles/table)
  • Merging (or at least preserving) information accross frames.
  • Better color classification for joining segments.

Pyramid segmentation applied on the hue channel I tried to improve the results using cvCvtColor. The idea is that if the game elements are strongly differentiated in color, I can throw away the brightness and saturation information and process only the hue channel with much higher frame-rate.

In fact, the speed up is the only positive of this approach. Frame rate climbed to 4fps (from around 2.5fps for a 3-channel source) which still is not really impressive anyway.

Although the red skittles are pretty distinctive, the bad news is that the green skittle color (dark green-yellow) is very close to the floor color (beigE and brown become yellow and orange-yellow respectively). It is almost impossible to get any results from adjusting the thresholds.

Future work:

05-01-13 Localization Algorithm Outline

Meeting with Zbynek Winkler.
Created the outline of the localization alghoritm:

  1. Apply a ROI mask to the omnidirectional image.
  2. Perform the color-based segmentation:
  3. Recalculate the segment coordinates using a predefined lookup table.
  4. The centroids of floor segments now make a rectangular grid …
  5. … match the grid with a known map (using a predicted viewpoint).
  6. Update the lookup table to minimize the grid-matching error.
  7. Update the current viewpoint.

05-01-13 Skittle Detection Algorithm Outline

Based on the e-mail discussion transcripted below the skittle detection algorithm evolved to the following steps:

  1. Perform color-based segmentation
  2. A skittles consists of 2 or 3 individual segments:
    • Red/green body and white stripe (standing upside-down or laying with screws towards the camera)
    • Red/green head and body separated by a white stripe (standing upright or laying with head towards the camera)
  3. Identify the corresponding segments (need a heuristic to join neighboring segments - consider direction of principial body axis)
  4. Decide if the skittle is standing (consider direction from the camera - both panoramic and omnidirectional):
    • 2 segments - white stripe first, then the body (upside down)
    • 3 segments - larger (rectangular) body first, then white stripe, finally smaller (elliptical) head (on screws)


Mail Archive

Computer Vision Libraries

  • OpenCV (home) — Camera calibration, capture, color conversions etc. (C, Windows / Linux).
  • CMVision (home) — Realtime color segmentation (C++, Linux).
  • LTI-lib (home) — Corner detection etc. (C++, Windows / Linux).

Other Material

  omni/start.txt · Last modified: 2006/02/21 17:41
Recent changes RSS feed Driven by DokuWiki