Digital Filters
Digital Filters
Digital Filters
Digital Filters
Introduction
Mean Filter
Median Filter
Gaussian Smoothing
Conservative Smoothing
Crimmins Speckle Removal
Frequency Filters
Laplacian/Laplacian of Gaussian Filter
Unsharp Filter
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
http://www.cee.hw.ac.uk/hipr/html/filtops.html12/04/2007 12:12:21
HIPR - Hypermedia Image Processing Reference
User Guide
Welcome to HIPR!
What is HIPR?
Guide to Contents
How to Use HIPR
Advanced Topics
Local Information
Image Transforms
Image Synthesis
Appendices
A to Z of Common Image Processing Concepts
The Image Library
Common Software Implementations
HIPRscript Reference Manual
Bibliography
Acknowledgements
The HIPR Copyright
Main Index
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
Morphology
Introduction
Dilation
Erosion
Opening
Closing
Hit and Miss Transform
Thinning
Thickening
Skeletonization/Medial Axis Transform
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
http://www.cee.hw.ac.uk/hipr/html/morops.html12/04/2007 12:12:23
Feature Detectors
Feature Detectors
Introduction
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
http://www.cee.hw.ac.uk/hipr/html/featops.html12/04/2007 12:12:24
Digital Filters - Introduction
Introduction
In image processing filters are mainly used to suppress either the high frequencies in the image, i.e.
smoothing the image, or the low frequencies, i.e. enhancing or detecting edges in the image.
The first involves transforming the image into the frequency domain, multiplying it with the frequency
filter function and re-transforming the result into the spatial domain. The filter function is shaped so as
to attenuate some frequencies and enhance others. For example, a simple lowpass function is 1 for
frequencies smaller than the cut-off frequency and 0 for all others.
The corresponding process in the real domain is to convolve the input image f(i,j) with the filter function
h(i,j). This can be written as
The mathematical operation is identical to the multiplication in the frequency space, but the results of
the digital implementations vary, since we have to approximate the filter function with a discrete and
finite mask (kernel).
The discrete convolution can be defined as a `shift and multiply' operation, where we shift the kernel
over the image and multiply its value with the corresponding pixel values of the image. For a squared
kernel with size M× M, we can calculate the output image with the following formula:
Various standard kernels exist for specific applications, where the size and the form of the mask
determine the characteristics of the operation. The most important of them are discussed in this chapter.
The masks for two examples, the mean and the Laplacian operator, can be seen in Figure 1.
Figure 1 Convolution kernel for a mean filter and one form of the discrete Laplacian.
In contrast to the frequency domain, it is possible to implement non-linear filters in the real domain. In
this case, the summations in the convolution function is replaced with some kind of non-linear operator:
For most non-linear filters the elements of h(i,j) are all 1. A very commonly used non-linear operator is
the median, which returns the `middle' of the input values.
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
Mean Filter
Common Names: Mean filtering, Smoothing, Averaging, Box filtering
Brief Description
Mean filtering is a simple, intuitive and easy to implement method of smoothing images, i.e. reducing
the amount of intensity variation between one pixel and the next. It is often used to reduce noise in
images.
How It Works
The idea of mean filtering is simply to replace each pixel value in an image with the mean (`average')
value of its neighbours, including itself. This has the effect of eliminating pixel values which are
unrepresentative of their surroundings. Mean filtering is usually thought of as a convolution filter. Like
other convolutions it is based around a kernel, which represents the shape and size of the neighbourhood
to be sampled when calculating the mean. Often a 3×3 square kernel is used, as shown in Figure 1,
although larger kernels (e.g. 5×5 squares) can be used for more severe smoothing. (Note that a small
kernel can be applied more than once in order to produce a similar - but not identical - effect as a single
pass with a large kernel.)
Computing the straightforward convolution of an image with this kernel carries out the mean filtering
process.
shows this image corrupted by Gaussian noise with a mean of zero and a standard deviation (SD)
of 8.
shows the effect of applying a 3×3 mean filter. Note that the noise is less apparent, but the image
has been `softened'. If we increase the size of the mean filter to 5×5, we obtain an image with less noise
shows the same image more severely corrupted by Gaussian noise with a mean of zero and a SD
of 13.
provides an even more challenging task. It shows an image containing `salt and pepper' shot noise.
shows the effect of smoothing this image with a 3×3 mean filter. Since the shot noise pixel values
are often very different from the surrounding values, they tend to distort the pixel average calculated by
the mean filter significantly.
Using a 5×5 filter instead gives . This result is not a significant improvement noise reduction and,
furthermore, the image is bow very blurred.
These examples illustrate the two main problems with mean filtering, which are:
● A single pixel with a very unrepresentative value can significantly affect the mean value of all the
pixels in its neighbourhood.
● When the filter neighbourhood straddles an edge, the filter will interpolate new values for pixels
on the edge and so will blur that edge. This may be a problem if sharp edges are required in the
output.
Both of these problems are tackled by the median filter. The median filter is often a better filter for
reducing noise than the mean filter, but it takes longer to compute.
In general the mean filter acts as a lowpass frequency filter and, therefore, reduces the spatial intensity
derivatives present in the image. We have already seen this effect as a `softening' of the facial features in
the above example. Now consider the image which depicts a scene containing a wider range of
different spatial frequencies. After smoothing once with a 3×3 mean filter we obtain . Notice that
the low spatial frequency information in the background has not been effected significantly by filtering,
but the (once crisp) edges of the foreground subject have been appreciably smoothed. After filtering
with a 7×7 filter, we obtain an even more dramatic illustration of this phenomena . Compare this
result to that obtained by passing a 3×3 filter over the original image three times .
Common Variants
Variations on the mean smoothing filter discussed here include Threshold Averaging wherein smoothing
is applied subject to the condition that the center pixel value is changed only if the difference between its
original value and the average value is greater than a preset threshold. This has the effect that noise is
smoothed with a less dramatic loss in image detail.
Other convolution filters that do not calculate the mean of a neighbourhood are also often used for
smoothing. One of the most common of these is the Gaussian smoothing filter.
Exercises
1. The mean filter is computed using a convolution. Can you think of any ways in which the special
properties of the mean filter kernel can be used to speed up the convolution? What is the
computational complexity of this faster convolution?
2. Use an edge detector on the image and note the strength of the output. Then apply a 3×3
mean filter to the original image and run the edge detector again. Comment on the difference.
What happens if a 5×5 or a 7×7 filter is used?
3. Applying a 3×3 mean filter twice does not produce quite the same result as applying a 5×5 mean
filter once. However, a 5×5 convolution mask can be constructed which is equivalent. What does
this mask look like?
4. Create a 7×7 convolution mask which has an equivalent effect to three passes with a 3×3 mean
filter.
5. How do you think the mean filter would cope with Gaussian noise which was not symmetric
about zero? Try some examples.
References
R. Boyle and R. Thomas Computer Vision: A First Course, Blackwell Scientific Publications, 1988, pp
32 - 34.
E. Davies Machine Vision: Theory, Algorithms and Practicalities, Academic Press, 1990, Chap 3.
Local Information
General advice about the local HIPR installation is available here
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
Median Filter
Common Names: Median filtering, Rank filtering
Brief Description
The median filter is normally used to reduce noise in an image, somewhat like the mean filter. However, it
often does a better job than the mean filter of preserving useful detail in the image.
How It Works
Like the mean filter, the median filter considers each pixel in the image in turn and looks at its nearby
neighbours to decide whether or not it is representative of its surroundings. Instead of simply replacing the
pixel value with the mean of neighbouring pixel values, it replaces it with the median of those values. The
median is calculated by first sorting all the pixel values from the surrounding neighbourhood into numerical
order and then replacing the pixel being considered with the middle pixel value. (If the neighbourhood
under consideration contains an even number of pixels, the average of the two middle pixel values is used.)
Figure 1 illustrates an example calculation.
Figure 1 Calculating the median value of a pixel neighbourhood. As can be seen the central
pixel value of 150 is rather unrepresentative of the surrounding pixels and is replaced with the
median value: 124. A 3×3 square neighbourhood is used here --- larger neighbourhoods will
produce more severe smoothing.
● The median is a more robust average than the mean and so a single very unrepresentative pixel in a
neighbourhood will not affect the median value significantly.
● Since the median value must actually be the value of one of the pixels in the neighbourhood, the
median filter does not create new unrealistic pixel values when the filter straddles an edge. For this
reason the median filter is much better at preserving sharp edges than the mean filter.
shows an image that has been corrupted by Gaussian noise with mean 0 and standard deviation (SD)
8. The original image is for comparison. Applying a 3×3 median filter produces . Note how the
noise has been reduced at the expense of a slight degradation in image quality. shows the same image
with even more noise added (Gaussian noise with mean 0 and SD 13), and is the result of 3×3 median
filtering. The median filter is sometimes not as subjectively good at dealing with large amounts of Gaussian
noise as the mean filter.
Where median filtering really comes into its own is when the noise produces extreme `outlier' pixel values,
as for instance in which has been corrupted with `salt and pepper' noise - i.e. bits have been flipped
with probability 1%. Median filtering this with a 3×3 neighbourhood produces , in which the noise has
been entirely eliminated with almost no degradation to the underlying image. Compare this with the similar
test on the mean filter.
Consider another example wherein the original image has been corrupted with higher levels (i.e. p=5%
that a bit is flipped) of salt and pepper noise . After smoothing with a 3×3 filter, most of the noise has
been eliminated . If we smooth the noisy image with a larger median filter, e.g. 7×7, all the noisy pixels
disappear, as shown in . Note that the image is beginning to look a bit `blotchy', as greylevel regions
are mapped together. Alternatively, we can pass a 3×3 median filter over the image three times in order to
In general, the median filter allows a great deal of high spatial frequency detail to pass while remaining very
effective at removing noise on images where less than half of the pixels in a smoothing neighbourhood have
been effected. (As a consequence of this, median filtering can be less effective at removing noise from
images corrupted with Gaussian noise.)
One of the major problems with the median filter is that it is relatively expensive and complex to compute.
To find the median it is necessary to sort all the values in the neighbourhood into numerical order and this is
relatively slow, even with fast sorting algorithms such as quicksort. The basic algorithm can however be
enhanced somewhat for speed. A common technique is to notice that when the neighbourhood window is
slid across the image, many of the pixels in the window are the same from one step to the next, and the
relative ordering of these with each other will obviously not have changed. Clever algorithms make use of
this to improve performance.
Exercises
1. Using the image , explore the effect of median filtering with different neighbourhood sizes.
2. Compare the relative speed of mean and median filters using the same sized neighbourhood and
image. How does the performance of each scale with size of image and size of neighbourhood?
3. Unlike the mean filter, the median filter is non-linear. This means that for two images A(x) and B(x):
Illustrate this to yourself by performing smoothing and pixel addition (in the order indicated on each
side of the above equation!) to a set of test images. Carry out this experiment on some simple
References
R. Boyle and R. Thomas Computer Vision: A First Course, Blackwell Scientific Publications, 1988, pp 32
- 34.
E. Davies Machine Vision: Theory, Algorithms and Practicalities Academic Press, 1990, Chap 3.
Local Information
General advice about the local HIPR installation is available here
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
Gaussian Smoothing
Common Names: Gaussian smoothing
Brief Description
The Gaussian smoothing operator is a 2-D convolution operator that is used to `blur' images and remove
detail and noise. In this sense it is similar to the mean filter, but it uses a different kernel that represents
the shape of a Gaussian (`bell-shaped') hump. This kernel has some special properties which are detailed
below.
How It Works
The Gaussian distribution in 1-D has the form:
where is the standard deviation of the distribution. We have also assumed that the distribution has a
mean of zero (i.e. it is centered about the line x=0). The distribution is illustrated in Figure 1.
The idea of Gaussian smoothing is to use this 2-D distribution as a `point-spread' function, and this is
achieved by convolution. Since the image is stored as a collection of discrete pixels we need to produce
a discrete approximation to the Gaussian function before we can perform the convolution. In theory, the
Gaussian distribution is non-zero everywhere, which would require an infinitely large convolution mask,
but in practice it is effectively zero more than about three standard deviations from the mean, and so we
can truncate the mask at this point. Figure 3 shows a suitable integer valued convolution mask that
approximates a Gaussian with a of 1.4.
Once a suitable mask has been calculated, then the Gaussian smoothing can be performed using standard
convolution methods. The convolution can in fact be performed fairly quickly since the equation for the
2-D isotropic Gaussian shown above is separable into x and y components. Thus the 2-D convolution
can be performed by first convolving with a 1-D Gaussian in the x direction, and then convolving with
another 1-D Gaussian in the y direction. (The Gaussian is in fact the only completely circularly
symmetric operator which can be decomposed in such a way.) Figure 4 shows the 1-D x component
mask that would be used to produce the full mask shown in Figure 3. The y component is exactly the
same but is oriented vertically.
Figure 4 One of the pair of 1-D convolution masks used to calculate the full mask shown
in Figure 3 more quickly.
A further way to compute a Gaussian smoothing with a large standard deviation is to convolve an image
several times with a smaller Gaussian. While this is computationally complex, it can have applicability if
the processing is carried out using a hardware pipeline.
The Gaussian filter not only has utility in engineering applications. It is also attracting attention from
computational biologists because it has been attributed with some amount of biological plausibility, e.g.
some cells in the visual pathways of the brain often have an approximately Gaussian response.
The Gaussian outputs a `weighted average' of each pixel's neighbourhood, with the average weighted
more towards the value of the central pixels. This is in contrast to the mean filter's uniformly weighted
average. Because of this, a Gaussian provides gentler smoothing and preserves edges better than a
similarly sized mean filter.
One of the principle justifications for using the Gaussian as a smoothing filter is due to its frequency
response. Most convolution based smoothing filters act as lowpass frequency filters. This means that
their effect is to remove low spatial frequency components from an image. The frequency response of a
convolution filter, i.e. its effect on different spatial frequencies, can be seen by taking the Fourier
transform of the filter. Figure 5 shows the frequency responses of a 1-D mean filter with width 7 and
also of a Gaussian filter with = 3.
Figure 5 Frequency responses of Box (i.e. mean) filter (width 7 pixels) and Gaussian
filter ( = 3 pixels). The spatial frequency axis is marked in cycles per pixel, and hence
no value above 0.5 has a real meaning.
Both filters attenuate high frequencies more than low frequencies, but the mean filter exhibits
oscillations in its frequency response. The Gaussian on the other hand shows no oscillations. In fact, the
shape of the frequency response curve is itself (half a) Gaussian. So by choosing an appropriately sized
Gaussian filter we can be fairly confident about what range of spatial frequencies are still present in the
image after filtering, which is not the case of the mean filter. This has consequences for some edge
detection techniques, as mentioned in the section on zero crossings. (The Gaussian filter also turns out to
be very similar to the optimal smoothing filter for edge detection under the criteria used to derive the
Canny edge detector.)
We use to illustrate the effect of smoothing with successively larger and larger Gaussian filters.
shows the effect of filtering with a Gaussian of = 1.0 (and mask size 5x5).
shows the effect of filtering with a Gaussian of = 2.0 (and mask size 9x9).
shows the effect of filtering with a Gaussian of = 4.0 (and mask size 15x15).
We now consider using the Gaussian filter for noise reduction. For example, consider the image
which has been corrupted by Gaussian noise with a mean of zero and = 8. Smoothing this with a 5×5
Gaussian yields . (Compare this result with that achieved by the mean and median filters.)
Salt and pepper noise is more challenging for a Gaussian filter. Here we will smooth the image ,
which has been corrupted by 1% salt and pepper noise (i.e. individual bits have been flipped with
probability 1%). shows the result of Gaussian smoothing (using the same convolution as above).
(Compare this with the original .) Notice that much of the noise still exists and that, although it has
decreased in magnitude somewhat, it has been smeared out over a larger spatial region. Increasing the
standard deviation continues to reduce/blur the intensity of the noise, but also attenuates high frequency
detail (e.g. edges) significantly, as shown in . This type of noise is better reduced using median
filtering, conservative smoothing or Crimmins Speckle Removal.
Exercises
1. Starting from the Gaussian noise (mean 0, = 13) corrupted image , compute both mean
filter and Gaussian filter smoothing at various scales, and compare each in terms of noise
removal vs. loss of detail.
2. At how many standard deviations from the mean does a Gaussian fall to 5% of its peak value? On
the basis of this suggest a suitable square mask size for a Gaussian filter with = s.
3. Estimate the frequency response for a Gaussian filter by Gaussian smoothing an image, and
taking its Fourier transform both before and afterwards. Compare this with the frequency
response of a mean filter.
4. How does the time taken to smooth with a Gaussian filter compare to the time taken to smooth
with a mean filter for a mask of the same size? Notice that in both cases the convolution can be
sped up considerably by exploiting certain features of the kernel.
References
E. Davies Machine Vision: Theory, Algorithms and Practicalities, Academic Press, 1990, pp 42 - 44.
R. Gonzalez and R. Woods Digital Image Processing, Addison-Wesley Publishing Company, 1992, p
191.
R. Haralick and L. Shapiro Computer and Robot Vision, Addison-Wesley Publishing Company, 1992,
Vol 1, Chap 7.
Local Information
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
Conservative Smoothing
Common Names: Conservative Smoothing
Brief Description
Conservative smoothing is a noise filtering technique which derives its name from the fact that it
employs a simple, fast filtering algorithm that sacrifices noise suppression power in order to preserve the
high spatial frequency detail (e.g. sharp edges) in an image. It is explicitly designed to remove noise
spikes -- i.e. isolated pixels of exceptionally low or high pixel intensity (e.g. salt and pepper noise) and
is, therefore, less effective at removing additive noise (e.g. Gaussian noise) from an image.
How It Works
Like most noise filters, conservative smoothing operates on the assumption that noise has a high spatial
frequency and, therefore, can be attenuated by a local operation which makes each pixel's intensity
roughly consistent with those of its nearest neighbours. However, whereas mean filtering accomplishes
this by averaging local intensities and median filtering by a non-linear rank selection technique,
conservative smoothing simply ensures that each pixel's intensity is bounded within a range of
intensities defined by its neighbours.
This is accomplished by a procedure which first finds the minimum and maximum intensity values of all
the pixels within a windowed region around the pixel in question. If the intensity of the central pixel lies
within the intensity range spread of its neighbours, it is passed onto the output image unchanged.
However, if the central pixel intensity is greater than the maximum value, it is set equal to the maximum
value; if the central pixel intensity is less than the minimum value, it is set equal to the minimum value.
Figure 1 illustrates this idea.
Figure 1 Conservatively smoothing a local pixel neighbourhood. The central pixel of this
figure contains an intensity spike (intensity value 150). In this case, conservative
smoothing replaces it with the maximum intensity value (127) selected amongst those of
its 8 nearest neighbours.
If we compare the result of conservative smoothing on the image segment of Figure 1 with the result
obtained by mean filtering and median filtering, we see that it produces a more subtle effect than both
the former (whose central pixel value would become 125) and the latter (124). Furthermore,
conservative smoothing is less corrupting at image edges than either of these noise suppression filters.
consider the image which has been corrupted with Gaussian noise with mean 0 and deviation 13.
is the result after mean filtering with a 3×3 kernel. Comparing this result with the original image
, it is obvious that in suppressing the noise, edges were blurred and detail was lost.
This example illustrates a major limitation of linear filtering, namely that a weighted average smoothing
process tends to reduce the magnitude of an intensity gradient. Rather than employing a filter which
inserts intermediate intensity values between high contrast neighbouring pixels, we can employ a non-
linear noise suppression technique, such as the median filtering or conservative smoothing, to preserve
spatial resolution by re-using pixel intensity values already existent in the original image. For example,
consider which is the Gaussian noise corrupted image considered above passed through a median
filter with a 3×3 kernel. Here, noise is dealt with less effectively, but detail is better preserved than in the
case of mean filtering.
If we classify smoothing filters along this Noise Suppression vs. Detail Preservation continuum,
conservative smoothing would be rated near the tail end of the former category. shows the same
image conservatively smoothed, using a 3×3 neighbourhood. Maximum high spatial frequency detail is
preserved, but at the price of noise suppression. Conservative smoothing is unable to reduce much
Gaussian noise as individual noisy pixel values do not vary much from their neighbours.
The real utility of conservative smoothing (and median filtering) is in suppressing salt and pepper, or
impulse, noise. A linear filter cannot totally eliminate impulse noise, as a single pixel which acts as an
intensity spike can can contribute significantly to the weighted average of the filter. Non-linear filters
can be robust to this type of noise because single outlier pixel intensities can be eliminated entirely.
For example, consider which has been corrupted by 1% salt and pepper noise (i.e. bits have been
flipped with probability 1%). After mean filtering, the image is still noisy, as shown in . After
median filtering, all noise is suppressed, as shown in . Conservative smoothing produces an image
which still contains some noise in places where the pixel neighbourhoods were contaminated by more
than one intensity spike . However, no image detail has been lost -- e.g. notice how conservative
smoothing is the only operator which preserved the reflection in the subject's eye.
Conservative smoothing works well for low levels of salt and pepper noise. However, when the image
has been corrupted such that more than 1 pixel in the local neighbourhood has been effected,
conservative smoothing is less successful. For example, smoothing the image which has been
infected with 5% salt and pepper noise (i.e. bits flipped with probability 5%), yields . (The original
image is .) Compare this result to that achieved by smoothing with a 3×3 median filter . You
may also compare the result achieved by conservative smoothing to that obtained with 10 iterations of
the Crimmins Speckle Removal algorithm . Notice that although the latter is effective at noise
removal, it smoothes away so much detail that it is of little more general utility than the conservative
smoothing operator on images badly corrupted by noise.
Exercises
1. Explore the effects of conservative smoothing on images corrupted by increasing amounts of
Gaussian noise. At what point does the algorithm become incapable of producing significant
noise suppression?
corrupted by low levels (e.g. 0.1%) of salt and pepper noise. Use the image and use the
original as a benchmark for assessing which algorithm reduces the most noise whilest
preserving image detail. (Note, you should not need more than 8 iterations of Crimmins to clean
up this image.)
3. As the size of kernel increases, the magnitude of intensity gradients in the original image are
decreased after low-pass filtering (e.g. after smoothing with a mean filter). Consider the effects of
increasing the neighbourhood size used by the conservative smoothing algorithm. Does this trend
exist? Could repeated calls to the conservative smoothing operator yield increased smoothing?
Figure 2 Five different structuring elements, for use in exercise 3. These local
neighbourhoods can be used in conservative smoothing by moving the central
(white) portion of the structuring element over the image pixel of interest and then
computing the maximum and minimum (and, hence the range of) intensities of the
image pixels which are covered by the blackened portions of the structuring
element. Using this range, a pixel can be conservatively smoothed as described in
this worksheet.
References
R. Boyle and R. Thomas Computer Vision: A First Course, Blackwell Scientific Publications, 1988, pp
32 - 34.
Local Information
General advice about the local HIPR installation is available here
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
Brief Description
Crimmins Speckle Removal reduces speckle from an image using the Crimmins complementary hulling
algorithm. The algorithm has been specifically designed to reduce the intensity of salt and pepper noise
in an image. Increased iterations of the algorithm yield increased levels of noise removal, but also
introduce a significant amount of blurring of high frequency details.
How It Works
Crimmins Speckle Removal works by passing an image through a speckle removing filter which uses
the complementary hulling technique to reduce the speckle index of that image. The algorithm uses a
non-linear noise reduction technique which compares the intensity of each pixel in an image with those
of its 8 nearest neighbours and, based upon the relative values, increments or decrements the value of the
pixel in question such that it becomes more representative of its surroundings. The noisy pixel alteration
(and detection) procedure used by Crimmins is more complicated than the ranking procedure used by the
non-linear median filter. It involves a series of pairwise operations in which the value of the `middle'
pixel within each neighbourhood window is compared, in turn, with each set of neighbours (N-S, E-W,
NW-SE, NE-SW) in a search for intensity spikes. The operation of the algorithm is illustrated in Figure
1 and described in more detail below.
For each iteration and for each pair of pixel neighbours, the entire image is sent to a Pepper Filter and
Salt Filter as shown above. In the example case, the Pepper Filter is first called to determine whether the
each image pixel is darker_than - i.e. by more than 2 intensity levels - its northern neighbours.
Comparisons where this condition proves true cause the intensity value of the pixel under examination to
be incremented twice lightened, otherwise no change is effected. Once these changes have been
recorded, the entire image is passed through the Pepper Filter again and the same series of comparisons
are made between the current pixel and its southern neighbour. This sequence is repeated by the Salt
Filter, where the conditions lighter_than and darken are, again, instantiated using 2 intensity levels.
Note that, over several iterations, the effects of smoothing in this way propagate out from the intensity
spike to infect neighbouring pixels. In other words, the algorithm smoothes by reducing the magnitude
of a locally inconsistent pixel, as well as increasing the magnitude of pixels in the neighbourhood
surrounding the spike. It is important to notice that a spike is defined here as a pixel whose value is more
than 2 intensity levels different from its surroundings. This means that after 2 iterations of the algorithm,
the immediate neighbours of such a spike may themselves become spikes with respect to pixels lying in
a wider neighbourhood.
We begin examining the Crimmins Speckle Removal algorithm using the image - which is a
contrast stretched version of . We can corrupt this image with a small amount (i.e. p=0.1% that a
bit is flipped) of salt and pepper noise and then use several iterations of the Crimmins Speckle
Removal algorithm to clean it up. The results after 1, 4 and 8 iterations of the algorithm are: ,
, . It took 8 iterations to produce the relatively noise free version that is shown in the latter
image.
In this case, it is instructive to examine the images where significant noise still exists. For example, we
can quantify what we see qualitatively - that the intensity of the speckle noise is decreasing with
increased iterations of the algorithm - by measuring the intensity values of a particular (arbitrarily
chosen) noisy pixel in each of the noisy images. If we zoom a small portion of (i) the original noisy
corrupted image , (ii) the speckle filtered image after 1 iteration and (iii) 4 iterations
, we find that the pepper intensity spike just under the eye takes on intensity values: 51, 67, and
115 respectively. This confirms what we would expect from an algorithmic analysis: each iteration of
Crimmins Speckle Removal reduces the magnitude of a noise spike by 16 intensity levels.
We can also see from this example that a noisy spike (i.e. any pixel whose magnitude is different than its
neighbours by more than 2 levels) is reduced by driving its pixel intensity value towards that of its
neighbours and driving the neighbouring values towards that of the spike (although the latter phenomena
occurs rather more slowly). By increasing the number of iterations of the algorithm, we increase the
extent of this effect, and hence, incur blurring. (If we keep increasing the number of iterations, we would
obtain an image with very little contrast, as all sharp gradients will be smoothed down to a magnitude of
2 intensity levels.)
An extreme example of this can be demonstrated using the image which has been corrupted by
p=1% (that a bit is flipped) salt and pepper noise. (The original is .) In order to remove all the
noise, as shown in , 13 iterations of Crimmins Speckle Removal are required. Much detail has
been sacrificed. We can obtain better performance out of Crimmins Speckle Removal if we use fewer
iterations of the algorithm. However, because this algorithm reduces noise spikes by a few intensity
levels at each iteration, we can only expect to remove noise over few iterations if the noise has a similar
intensity value(s) to those of the underlying image. For example, applying 8 iterations of Crimmins
Speckle Removal to the face corrupted with 5% salt noise (as shown in ) yields . Here the
snow has been removed from the light regions on the subject's face and sweater, but remains in areas
where the background is dark.
The foregoing discussion has pointed to the fact that the Crimmins Speckle Removal algorithm is most
useful on images corrupted by noise whose values are not more than a couple intensity levels different
from those in the underlying image. For example, we can use Crimmins to smooth the Gaussian noise
corrupted image (zero mean and =8) . The result, after only 2 iterations, is shown in .
Below results are tabulated for other smoothing operators applied to this noisy image.
If we allow a little noise in the output of the Crimmins filter (though not as much as we see in some of
the above filter outputs), we can retain a good amount of detail, as shown in . If you now return to
examine the cropped and zoomed versions of the first series of examples in this worksheet, you can see
the Gaussian noise components being smoothed away after very few iterations (i.e. long before the more
dramatic noise spikes are reduced).
Exercises
1. How does the Crimmins algorithm reduce the spatial extent of pixel alteration in the region
around an intensity spike? (In other words, when the algorithm finds an isolated pepper spike
against a uniform light background, how do the conditions within the algorithmic specification
given above limit the amount of darkening that affects pixels outside the local neighbourhood of
the spike?)
2. Investigate the effects of Crimmins Speckle Removal on the image which has poor
contrast and a limited dynamic range centered in the middle of the greyscale spectrum. First filter
a p=3% salt and peppered version of this image. Then take the resultant image and contrast
stretch it using a cutoff frequency of 0.03. Compare your result to which was filtered (and
noise corrupted - using p=3%) after contrast stretching. It took 11 iterations to produce the latter.
Why did it take fewer filtering iterations to remove the noise in your result? Why doesn't your
result look as good?
3. Corrupt the image with Gaussian noise with a large and then filter it using Crimmins
Speckle removal. Compare your results with that achieved by mean filtering, median filtering,
and conservative smoothing.
References
T. Crimmins The Geometric filter for Speckle Reduction, Applied Optics, Vol. 24, No. 10, 15 May
1985.
Local Information
General advice about the local HIPR installation is available here
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
Frequency Filter
Common Names: Frequency Filters
Brief Description
Frequency filters process an image in the frequency domain. The image is Fourier transformed,
multiplied with the filter function and then re-transformed into the real domain. Attenuating high
frequencies results in a smoother image in the real domain, attenuating low frequencies enhances the
edges.
All frequency filters can also be implemented in the spatial domain and, if there exists a simple mask for
the desired filter effect, it is computationally less expensive to perform the filtering in the real domain.
Frequency filtering is more appropriate if no straight-forward mask can be found in the spatial domain.
How It Works
Frequency filtering is based on the Fourier Transform. (For the following discussion we assume some
knowledge about the Fourier Transform, therefore it advantageous if you have already read the
corresponding worksheet.) The operator usually takes an image and a filter function in the Fourier
domain. This input image is then multiplied with the filter function in a pixel by pixel fashion:
where F(k,l) is the input image in the Fourier domain, H(k,l) the filter function and G(k,l) is the filtered
image. To obtain the resulting image in the real space, G(k,l) has to be re-transformed using the inverse
Fourier Transform.
Since the multiplication in the Fourier space is identically to convolution in the real space, all frequency
filters can in theory be implemented as a spatial filter. However, in practice, the Fourier domain filter
function can only be approximated by the filtering mask in real space.
The form of the filter function determines the effects of the operator. There are basically three different
kinds of filters: lowpass, highpass and bandpass filters. A low-pass filter attenuates high frequencies and
retains low frequencies unchanged. The result in the real domain is equivalent to that of a smoothing
filter - as the blocked high frequencies correspond to sharp intensity changes, i.e. to the fine-scale details
and noise in the real space image.
A highpass filter, on the other hand, yields edge enhancement or edge detection in the real domain,
because edges contain many high frequencies. Areas of rather constant greylevel consist of mainly low
frequencies and are therefore suppressed.
A bandpass attenuates very low and very high frequencies, but retains a middle-range band of
frequencies. Bandpass filtering can be used to enhance edges (suppressing low frequencies) while
reducing the noise at the same time (attenuating high frequencies).
The most simple lowpass filter is the ideal lowpass. It suppresses all frequencies higher than the cut-off
frequency and leaves smaller frequencies unchanged:
In most implementations, is given as a fraction of the highest frequency represented in the Fourier
domain image.
The drawback of this filter function is a ringing effect which occurs along the edges of the filtered real
domain image. This phenomena is illustrated in Figure 1 - which shows the shape of the one-
dimensional filter in both the frequency and real domains for two different values of . We obtain the
shape of the two-dimensional filter by rotating these functions about the y-axis. As mentioned earlier,
multiplication in the Fourier domain corresponds to a convolution in the real domain. Due to the
multiple peaks in the ideal filter in the real domain, the filtered image produces ringing along intensity
edges in the real domain.
Better results can be achieved with a Gaussian shaped filter function. The advantage is that the Gaussian
has the same shape in real and Fourier space and therefore does not incur the ringing effect in the real
space of the filtered image. A commonly used discrete approximation to the Gaussian is the Butterworth
filter. Applying this filter in the frequency domain shows a similar result to the Gaussian smoothing in
the real domain. One difference is that the computational cost of the spatial filter increases with the
standard deviation (i.e. with the size of the filter mask), whereas the costs for a frequency filter are
independent of the filter function. Hence, the spatial Gaussian filter is more appropriate for narrow
lowpass filters, while the Butterworth filter is a better implementation for wide lowpass filters.
The same principles apply to highpass filters. We obtain a highpass filter function by inverting the
corresponding lowpass filter, e.g. an ideal highpass filter blocks all frequencies smaller than and
leaves the others unchanged.
Bandpass filters are a combination of both lowpass and highpass filters. They attenuate all frequencies
smaller than a frequency and higher than a frequency , while the frequencies between the two
cut-offs remain in the resulting output image. We obtain the filter function of a bandpass by multiplying
the filter functions of a lowpass and of a highpass in the frequency domain, where the cut-off frequency
of the lowpass is higher than that of the highpass.
Instead of using one of the standard filter functions, we can also create our own filter mask, thus
enhancing or suppressing only certain frequencies. In this way we could, for example, remove periodic
patterns with a certain direction in the resulting real domain image.
performance with . Corrupting this image with Gaussian noise with a zero mean and a standard
deviation of 8 yields . We can reduce the noise using a lowpass filter, because noise consists
largely of high frequencies, which are attenuated by a lowpass filter.
is the result of applying an ideal lowpass filter to the noisy image with the cut-off frequency
being . Although we managed to reduce the high frequent noise, this image is of no practical use. We
lost too many of the fine-scale details and the image exhibits strong ringing due to the shape of the ideal
low pass filter.
Applying the same filter with a cut-off frequency of 0.5 yields . Since this filter keeps a greater
number of frequencies, more details remain in the output image. The image is less blurred, but also
contains more noise. The ringing is less severe, but still exists.
Better results can be achieved with a Butterworth filter. We obtain with a cut-off frequency of .
This image doesn't show any visible ringing and only little noise. However, it also lost some image
information, i.e. the edges are blurred and the image contains less details than the original.
In order to retain more details, we increase the cut-off frequency to 0.5, as can be seen in . This
image is less blurred, but also contains a reasonable amount of noise. In general, when using a lowpass
filter to reduce the high frequency noise, we have to compromise some desirable high frequency
The ringing effect originating from the shape of the ideal lowpass can be better illustrated on the
following artificial image. is a binary image of a rectangle. Filtering this image with an ideal
lowpass filter (cut-off frequency ) yields . The ringing is already recognizable in this image but
is much more obvious in which is obtained after a histogram equalization. The effect gets even
worse if we block more of the frequencies contained in the input image. In order to obtain we
used a cut-off frequency of . Apart from the (desired) smoothing the image also contains a severe
ringing which clearly visible even without histogram equalization. We can also see that the cut-off
frequency directly corresponds to the frequency of the ringing, i.e. as we double the cut-off frequency,
we double the distance between two rings.
shows an image filtered with a Butterworth filter with a cut-off frequency of . In contrast to the
above examples, this image doesn't exhibit any ringing.
We will illustrate the effects of highpass frequency filtering using as well. As a result of
attenuating (or blocking) the low frequencies, areas of constant intensity in the input image are zero in
the output of the highpass filter. Areas of a strong intensity gradient, containing the high frequencies,
have positive and negative intensity values in the filter output. In order to display the image on the
screen, an offset is added to the output in the real domain and the image intensities are scaled. This
results in a middle grey-value for low frequency areas and dark and light values for the edges.
shows the output of a Butterworth highpass with the cut-off frequency being 0.5.
An alternative way to display the filter output is to take the absolute value of the filtered real domain
image. If we apply this method to the clown image (and threshold the result with 13) we obtain .
This image may be compared with , which is an edge image produced by the Sobel operator and,
thus, shows the absolute value of the edge magnitude. We can see that the Sobel operator detects the
edges better than the highpass filter. In general, spatial filters are more commonly used for edge
detection while frequency filters are more often used for high frequency emphasis. Here, the filter
doesn't totally block low frequencies, but magnifies high frequencies relative to low frequencies. This
technique is used in the printing industries for crispening image edges.
Frequency filters are quite useful when processing parts of an image which can be associated with
certain frequencies. For example, in each part of the house is made of stripes of a different
frequency and orientation. The corresponding Fourier Transform (after histogram equalization can be
seen in . We can see the main peaks in the image corresponding to the periodic patterns in the real
space image which now can be accessed separately. For example, we can smooth the vertical stripes (i.e.
those components which make up the wall in the real space image) by multiplying the Fourier image
with the frequency mask . The effect is that all frequencies within the black rectangle are set to
zero, the others remain unchanged. Applying the inverse Fourier Transform and normalizing the
resulting image yields in the real domain. Although the image shows some regular patterns in the
formerly constant background, the vertical stripes are almost totally removed whereas the other patterns
remained mostly unchanged.
We can also use frequency filtering to achieve the opposite effect, i.e. finding all features in the image
with certain characteristics in the frequency domain. For example, if we want to keep the vertical stripes
(i.e. the wall) in the above image, we can use as a mask. To perform the frequency filtering we
transform both the image of the house and the mask into the Fourier domain where we multiply the two
images with the effect that the frequencies occurring in the mask remain in the output while the others
are set to zero. Re-transforming the output into the real space and normalizing it yields . In this
image, the dominant pattern is the one defined by the mask. The pixel values are the highest at places
which were composed of this vertical pattern in the input image and are zero in most of the background
areas. It is now possible to identify the desired area by applying a threshold, as can be seen in . To
understand this process we should keep in mind that a multiplication in the Fourier domain is identical to
a convolution in the real domain.
Frequency filters are also commonly used in image reconstruction. Here, the aim is to remove the effects
of a non-ideal imaging system by multiplying the image in the Fourier space with an appropriate
function. The easiest method, called inverse filtering, is to divide the image in the Fourier space with the
optical transfer function (OTF). We illustrate this technique, also known as deconvolution, using .
We simulate a non-ideal OTF by multiplying the Fourier Transform of the image with the Fourier
Transform of a Gaussian image with a standard deviation of 5. Re-transforming the result into the real
domain yields the blurred image . We can now reconstruct the original image using inverse
filtering by taking the Fourier Transform of the blurred image and dividing it by the Fourier Transform
of the Gaussian mask, which was used to initially blur the image. The reconstructed image is shown in
Although we obtain, in the above case, exactly the original image, this method has two major problems.
First, it is very sensitive to noise. If we, for example, add 0.1% spike noise to the blurred image, we
obtain . Inverse filtering the image (as described above) using this image in order to de-blur yields
the low contrast result . (Note that doing contrast enhancement to emphasize the original image
features can produce an image very similar to the original, except for a loss of fine details). The situation
can be slightly improved if we ignore all values of the Fourier space division in which the divisor (i.e.
the value of the OTF) is below a certain threshold. The effect of using a threshold of 3 can be seen in
. However, if we increase the threshold we have to discard more of the Fourier values and
therefore lose more image information. Hence, we will be less successful in reconstructing the original
image.
The second problem with this image restoration method is that we need to know the OTF which
corrupted the image in the first place. If we, for example, blur the image by convolving it with the
Gaussian image in the real space, we obtain . Although this should theoretically be the same
image as obtained from the multiplication in the Fourier Space, we obtain small differences due to
quantization errors and effects around the border of an image when convolving it in real space.
Reconstructing the original image by dividing the blurred image in the Fourier space with the Fourier
We face a similar problem if we want to deconvolve a real blurred image like . Since we do not
know the transfer function which caused the blurring, we have to estimate it. and are the
results of estimating the OTF with a Gaussian image with a standard deviation of 3 and 10, respectively
and applying an inverse filtering with a minimum OTF threshold of 10. We can see that the image
improved only very little, if at all.
Due to the above problems, in most practical cases more sophisticated reconstruction methods are used.
For example, Wiener filtering and Maximum Entropy filtering are two techniques which are based on the
same principle as inverse filtering, but produce better results on real world images.
Finally, frequency filtering can also be used for pattern matching. For example, we might want to find
all locations in which are occupied by a certain letter, say X. To do this, we need an image of an
isolated X which can act as a mask, in this case . To perform the pattern matching, we transform
both image and mask into the Fourier space and multiply them. We apply the inverse Fourier Transform
to the resulting Fourier image and scale the output to obtain in the real domain. This image is
(theoretically) identical to the result of convolving image and mask in the real space. Hence, the image
shows high values at locations in the image which match the mask well. However, apart from the X,
these are also other letters like the R and the K which match well. In fact, if we threshold the image at
255 (as can be seen in ) we get two locations indicating an X, one of them being incorrect.
Since we multiplied the two complex Fourier images, we also changed the phase of the original text
image. This results in a constant shift between the position of the letter in the original and its response in
the processed image. The example shows that this straightforward method runs into problems if we want
to distinguish between similar patterns or if the mask and the corresponding pattern in the data differ
slightly. Another problem includes the fact that this operation is neither rotation- nor scale invariant.
(Note that we also run into these problems if we implement the operation as a simple convolution in the
real space.) The size of the pattern determines whether it is better to perform the matching in the real or
frequency domain. In our case (the letter was approximately 10×20 pixels), it is substantially faster to do
the matching in the frequency domain.)
The above method might be modified in the following way: instead of multiplying the Fourier
Transforms of the image and the mask as a first step, we threshold the Fourier image of the mask to
identify the most important frequencies which make up the letter X in the real space. For example,
scaling the Fourier magnitude of the above mask to 255 and thresholding it at a value of 10 yields all the
frequencies with at least 4% of the peak magnitude, as it can be seen in . Now, we multiply this
modified mask with the Fourier image of the text, thus retaining only frequencies which also appear in
the letter X. Inverse Fourier Transforming this image yields . We can see that the X is the letter
which preserved its shape the best and also has higher intensity values. Thresholding this image yields
Exercises
1. Apply median, mean and Gaussian smoothing to and compare the results with the images
obtained via lowpass filtering.
2. Add `salt and pepper' noise to and then enhance the resulting image using a lowpass filter.
Which method would be more suitable and why?
3. Remove single parts from (e.g. a window, the roof or the wall) by creating an appropriate
mask and multiplying it with the Fourier Transform of the image.
References
E. Davies Machine Vision: Theory, Algorithms and Practicalities, Academic Press, 1990, Chap 9.
R. Gonzalez and R. Woods Digital Image Processing, Addison-Wesley Publishing Company, 1992,
Chap 4.
IEEE Trans. Circuits and Syst. Special Issue on Digital Filtering and Image Processing, Vol CAS-2,
1975.
Local Information
General advice about the local HIPR installation is available here
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
Laplacian/Laplacian of Gaussian
Common Names: Laplacian, Laplacian of Gaussian, LoG, Marr Filter
Brief Description
The Laplacian is a 2D isotropic measure of the 2nd spatial derivative of an image. The Laplacian of an
image highlights regions of rapid intensity change and is therefore often used for edge detection (see
zero crossing edge detectors). The Laplacian is often applied to an image that has first been smoothed
with something approximating a Gaussian smoothing filter in order to reduce its sensitivity to noise, and
hence the two variants will be described together here. The operator normally takes a single greylevel
image as input and produces another greylevel image as output.
How It Works
The Laplacian L(x,y) of an image having pixel intensity values I(x,y) is given by:
Since the input image is represented as a set of discrete pixels, we have to find a discrete convolution
mask that can approximate the second derivatives in the definition of the Laplacian. Three commonly
used small masks are shown in Figure 1.
Figure 1 Three commonly used discrete approximations to the Laplacian filter. (Note, we
have defined the Laplacian using a negative peak because this is more common, however,
it is equally valid to use the opposite sign convention.)
Using one of these masks, the Laplacian can be calculated using standard convolution methods.
However, because these masks are approximating a second derivative measurement on the image, they
are very sensitive to noise. To counter this, the image is often Gaussian smoothed before applying the
Laplacian filter. This pre-processing step reduces the high frequency noise components prior to the
differentiation step.
In fact, since the convolution operation is associative, we can convolve the Gaussian smoothing filter
with the Laplacian filter first of all, and then convolve this hybrid filter with the image to achieve the
required result. Doing things this way has two advantages:
● Since both the Gaussian and the Laplacian masks are usually much smaller than the image, this
method usually requires far fewer arithmetic operations.
● The LoG (`Laplacian of Gaussian') mask can be precalculated in advance so only one
convolution needs to be performed at run-time on the image.
The 2D LoG function centered on zero and with Gaussian standard deviation has the form:
Figure 2 The 2D Laplacian of Gaussian (LoG) function. The x and y axes are marked in
standard deviations ( ).
A discrete mask that approximates this function (for a Gaussian of 1.4) is shown in Figure 3.
Note that as the Gaussian is made increasingly narrow, the LoG mask becomes the same as the simple
Laplacian masks shown in Figure 1. This is because smoothing with a very narrow Gaussian ( < 0.5
pixels) on a discrete grid has no effect. Hence on a discrete grid, the simple Laplacian can be seen as a
limiting case of the LoG for narrow Gaussians.
Figure 4 Response of 1D LoG filter to a step edge. The left hand graph shows a 1D
image, 200 pixels long, containing a step edge. The right hand graph shows the response
of a 1D LoG filter with Gaussian 3 pixels.
is the result of applying a LoG filter with Gaussian 1.0. A 7×7 mask was used. Note that the
output contains negative and non-integer values, so for display purposes, the image has been normalized
to the range 0-255.
If a portion of the filtered, or gradient, image is added to the original image, then the result will be to
make any edges in the original image much sharper and more contrasty. This is commonly used as an
enhancement technique in remote sensing applications.
is the effect of applying an LoG filter with Gaussian 1.0, again using a 7×7 mask.
Finally, is the result of combining (i.e. subtracting) the filtered image and the original image. Note
that the filtered image had to be suitable scaled suitably before combining in order to produce a sensible
enhancement. Also, it may be necessary to translate the filtered image by a half the width of the
convolution mask in both x and y directions in order to register the images correctly.
The enhancement has made edges sharper but has also increased the effect of noise. If we simply filter
the image with a Laplacian (i.e. use a LoG filter with a very narrow Gaussian) we obtain .
Performing edge enhancement using this sharpening image yields the noisy result . (Note that
unsharp filtering may produce an equivalent result since it can be defined by adding the negative
Laplacian image (or any suitable edge image) onto the original.) Conversely, widening the Gaussian
smoothing component of the operator can reduce some of this noise, but, at the same time, the
enhancement effect becomes less pronounced.
The fact that the output of the filter passes through zero at edges can be used to detect those edges. See
the section on zero crossing edge detection.
Note that since the LoG is an isotropic filter, it is not possible to directly extract edge orientation
information from the LoG output in the same way that it is for other edge detectors such as the Roberts
cross and Sobel operators.
Convolving with a mask such as the one shown in Figure 3 can very easily produce output pixel values
that are much larger than any of the input pixels values, and which may be negative. Therefore it is
important to use an image type (e.g. floating point) that supports negative numbers and a large range in
order to avoid overflow or saturation. The mask can also be scaled down by a constant factor in order to
reduce the range of output values.
Common Variants
It is possible to approximate the LoG filter with a filter that is just the difference of two differently sized
Gaussians. Such a filter is known as a DoG filter (short for `Difference of Gaussians').
As an aside it has been suggested (Marr 1982) that LoG filters (actually DoG filters) are important in
biological visual processing.
An even cruder approximation to the LoG (but much faster to compute) is the DoB filter (`Difference of
Boxes'). This is simply the difference between two mean filters of different sizes. It produces a kind of
squared off approximate version of the LoG.
Exercises
1. Try the effect of LoG filters using different width Gaussians on the image . What is the
general effect of increasing the Gaussian width? Notice particularly the effect on features of
different sizes and thicknesses.
2. Construct a LoG filter where the mask size is much too small for the chosen Gaussian width (i.e.
the LoG becomes truncated). What is the effect on the output? In particular what do you notice
about the LoG output in different regions each of uniform but different intensities?
3. Devise a rule to determine how big an LoG mask should be made in relation to the of the
underlying Gaussian if severe truncation is to be avoided.
4. If you were asked to construct an edge detector that simply looked for peaks (both positive and
negative) in the output from an LoG filter, what would such a detector produce as output from a
single step edge?
References
R. Haralick and L. Shapiro Computer and Robot Vision, Vol 1, Addison-Wesley Publishing Company,
1992, pp 346-351.
Local Information
General advice about the local HIPR installation is available here
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
Unsharp Filter
Common Names: Unsharp Filter, Unsharp Sharpening Mask
Brief Description
The unsharp filter is a simple sharpening operator which derives its name from the fact that it enhances
edges (and other high frequency components in an image) via a procedure which subtracts an unsharp,
or smoothed, version of an image from the original image. The unsharp filtering technique is commonly
used in the photographic and printing industries for crispening edges.
How It Works
We can better understand the operation of the unsharp sharpening filter by examining its frequency
response characteristics. If we have a signal as shown in Figure 2(a), subtracting away the lowpass
component of that signal (as in Figure 2(b)), yields the highpass, or `edge', representation shown in
Figure 2(c).
This edge image can be used for sharpening if we add it back onto the original signal, as shown in
Figure 3.
where k is a scaling constant. Reasonable values for k vary between 0.2 and 0.7, with the larger values
providing increasing amounts of sharpening.
For example, consider the simple image object , whose strong edges have been slightly blurred
by camera focus. In order to extract a sharpened view of the edges, we smooth this image using a mean
filter (kernel size 3×3) and then subtract the smoothed result from the original image. The resulting
image is . (Note, the gradient image contains positive and negative values and, therefore, must
be normalized for display purposes.)
Because we subtracted off all low frequency components from the original image (i.e., we highpass
filtered the image) we are left with only high frequency edge descriptions. Normally, we would require
that a sharpening operator give us back our original image with the high frequency components
enhanced. In order to achieve this effect, we now add some proportion of this gradient image back onto
our original image. shows the image sharpened according to this formula, where the scaling
constant k is set to 0.7.
A more common way of implementing the unsharp mask is by using the negative Laplacian operator to
extract the highpass information directly. See Figure 5.
Some unsharp masks for producing an edge image of this type are shown in Figure 6. These are simply
negative, discrete Laplacian filters. After convolving an original image with a mask such as one of these,
it need only be scaled and and then added to the original. (Note that in the Laplacian of Gaussian
worksheet, we demonstrated edge enhancement using the correct, or positive, Laplacian and LoG masks.
In that case, because the mask peak was positive, the edge image was subtracted, rather than added, back
onto the original.)
With this in mind, we can compare the unsharp and Laplacian of Gaussian filters. First, notice that the
gradient images produced by both filters (e.g. produced by unsharp and produced by
LoG) exhibit the side-effect of ringing, or the introduction of additional intensity image structure. (Note
also that the rings have opposite signs due to the difference in signs of the masks used in each case.)
This ringing occurs at high contrast edges. Figure 7 describes how oscillating (i.e. positive, negative,
positive, etc.) terms in the output (i.e. ringing) are induced by the oscillating terms in the filter.
Figure 7 Ringing effect introduced by the unsharp mask in the presence of a 2 pixel wide,
high intensity stripe. (Grey levels: -1=Dark, 0=Grey, 1=Bright.) a) 1-D input intensity
image slice. b) Corresponding 1-D slice through unsharp filter. c) 1-D output intensity
image slice.
Another interesting comparison of the two filters can be made by examining their edge enhancement
capabilities. Here we begin with reference to . shows the sharpened image produced by a
7×7 Laplacian of Gaussian. shows that due to unsharp sharpening with an equivalently sized
Laplacian. In comparing the unsharp mask defined using the Laplacian with the LoG, it is obvious that
the latter is more robust to noise, as it has been designed explicitly to remove noise before enhancing
edges. (Note, we can obtain a slightly less noisy, but also less sharp, image using a smaller (i.e. 3×3)
The unsharp filter is a powerful sharpening operator, but does indeed produce a poor result in the
presence of noise. For example, consider which has been deliberately corrupted by Gaussian
noise. (For reference, is a mean filtered version of this image.) Now compare this (and the original
) with the output of the unsharp filter . The unsharp mask has accentuated the noise.
Common Variants
Adaptive Unsharp Masking
A powerful technique for sharpening images in the presence of low noise levels is via an adaptive
filtering algorithm. Here we look at a method of re-defining a highpass filter (such as the one shown in
Figure 8) as the sum of a collection of edge directional masks.
This filter can be re-written as times the sum of the eight edge sensitive masks shown in Figure 9.
Adaptive filtering using these masks can be performed by filtering the image with each mask, in turn,
and then summing those outputs which exceed a threshold. As a final step, this result is added to the
original image. (See Figure 10.)
This use of a threshold makes the filter adaptive in the sense that it overcomes the directionality of any
single mask by combining the results of filtering with a selection of masks -- each of which is tuned to
an edge directionality inherent in the image.
Exercises
1. Consider the image , which, after unsharp sharpening (using a mean smoothing filter, with
kernel size 3×3) becomes . a) Perform unsharp sharpening on the raw image using a
Gaussian filter (with the same kernel size). How do the sharpened images produced by the two
different smoothing functions compare? b) Try re-sharpening this image using a filter with larger
kernel sizes (e.g. 5×5, 7×7 and 9×9). How does increasing the kernel size affect the result? c)
What would you expect to see if the mask size were allowed to approach the image size?
3. What result would you expect from an unsharp sharpening operator defined using a smoothing
filter (e.g. the median) which does not produce a lowpass image.
4. Enhance the edges of the 0.1% salt and pepper noise corrupted image using both the
unsharp and Laplacian of Gaussian filters. Which performs best under these conditions?
5. Investigate the response of the unsharp masking filter to edges of various orientations. (Some
useful example images include , and .) Compare your results with those
produced by adaptive unsharp sharpening.
References
R. Haralick and L. Shapiro Computer and Robot Vision, Addison-Wesley Publishing Company, 1992.
R. Schalkoff Digital Image Processing and Computer Vision, John Wiley & Sons, 1989, Chap 4.
Local Information
General advice about the local HIPR installation is available here
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
While HIPR is intended to be used locally in a networked configuration using HTML viewers like
Netscape, this copyright notice restricts the range of access.
Under the current version of this notice, HIPR can only be used at the same site and organization where
the HIPR is installed. The geography of a site can be extended to cover a complete University campus,
and organization can be as large as what is normally contained in at a single University campus.
It is prohibited to make HIPR available over the WWW outside the site and organization.
(It's also stupid to do so because of the many image files involved. If you want to use HIPR effectively,
spend the small amount of money to get the source at your site.)
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
http://www.cee.hw.ac.uk/hipr/html/copyrght.html12/04/2007 12:13:23
Introduction - Welcome to HIPR!
Welcome to HIPR!
You are looking at HIPR - The Hypermedia Image Processing Reference, a new source of on-line
assistance for users of image processing everywhere. If you are a new user then this section is intended
to help you explore the facilities of HIPR and so enable you to get started using it effectively as quickly
as possible.
Hypermedia vs Hardcopy
There are in fact two versions of HIPR --- a hypermedia version which must be viewed on a computer
screen, and a hardcopy (paper and ink) version which you can read like any other book. The first part of
this welcome is an introduction to using the hypermedia version, for those who have not used
hypermedia documents before.
If you are viewing the hypermedia version of HIPR, then what you are seeing now is the Welcome Page
of HIPR. You are viewing it with the aid of a piece of software called a hypermedia browser, probably
one called Netscape although others can be used just as easily. The central portion of the screen contains
this text and around its edges are various other buttons and menus that will be explained later. In fact
you will probably not be able to see the whole Welcome Page since it is quite large. To see more of the
page, look to the left or right of the text for a scroll-bar. Clicking in this with the left mouse button at
different points along its length will cause different parts of the Welcome page to be displayed. Try
clicking in the different parts of the bar with different mouse buttons to see what effect they have. When
you are happy with this method of moving around within a page, return to this point again.
You may also be able to move around a page by pressing keyboard keys, which you might prefer. If you
are using Netscape or Mosaic then <Space> will scroll the page forward one screenfull, and
<BackSpace> or <Delete> will scroll backwards.
The Welcome Page is just one of many pages that make up HIPR. These pages are linked together using
hyperlinks. A hyperlink usually appears as a highlighted word or phrase that refers to another part of
HIPR, often to a place where an explanation of that word or phrase may be found. The magic of
hyperlinks is that simply clicking on this highlighted text with the mouse takes you to the bit of HIPR
that is being referred to. This is one of the most powerful features of HIPR, since it allows rapid cross-
references and explanations to be checked with the minimum of effort. In Netscape, hyperlinks appear
underlined by default.
Note that if you are reading the hardcopy version of HIPR, then hyperlinks simply appear in parentheses
as a cross reference to the relevant page.
If you are using the hypermedia version of HIPR then you can try this out right now. For instance, this
link merely takes you to the top of the Welcome Page. You can return here after trying out the link using
the scrollbar. You could have got there in the first place simply by using the scroll-bar, but sometimes a
hyperlink is more convenient even for just moving around within a page. On the other hand this link
(don't follow it until you've read the rest of this paragraph!) takes you to the Top-Level Page of HIPR,
which is where you will usually enter HIPR when you start using it for real. Near the top of that page is
a hyperlink titled `Welcome to HIPR!' which will bring you back here. Try it.
Hyperlinks don't have to be words or phrases --- they can also be images. For instance if you go to the
bottom of this page you will see a small icon with a picture of a house in it. Clicking on this will take
you to the Top-Level Page again. Incidentally, this button, which appears at the bottom of almost every
page in HIPR, is a good way of reorienting yourself if you get `lost in hyperspace'.
Hyperlinks don't always just take you to another chunk of text. Sometimes they cause other sorts of
information to be displayed such as pictures, or even movies. They can also cause sound clips to be
played.
Once you have mastered moving around within a page using the scroll-bar or short-cut keys, and moving
around between pages using hyperlinks, you know all that you need to start exploring HIPR by yourself
without getting lost.
What Next?
HIPR includes a complete user guide that contains much more information than this Welcome Page, and
you should familiarize yourself with the most important parts of this next. In the hardcopy version this
means at least the first two chapters, while in the hypermedia version, the same material is found in the
sections titled `What is HIPR?', `Guide to Contents' and `How to Use HIPR'. If you are using the
hypermedia version then all these sections can be accessed from the Top-Level Page of HIPR.
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
What is HIPR?
Description
The Hypermedia Image Processing Reference (HIPR) was developed at the Department of Artificial
Intelligence in the University of Edinburgh in order to provide a set of computer-based tutorial materials
for use in taught courses on image processing and machine vision.
The package provides on-line reference and tutorial information on a wide range of image processing
operations, extensively illustrated with actual digitized images, and bound together in a hypermedia
format for easy browsing, searching and cross referencing. Amongst the features offered by HIPR:
● Reference information on around 50 of the most common classes of image processing operations
in use today.
● Guidelines for the use of each operation, including their particular advantages and disadvantages,
and suggestions as to when they are appropriate.
● Example input and output images for each operation illustrating typical results. The images are
viewable on screen and are also available to the student as an image library for further
exploration using an image processing package.
● Encyclopedic glossary of common image processing concepts and terms, cross-referenced with
the image processing operation reference.
● Bibliographic information.
● Tables of equivalent operators for several common image processing packages: VISILOG,
Khoros, the Matlab image processing toolbox, and HIPS.
● Software and detailed instructions for editing and extending the structure of HIPR.
Motivations
The motivation behind HIPR is to bridge the gap between image processing textbooks which provide
good technical detail, but do not generally provide very high quality or indeed very many example
images; and image processing software packages which readily provide plenty of interactivity with real
images and real computers, but often lack much in the way of a tutorial component.
By providing example input and output images for all the image processing operations covered, and
making these easily available to the student through the use of hypermedia, HIPR presents image
processing in a much more `hands on' fashion than is traditional. It is the authors' belief that this
approach is essential for gaining real understanding of what can be done with image processing. In
addition, the use of hypertext structure allows the reference to be efficiently searched, and cross-
references can be followed at the click of a mouse button. Since the package can easily be provided over
a local area network, the information is readily available at any suitably equipped computer connected to
that network.
Another important goal of the package was that it should be usable by people using almost any sort of
computer platform, so much consideration has been given to portability issues. The package should be
suitable for many machine architectures and operating systems, including UNIX workstations, PC/
Windows and Apple Macintosh.
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
Guide to Contents
HIPR is split into three main parts.
User Guide
The user guide provides a wealth of information about how to use, install and extend the HIPR package.
It also describes in detail the structure of the package, and some of the motivations and philosophy
behind the design. In this section:
Introduction to HIPR
Welcome to HIPR!
Where to start if you're completely new to HIPR.
What is HIPR?
Introduction to the motivation and philosophy behind HIPR and a brief overview of the
structure.
Guide to Contents
What you're reading.
General Overview
General background information about how HIPR is organized.
Hypermedia Basics
An introduction to hypermedia and using hypermedia browsers.
How to use HIPR effectively, illustrated with examples of typical tasks that users might
use HIPR for.
Advanced Topics
Filename Conventions
Describes the naming conventions used for the various types of files found in the HIPR
distribution.
Installation Guide
Instructions for installing HIPR on your system.
Local Information
This is a convenient place for the maintainer of your HIPR system to add local information about
the particular image processing setup you use.
The bulk of HIPR is in this section, which consists of detailed descriptions of around 50 of the most
commonly found image processing operations. The operations are grouped into nine categories:
Image Arithmetic
Applying the four standard arithmetic operations of addition, subtraction, multiplications and
division to images. Also Boolean logical operations on images.
Point Operations
Operations that simply remap pixel values without altering the spatial structure of an image.
Geometric Operations
Altering the shape and size of images.
Image Analysis
Statistical and other measures of image attributes.
Morphology
Operations based on the shapes of features in images.
Digital Filters
Largely operations that can be implemented using convolution.
Feature Detectors
Operations designed to identify and locate particular image features such as edges or corners.
Image Transforms
Changing the way in which an image is represented, e.g. representing an image in terms of the
spatial frequency components it contains.
Image Synthesis
Generating artificial images and adding artificial features to images.
Appendices
Additional reference information, including particularly the HIPR A to Z and the index.
Bibliography
Useful general references and texts for image processing and machine vision.
Acknowledgements
Our thanks to our many helpers.
Index
Main index for all of HIPR. The hypertext version includes `hyperlinks' to each indexed item.
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
http://www.cee.hw.ac.uk/hipr/html/howto.html12/04/2007 12:13:27
Advanced Topics
Advanced Topics
The Directory Structure of HIPR
Images and Image Formats
Filename Conventions
Producing the Hardcopy Version of HIPR
Installing HIPR on Your System
Making Changes to HIPR
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
http://www.cee.hw.ac.uk/hipr/html/advance.html12/04/2007 12:13:28
Local Information
Local Information
General advice about the local HIPR installation is available here
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
http://www.cee.hw.ac.uk/hipr/html/local.html12/04/2007 12:13:29
Image Arithmetic
Image Arithmetic
Introduction
Addition
Subtraction
Multiplication and Scaling
Division
Blending
Logical AND/NAND
Logical OR/NOR
Logical XOR/XNOR
Invert/Logical NOT
Bitshift Operators
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
http://www.cee.hw.ac.uk/hipr/html/arthops.html12/04/2007 12:13:29
Point Operations
Point Operations
Introduction
Thresholding
Adaptive Thresholding
Contrast Stretching
Histogram Equalization
Logarithm Operator
Exponential/`Raise to Power' Operator
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
http://www.cee.hw.ac.uk/hipr/html/pntops.html12/04/2007 12:13:30
Geometric Operations
Geometric Operations
Introduction
Scale
Rotate
Reflect
Translate
Affine Transformation
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
http://www.cee.hw.ac.uk/hipr/html/geomops.html12/04/2007 12:13:31
Image Analysis
Image Analysis
Introduction
Intensity Histogram
Classification
Connected Components Labeling
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
http://www.cee.hw.ac.uk/hipr/html/analops.html12/04/2007 12:13:31
Image Transforms
Image Transforms
Introduction
Distance Transform
Fourier Transform
Hough Transform
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
http://www.cee.hw.ac.uk/hipr/html/tranops.html12/04/2007 12:13:32
Image Synthesis
Image Synthesis
Introduction
Noise Generation
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
http://www.cee.hw.ac.uk/hipr/html/snthops.html12/04/2007 12:13:33
A to Z of Image Processing Concepts
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
http://www.cee.hw.ac.uk/hipr/html/library.html12/04/2007 12:13:35
Common Software Implementations
The four packages we have chosen are: Visilog, Khoros and the Matlab Image Processing Toolbox.
Information about these packages, and advice as to where they can be obtained is given below.
If your image processing software is not mentioned here then you will have to consult the
documentation that came with it for help on operator equivalents.
Note that while we have done our best to describe the contents of these packages accurately, it is
possible that we have made some omissions, or that the implementation/version that you are using is
different from ours. Where a package has more than one operator that do similar things to an operator
documented in HIPR, we have mentioned only the one we think is closest.
Visilog
Khoros
Matlab
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
http://www.cee.hw.ac.uk/hipr/html/implems.html12/04/2007 12:13:36
HIPRscript Reference Manual
What is HIPRscript?
See the introductory section on Making Changes with HIPRscript for an introduction to the role of
HIPRscript in generating HIPR.
Almost 300 HIPRscript source files go together to make HIPR. In general, each HIPRscript source file
gives rise to one HTML file and one LaTeX file. Each HTML file corresponds to a `scrollable page' of
hypertext, while the LaTeX files are merged together to generate the hardcopy version of HIPR. There
are a few exceptions to this rule --- for instance the .loc source files used for entering local
information are included into other files.
To convert HIPRscript into HTML and LaTeX, a Perl program called hiprgen.pl is used. This
program can be found in the progs sub-directory. The effect of the program when run on a HIPRscript
source file is to generate corresponding HTML and LaTeX files in the appropriate directories.
To run hiprgen.pl you need to have at the very least a recent version of Perl installed on your
system. In addition, if you wish to have the program automatically generate equations, figures and
thumbnails as described below, then you will have to install additional utilities. The Installation Guide
has all the details.
Apart from the hpr, html and tex sub-directories, four other sub-directories are also important to the
running of HIPRscript.
The eqns sub-directory contains inline images representing equations for use in HTML documents.
These image files are generated automatically by hiprgen.pl from information in the HIPRscript
files and are incorporated into the displayed document by the HTML browser.
The figs sub-directory contains the images used for figures in two different formats: GIF for HTML
pages, and encapsulated PostScript for inclusion into LaTeX output. As a HIPRscript author you must
create the PostScript version of the figure yourself and put it in this directory. hiprgen.pl will then
create a matching GIF file automatically if one does not already exist.
The thumbs sub-directory contains `thumbnails' (miniature versions of images that are used for
imagelinks). These are normally generated automatically by hiprgen.pl from corresponding full-size
images.
The index sub-directory contains information used in generating the HIPR main index. The files in this
directory are created automatically my hiprgen.pl and should not be edited by hand.
For instance, on my UNIX system, in order to translate dilate.hpr into HTML and LaTeX, I type
(from within the src sub-directory):
The program will run and the appropriate HTML and LaTeX files will be generated. If there are any
errors, the translator will stop with an error message and no output files are generated. Error messages
are detailed in a later section.
\links{}{erode}{morops}
\index{Dilation}{\section{Dilation}}
\title{Morphology - Dilation}
\subsection{Brief Description}
Have a look at the dilation worksheet to see what this becomes (note that if you are using Netscape or
Mosaic then you might want to click on the link with the middle mouse button which will display the
worksheet in a separate window).
As with both HTML and LaTeX, a HIPRscript source file is an ASCII file containing a mixture of raw
text and tags which define how that raw text is to be displayed. Tags in HIPRscript have the following
syntax:
● This is then followed by the tagname. Most tagnames contain only alphanumeric characters, but
there are also a few tagnames consisting of a single non-alphanumeric character. Note that there
is no space between the backslash and the tagname.
● Finally there comes the list of arguments associated with the tag. Each separate argument is
enclosed in its own pair of curly braces: {...}. Note that there must not be any whitespace
between arguments, or between the arguments and the tagname.
The simplest tags take no arguments. For instance the tag: \times produces a multiplication symbol
like this: ×.
Slightly more complicated are tags which take a single argument. An example of this is the \em tag
which produces emphasized italic text. For instance, the phrase emphasized italic text in the last sentence
was produced by the following HIPRscript fragment: \em{emphasized italic text}.
Finally, some tags take multiple arguments. A common example of this is the \ref tag which is used
for cross references to other HIPR pages. To create a reference to the worksheet on dilation for example,
we could write \ref{dilate}{This is a link to the dilation worksheet} which
Many tags can be nested inside one another. If we modify the example in the last paragraph to \ref
{dilate}{This is a link to the \em{dilation} worksheet}, then we get: This is a
link to the dilation worksheet.
Arguments can usually contain newlines without any problems, but you should make sure that there are
no newlines between arguments belonging to the same tag. For instance:
\ref{dilate}{This is a link
to the dilation worksheet}
is fine, whereas:
\ref{dilate}
{This is a link to the dilation worksheet}
is not.
One very important point is that like LaTeX and HTML, HIPRscript pays very little attention to the
amount of whitespace (spaces, tabs and newlines) that you put in your document. One space between
words is treated exactly the same as one hundred spaces --- both will cause just a single space to be
displayed between the words.
Single newlines in sentences are also largely ignored. In general HIPRscript will decide for itself where
it wants to break lines in order to make them fit onto the page. Note: In actual fact this tricky decision is
taken care of by the magic of LaTeX and HTML browsers.
Two or more newlines in succession are treated as significant. They indicate that you want a paragraph
break at that point. This will normally cause a vertical space to be inserted between the preceding and
following lines on the page.
As mentioned earlier, tags that don't take any arguments are terminated by any non-alphanumeric
character. This can cause a problem if you do want to immediately follow such a tag with an
alphanumeric character without putting a space in between. For instance if you want to display `2×2',
you cannot write 2\times2 since \times2 is not a recognized tag and will cause an error. And if you
write 2\times 2 then what you get is `2× 2', i.e. with an extra unwanted space. The solution is to
terminate the tag with a caret. This special character terminates a tag without being printed itself. So 2
\times^2 will produce the display you want. If you do want to follow a tag immediately with a caret
for some reason, then simply use two of them. The other time you might want to use a caret to terminate
a tag is if you want to put a backslash at the end of an argument. Without the caret, the final backslash
would escape the closing brace and hence cause HIPRscript to think that you haven't terminated the
argument properly. So you would write: ...\\^.
Since the backslash and braces are special characters, there are tags for displaying them normally. \\,
\{ and \} will display \, { and } respectively.
It is often useful to be able to write into HIPRscript files chunks of text that will be completely ignored
by the interpreter. Such a piece of text is known as a comment. They can be used to explain to anyone
reading the source file why you chose to say certain things or why you chose to express things in certain
ways. This can be useful if someone else wishes to work on the file after you have finished with it, or if
you wish to be reminded what you were doing last when you come back to it. Comments can also be
used to force the translator to ignore large chunks of your source file. This is particularly useful for
tracking down errors in your source file. There are two forms of comments in HIPRscript:
Line Comments
Anything on a line following a hash sign (#) is ignored, including the hash sign itself. If you want
a hash sign then use the \# tag.
Block Comments
The special tag \comment{...} simply causes everything in its single argument to be ignored.
This is good for commenting out large blocks of text in a single swoop.
#
Line comment - the remainder of the current line is ignored.
\\
Backslash: \
\#
Hash sign: #
\begin
Must be present right at the start of the first HIPR page --- sets up various LaTeX preliminaries
and creates the HIPR banner.
\blob or \blob{COLOUR}
Produces a small coloured blob in the HTML output only. The optional COLOUR argument can
take the values yellow or red. The default value is yellow.
\br
Forces a line break. Note that in a few situations this will cause a LaTeX error since LaTeX
doesn't like being told where to break lines. If this happens use the \html tag to ignore the line
break in the LaTeX output.
\chapter{HEADING}
Starts a new chapter with a title given by HEADING. This is the second largest document
division in HIPR.
\comment{TEXT}
Block comment --- simply ignores everything within its argument and produces no output.
\deg
Degree symbol: °
\dd{TEXT}
See the \description tag.
\description{DESCRIPTION_LIST}
Description list. One of three HIPRscript list types --- this one is used for lists where each list
item consists of a brief label (known as the topic) followed by a block of text (known as the
discussion). The argument of this tag must consist solely of alternate \dt and \dd tags. The
\dt tag comes first and its argument gives the Description Topic. The \dd tag then follows with
the Description Discussion as its argument. Any number of pairs of \dt and \dd tags may be
inserted.
\description{
\dt{HIPR}
\dd{The Hypermedia Image Processing Reference.}
\dt{HIPRscript}
\dd{The language used to create HIPR.}
}
HIPR
Note that the \dd tag's argument can contain any other tags, including additional lists. The \dt
tag on the other hand should only contain raw text and appearance changing tags such as \em.
\dt{TEXT}
See the \description tag.
\eg
Inserts an italicized `e.g.': e.g.
\end
Must go at the end of the last page. Its principle purpose is to tell LaTeX that it has reached the
end.
\enumerate{ENUMERATE_LIST}
Enumerated list. One of three HIPRscript list types --- this one is used for ordered lists where
each item is to be numbered in consecutive order. HIPRscript will take care of the numbering
automatically. The argument of this tag consists of a mixture of text and \item tags. Each
\item tag encountered tells HIPRscript that a new list item is starting, complete with a new
number.
\enumerate{
\item Capture an image.
\item Choose a threshold value.
\item Apply thresholding to produce a binary image.
}
1. Capture an image.
2. Choose a threshold value.
3. Apply thresholding to produce a binary image.
\enumerate lists can contain most other sorts of tag, including other list tags. Usually a
different numbering scheme (e.g. roman numerals or letters) is used for nested \enumerate
lists.
\eqnd{EQUATION}{GIF_FILE} or \eqnd{EQUATION}{GIF_FILE}{SIZE}
Displayed equation. Used for mathematical equations which are to be set apart from the text that
describes them. The EQUATION argument describes the equation using LaTeX format. i.e. the
code that goes here is exactly what you would type to produce the equation in LaTeX. The
GIF_FILE argument is the name of a file (without the .gif suffix) in the eqns sub-directory
where an image representing that equation will be stored. It is possible to get HIPRscript to
generate the GIF file automatically from the EQUATION argument, but note that this involves
many extra complications which are described below in the subsection on Using Figure,
Equation and Imageref Tags. The optional SIZE argument can take values of small, large or
huge and determines the scaling of the displayed equation. The default is large.
\eqnl{EQUATION}{GIF_FILE}
Similar to the \eqnd tag except that it produces an in-line equation and takes no optional SIZE
argument.
\etc
Inserts an italicized `etc.': etc.
\fig{FIG_FILE}{CAPTION} or \fig{FIG_FILE}{CAPTION}{SCALE}
Include a figure at this point in the text. The FIG_FILE argument refers to two similarly named
files in the figs sub-directory that contain the image to be included in two different formats,
GIF and encapsulated PostScript. The FIG_FILE argument should specify just the stem-name of
the files. To this HIPRscript will add .eps for the PostScript file and .gif for the GIF file. It is
possible to get HIPRscript to generate the GIF file automatically from the PostScript file, but note
that this involves many extra complications which are described below in the subsection on
Using Figure, Equation and Imageref Tags. The CAPTION argument gives the text that will go
with the figure, and may contain appearance changing tags such as \em and also cross-reference
tags such as \ref. The optional SCALE argument gives the scaling of the figure. A size of 1
(the default) means that the image will be included at its `natural' size. A number less than this
will reduce the size, larger numbers will increase it. Note that HIPRscript will assign a figure
number to your figure automatically.
\figref{FIG_FILE}
Used to reference a particular figure in the text. The FIG_FILE argument should match up with
the FIG_FILE argument of a \fig tag somewhere in the same HIPRscript file. That will be the
figure to which the \figref refers. The visible effect of the tag is to insert the text: `Figure N'
where N is the number of the figure concerned.
\html{TEXT}
Process the TEXT argument as if it were normal HIPRscript, but only produce HTML output.
Nothing appears in the LaTeX output for this tag.
\ie
Inserts an italicized `i.e.': i.e.
\imageref{IMAGE_FILE} or \imageref{IMAGE_FILE}{MODE}
Refer to an image in the images sub-directory. The IMAGE_FILE argument should give the
name of the image file concerned, minus any file suffix such as .gif. The visible effect of this
tag depends upon whether you are looking at the HTML or the LaTeX output. In the HTML
output, by default it creates a hyperlink to the named image, in the form of a small thumbnail
image. The thumbnail image is in fact just a reduced size version of the full image and is found in
the thumbs sub-directory. It is possible to get HIPRscript to generate the thumbnail
automatically from the full size image, but note that this involves many extra complications
which are described below in the subsection on Using Figure, Equation and Imageref Tags. In
the LaTeX output, this tag simply prints the IMAGE_FILE argument in typewriter font.
The optional argument MODE can be used to alter the appearance of the HTML output slightly
(it doesn't affect the LaTeX though). Setting MODE to text means that the imagelink will
simply appear as the name of the image file rather than as a thumbnail. Setting MODE to both
causes the link to appear both as text and as a thumbnail. Setting MODE to thumbnail
produces the default behaviour.
\inc{FILE}{TEXT}
Includes the HIPRscript file named by the FILE argument plus a .hpr extension, into the current
file. In the HTML output, this inclusion appears as a hyperlink to the named file. The text of the
hyperlink is given by the TEXT argument. In the LaTeX output there is no such link --- the
named file is included as if its entire contents had been typed into the current file at that point.
The TEXT argument is ignored in the LaTeX output.
\inc2{FILE}{TEXT}
Includes the HIPRscript file named by the FILE argument plus a .hpr extension, into the current
file. This is rather like the \inc tag except that TEXT is printed in both the HTML and the
LaTeX output. In the HTML output, TEXT acts as a hyperlink to the included file. In the LaTeX
output, TEXT is merely printed immediately before the named file is included.
\index{ENTRY1}{ENTRY2}...{TEXT}
Creates an entry in the HIPR index. ENTRY1, ENTRY2 and so on give the names of topic
entries in the index. Each index entry may have up to three levels of nesting in order to specify
the entry precisely. An entry with more than one level of nesting is indicated by a ENTRY
argument containing one or two | symbols. The | symbols are used as separators between the
different levels in the entry.
For instance if we wanted an index entry to appear under the general category of `Edge detectors'
and then within that category, under `Canny', then we would have an entry argument that looked
like: {Edge detectors|Canny}.
Note that every ENTRY argument in the whole of HIPR must be unique. We can only have one
place in HIPR which has the index entry {Edge detectors|Canny} for instance. However,
we could have another entry for {Edge detectors|Roberts} for instance.
The TEXT argument indicates which chunk of HIPRscript is to be pointed to by the index entry
and can contain almost anything. It should not however contain any \target or \title tags.
A particular place in HIPR can have more than one index entry associated with it. Simply use as
many ENTRY arguments as you need.
\input{FILE}
Like the \inc tag, this tag includes a named file into the current one. This time however, the
effect for both HTML and LaTeX is as if the contents of the named file had been inserted directly
into the current file in the tag's place. No links are created.
\item
Marks the beginning of a list item in \enumerate and \itemize lists. It gives an error if
used anywhere else.
\itemize{ITEMIZE_LIST}
Itemized list. One of three HIPRscript list types --- this one is used for unordered lists where each
item is to be marked with a `bullet', but not numbered. The argument of this tag consists of a
mixture of text and \item tags. Each \item tag encountered tells HIPRscript that a new list
item is starting, and causes a bullet to be printed.
\itemize{
\item Available on-line.
\item Extensive cross-referencing.
\item Uses Netscape browser.
}
❍ Available on-line.
❍ Extensive cross-referencing.
❍ Uses Netscape browser.
\itemize lists can contain most other sorts of tag, including other list tags. Usually different
bullet styles are used for nested \itemize lists.
\latex{TEXT}
Process the TEXT argument as if it were normal HIPRscript, but only produce LaTeX output.
Nothing appears in the HTML output for this tag.
\LaTeX
A special tag that produces a nicely formatted version of the LaTeX tradename. In the hardcopy
version it prints as: ` '. In the hypermedia version it prints simply as: `LaTeX'.
\links{LEFT}{RIGHT}{UP}
Causes navigation buttons to appear in the HTML version of HIPR. It has no effect on the LaTeX
output. LEFT, RIGHT and UP should be the names of other HIPRscript files within HIPR minus
the .hpr suffix. The navigation buttons appear at the top of HTML page, and, if the page is
more than about a screenful, are duplicated at the bottom. If there is no appropriate link for any of
the arguments, then simply use a pair of empty braces for that argument, and no navigation
button will be generated for that direction.
\links{}{erode}{morops}
creates a `right' navigation button leading to the erode.html worksheet, and an `up' button
leading to the morops.html section.
See the section on Navigation Buttons for more information about navigation buttons.
\newpage
Causes a pagebreak. Only affects the LaTeX output.
\part{HEADING}
Starts a new `part' with a title given by the HEADING argument. This is the largest document
division in HIPR.
\pm
Plus-or-minus symbol: ª
\quote{TEXT}
Indents the HIPRscript contained in the TEXT argument from the left margin slightly.
\rawhtml{HTML}
Passes the HTML argument directly into the HTML output with no processing. It differs from the
\html tag in that the HTML argument is not treated as HIPRscript to be further processed. Has
\rawlatex{LATEX}
Passes the LATEX argument directly into the LaTeX output with no processing. It differs from
the \latex tag in that the LATEX argument is not treated as HIPRscript to be further
processed. Has no effect on the HTML output.
\ref{FILE}{TEXT}
In the HTML output, this creates a hyperlink using TEXT pointing at the HIPRscript file
specified by FILE (minus the .hpr file extension as usual).
\section{HEADING}
Starts a new section with a title given by the HEADING argument. This is the third largest
document division in HIPR.
\sqr
A squared symbol: ²
\strong{TEXT}
Causes the HIPRscript within TEXT to be displayed in a bold font.
\subsection{HEADING}
Starts a new subsection with a title given by the HEADING argument. This is the fourth largest
document division in HIPR.
\subsubsection{HEADING}
Starts a new subsubsection with a title given by the HEADING argument. This is the fifth largest
document division in HIPR, and the smallest.
\tab{COL1}{COL2}...
Used for specifying the data to go into a table created with the \table tag. The text in COL1
goes in the first column, the text in COL2 goes in the second column and so on. There should be
the same number of arguments as there are data columns in the table, and the number of
characters in each argument should be less than the width of the table columns. See the \table
tag for details.
\table{COL1}{COL2}...{DATA}
Creates a table at the current place in the document. The last argument contains the actual data
that will go into the table. The previous arguments define the column layout of the table. If a
COL argument is a number, then that indicates a data column. The number gives the width in
characters of that column. Data appears left justified within the column. If the COL argument is a
`|' character, this indicates an internal vertical dividing line. If you use two such arguments in a
The DATA argument contains the body of the table. It must consist solely of \tab tags
(specifying the data to go in each column), and \tabline tags (specifying horizontal lines in
the table).
Note that HIPRscript will automatically put lines around the outside of a table and so these do not
need to be specified.
\table{6}{|}{|}{8}{8}{8}{
\tab{Type}{Test A}{Test B}{Test C}
\tabline
\tab{1}{Yes}{Yes}{No}
\tab{2}{No}{Yes}{No}
\tab{3}{Yes}{Yes}{Yes}
}
--------------------------------
| Type || Test A Test B Test C |
|------++------------------------|
| 1 || Yes Yes No |
| 2 || No Yes No |
| 3 || Yes Yes Yes |
--------------------------------
Note that at the time of writing HTML does not provide proper support for tables, and so tables
appear rather crudely in HTML. Significantly better looking tables appear in the LaTeX output.
\tabline
Used between \tab tags within a table to produce a horizontal line across the table. Note that the
\table tag itself produces horizontal lines at the top and bottom of the table.
\target{LABEL}{TEXT}
Associates the chunk of HIPRscript in TEXT with LABEL for reference to by a \textref tag.
LABEL must be a single word and can only contain alphanumeric characters. TEXT can be any
bit of HIPRscript.
\textref{FILE}{LABEL}{TEXT}
Like the \ref tag, this creates a hyperlink around TEXT to the HIPRscript file named in FILE.
However, whereas following a \ref tag automatically takes a user to the top of that file, the
\textref tag allows you to jump into the middle of a file. The point to be jumped to must be
marked with a \target tag and that tag's LABEL argument must match the LABEL argument
used here.
\times
A multiplication symbol: ×
\title{TEXT}
Most HTML documents are associated with a document title which does not appear on the page
itself, but which is often shown separately by the HTML browser. This tag allows you to specify
the HTML document title. It has no effect on the LaTeX output. This tag must be positioned near
the top of the HIPRscript file and should only be used once. It should be placed after any
\links tag, and also after the first document sectioning command in the file (i.e. after the first
\chapter, \section etc. tag in the file).
\tt{TEXT}
Causes the HIPRscript contained in TEXT to be displayed in a typewriter font.
\verbatim{TEXT}
TEXT is displayed in both HTML and LaTeX output exactly as it appears in the HIPRscript file.
Unlike normal HIPRscript text, all whitespace is preserved as is. The text is also normally
displayed in a fixed space typewriter font.
What actually happens is that every time a HIPRscript file is processed, all the index entry information
in that file is written into a corresponding IDX file in the index sub-directory. To generate the index
section, what we have to do is scan this sub-directory, collate all the information in all the IDX files
there, and then use this information to produce the index pages. In fact, the index section is itself written
to the src sub-directory as a HIPRscript file. The scanning and analysis of the IDX files is performed
by another Perl program called hiprindx.pl, also found in the progs sub-directory.
Finally the HIPRscript file for the index is processed as normal to produce the required HTML and
LaTeX index pages.
1. Run hiprgen.pl on each HIPRscript file in the src sub-directory to both generate the
corresponding HTML and LaTeX files, and also to write the relevant index information to IDX
files in the index sub-directory.
2. Run hiprindx.pl in order to analyze the IDX files and generate index.hpr in the src sub-
directory.
3. Finally, run hiprgen.pl on index.hpr in order to produce the HTML and LaTeX index
sections.
Note that this procedure can be simplified somewhat through the use of a makefile as described later.
Which means an error was found on line 2 of the file chngehip.hpr. In this case the translator is
saying that it encountered the end of the file before it had finished reading the argument of an \index tag.
The most likely cause of this is a missing end-brace (}).
We present here a list of all the error messages you are likely to encounter when using HIPRscript,
together with brief explanations of the likely cause.
Couldn't create the HTML output file corresponding to the current HIPRscript file. Make sure
that the html sub-directory exists and is writeable. Check also that if the HTML file to be
produced already exists, then it should also be writeable, so that it can be overwritten with the
new version.
g.: 0.8.
Make utilities solve exactly the problem described above of keeping track of large collections of files
which depend upon one another, and of regenerating just those files that need to be regenerated in a
simple way. They are available for almost all computer systems and can generally be obtained free from
the Internet at the standard software archives for your machine if you don't have one already.
A make utility is used in conjunction with a special text file known as a makefile. This file is
conventionally just called makefile and in HIPR you will find a ready prepared one in the src sub-
directory suitable for use with UNIX make utilities (and with many MS-DOS implementations as well
that automatically convert `/' to `\' in pathnames). If you are not using one of these systems and wish to
use the makefile then you will need to edit it slightly in order to set the pathnames to the important sub-
directories correctly. Look at the top of the supplied makefile for guidance and consult the
documentation on the make utility for your system for details.
You do not need to understand what this file does in order to use it --- just make sure that the current
working directory is the src sub-directory, and then run the make program, normally by just typing
make. If you have installed everything else correctly then this is all you will need to do. The make
program will check to see which HIPRscript files have been modified since the corresponding HTML
and LaTeX files were created, and will run hiprgen.pl on just those files.
The makefile will also allow you to generate the index relatively painlessly. Simply run make with the
argument `index', e.g. by typing make index.
If you cannot use a make utility then it is of course still possible to regenerate HIPRscript files by hand
and this is entirely appropriate for relatively small changes.
Note that the facilities that enable hiprgen.pl to generate these additional files automatically are
amongst the least portable aspects of HIPR. They were used extensively during the development of
HIPR and are included as a bonus here for those experienced enough to make use of them. However, we
cannot provide any support or advice other than that supplied here. Note that you should not even begin
to think about using this feature though, unless you have a UNIX system. You will also need to obtain
several other programs and utilities such as the PBMplus image manipulation library, and GhostScript.
Details are given in the relevant section of the Installation Guide.
Since the automatic generation of additional files for use with figures, equations and \imageref tags
is so non-portable, this feature is disabled by default.
\fig Tags
With both HTML and LaTeX, the picture that goes into a figure is included into the HTML or LaTeX
from an external file in the figs sub-directory. In the case of HTML, this picture must be a GIF image
file, whereas in the case of LaTeX, the picture must be stored in encapsulated PostScript (EPS) format.
Therefore both GIF and EPS versions of each figure must be available in the figs sub-directory. The
two versions are distinguished by their different file extensions: .gif and .eps, but otherwise have
identical stemnames (which must match up with the first argument of a corresponding \fig tag.
It is possible to get hiprgen.pl to generate the GIF file automatically from the corresponding EPS
file. This requires that you first install the special HIPR utility hfig2gif, as well as the PBMplus
library, GhostScript and pstoppm.ps. Then you must edit the hiprgen.pl file, and near the top of
the file, you should change the line:
$AUTO_FIGURE = 0;
to:
$AUTO_FIGURE = 1;
From this point on, if, during the course of processing a HIPRscript file, a \fig tag is encountered for
which there is a corresponding EPS file, but no GIF file, then a corresponding GIF file will be generated
from the EPS file. If you subsequently change the EPS file and you want to force hiprgen.pl to
regenerate a new matching GIF file, then simply delete the old GIF file.
At the time of writing, HTML provides no support for mathematical equations, unlike LaTeX which
does. This situation may be remedied in the future, but for now we have had to use a workaround.
Therefore, equations in HTML are included as small in-line images stored as GIF files in the eqns sub-
directory. LaTeX does provide support for equations, and so no external files are required for the LaTeX
versions of equations.
While these image-equations for use with HTML could be generated by hand, it is possible to get
hiprgen.pl to generate the GIF file automatically from the LaTeX equation code contained in the
equation tag. This requires that you first install the special HIPR utility heqn2gif, as well as the
PBMplus library, GhostScript, pstoppm.ps and LaTeX. Then you must edit the hiprgen.pl file,
and near the top of the file, you should change the line:
$AUTO_EQUATION = 0;
to:
$AUTO_EQUATION = 1;
From this point on, if, during the course of processing a HIPRscript file, a \eqnl or \eqnd tag is
encountered for which there is no corresponding GIF file then one will be generated from the LaTeX
description of the equation. If you subsequently change this description and you want to force
hiprgen.pl to regenerate a new matching GIF file, then simply delete the old GIF file.
\imageref Tags
In the HTML output files, \imageref tags cause a hyperlink to an image in the images sub-directory
to be created. By default, this imagelink takes the form of a miniature version of the full-size image,
known as a thumbnail. This thumbnail must be a GIF file and is stored in the thumbs sub-directory.
The files have the unconventional suffix .thm in order to avoid confusion with the full-size images.
Conventionally, thumbnails are 32 pixels high, and keep the same aspect ratio as their parent image.
Various image manipulation utilities could be used to generate thumbnails by hand, but it is possible to
get hiprgen.pl to generate the thumbnail automatically from the full-size image. This requires that
you first install the special HIPR utility himg2thm and the PBMplus library. Then you must edit the
hiprgen.pl file, and near the top of the file, you should change the line:
$AUTO_THUMBNAIL = 0;
to:
$AUTO_THUMBNAIL = 1;
From this point on, if, during the course of processing a HIPRscript file, a \imageref tag is
encountered for which there is no corresponding thumbnail file then one will be generated automatically.
If you subsequently change the full-size image and you want to force hiprgen.pl to regenerate a new
matching thumbnail, then simply delete the old thumbnail file.
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
Bibliography
General References
I. Aleksander (ed.) Artificial Vision for Robots, Chapman and Hall, 1983.
N. Ahuja and B. Schachter Pattern Models, John Wiley & Sons, 1983.
T. Avery and G. Berlin Fundamentals of Remote Sensing and Airphoto Interpretation, Maxwell
Macmillan International, 1985.
B. Batchelor, D. Hill and D. Hodgson Automated Visual Inspection, IFS (Publications) Ltd, 1985.
R. Bates and M. McDonnell Image Restoration and Reconstruction, Oxford University Press, 1986.
R. Blahut Fast Algorithms for Digital Signal Processing, Addison-Wesley Publishing Company, 1985.
R. Bolles, H. Baker and M. Hannah The JISCT Stereo Evaluation, ARPA Image Understanding
Workshop Proceedings, 1993.
R. Boyle and R. Thomas Computer Vision: A First Course, Blackwell Scientific Publications, 1988.
M. Brady and R. Paul (eds) Robotics Research:The First International Symposium, MIT Press, 1984.
D. Braggins and J. Hollingham The Machine Vision Sourcebook, IFS (Publications) Ltd, 1986.
A. Browne and L. Norton-Wayne Vision and Information Processing for Automation, Plenum Press,
1986.
V. Bruce and P. Green Visual Perception: Physiology, Psychology and Ecology, 2nd ed., Lawrence
Erlbaum Associates, 1990.
J. Canny A Computational Approach to Edge Detection, IEEE Transactions on Pattern Analysis and
Machine Intelligence, Vol 8, No. 6, Nov 1986.
T. Crimmins The Geometric filter for Speckle Reduction, Applied Optics, Vol. 24, No. 10, 15 May
1985.
E. Davies Machine Vision: Theory, Algorithms and Practicalities, Academic Press, 1990.
G. Dodd and L. Rossol (eds) Computer Vision and Sensor-Based Robotics, Plenum Press, 1979.
R. Duda and P. Hart Pattern Classification and Scene Analysis, John Wiley & Sons, 1978.
D. Elliott and K. Rao Fast Transforms: Algorithms and Applications, Academic Press, 1983.
K. Fu, R. Gonzalez and C. Lee Robotics: Control, Seeing, Vision and Intelligence, McGraw-Hill,
1987.
R. Gonzalez and P. Wintz Digital Image Processing, 2nd edition, Addison-Wesley Publishing
Company, 1987.
R. Gonzalez and R. Woods Digital Image Processing, Addison-Wesley Publishing Company, 1992.
W. Green Digital Image Processing - A Systems Approach, Van Nostrand Reinhold Co., 1983.
R. Haralick and L. Shapiro Computer and Robot Vision, Addison-Wesley Publishing Company, 1992.
B. Horn and M. Brooks (eds) Shape from Shading, MIT Press, 1989.
T. Huang (ed) Advances in Computer Vision and Image Processing, Vol 2, JAI Press, 1986.
IEEE Trans. Circuits and Syst. Special Issue on Digital Filtering and Image Processing, Vol CAS-2,
1975.
IEEE Trans. Inform. Theory, Special Issue on Quantization, Vol IT-28, 1982.
IEEE Trans. Pattern Analysis and Machine Intelligence Special Issue on Industrial Machine Vision
and Computer Vision Technology, Vol 10, 1988.
L. Kanal and A. Rosenfeld (eds) Progress in Pattern Recognition, Vol 2, North Holland, 1985.
J. Kittler and M. Duff (eds) Image Processing System Architectures, Research Studies Press Ltd, 1985.
Y. Pao Adaptive Pattern Recognition and Neural Networks, Addison-Wesley Publishing Company,
1989.
W. Pratt Digital Image Processing, 2nd edition, John Wiley & Sons, 1991.
Proc. IEEE Special Issue on Digital Picture Processing, Vol 60, 1972.
A. Rosenfeld and A. Kak Digital Picture Processing, Vols 1 and 2, Academic Press, 1982.
A. Rosenfeld and J.L. Pfaltz Distance Functions in Digital Pictures, Pattern Recognition, Vol 1, 1968.
J. Serra Images Analysis and Mathematical Morphology, Academic Press, 1982. R. Schalkoff Digital
Image Processing and Computer Vision, John Wiley & Sons, 1989.
J. Stoffel (ed) Graphical and Binary Image Processing and Applications, Artech House, 1982.
R. Tsai A Versatile Camera Calibration Technique for High-Accuracy 3D Machine Vision Metrology
Using Off-the-Shelf TV Cameras and Lenses, IEEE Journal of Robotics and Automation, 1987.
S. Ullman and W. Richards (eds) Image Understanding 1984, Ablex Publishing Co, 1984.
A. Zisserman Notes on Geometric and Invariance in Vision, British Machine Vision Association and
Society for Pattern Recognition, 1992, Ch. 2.
Local References
Additional useful references may be added here by the person who installed HIPR on your system.
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
Acknowledgements
The creation of HIPR was a team effort which would not have been possible without the advice,
contributions and criticisms of many people.
In particular, we would like to thank the following for advice on what should be included in HIPR: Greg
Briarty (Nottingham University), Charles Duncan (Edinburgh University) and Ata Etemadi (Imperial
College).
We would like to thank the staff and students of Edinburgh's Meteorology (Charles Duncan), Physics
(William Hossack) and Artificial Intelligence for being guinea pigs with HIPR as it was being
developed.
Many of the input images were acquired specifically for HIPR, however, we would also like to thank
those individuals and organizations who, without endorsing the material in the HIPR package, donated
images from their own research or private photo collections. They (and their contributions) include: Ms.
R. Aguilar-Chongtay (wom3, cel4), Calibrated Imaging Laboratory at Carnegie Mellon University (st00
- st11), Dr J. Darbyshire (soi1), Dr. A. Dil (puf1, crs1, cot1, goo1, dov1, pdc1, bri1, cas1, lao1, win1),
Ms. S. Flemming (hse2, hse3, hse4), Dr M. Foster (mri1, mri2), Mr. X. Huang (bab4, wom5, wom6,
txt3), Mr. A. Hume (mam1, mam2), Dr. G. Hayes (wom4), Dr. N. Karssemeijer (stl1, stl2, slt1, slt2), Dr.
B. Lotto (cel1, cel2, cla3, clb3, clc3, cel5, cel6, cel7, axn1, axn2), Dr C. Maltin (mus1), Dr N. Martin
(alg1), Mr. A. MacLean (pcb1, pcb2), Meteorology Department, University of Edinburgh (air1, avs1,
bir1, bvs1, eir1, evs1, sir1, svs1, uir1, uvs1), Mr. A. Miller (tst2), NASA & (specifically) Dryden
Research Aircraft Photo Archive (lun1, shu1, shu3, fei1, arp1, arp2, ctr1, mof1), Pilot European Image
Processing Archive (PEIPA) (rot1), Ms. S. Price (grd1), Dr K. Ritz (fun1), Rutgers University (and
specifically J. Kouhia for notes on file conversion) (bur1, cor1, stw1, gra1, tre1, mic1, mar1, fld1, ppr1,
ppr2, rep1, pig1), Dr G. Simm (usd1, xra1), Dr N. Strachan (fsh2), Ms. S. Smith (grd1), Ms. V.
Temperton (rck1, rck2, rck3), Dr. P. Thanish (mam1, mam2), Mr. D. Walker (sff1, sfn1, sfc1), Mr. M.
Westhead (rob1), Dr F. Wright (dna1).
Thanks go to many helpful people in the Department of Artificial Intelligence for technical assistance
with the hypertext and image productions, including Neil Brown, Tim Colles, Andrew Fitzgibbon and
Martin Westhead.
Funding for the development of this project was provided by the United Kingdom's Joint Information
Systems Committee, under their New Technology Initiative, Project 194, and by the University of
Edinburgh. We greatly appreciate their support and advice through Tom Franklin (Manchester
University).
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
Index
Note: You may find it easiest to use your browser's searching facilities on this index in order to find the
entries you require.
● Distance metrics
● Distance transform
● Dithering
● Division
● Edge detectors
● Editing HTML and LaTeX directly
● Enhancement,
❍ Using LoG filter
● Erosion
● Exponential operator
● Filename conventions
● Fourier transform
● Frequency filters
● Gamma Correction
● Gaussian smoothing
● Greyscale images
● Guide to contents
● HIPRscript
● Hardcopy version of HIPR
● Histogram equalization
● Histogram, intensity
● Hit-and-miss transform
● Hough Transform
● How to use HIPR
● Idempotence
● Image editors
● Image formats
❍ Changing the default
● Image library
● Installation guide
● Invert
● Isotropic operators
● Kernel
● Khoros
● Laplacian filter
● Laplacian of Gaussian filter
● Line detection
● Local information
❍ Customization
● Logarithm
● Logical operators
● Look up tables
● Making changes to HIPR
● Marr edge detector
● Marr filter
● Masking
● Mathematical morphology
● Matlab
● Mean filter
● Medial axis transform
● Median filter
● Multi-spectral images
● Multiplication
● NAND
● NOR
● NOT
● Noise generation
● Normalization
● OR
● Opening
● Pixel connectivity
● Pixel values
● Pixels
● Prewitt edge detectors,
❍ Compass operator
❍ Gradient operator
● Primary colours
● RGB and colourspaces
● Raise to power
● Reflect
● Roberts cross
● Rotate
● Saturation
● Scale, geometric
● Scaling, greylevel
● Skeleton by zone of influence (SKIZ)
● Skeletonization
● Sobel edge detector
● Structuring elements
● Subtraction
● Thickening
● Thinning
● Thresholding
● Thresholding, adaptive
● Translate
● Unsharp Filter
● Visilog
● Voronoi diagram
● Welcome to HIPR!
● What is HIPR?
● Wrapping
● XNOR
● XOR
● Zero crossing detector
● LaTeX,
❍ in practice
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
Introduction
Morphological operators often take a binary image and a structuring element as input and combine them
using a set operator (intersection, union, inclusion, complement). They process objects in the input
image based on characteristics of its shape, which are encoded in the structuring element. The
mathematical details are explained in Mathematical Morphology.
Usually, the structuring element is sized 3×3 and has its origin at the center pixel. It is shifted over the
image and at each pixel of the image its elements are compared with the set of the underlying pixels. If
the two sets of elements match the condition defined by the set operator (e.g. if set of pixels in the
structuring element is a subset of the underlying image pixels), the pixel underneath the origin of the
structuring element is set to a pre-defined value (0 or 1 for binary images). A morphological operator is
therefore defined by its structuring element and the applied set operator.
For the basic morphological operators the structuring element contains only foreground pixels (i.e. ones)
and `don't care's'. These operators, which are all a combination of erosion and dilation, are often used to
select or suppress features of a certain shape, e.g. removing noise from images or selecting objects with
a particular direction.
The more sophisticated operators take zeros as well as ones and `don't care's' in the structuring element.
The most general operator is the hit and miss, in fact, all the other morphological operators can be
deduced from it. Its variations are often used to simplify the representation of objects in a (binary) image
while preserving their structure, e.g. producing a skeleton of an object using skeletonization and tidying
the result up using thinning.
Morphological operators can also be applied to grey-level images, e.g. to reduce noise or to brighten the
image. However, for many applications, other methods like a more general spatial filter produces better
results.
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Dilation
Common Names: Dilate, Grow, Expand
Brief Description
Dilation is one of the two basic operators in the area of mathematical morphology, the other being erosion.
It is typically applied to binary images, but there are versions that work on greyscale images. The basic
effect of the operator on a binary image is to gradually enlarge the boundaries of regions of foreground
pixels (i.e. white pixels, typically). Thus areas of foreground pixels grow in size while holes within those
regions become smaller.
How It Works
Useful background to this description is given in the mathematical morphology section of the Glossary.
The dilation operator takes two pieces of data as inputs. The first is the image which is to be dilated. The
second is a (usually small) set of coordinate points known as a structuring element (also known as a kernel).
It is this structuring element that determines the precise effect of the dilation on the input image.
Suppose that X is the set of Euclidean coordinates corresponding to the input binary image,
and that K is the set of coordinates for the structuring element.
Then the dilation of X by K is simply the set of all points x such that the intersection of Kx
with X is non-empty.
The mathematical definition of greyscale dilation is identical except for the way in which the set of
coordinates associated with the input image is derived. In addition, these coordinates are 3-D rather than 2-
D. More details can be found under mathematical morphology.
As an example of binary dilation, suppose that the structuring element is a 3×3 square, with the origin at its
center as shown in Figure 1. Note that in this and subsequent diagrams, foreground pixels are represented
To compute the dilation of a binary input image by this structuring element, we consider each of the
background pixels in the input image in turn. For each background pixel (which we will call the input pixel)
we superimpose the structuring element on top of the input image so that the origin of the structuring
element coincides with the input pixel position. If at least one pixel in the structuring element coincides
with a foreground pixel in the image underneath, then the input pixel is set to the foreground value. If all
the corresponding pixels in the image are background however, the input pixel is left at the background
value.
For our example 3×3 structuring element, the effect of this operation is to set to the foreground colour any
background pixels that have a neighbouring foreground pixel (assuming 8-connectedness). Such pixels
must lie at the edges of white regions, and so the practical upshot is that foreground regions grow (and
holes inside a region shrink).
Dilation is the dual of erosion i.e. dilating foreground pixels is equivalent to eroding the background pixels.
The structuring element may have to be supplied as a small binary image, or in a special matrix format, or it
may simply be hardwired into the implementation, and not require specifying at all. In this latter case, a 3×3
square structuring element is normally assumed which gives the expansion effect described above. The
effect of a dilation using this structuring element on a binary image is shown in Figure 2.
The 3×3 square is probably the most common structuring element used in dilation operations, but others can
be used. A larger structuring element produces a more extreme dilation effect, although usually very similar
effects can be achieved by repeated dilations using a smaller but similarly shaped structuring element. With
larger structuring elements, it is quite common to use an approximately disk shaped structuring element, as
opposed to a square one.
shows a thresholded image of . illustrates the basic effect of dilation on the binary
image, produced by two dilation passes using a disc shaped structuring element of 11 pixels radius. Note
that the corners have been rounded off. In general, when dilating by a disc shaped structuring element,
convex boundaries will become rounded, and concave boundaries will be preserved as they are.
Dilations can be made directional by using less symmetrical structuring elements. e.g. a structuring element
that is 10 pixels wide and 1 pixel high will dilate in a horizontal direction only. Similarly, a 3×3 square
structuring element with the origin in the middle of the top row rather than the center, will dilate the bottom
of a region more strongly than the top.
Greyscale dilation with a flat disc shaped structuring element will generally brighten the image. Bright
regions surrounded by dark regions grow in size, and dark regions surrounded by bright regions shrink in
size. Small dark spots in images will disappear as they get `filled in' to the surrounding intensity value.
Small bright spots will become larger spots. The effect is most marked at places in the image where the
intensity changes rapidly and regions of fairly uniform intensity will be largely unchanged except at their
edges. Figure 3 shows a vertical cross section through a greylevel image and the effect of dilation using a
disc shaped structuring element.
Figure 3 Greylevel dilation using a disc shaped structuring element. The graphs show a
vertical cross section through a greylevel image.
shows the basic effects of greylevel dilation. This was produced from by two erosion
passes using a 3×3 flat square structuring element. The highlights on the bulb surface have increased in size
and have also become squared off as an artifact of the structuring element shape. The dark body of the cube
has shrunk in size since it is darker than its surroundings, while within the outlines of the cube itself, the
darkest top surface has shrunk the most. Many of the surfaces have a more uniform intensity since dark
spots have been filled in by the dilation. shows the effect of five passes of the same dilation
operator on the original image.
There are many specialist uses for dilation. For instance it can be used to fill in small spurious holes
(`pepper noise') in images. shows an image containing pepper noise, shows the result of
dilating this image with a 3×3 square structuring element. Note that although the noise has been effectively
removed, the image has been degraded significantly. Compare the result with that described under closing.
Dilation can also be used for edge detection by taking the dilation of an image and then subtracting away
the original image, thus highlighting just those new pixels at the edges of objects that were added by the
dilation. For example starting with again, we first dilate it using 3×3 square structuring element,
and then subtract away the original image to leave just the edge of the object as shown in .
Finally, dilation is also used as the basis for many other mathematical morphology operators, often in
combination with some logical operators. A simple example is region filling which is illustrated using
. This image and all the following results were zoomed with a factor of 16 for a better display, i.e.
each pixel during the processing corresponds to a 16×16 pixel square in the displayed images. Region
filling applies logical NOT, logical AND and dilation iteratively. The process can be described with the
following formula:
where is the region which after convergence fills the boundary, J is the structuring element and is
the negative of the boundary. This combination of the dilation operator and a logical operator is also known
as conditional dilation.
Imagine that we know , i.e. one pixel which lies inside the region shown in the above image, e.g. .
First, we dilate the image containing the single pixel using a mask as shown in Figure 1 resulting in .
To prevent the growing region from crossing the boundary, we AND it with which is the negative of
the boundary. Dilating the resulting image, , yields . ANDing this image with the inverted
and finally . ORing this image with the initial boundary yields the final result, as can be seen
in .
Many other morphological algorithms make use of dilation, and some of the most common ones are
described here. An example in which dilation is used in combination with other morphological operators is
the pre-processing for automated character recognition described in the thinning section.
Exercises
1. What would be the effect of a dilation using the cross shaped structuring element shown in Figure 4?
2. What would happen if the boundary shown in the region filling example is disconnected at one
point ? What could you do to fix that problem ?
3. What would happen if the boundary in the region filling example is 8-connected. What should the
structuring element look like in this case ?
4. How might you use conditional dilation to determine a connected component given one point of this
component.
5. What problems are there with using dilation to fill small noisy holes in objects?
References
R. Gonzalez and R. Woods Digital Image Processing, Addison-Wesley Publishing Company, 1992, pp
518 - 519, 549.
R. Haralick and L. Shapiro Computer and Robot Vision, Vol. 1, Chap. 5, Addison-Wesley Publishing
Company, 1992.
Local Information
General advice about the local HIPR installation is available here
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
Erosion
Common Names: Erode, Shrink, Reduce
Brief Description
Erosion is one of the two basic operators in the area of mathematical morphology, the other being dilation.
It is typically applied to binary images, but there are versions that work on greyscale images. The basic
effect of the operator on a binary image is to erode away the boundaries of regions of foreground pixels (i.e.
white pixels, typically). Thus areas of foreground pixels shrink in size, and holes within those areas become
larger.
How It Works
Useful background to this description is given in the mathematical morphology section of the Glossary.
The erosion operator takes two pieces of data as inputs. The first is the image which is to be eroded. The
second is a (usually small) set of coordinate points known as a structuring element (also known as a kernel).
It is this structuring element that determines the precise effect of the erosion on the input image.
Suppose that X is the set of Euclidean coordinates corresponding to the input binary image,
and that K is the set of coordinates for the structuring element.
Then the erosion of X by K is simply the set of all points x such that Kx is a subset of X.
The mathematical definition for greyscale erosion is identical except in the way in which the set of
coordinates associated with the input image is derived. In addition, these coordinates are 3-D rather than 2-
D. More details to be found under mathematical morphology.
As an example of binary erosion, suppose that the structuring element is a 3×3 square, with the origin at its
center as shown in Figure 1. Note that in this and subsequent diagrams, foreground pixels are represented
by 1's and background pixels by 0's.
To compute the erosion of a binary input image by this structuring element, we consider each of the
foreground pixels in the input image in turn. For each foreground pixel (which we will call the input pixel)
we superimpose the structuring element on top of the input image so that the origin of the structuring
element coincides with the input pixel coordinates. If for every pixel in the structuring element, the
corresponding pixel in the image underneath is a foreground pixel, then the input pixel is left as it is. If any
of the corresponding pixels in the image are background however, the input pixel is also set to background
value.
For our example 3×3 structuring element, the effect of this operation is to remove any foreground pixel that
is not completely surrounded by other white pixels (assuming 8-connectedness). Such pixels must lie at the
edges of white regions, and so the practical upshot is that foreground regions shrink (and holes inside a
region grow).
Erosion is the dual of dilation i.e. eroding foreground pixels is equivalent to dilating the background pixels.
The structuring element may have to be supplied as a small binary image, or in a special matrix format, or it
may simply be hardwired into the implementation, and not require specifying at all. In this latter case, a 3×3
square structuring element is normally assumed which gives the shrinking effect described above. The
effect of an erosion using this structuring element on a binary image is shown in Figure 2.
The 3×3 square is probably the most common structuring element used in erosion operations, but others can
be used. A larger structuring element produces a more extreme erosion effect, although usually very similar
effects can be achieved by repeated erosions using a smaller similarly shaped structuring element. With
larger structuring elements, it is quite common to use an approximately disk shaped structuring element, as
opposed to a square one.
is the result of eroding four times with a disc shaped structuring element 11 pixels in
diameter. It shows that the hole in the middle of the image increases in size as the border shrinks. Note that
the shape of the region has been quite well preserved due to the use of a disc shaped structuring element. In
general, erosion using a disc shaped structuring element will tend to round concave boundaries, but will
preserve the shape of convex boundaries.
Erosions can be made directional by using less symmetrical structuring elements. e.g. a structuring element
that is 10 pixels wide and 1 pixel high will erode in a horizontal direction only. Similarly, a 3×3 square
structuring element with the origin in the middle of the top row rather than the center, will erode the bottom
of a region more severely than the top.
Greyscale erosion with a flat disc shaped structuring element will generally darken the image. Bright
regions surrounded by dark regions shrink in size, and dark regions surrounded by bright regions grow in
size. Small bright spots in images will disappear as they get eroded away down to the surrounding intensity
value, and small dark spots will become larger spots. The effect is most marked at places in the image
where the intensity changes rapidly, and regions of fairly uniform intensity will be left more or less
unchanged except at their edges. Figure 3 shows a vertical cross section through a greylevel image and the
effect of erosion using a disc shaped structuring element. Note that the flat disc shaped kernel causes small
Figure 3 Greylevel erosion using a disc shaped structuring element. The graphs show a
vertical cross section through a greylevel image.
illustrates greylevel erosion. It was produced from by two erosion passes using a 3×3 flat
square structuring element. Note that the highlights have disappeared, and that many of the surfaces seem
more uniform in appearance due to the elimination of bright spots. The body of the cube has grown in size
since it is darker than its surroundings. shows the effect of five passes of the same erosion operator
on the original image.
There are many specialist uses for erosion. One of the more common is to separate touching objects in a
binary image so that they can be counted using a labeling algorithm. shows a number of dark discs
(coins in fact) silhouetted against a light background. shows the result of thresholding the image at
pixel value 90. It is required to count the coins. However, this is not going to be easy since the touching
coins form a single fused region of white, and a counting algorithm would have to first segment this region
into separate coins before counting - a non-trivial task. The situation can be much simplified by eroding the
image. shows the result of eroding twice using a disc shaped structuring element 11 pixels in
diameter. All the coins have been separated neatly and the original shape of the coins has been largely
preserved. At this stage a labeling algorithm can be used to count the coins. The relative sizes of the coins
can be used to distinguish the various types by, for example, measuring the area of each distinct region.
is derived from the same input picture, but a 9×9 square structuring element is used instead of a
disc (the two structuring elements have approximately the same area). The coins have been clearly
separated as before, but the square structuring element has led to distortion of the shapes, which is some
situations could cause problems in identifying the regions after erosion.
Erosion can also be used to remove small spurious bright spots (`salt noise') in images. shows an
image with salt noise, and shows the result of erosion with a 3×3 square structuring element. Note
that although the noise has been removed, the rest of the image has been degraded significantly. Compare
this with the same task using opening.
We can also use erosion for edge detection by taking the erosion of an image and then subtracting it away
from the original image, thus highlighting just those pixels at the edges of objects that were removed by the
erosion. An example of a very similar technique using is given in the section dealing with dilation.
Finally, erosion is also used as the basis for many other mathematical morphology operators.
Exercises
1. What would be the effect of an erosion using the cross shaped structuring element shown in Figure
4?
2. Is there any difference in the final result between applying a 3×3 square structuring element twice to
an image, and applying a 5×5 square structuring element just once to the image? Which do you think
would be faster and why?
3. When using large structuring elements, why does a disc shaped structuring element tend to preserve
the shapes of convex objects better than a square structuring element?
4. Use erosion in the way described above to detect the edges of . Is the result different to the
one obtained with dilation ?
References
R. Gonzalez and R. Woods Digital Image Processing, Addison-Wesley Publishing Company, 1992, pp
518, 512, 550.
R. Haralick and L. Shapiro Computer and Robot Vision, Vol. 1, Chap. 5, Addison-Wesley Publishing
Company, 1992.
Local Information
General advice about the local HIPR installation is available here
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
Opening
Common Names: Opening
Brief Description
Opening and closing are two very important operators from mathematical morphology. They are both
derived from the fundamental operations of erosion and dilation. Like those operators they are normally
applied to binary images, although there are also greylevel versions. The basic effect of an opening is
somewhat like erosion in that it tends to remove some of the foreground (bright) pixels from the edges of
regions of foreground pixels. However it is less destructive than erosion in general. As with other
morphological operators, the exact operation is determined by a structuring element. The effect of the
operator is to preserve foreground regions that have a similar shape to this structuring element, or that can
completely contain the structuring element, while eliminating all other regions of foreground pixels.
How It Works
Very simply, an opening is defined as an erosion followed by a dilation using the same structuring element
for both operations. See the sections on erosion and dilation for details of the individual steps. The opening
operator therefore requires two inputs: an image to be opened, and a structuring element.
Opening is the dual of closing i.e. opening the foreground pixels with a particular structuring element is
equivalent to closing the background pixels with the same element.
preserved. However, all foreground pixels which can not be reached by the structuring element without
parts of it moving out of the foreground region will be eroded away. After the opening has been carried
out, the new boundaries of foreground regions will all be such that the structuring element fits inside them,
and so further openings with the same element have no effect. The property is known as idempotence. The
effect of an opening on a binary image using a 3×3 square structuring element is illustrated in Figure 1.
As with erosion and dilation, it is very common to use this 3×3 structuring element. The effect in the above
figure is rather subtle since the structuring element is quite compact and so it fits into the foreground
boundaries quite well even before the opening operation. To increase the effect, multiple erosions are often
performed with this element followed by the same number of dilations. This effectively performs an
opening with a larger square structuring element.
The effect is more pronounced with larger structuring elements. Consider which is a binary image
containing a mixture of circles and lines. Suppose that we want to separate out the circles from the lines, so
that they can be counted. Opening with a disc shaped structuring element 11 pixels in diameter gives
. Some of the circles are slightly distorted, but in general, the lines have been almost completely
removed while the circles remain almost completely unaffected.
shows another binary image. Suppose that this time we wish to separately extract the horizontal and
vertical lines. shows the result of an opening with a 3×9 vertically oriented structuring element.
shows what happens if we use a 9×3 horizontally oriented structuring element instead. Note that
there are a few glitches in this last image where the diagonal lines cross vertical lines. These could easily
be eliminated however using a slightly longer structuring element.
Unlike erosion and dilation, the position of the origin of the structuring element does not really matter for
opening and closing, the result is independent of it.
Greylevel opening can similarly be used to select and preserve particular intensity patterns while
attenuating others. As a simple example we start with and then perform greylevel opening with a
flat 5×5 square structuring element to produce . The important thing to notice here is the way in
which bright features smaller than the structuring element have been greatly reduced in intensity, while
larger features have remained more or less unchanged in intensity. Thus the fine grained hair and whiskers
in the image have been much reduced in intensity, while the nose region is still at much the same intensity
as before. Note that the image does have a more matt appearance than before since the opening has
eliminated small specularities and texture fluctuations.
Similarly, opening can be used to remove `salt noise' in images. shows an image containing salt
noise, and shows the result of opening with a 3×3 square structuring element. The noise has been
entirely removed with relatively little degradation of the underlying image. However, if the noise consists
of dark points (i.e. `pepper noise') as it can be seen in , greylevel opening yields . Here, no
noise has been removed. At some places where two nearby noise pixels have merged into one larger point,
the noise level has even been increased. In this case of `pepper noise', greylevel closing is a more
appropriate operator.
As we have seen, opening can be very useful for separating out particularly shaped objects from the
background, but it is far from being a universal 2-D object recognizer/segmenter. For instance if we try and
use a long thin structuring element to locate, say, pencils in our image, any one such element will only find
pencils at a particular orientation. If it is necessary to find pencils at other orientations then differently
oriented elements must be used to look for each desired orientation. It is also necessary to be very careful
that the structuring element chosen does not eliminate too many desirable objects, or retain too many
undesirable ones, and sometimes this can be a delicate or even impossible balance.
Consider, for example, , which contains two kinds of cells: small, black ones and larger, grey ones.
Thresholding the image at a value of 210 yields , in which both kinds of cells are separated from the
background. We want to retain only the large cells in the image, whilest removing the small ones. This can
be done with straightforward opening. Using a 11 pixel circular structuring element yields . Most of
the desired cells are in the image, whereas none of the black cells remained. However, we cannot find any
structuring element which allows us to detect the small cells and remove the large ones. Every mask that is
small enough so that the dark cells remain in the image would not remove the large cells, either. This is
illustrated in , which is the result of applying a 7 pixel wide circular mask to the thresholded image.
Common Variants
It is common for opening to be used in conjunction with closing to achieve more subtle effects as described
in the section on closing.
Exercises
1. Apply opening to using square structuring elements of increasing size. Compare the results
obtained with the different sizes. If your implementation of the operator does not support greylevel
opening, threshold the input image.
2. How can you detect the small cells in the above example while removing the large cells. Use
the closing operator with structuring elements at different sizes in combination with some logical
operator.
3. Describe two 2-D object shapes (different from the ones shown in ) that simple opening
could distinguish between, when the two are mixed together in a loose flat pile. What would be the
appropriate structuring elements to use?
4. Now describe two 2-D shapes that opening couldn't distinguish between.
5. Can you explain why the position of the origin within the structuring element does not affect the
result of the opening, when it does make a difference for both erosion and dilation?
References
http://www.cee.hw.ac.uk/hipr/html/open.html (4 of 5)12/04/2007 12:14:05
Morphology - Opening
R. Haralick and L. Shapiro Computer and Robot Vision, Vol 1, Addison-Wesley Publishing Company,
1992, Chap 5, pp 174-185.
Local Information
General advice about the local HIPR installation is available here
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
Closing
Common Names: Closing
Brief Description
Closing is an important operator from the field of mathematical morphology. Like its dual operator
opening, it can be derived from the fundamental operations of erosion and dilation. Like those operators it
is normally applied to binary images, although there are also greylevel versions. Closing is similar in some
ways to dilation in that it tends to enlarge the boundaries of foreground (bright) regions in an image (and
shrink background colour holes in such regions), but it is less destructive of the original boundary shape.
As with other morphological operators, the exact operation is determined by a structuring element. The
effect of the operator is to preserve background regions that have a similar shape to this structuring
element, or that can completely contain the structuring element, while eliminating all other regions of
background pixels.
How It Works
An closing is just an opening performed in reverse. It is defined simply as a dilation followed by an erosion
using the same structuring element for both operations. See the sections on erosion and dilation for details
of the individual steps. The closing operator therefore requires two inputs: an image to be closed, and a
structuring element.
Closing is the dual of opening i.e. closing the foreground pixels with a particular structuring element is
equivalent to closing the background with the same element.
and sliding it around outside each foreground region, without changing it's orientation. For any background
boundary point, if the structuring element can be made to touch that point, without any part of the element
being inside a foreground region, then that point remains background. If this is not possible, then the pixel
is set to foreground. After the closing has been carried out the background region will be such that the
structuring element can be made to cover any point in the background without any part of it also covering a
foreground point, and so further closings will have no effect. This property is known as idempotence. The
effect of a closing on a binary image using a 3×3 square structuring element is illustrated in Figure 1.
As with erosion and dilation, this particular 3×3 structuring element is the most commonly used, and in
fact many implementations will have it hardwired into their code, in which case it is obviously not
necessary to specify a separate structuring element. To achieve the effect of a closing with a larger
structuring element, it is possible to perform multiple dilations followed by the same number of erosions.
Closing can sometimes be used to selectively fill in particular background regions of an image. Whether or
not this can be done depends upon whether a suitable structuring element can be found that fits well inside
regions that are to be preserved, but doesn't fit inside regions that are to be removed.
is an image containing large holes and small holes. If it is desired to remove the small holes while
retaining the large holes, then we can simply perform a closing with a disc-shaped structuring element with
a diameter larger than the smaller holes, but smaller than the large holes.
is the result of a closing with a 22 pixel diameter disc. Note that the thin black ring has also been
filled in as a result of the closing operation.
In real world applications, closing can, for example, be used to enhance binary images of objects obtained
from thresholding. Consider that we want compute the skeleton of . To do this we first need to
transform the greylevel image into a binary image. Simply thresholding the image at a value of 100 yields
. We can see that the threshold classified some parts of the receiver as background. is the
result of closing the thresholded image with a circular mask of size 20. The advantage of this image
becomes obvious when we compare the skeletons of the two binary images. is the skeleton of the
image which was only thresholded and is the skeleton of the image produced by the closing
operator. We can see that the latter skeleton is less complex and better represents the shape of the object.
Unlike erosion and dilation, the position of the origin of the structuring element does not really matter for
opening and closing. The result is independent of it.
Greylevel closing can similarly be used to select and preserve particular intensity patterns while
attenuating others.
is the result of greylevel closing with a flat 5×5 square structuring element. Notice how the dark
specks in between the bright spots in the hair have been largely filled in to the same colour as the bright
spots, while the more uniformly coloured nose area is largely the same intensity as before. Similarly the
gaps between the white whiskers have been filled in.
shows the result of a closing with a 3×3 square structuring element. The noise has been completely
removed with only a little degradation to the underlying image. If, on the other hand, the noise consists of
bright spots (i.e. `salt noise'), as can be seen in , closing yields . Here, no noise has been
removed. The noise has even been increased at locations where two nearby noise pixels have merged
together into one larger spot. Compare these results with the ones achieved on the same image using
opening.
Although closing can sometimes be used to preserve particular intensity patterns in an image while
attenuating others, this is not always the case. Some aspects of this problem are discussed under opening.
Common Variants
Opening and closing are themselves often used in combinations together to achieve more subtle results. If
we represent the closing of an image f by C(f), and its opening by O(f), then some common combinations
include:
Proper Opening
Min(f, C(O(C(f))))
Proper Closing
Max(f, O(C(O(f))))
Automedian Filter
Max(O(C(O(f))), Min(f, C(O(C(f)))))
Exercises
1. Use closing to remove the lines from whereas the circles should remain. Do you manage
to remove all lines ?
Now use closing to remove the circles whilest keeping the lines. Is it possible to achieve this with
only one structuring element ?
2. Can you use closing to remove certain features (e.g. the diagonal lines) from ? Try it out.
References
R. Gonzalez and R. Woods Digital Image Processing, Addison-Wesley Publishing Company, 1992, pp
524, 552.
R. Haralick and L. Shapiro Computer and Robot Vision, Vol 1, Addison-Wesley Publishing Company,
1992, pp 174-185.
Local Information
General advice about the local HIPR installation is available here
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
Hit-and-Miss Transform
Common Names: Hit-and-miss Transform, Hit-or-miss Transform
Brief Description
The hit-and-miss transform is a general binary morphological operation that can be used to look for
particular patterns of foreground and background pixels in an image. It is actually the basic operation of
binary morphology since almost all the other binary morphological operators can be derived from it. As
with other binary morphological operators it takes as input a binary image and a structuring element, and
produces another binary image as output.
How It Works
The structuring element used in the hit-and-miss is a slight extension to the type that has been introduced
for erosion and dilation, in that it can contain both foreground and background pixels, rather than just
foreground pixels, i.e. both ones and zeros. Note that the simpler type of structuring element used with
erosion and dilation is often depicted containing both ones and zeros as well, but in that case the zeros
really stand for `don't care's', and are just used to fill out the structuring element to a convenient shaped
kernel, usually a square. In all our illustrations, these `don't care's' are shown as blanks in the kernel in
order to avoid confusion. An example of the extended kind of structuring element is shown in Figure 1. As
usual we denote foreground pixels using ones, and background pixels using zeros.
The hit-and-miss operation is performed in much the same way as other morphological operators, by
translating the origin of the structuring element to all points in the image, and then comparing the
structuring element with the underlying image pixels. If the foreground and background pixels in the
structuring element exactly match foreground and background pixels in the image, then the pixel
underneath the origin of the structuring element is set to the foreground colour. If it doesn't match, then
that pixel is set to the background colour.
For instance, the structuring element shown in Figure 1 can be used to find right angle convex corner
points in images. Notice that the pixels in the element form the shape of a bottom-left convex corner. We
assume that the origin of the element is at the center of the 3×3 element. In order to find all the corners in a
binary image we need to run the hit-and-miss transform four times with four different elements
representing the four kinds of right angle corners found in binary images. Figure 2 shows the four different
elements used in this operation.
Figure 2 Four structuring elements used for corner finding in binary images using the hit-
and-miss transform. Note that they are really all the same element, but rotated by different
amounts.
After obtaining the locations of corners in each orientation, We can then simply OR all these images
together to get the final result showing the locations of all right angle convex corners in any orientation.
Figure 3 shows the effect of this corner detection on a simple binary image.
Figure 3 Effect of the hit-and-miss based right angle convex corner detector on a simple
binary image. Note that the `detector' is rather sensitive.
Implementations vary as to how they handle the hit-and-miss transform at the edges of images where the
structuring element overlaps the edge of the image. A simple solution is to simply assume that any
structuring element that overlaps the image does not match underlying pixels, and hence the corresponding
pixel in the output should be set to zero.
The hit-and-miss transform has many applications in more complex morphological operations. It is being
used to construct the thinning and thickening operators and hence, for all applications explained in these
worksheets.
The operations of erosion, dilation, opening, closing, thinning and thickening can all be derived from the
hit-and-miss transform in conjunction with simple set operations.
Figure 4 illustrates some structuring elements that can be used for locating various binary features.
Figure 4 Some applications of the hit-and-miss transform. 1 is used to locate isolated points
in a binary image. 2 is used to locate the end points on a binary skeleton Note that this
structuring element must be used in all its rotations so four hit-and-miss passes are required.
3a and 3b are used to locate the triple points (junctions) on a skeleton. Both structuring
elements must be run in all orientations so eight hit-and-miss passes are required.
shows the triple points (i.e. points where three lines meet) of the skeleton. Note that the hit-and-miss
transform itself merely outputs single foreground pixels at each triple point (the rest of the output image
being black). To produce our example here, this image was then dilated once using a cross-shaped
structuring element in order to mark these isolated points clearly, and this was then ORed with the original
skeleton in order to produce the overlay.
shows the end points of the skeleton. This image was produced in a similar way to the triple point
image above, except of course that a different structuring element was used for the hit-and-miss operation.
In addition, the isolated points produced by the transform were dilated with a square in order to mark them,
rather than with a cross.
The successful use of the hit-and-miss transform relies on being able to think of a relatively small set of
binary patterns that capture all the possible variations and orientations of a feature that is to be located. For
features larger than a few pixels across this is often not feasible.
Exercises
1. How can the hit-and-miss transform be used to perform erosion?
2. How can the hit-and-miss transform, together with the NOT operation, be used to perform dilation?
3. What is the smallest number of different structuring elements that you would need to use to locate
all foreground points in an image which have at least one foreground neighbour, using the hit-and-
miss transform? What do the structuring elements look like?
References
R. Gonzalez and R. Woods Digital Image Processing, Addison-Wesley Publishing Company, 1992, p
528.
R. Haralick and L. Shapiro Computer and Robot Vision, Vol 1, Addison-Wesley Publishing Company,
1992, Chap 5, pp 168-173.
Local Information
General advice about the local HIPR installation is available here
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
Thinning
Common Names: Thinning
Brief Description
Thinning is a morphological operation that is used to remove selected foreground pixels from binary
images, somewhat like erosion or opening. It can be used for several applications, but is particularly useful
for skeletonization. In this mode it is commonly used to tidy up the output of edge detectors by reducing all
lines to single pixel thickness. Thinning is normally only applied to binary images, and produces another
binary image as output.
The thinning operation is related to the hit-and-miss transform, and so it is helpful to have an
understanding of that operator before reading on.
How It Works
Like other morphological operators, the behaviour of the thinning operation is determined by a structuring
element. The binary structuring elements used for thinning are of the extended type described under the hit-
and-miss transform (i.e. they can contain both ones and zeros).
The thinning operation is related to the hit-and-miss transform and can be expressed quite simply in terms
of it. The thinning of an image I by a structuring element J is:
In everyday terms, the thinning operation is calculated by translating the origin of the structuring element
to each possible pixel position in the image, and at each such position comparing it with the underlying
image pixels. If the foreground and background pixels in the structuring element exactly match foreground
and background pixels in the image, then the image pixel underneath the origin of the structuring element
is set to background (zero). Otherwise it is left unchanged. Note that the structuring element must always
The choice of structuring element determines under what situations a foreground pixel will be set to
background, and hence it determines the application for the thinning operation.
We have described the effects of a single pass of a thinning operation over the image. In fact, the operator
is normally applied repeatedly until it causes no further changes to the image (i.e. until convergence).
Alternatively, in some applications, e.g. pruning, the operations may only be applied for a limited number
of iterations.
Thinning is the dual of thickening, i.e. thickening the foreground is equivalent to thinning the background.
Consider all pixels on the boundaries of foreground regions (i.e. foreground points that have
at least one background neighbour). Delete any such point that has more than one foreground
neighbour, as long as doing so does not locally disconnect (i.e. split into two) the region
containing that pixel. Iterate until convergence.
This procedure erodes away the boundaries of foreground objects as much as possible, but does not affect
pixels at the ends of lines.
This effect can be achieved using morphological thinning by iterating until convergence with the
structuring elements shown in Figure 1, and all their 90° rotations (4×2 = 8 structuring elements in total).
In fact what we are doing here is determining the octagonal skeleton of a binary shape --- the set of points
that lie at the centers of octagons that fit entirely inside the shape, and which touch the boundary of the
shape at at least two points. See the section on skeletonization for more details on skeletons and on other
ways of computing it. Note that this skeletonization method is guaranteed to produce a connected skeleton.
Figure 2 shows the result of this thinning operation on a simple binary image.
Note that skeletons produced by this method often contain undesirable short spurs produced by small
irregularities in the boundary of the original object. These spurs can be removed by a process called
pruning, which is in fact just another sort of thinning. The structuring element for this operation is shown
in Figure 3, along with some other common structuring elements.
Figure 3 Some applications of thinning. 1 simply finds the boundary of a binary object, i.e.
it deletes any foreground points that don't have at least one neighbouring background point.
Note that the detected boundary is 4-connected. 2 does the same thing but produces an 8-
connected boundary. 3a and 3b are used for pruning. At each thinning iteration, each
element must be used in each of its four 90° rotations. Pruning is normally carried out for
only a limited number of iterations to remove short spurs, since pruning until convergence
will actually remove all pixels except those that form closed loops.
Note that many implementations of thinning have a particular structuring element `hardwired' into them
(usually the skeletonization structuring elements), and so the user does not need to be concerned about
selecting one.
is the result of applying the Sobel operator to . Note that the detected boundaries of the
object are several pixels thick.
We first threshold the image at a greylevel value of 60 producing in order to obtain a binary
image.
Then, iterating the thinning algorithm until convergence, we get . The detected lines have all been
reduced to a single pixel width. Note however that there are still one or two `spurs' present, which can be
removed using pruning.
is the result of pruning (using thinning) for five iterations. The spurs are now almost entirely
gone.
Thinning is often used in combination with other morphological operators for extracting a simple
representation of regions. A common example is the automated recognition of hand-written characters. In
this case, morphological operators are used as pre-processing to obtain the shape of the characters which
then can be used for the recognition. We illustrate a simple example using , which shows a Japanese
character. Note that this and the following images were zoomed by a factor of 4 for a better display. Hence,
a 4×4 pixel square here corresponds to 1 pixel during the processing. Since we want to work on binary
images, we start off by thresholding the image at a value of 180, obtaining . A simple way to obtain
the skeleton of the character is to thin the image with the masks shown in Figure 4 until convergence. The
result is shown in .
The character is now reduced to a single pixel wide line. However, the line is broken at some locations,
which might cause problems during the recognition process. To improve the situation we can first dilate
the image to connect the lines before thinning it. Dilating the image twice with a 3×3 square mask yields
, then the result of the thinning is . The corresponding images for three dilations are and
. Although the line is now connected the process also had negative effects on the skeleton: we obtain
spurs on the end points of the lines and the skeleton changes its shape at high curvature locations.
Therefore, we try to prune the spurs by thinning the image using the masks shown in Figure 4.
Figure 4 Shows the masks used in the character recognition example. 1 shows the mask
used in combination with thinning to obtain the skeleton. 2 was used in combination with
thinning to prune the skeleton and with the hit-and-miss operator to find the end points of the
skeleton. Each mask was used in each of its 45° rotations.
Pruning the thinned image which was obtained after 2 dilations yields using two iterations for each
orientation of the mask. For the example obtained after 3 dilations we get using 4 iterations of
pruning. The spurs have now disappeared, however, the pruning has also suppressed pixels at the end of
correct lines. If we want to restore these parts of the image, we can combine the dilation operator with a
logical AND operator. First, we need to know the end points of the skeleton so that we know where to start
the dilation. We find these by applying a hit-and-miss operator using the mask shown in Figure 4. The end
points of the latter of the two pruned images are shown in . Now, we dilate this image using a 3×3
mask. ANDing it with the thinned, but not pruned image prevents the dilation from spreading out in all
direction, hence it limits the dilation along the original character. This process is known as conditional
dilation. After repeating this procedure 5 times, we obtain . Although one of the parasitic branches
disappeared, the ones appearing close to the end of the lines remain.
Our final step is to OR this image with the pruning output thus obtaining . This simple example
illustrates that we can successfully apply a variety of morphological operators to obtain information about
the shape of a character. However, in a real world application, more sophisticated algorithms and masks
would be necessary to get good results.
And is the effect of skeletonization by thinning. The result is a lot less clear than before. Compare
this with the results obtained using the Canny operator.
Exercises
1. What is the difference of a thinned line obtained from the slightly different skeleton masks in Figure
1 and Figure 4 ?
2. The conditional dilation in the character recognition example `followed' the original character not
only towards the initial end of the line but also backwards. Hence it also might restore unwanted
spurs which were located in this direction. Can you think of a way to avoid that using a second
condition?
3. Can you think of any situation in the character recognition example, in which the pruning mask
shown in Figure 4 might cause problems ?
4. Find the boundaries of using morphological edge detection. First threshold the image, then
apply thinning using the mask shown in Figure 3. Compare the result with which was
obtain using the Sobel operator and morphological post-processing (see above).
5. Compare and contrast the effect of the Canny operator with the combined effect of Sobel operator
plus thinning and pruning.
6. If an edge detector has produced long lines in its output that are approximately x pixels thick, what
is the longest length spurious spur (prune) that you could expect to see after thinning to a single
pixel thickness? Test your estimate out on some real images.
7. Hence, approximately how many iterations of pruning should be applied to remove spurious spurs
from lines that were thinned down from a thickness of x pixels?
References
R. Gonzalez and R. Woods Digital Image Processing, Addison-Wesley Publishing Company, 1992, pp
518 - 548.
E. Davies Machine Vision: Theory, Algorithms and Practicalities Academic Press, 1990, pp 149 - 161.
R. Haralick and L. Shapiro Computer and Robot Vision, Vol 1, Addison-Wesley Publishing Company,
1992, Chap 5, pp 168-173.
Local Information
General advice about the local HIPR installation is available here
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Thickening
Common Names: Thickening
Brief Description
Thickening is a morphological operation that is used to grow selected regions of foreground pixels in
binary images, somewhat like dilation or closing. It has several applications, including determining the
approximate convex hull of a shape, and determining the skeleton by zone of influence. Thickening is
normally only applied to binary images, and produces another binary image as output.
The thickening operation is related to the hit-and-miss transform, and so it is helpful to have an
understanding of that operator before reading on.
How It Works
Like other morphological operators, the behaviour of the thickening operation is determined by a
structuring element. The binary structuring elements used for thickening are of the extended type
described under the hit-and-miss transform (i.e. they can contain both ones and zeros).
The thickening operation is related to the hit-and-miss transform and can be expressed quite simply in
terms of it. The thickening of an image I by a structuring element J is:
Thus the thickened image consists of the original image plus any additional foreground pixels switched
on by the hit-and-miss transform.
In everyday terms, the thickening operation is calculated by translating the origin of the structuring
element to each possible pixel position in the image, and at each such position comparing it with the
underlying image pixels. If the foreground and background pixels in the structuring element exactly
match foreground and background pixels in the image, then the image pixel underneath the origin of the
structuring element is set to foreground (one). Otherwise it is left unchanged. Note that the structuring
element must always have a zero or a blank at its origin if it is to have any effect.
The choice of structuring element determines under what situations a background pixel will be set to
foreground, and hence it determines the application for the thickening operation.
We have described the effects of a single pass of a thickening operation over the image. In fact, the
operator is normally applied repeatedly until it causes no further changes to the image (i.e. until
convergence). Alternatively, in some applications, the operations may only be applied for a limited
number of iterations.
Thickening is the dual of thinning, i.e. thinning the foreground is equivalent to thickening the
background. In fact, in most cases thickening is performed by thinning the background.
The convex hull of a binary shape can be visualized quite easily by imagining stretching an elastic band
around the shape. The elastic band will follow the convex contours of the shape, but will `bridge' the
concave contours. The resulting shape will have no concavities and contains the original shape. Where
an image contains multiple disconnected shapes, the convex hull algorithm will determine the convex
hull of each shape, but will not connect disconnected shapes, unless their convex hulls happen to overlap
(e.g. two interlocked `U'-shapes).
An approximate convex hull can be computed using thickening with the structuring elements shown in
Figure 1. The convex hull computed using this method is actually a `45° convex hull' approximation, in
which the boundaries of the convex hull must have orientations that are multiples of 45°. Note that this
computation can be very slow.
Figure 1 Structuring elements for determining the convex hull using thickening. During
each iteration of the thickening, each element should be used in turn, and then in each of
their 90° rotations, giving 8 effective structuring elements in total. The thickening is
continued until no further changes occur, at which point the convex hull is complete.
is the result of applying the 45° convex hull algorithm described above. This process took a
considerable amount of time --- over 100 thickening passes with each of the eight structuring elements!
Another application of thickening is to determine the skeleton by zone of influence, or SKIZ. The SKIZ
is a skeletal structure that divides an image into regions, each of which contains just one of the distinct
objects in the image. The boundaries are drawn such that all points within a particular boundary are
closer to the binary object contained within that boundary than to any other. As with normal skeletons,
various possible distance metrics can be used. The SKIZ is also sometimes called the Voronoi diagram.
One method of calculating the SKIZ is to first determine the skeleton of the background, and then prune
this until convergence to remove all branches except those forming closed loops, or that intersect the
boundary. Both of these concepts are described (applied to foreground objects) under thinning. Since
thickening is the dual of thinning, we can accomplish the same thing using thickening. The structuring
elements used in the two processes are shown in Figure 2.
Figure 2 Structuring elements used in determining the SKIZ. 1a and 1b are used to
perform the skeletonization of the background. Note that these elements are just the duals
of the corresponding skeletonization by thinning elements. On each thickening iteration,
each element is used in turn, and in each of its 90° rotations. Thickening is continued until
convergence. When this is finished, structuring elements 2a and 2b are used in similar
fashion to prune the skeleton until convergence and leave behind the SKIZ.
We illustrate the SKIZ using the same starting image as for the convex hull.
shows the image after the skeleton of the background has been found.
And is the same image after pruning until convergence. This is the SKIZ of the original image.
Since the SKIZ considers each foreground pixel as an object to which it assigns a zone of influence, it is
rather sensitive to noise. If we, for example, add some `salt noise' to the above image, we obtain .
The SKIZ of that image is given by . Now, we not only have a zone of influence for each of the
crosses, but also for each of the noise points.
Since thickening is the dual to thinning, it can be applied for the same range of tasks as thinning. Which
operator is used depends on the polarity of the image, i.e. if the object is represented in black and the
background is white, the thickening operator thins the object.
Exercises
1. What would the convex hull look like if you used the structuring element shown in Figure 3.
Determine the convex hull of using this mask and compare it with the result obtained with
the structuring element shown in Figure 1.
3. Can you think of (or find out about) any uses for the SKIZ?
4. Use thickening and other morphological operators (e.g. erosion and opening) to process .
Reduce all lines to a single pixel width and try to obtain their maximum length.
References
R. Gonzalez and R. Woods Digital Image Processing, Addison-Wesley Publishing Company, 1992, pp
518 - 548.
R. Haralick and L. Shapiro Computer and Robot Vision, Vol 1, Addison-Wesley Publishing Company,
1992, Chap 5, pp 168-173.
Local Information
General advice about the local HIPR installation is available here
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
Brief Description
Skeletonization is a process for reducing foreground regions in a binary image to a skeletal remnant that
largely preserves the extent and connectivity of the original region while throwing away most of the
original foreground pixels. To see how this works, imagine that the foreground regions in the input
binary image are made of some uniform slow-burning material. Light fires simultaneously at all points
along the boundary of this region and watch the fire move into the interior. At points where the fire
traveling from two different boundaries meets itself, the fire will extinguish itself and the points at which
this happens form the so called `quench line'. This line is the skeleton. Under this definition it is clear
that thinning produces a sort of skeleton.
Another way to think about the skeleton is as the loci of centers of bi-tangent circles that fit entirely
within the foreground region being considered. Figure 1 illustrates this for a rectangular shape.
The terms medial axis transform (MAT) and skeletonization are often used interchangeably but we will
distinguish between them slightly. The skeleton is simply a binary image showing the simple skeleton.
The MAT on the other hand is a greylevel image where each point on the skeleton has an intensity
which represents its distance to a boundary in the original object.
How It Works
The skeleton/MAT can be produced in two main ways. The first is to use some kind of morphological
thinning that successively erodes away pixels from the boundary (while preserving the end points of line
segments) until no more thinning is possible at which point what is left approximates the skeleton. The
alternative method is to first calculate the distance transform of the image. The skeleton then lies along
the singularities (i.e. creases or curvature discontinuities) in the distance transform. This latter approach
is more suited to calculating the MAT since the MAT is the same as the distance transform but with all
points off the skeleton suppressed to zero.
Note: The MAT is often described as being the `locus of local maxima' on the distance transform. This is
not really true in any normal sense of the phrase `local maximum'. If the distance transform is displayed
as a 3-D surface plot with the third dimension representing the greyvalue, the MAT can be imagined as
the ridges on the 3-D surface.
The skeleton is useful because it provides a simple and compact representation of a shape that preserves
many of the topological and size characteristics of the original shape. Thus, for instance, we can get a
rough idea of the length of a shape by considering just the end points of the skeleton and finding the
maximally separated pair of end points on the skeleton. Similarly, we can distinguish many qualitatively
different shapes from one another on the basis of how many `triple points' there are i.e. points where at
least three branches of the skeleton meet.
In addition, to this, the MAT (not the pure skeleton) has the property that it can be used to exactly
reconstruct the original shape if necessary.
As with thinning, slight irregularities in a boundary will lead to spurious spurs in the final image which
may interfere with recognition processes based on the topological properties of the skeleton. Despurring
or pruning can be carried out to remove spurs of less than a certain length but this is not always effective
since small perturbations in the boundary of an image can lead to large spurs in the skeleton.
Note that some implementations of skeletonization algorithms produce skeletons that are not guaranteed
to be continuous, even if the shape they are derived from, is. This is a due to the fact that the algorithms
must of necessity run on a discrete grid. The MAT is actually the locus of slope discontinuities in the
distance transform.
Here are some example skeletons and MATs produced from simple shapes. Note that the MATs have
been contrast stretched in order to make them more visible.
The skeleton and the MAT are often very sensitive to small changes in the object. If, for example, the
algorithm which does not guarantee a connected skeleton yields . Sometimes this sensitivity might
be useful. Often, however, we need to extract the binary image from a greyscale image. In these cases, it
is often difficult to obtain the ideal shape of the object so that the skeleton becomes rather complex. We
illustrate this using . To obtain a binary image we threshold the image at a value of 100, thus
obtaining . The skeleton of the binary image, shown in , is much more complex than the
one we would obtain from the ideal shape of the telephone receiver. This example shows that simple
thresholding is often not sufficient to produce a useful binary image. Some further processing might be
necessary skeletonizing the image.
The skeleton is also very sensitive to noise. To illustrate this we add some `pepper noise' to the above
rectangle, thus obtaining . As it can be seen in , the corresponding skeleton connects each
noise point to the skeleton obtained from the noise free image.
Common Variants
It is also possible to skeletonize the background as opposed to the foreground of an image. This idea is
closely related to the dual of the distance transform mentioned in the thickening worksheet. This
skeleton is often called the SKIZ (SKeleton by Influence Zones).
Exercises
1. What would the skeleton of a perfect circular disk look like?
2. Why does the skeleton of look so strange? Can you say anything general about the effect
of holes in a shape on the skeleton of that shape?
3. Try to improve the binary image of the telephone receiver so that its skeleton becomes less
complex and better represents the shape of the receiver.
4. How can the MAT be used to reconstruct the original shape of the region it was derived from?
References
D. Ballard and C. Brown Computer Vision, Prentice-Hall, 1982, Chap 8.
E. Davies Machine Vision: Theory, Algorithms and Practicalities Academic Press, 1990, pp 149 - 161.
R. Haralick and L. Shapiro Computer and Robot Vision, Vol. 1, Addison-Wesley Publishing
Company, 1992, Chap 5.
Local Information
General advice about the local HIPR installation is available here
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
Introduction
The operators included in this section are those whose purpose is to identify meaningful image features
on the basis of distributions of pixel greylevels. The two categories of operators included here are:
Edge Pixel Detectors - that assign a value to a pixel in proportion to the likelihood that the pixel is
part of an image edge (i.e. a pixel which is on the boundary between two regions of different intensity
values).
Line Pixel Detectors - that assign a value to a pixel in proportion to the likelihood that the pixel is
part of a image line (i.e. a dark narrow region bounded on both sides by lighter regions, or vice-versa).
Detectors for other features can be defined, such as circular arc detectors in intensity images (or even
more general detectors, as in the generalized Hough transform), or planar point detectors in range
images, etc.
Note that the operators in this section merely identify pixels likely to be part of such a structure. To
actually extract the structure from the image it is then necessary to group together image pixels (which
are usually adjacent).
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
http://www.cee.hw.ac.uk/hipr/html/featintr.html12/04/2007 12:14:36
Feature Detectors - Roberts Cross Edge Detector
Brief Description
The Roberts Cross operator performs a simple, quick to compute, 2-D spatial gradient measurement on
an image. It thus highlights regions of high spatial gradient which often correspond to edges. In its most
common usage, the input to the operator is a greyscale image, as is the output. Pixel values at each point
in the output represent the estimated absolute magnitude of the spatial gradient of the input image at that
point.
How It Works
In theory, the operator consists of a pair of 2×2 convolution masks as shown in Figure 1. One mask is
simply the other rotated by 90°. This is very similar to the Sobel operator.
These masks are designed to respond maximally to edges running at 45° to the pixel grid, one mask for
each of the two perpendicular orientations. The masks can be applied separately to the input image, to
produce separate measurements of the gradient component in each orientation (call these Gx and Gy).
These can then be combined together to find the absolute magnitude of the gradient at each point and the
orientation of that gradient. The gradient magnitude is given by:
The angle of orientation of the edge giving rise to the spatial gradient (relative to the pixel grid
orientation) is given by:
In this case, orientation 0 is taken to mean that the direction of maximum contrast from black to white
runs from left to right on the image, and other angles are measured anti-clockwise from this.
Often, the absolute magnitude is the only output the user sees --- the two components of the gradient are
conveniently computed and added in a single pass over the input image using the pseudo-convolution
operator shown in Figure 2.
is the corresponding output from the Roberts cross operator. The gradient magnitudes output by
the operator have been multiplied by a factor of 5 to make the image clearer. Note the spurious bright
dots on the image which demonstrate that the operator is susceptible to noise. Also note that only the
strongest edges have been detected with any reliability.
is the result of thresholding the Roberts cross output at a pixel value of 80.
We can also apply the Roberts cross operator to detect depth discontinuity edges in range images.
is a range image in which the distance from the sensor to the object is encoded in the intensity value of
the image. Applying the Roberts cross yields . The operator produced a line with high intensity
values along the boundary of the object. On the other hand, intensity changes originating from depth
continuities within the object are not high enough to output a visible line. However, if we threshold the
image at a value of 20, all depth discontinuities in the object produce an edge in the image, as can be
seen in .
The operator's sensitivity to noise can be demonstrated if we add noise to the above range image. is
the result of adding Gaussian noise with a standard deviation of 8, is the corresponding output of
the Roberts cross operator. The difference to the previous image becomes visible if we again threshold
the image at a value of 20, as can be seen in . Now, we not only detect edges corresponding to real
depth discontinuities, but also some noise points. We can show that the Roberts cross operator is more
sensitive to noise than, for example, the Sobel operator if we apply the Sobel operator to the same noisy
image. In that case, we can find a threshold which removes most of the noise pixels while keeping all
edges of the object. is the result of applying a Sobel edge detector to the above noisy image and
thresholding the output at a value of 150.
The previous example contained very sharp intensity changes, which enabled us (in the noise-free case)
to detect the edges very well. is a range image where the intensity values change much slower.
Hence, the edges in the resulting Roberts cross image, , are rather faint. Since the intensity of
many edge pixels in this image is very low it is not possible to entirely separate the edges from the noise.
This can be seen in , which is the result of thresholding the image at a value of 30.
The effects of the shape of the edge detection mask on the edge image can be illustrated using .
Applying the Roberts cross operator yields . Due to the different width and orientation of the lines
in the image, the response in the edge image varies significantly. Since the intensity steps between
foreground and background are constant in all patterns of the original image, this shows that the Roberts
Cross operator responds differently to different frequencies and orientations.
If the pixel value type being used only supports a small range of integers (e.g. 8-bit integer images), then
it is possible for the gradient magnitude calculations to overflow the maximum allowed pixel value. In
this case it is common to simply set those pixel values to the maximum allowed value. In order to avoid
this happening, image types that support a greater range of pixel values, e.g. floating point images, can
be used.
There is a slight ambiguity in the output of the Roberts operator as to which pixel in the output
corresponds to which pixel in the input, since technically the operator measures the gradient intensity at
the point where four pixels meet. This means that the gradient image will be shifted by half a pixel in
both x and y grid directions.
Exercises
1. Why does the Roberts cross' small mask size make it very sensitive to noise in the image?
2. Apply the Roberts cross operator to . Can you obtain an edge image that contains only
lines corresponding to the contours of the object? Compare with the results obtained with the
Sobel and Canny operators.
3. Compare the result of applying the Roberts cross operator to with the one of using the
Sobel operator.
4. Compare the performance of the Roberts cross with the Sobel operator in terms of noise
rejection, edge detection and speed.
5. Under what situations might you choose to use the Roberts cross rather than the Sobel? And
under what conditions would you avoid it?
References
R. Boyle and R. Thomas Computer Vision: A First Course, Blackwell Scientific Publications, 1988, pp
50 - 51.
E. Davies Machine Vision: Theory, Algorithms and Practicalities Academic Press, 1990, Chap 5.
L. Roberts Machine Perception of 3-D Solids, Optical and Electro-optical Information Processing, MIT
Press 1965.
Local Information
General advice about the local HIPR installation is available here
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
Brief Description
The Sobel operator performs a 2-D spatial gradient measurement on an image and so emphasizes regions
of high spatial gradient that correspond to edges. Typically it is used to find the approximate absolute
gradient magnitude at each point in an input greyscale image.
How It Works
In theory at least, the operator consists of a pair of 3×3 convolution masks as shown in Figure 1. One
mask is simply the other rotated by 90°. This is very similar to the Roberts Cross operator.
These masks are designed to respond maximally to edges running vertically and horizontally relative to
the pixel grid, one mask for each of the two perpendicular orientations. The masks can be applied
separately to the input image, to produce separate measurements of the gradient component in each
orientation (call these Gx and Gy). These can then be combined together to find the absolute magnitude
of the gradient at each point and the orientation of that gradient. The gradient magnitude is given by:
The angle of orientation of the edge (relative to the pixel grid) giving rise to the spatial gradient is given
by:
In this case, orientation 0 is taken to mean that the direction of maximum contrast from black to white
runs from left to right on the image, and other angles are measured anti-clockwise from this.
Often, this absolute magnitude is the only output the user sees --- the two components of the gradient are
conveniently computed and added in a single pass over the input image using the pseudo-convolution
operator shown in Figure 2.
As with the Roberts Cross operator, output values from the operator can easily overflow the maximum
allowed pixel value for image types that only support smallish integer pixel values (e.g. 8-bit integer
images). When this happens the standard practice is to simply set overflowing output pixels to the
maximum allowed value. The problem can be avoided by using an image type that supports pixel values
with a larger range.
Natural edges in images often lead to lines in the output image that are several pixels wide due to the
smoothing effect of the Sobel operator. Some thinning may be desirable to counter this. Failing that,
some sort of hysteresis ridge tracking could be used as in the Canny operator.
shows the results of applying the Sobel operator to . Compare this with the equivalent
Roberts cross output . Note that the spurious noise that afflicted the Roberts cross output image is
still present in this image, but its intensity relative to the genuine lines has been reduced, and so there is
a good chance that we can get rid of this entirely by thresholding. Also notice that the lines
corresponding to edges have become thicker compared with the Roberts cross output due to the
increased smoothing of the Sobel operator.
shows a simpler scene containing just a single flat dark object against a lighter background.
Applying the Sobel operator produces . Note that the lighting has been carefully set up to ensure
that the edges of the object are nice and sharp and free of shadows.
The Sobel edge detector can also be applied to range images like . The corresponding edge image is
. All edges in the image have been detected and can be nicely separated from the background using
Although the Sobel operator is not as sensitive to noise as the Roberts Cross operator, it still amplifies
high frequencies. is the result of adding Gaussian noise with a standard deviation of 15 to the
original image. Applying the Sobel operator yields and thresholding the result at a value of 150
produces . We can see that the noise has increased during the edge detection and it is not possible
anymore to find a threshold which removes all noise pixels and at the same time retains the edges of the
objects.
The object in the previous example contains sharp edges and its surface is rather smooth. Therefore, we
could (in the noise free case) easily detect the boundary of the object without getting any erroneous
pixels. A more difficult task is to detect the boundary of , because it contains many fine depth
variations (i.e. resulting in intensity changes in the image) on its surface. Applying the Sobel operator
straightforwardly yields . We can see that the intensity of many pixels on the surface is as high as
along the actual edges. One reason is that the output of many edge pixels is greater than the maximum
pixel value and therefore they are `cut off' at 255. To avoid this overflow we scale the range image by a
factor 0.25 prior to the edge detection and than normalize the output, as can be seen in . Although
the result improved significantly, we still cannot find a threshold so that a closed line along the boundary
remains and all the noise disappears. Compare this image with the results obtained with the Canny edge
detector.
Common Variants
A related operator is the Prewitt gradient edge detector (not to be confused with the Prewitt compass
edge detector). This works in a very similar way to the Sobel operator but uses slightly different masks,
as shown in Figure 3. This mask produces similar results to the Sobel, but is not as isotropic in its
response.
Exercises
1. Experiment with thresholding the example images to see if noise can be eliminated while still
retaining the important edges.
2. How does the Sobel operator compare with the Roberts cross operator in terms of noise rejection,
edge detection and speed?
3. How well are the edges located using the Sobel operator? Why is this?
4. Apply the Sobel operator to and see if you can use thresholding to detect the edges of the
object without obtaining noise. Compare the results with the one obtained with the Roberts cross
operator.
5. Under what conditions would you want to use the Sobel rather than the Roberts cross operator?
And when would you not want to use it?
References
R. Gonzalez and R. Woods Digital Image Processing, Addison Wesley, 1992, pp 414 - 428.
R. Boyle and R. Thomas Computer Vision: A First Course, Blackwell Scientific Publications, 1988, pp
48 - 50.
E. Davies Machine Vision: Theory, Algorithms and Practicalities Academic Press, 1990, Chap 5.
Local Information
General advice about the local HIPR installation is available here
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
Brief Description
The Canny operator was designed to be an optimal edge detector (according to particular criteria ---
there are other detectors around that also claim to be optimal with respect to slightly different criteria). It
takes as input a grey scale image, and produces as output an image showing the positions of tracked
intensity discontinuities.
How It Works
The Canny operator works in a multi-stage process. First of all the image is smoothed by Gaussian
convolution. Then a simple 2-D first derivative operator (somewhat like the Roberts Cross) is applied to
the smoothed image to highlight regions of the image with high first spatial derivatives. Edges give rise
to ridges in the gradient magnitude image. The algorithm then tracks along the top of these ridges and
sets to zero all pixels that are not actually on the ridge top so as to give a thin line in the output, a
process known as non-maximal suppression. The tracking process exhibits hysteresis controlled by two
thresholds: T1 and T2 with T1 > T2. Tracking can only begin at a point on a ridge higher than T1.
Tracking then continues in both directions out from that point until the height of the ridge falls below
T2. This hysteresis helps to ensure that noisy edges are not broken up into multiple edge fragments.
Usually, the upper tracking threshold can be set quite high, and the lower threshold quite low for good
results. Setting the lower threshold too high will cause noisy edges to break up. Setting the upper
threshold too low increases the number of spurious and undesirable edge fragments appearing in the
output.
One problem with the basic Canny operator is to do with Y-junctions i.e. places where three ridges meet
in the gradient magnitude image. Such junctions can occur where an edge is partially occluded by
another object. The tracker will treat two of the ridges as a single line segment, and the third one as a
line that approaches, but doesn't quite connect to, that line segment.
We use the image to demonstrate the effect of the Canny operator on a natural scene.
is obtained using a Gaussian mask with standard deviation 1.0 and upper and lower thresholds of
255 and 1 respectively. Most of the major edges are detected lots of detail in has been picked out well ---
note that this may be too much detail for subsequent processing. The `Y-Junction effect' mentioned
above can be seen at the bottom left corner of the mirror.
is obtained using the same mask size and upper threshold, but with the lower threshold increased
to 220. The edges have become more broken up than in the previous image, which is likely to be bad for
subsequent processing. Also the vertical edges on the wall have not been detected along their full length.
is obtained by lowering the upper threshold to 128. The lower threshold is kept at 1 and the
Gaussian standard deviation remains at 1.0. Many more faint edges are detected along with some short
`noisy' fragments. Notice that the detail in the clown's hair is now picked out.
is obtained with the same thresholds as the previous image, but the Gaussian used has a standard
deviation of 2.0. Much of the detail on the wall is no longer detected, but most of the strong edges
remain. The edges also tend to be smoother and less noisy.
Edges in artificial scenes are often sharper and less complex than those in natural scenes, and this
generally improves the performance of any edge detector.
The Gaussian smoothing in the Canny edge detector fulfills 2 purposes: First it can be used to control
the amount of detail which appears in the edge image and second, it can be used to suppress noise.
To demonstrate how the Canny operator performs on noisy images we use , which contains
Gaussian noise with a standard deviation of 15. Neither the Roberts cross nor the Sobel operator are able
to detect the edges of the object while removing all the noise in the image. Applying the Canny operator
using a standard deviation of 1.0 yields . All the edges have been detected and almost all of the
noise has been removed. For comparison, is the result of applying the Sobel operator and
thresholding the output at a value of 150.
We use to demonstrate how to control the details contained in the resulting edge image. is
the result of applying the Canny edge detector using a standard deviation of 1.0 and an upper and lower
threshold of 255 and 1, respectively. This image contains many details, however, for an automated
recognition task we might be interested to obtain only lines which correspond to the boundaries of the
objects. If we increase the standard deviation for the Gaussian smoothing to 1.8, the Canny operator
yields . Now, the edges corresponding to the uneveness of the surface have disappeared from the
image, but some edges corresponding to changes in the surface orientation remain. Although these edges
are `weaker' than the boundaries of the objects the resulting pixel values are the same, due to the
saturation of the image. Hence, if we scale down the image before the edge detection, we can use the
upper threshold of the edge tracker to remove the weaker edges. is the result of first scaling the
image with 0.25 and then applying the Canny operator using a standard deviation of 1.8 and an upper
and lower threshold of 200 and 1, respectively. The image shows the desired result that all the
boundaries of the objects have been detected whereas all other edges have been removed.
Although the Canny edge detector allows us the find the intensity discontinuities in an image, it is not
guaranteed that these discontinuities correspond to actual edges of the object. This is illustrated using
. We obtain by using a standard deviation of 1.0 and an upper and lower threshold of 255 and
1, respectively. In this case, some edges of the object do not appear in the image and many edges in the
image originate only from reflections on the object. It is a demanding task for an automated system to
interpret this image. We try to improve the edge image by decreasing the upper threshold to 150, as can
be seen in . We now obtain most of the edges of the object, but we also increase the amount of
noise. The result of further decreasing the upper threshold to 100 and increasing the standard deviation
to 2 is shown in .
Common Variants
The problem with Y-junctions mentioned above can be solved by including a model of such junctions in
the ridge tracker. This will ensure that no spurious gaps are generated at these junctions.
Exercises
1. Adjust the parameters of the Canny operator so that you can detect the edges of while
removing all of the noise.
2. What effect does increasing the Gaussian mask size have on the magnitudes of the gradient
maxima at edges? What change does this imply has to be made to the tracker thresholds when the
mask size is increased?
3. It is sometimes easier to evaluate edge detector performance after thresholding the edge detector
output at some low grey scale value (e.g. 1) so that all detected edges are marked by bright white
pixels. Try this out on the third and fourth example images of the clown mentioned above.
Comment on the differences between the two images.
4. How does the Canny operator compare with the Roberts and Sobel edge detectors in terms of
speed? What do you think is the slowest stage of the process?
5. How does the Canny operator compare in terms of noise rejection and edge detection with other
operators such as the Roberts and Sobel operators?
6. How does the Canny operator compare with other edge detectors on simple artificial 2-D scenes?
And on more complicated natural scenes?
7. Under what situations might you choose to use the Canny operator rather than the Roberts or
Sobel operators? Under what situations would you definitely not choose it?
References
R. Boyle and R. Thomas Computer Vision: A First Course, Blackwell Scientific Publications, 1988, p
52.
J. Canny A Computational Approach to Edge Detection, IEEE Transactions on Pattern Analysis and
Machine Intelligence, Vol 8, No. 6, Nov 1986.
E. Davies Machine Vision: Theory, Algorithms and Practicalities Academic Press, 1990, Chap 5.
R. Gonzalez and R. Woods Digital Image Processing, Addison-Wesley Publishing Company, 1992,
Chap 4.
Local Information
General advice about the local HIPR installation is available here
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
Brief Description
Compass Edge Detection is an alternative approach to the differential gradient edge detection (see the
Roberts Cross and Sobel operators). The operation usually outputs two images, one estimating the local
edge gradient magnitude and one estimating the edge orientation of the input image.
How It Works
When using compass edge detection the image is convolved with a set of (in general 8) convolution
masks, each of which is sensitive to edges in a different orientation. For each pixel the local edge
gradient magnitude is estimated with the maximum response of all 8 masks at this pixel location:
where is the response of the mask i at the particular pixel position and n is the number of
convolution masks. The local edge orientation is estimated with the orientation of the mask which yields
the maximum response.
Various masks can be used for this operation, for the following discussion we will use the Prewitt mask.
Two templates out of the set of 8 are shown in Figure 1:
The whole set of 8 masks is produced by taking one of the masks and rotating its coefficients circularly.
Each of the resulting masks is sensitive to another edge orientation ranging from 0° to 315° in steps of
45°, where 0° corresponds to a vertical edge.
The maximum response |G| for each pixel gives rise to the value of the corresponding pixel in the output
magnitude image. The values for the output orientation image lie between 1 and 8, depending on which
of the 8 masks produced the maximum response.
This edge detection method is also called edge template matching, because a set of edge templates is
matched to the image, each representing an edge in a certain orientation. The edge magnitude and
orientation of a pixel is then determined by the template which matches the local area of the pixel the
best.
The compass edge detector is an appropriate way to estimate the magnitude and orientation of an edge.
Whereas differential gradient edge detection needs a rather time-consuming calculation to estimate the
orientation from the magnitudes in x- and y-direction, the compass edge detection obtains the orientation
directly from the mask with the maximum response. The compass operator is limited to (here) 8 possible
orientations; however experience shows that most direct orientations estimates are not much more
accurate.
On the other hand, the compass operator needs (here) 8 convolutions for each pixel, whereas the
gradient operator needs only 2, one mask being sensitive to edges in the vertical direction and one to the
horizontal direction.
The result for the edge magnitude image is very similar with both methods, provided the same
convolving mask is used.
If we apply the Prewitt Compass Operator to we get two output images. shows the local
edge magnitude for each pixel. We can't see much in this image, because the response of the Prewitt
mask is too small. Applying histogram equalization to this image yields . The result is similar to
, which was processed with the Sobel differential gradient edge detector and histogram equalized.
The edges in the image can be rather thick, depending on the size of the convolving mask used. To
remove this unwanted effect some further processing (e.g. thinning) might be necessary.
is the greylevel orientation image which was contrast stretched for a better display. That means
the image contains 8 greylevel values between 0 and 255, each of them corresponding to an edge
orientation. shows the orientation image as a colour labeled image, containing 8 colours, each
corresponding to one edge orientation.
The orientation of strong edges is shown very clearly, as for example at the vertical stripes of the wall
paper. On a uniform background without a noticeable image gradient, on the other hand, it is ambiguous
which of the 8 masks will yield the maximum response. Therefore a uniform area results in a random
distribution of the 8 orientation values.
A simple example of the orientation image is obtained if we apply the Compass Operator to .
Each straight edge of the square yields a line of constant colour (or greylevel). The circular hole in the
middle, on the other hand, contains all 8 orientations and therefore is segmented in 8 parts, each of them
having a different colour. Again, the image is displayed as a normalized greylevel image and as
is an image containing many edges with gradually changing orientation. Applying the Prewitt
compass operator yields for the edge magnitude and for the edge orientation. Note that, due to
the distortion of the image, all posts along the railing in the lower left corner have a slightly different
orientation. However, the operator classifies them in only 3 different classes, since it assigns the same
orientation label to edges when the orientation varies within 45°.
Another image suitable image for edge detection is . The corresponding ouput of the compass edge
detector is and for the magnitude and orientation, respectively. Like the previous image, this
image contains little noise and most of the resulting edges correspond to boundaries of objects. Again,
we can see that most of the roughly vertical books were assigned the same orientation label, although the
orientation varies by some amount.
We demonstrate the influence of noise on the compass operator by adding Gaussian noise with a
standard deviation of 15 to the above image. shows the noisy image. The Prewitt compass edge
detector yields for the edge magnitude and for the edge orientation. Both images contain a
large amount of noise and most areas in the orientation image consists of a random distribution of the 8
possible values.
Common Variants
As already mentioned earlier, there are various masks which can be used for Compass Edge Detection.
The most common ones are shown in Figure 2:
Figure 2 Some examples for the most common compass edge detecting masks, each
example showing two masks out of the set of eight.
For every template, the set of all eight masks is obtained by shifting the coefficients of the mask
circularly.
The result for using different templates is similar, the main difference is the different scale in the
magnitude image. The advantage of Sobel and Robinson masks is, that only 4 out of the 8 magnitude
values must be calculated. Since each pair of masks rotated about 180° opposite is symmetric, each of
the remaining four values can be generated by inverting the result of the opposite mask.
Exercises
1. Compare the performance of the different masks by them to the above image of staircase.
2. Compare the magnitude edge image of the book shelf with and without noise. Investigate if you
can find a threshold which retains all important edges but removes the noise.
3. Produce an image containing 8 edge orientations from (e.g. by rotating the image about
45° and blending it with the original). Then apply the compass edge operator to the resulting
image and examine the edge orientation image. Do the same with 12 different edge orientations.
4. Take the orientation image obtained in exercise 2 and mask out the pixels not corresponding to a
strong edge using the thresholded edge magnitude image as a mask.
References
E. Davis Machine Vision, Academic Press, 1990, pp 101-110.
R. Gonzalez and R. Woods Digital Image Processing, Addison-Wesley Publishing Company, 1992, p
199.
Local Information
General advice about the local HIPR installation is available here
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
Brief Description
The zero crossing detector looks for places in the Laplacian of an image where the value of the
Laplacian passes through zero --- i.e. points where the Laplacian changes sign. Such points often occur
at `edges' in images --- i.e. points where the intensity of the image changes rapidly, but they also occur at
places that are not as easy to associate with edges. It is best to think of the zero crossing detector as
some sort of feature detector rather than as a specific edge detector. Zero crossings always lie on closed
contours and so the output from the zero crossing detector is usually a binary image with single pixel
thickness lines showing the positions of the zero crossing points.
The starting point for the zero crossing detector is an image which has been filtered using the Laplacian
of Gaussian filter. The zero crossings that result are strongly influenced by the size of the Gaussian used
for the smoothing stage of this operator. As the smoothing is increased then fewer and fewer zero
crossing contours will be found, and those that do remain will correspond to features of larger and larger
scale in the image.
How It Works
The core of the zero crossing detector is the Laplacian of Gaussian filter and so a knowledge of that
operator is assumed here. As described there, `edges' in images give rise to zero crossings in the LoG
output. For instance, Figure 1 shows the response of a 1-D LoG filter to a step edge in the image.
Figure 1 Response of 1-D LoG filter to a step edge. The left hand graph shows a 1-D
image, 200 pixels long, containing a step edge. The right hand graph shows the response
of a 1-D LoG filter with Gaussian standard deviation 3 pixels.
However, zero crossings also occur at any place where the image intensity gradient starts increasing or
starts decreasing, and this may happen at places that are not obviously edges. Often zero crossings are
found in regions of very low gradient where the intensity gradient wobbles up and down around zero.
Once the image has been LoG filtered, it only remains to detect the zero crossings. This can be done in
several ways.
The simplest is to simply threshold the LoG output at zero, to produce a binary image where the
boundaries between foreground and background regions represent the locations of zero crossing points.
These boundaries can then be easily detected and marked in single pass, e.g. using some morphological
operator. For instance, to locate all boundary points, we simply have to mark each foreground point that
has at least one background neighbour.
The problem with this technique is that will tend to bias the location of the zero crossing edge to either
the light side of the edge, or the dark side of the edge, depending upon whether it is decided to look for
the edges of foreground regions or for the edges of background regions.
A better technique is to consider points on both sides of the threshold boundary, and choose the one with
the lowest absolute magnitude of the Laplacian, which will hopefully be closest to the zero crossing.
Since the zero crossings generally fall in between two pixels in the LoG filtered image, an alternative
output representation is an image grid which is spatially shifted half a pixel across and half a pixel down
relative to the original image. Such a representation is known as a dual lattice. This does not actually
localize the zero crossing any more accurately of course.
A more accurate approach is to perform some kind of interpolation to estimate the position of the zero
crossing to sub-pixel precision.
We illustrate this effect using which contains detail at a number of different scales.
is the result of applying a LoG filter with Gaussian standard deviation 1.0. Note that in this and in
the following LoG output images, the true output contains negative pixel values. For display purposes
the greylevels have been offset so that displayed greylevel 128 corresponds to an actual value of zero,
and rescaled to make the image variation clearer. Since we are only interested in zero crossings this
rescaling is unimportant.
shows the zero crossings from this image. Note the large number of minor features detected,
which are mostly due to noise or very faint detail. This smoothing corresponds to a fine `scale'.
is the result of applying a LoG filter with Gaussian standard deviation 2.0.
And shows the zero crossings. Note that there are far fewer detected crossings, and that those that
remain are largely due to recognisable edges in the image. The thin vertical stripes on the wall for
example are clearly visible.
Finally, is the output from a LoG filter with Gaussian standard deviation 3.0. This corresponds to
quite a coarse `scale'.
is the zero crossings in this image. Note how only the strongest contours remain due to the heavy
smoothing. In particular, note how the thin vertical stripes on the wall no longer give rise to many zero
crossings.
All edges detected by the zero crossing detector are in the form of closed curves in the same way that
contour lines on a map are always closed. The only exception to this is where the curve goes off the edge
of the image.
Since the LoG filter is calculating a second derivative of the image it is quite susceptible to noise,
particularly if the standard deviation of the smoothing Gaussian is small. Thus it is common to see lots
of spurious edges detected away from any obvious edges. One solution to this is to increase the
smoothing of the Gaussian to preserve only string edges. Another is to look at the gradient of the LoG at
the zero crossing (i.e. the third derivative of the original image) and only keep zero crossings where this
is above a certain threshold. This will tend to retain only the stronger edges, but it is sensitive to noise
since the third derivative will greatly amplify any high frequency noise in the image.
is similar to the image obtained with a standard deviation of 1.0, except that the zero crossing
detector has been told to ignore zero crossings of shallow slope (in fact it ignores zero crossings where
the pixel value difference across the crossing in the LoG output is less than 40). As a result, fewer
spurious zero crossings have been detected. Note that in this case, the zero crossings do not necessarily
form closed contours.
Marr (Marr 1982) has suggested that human visual systems use zero crossing detectors based on LoG
filters at several different scales (Gaussian widths).
Exercises
1. Compare the output from the zero crossing edge detector with that from the Roberts cross, Sobel
and Canny edge detectors, for edge detection, noise rejection and edge localization.
2. Take a simple image containing step edges such as , and see what happens to the
locations of zero crossings as the level of smoothing is increased. Do they keep the same
position?
4. Try and develop an algorithm which can work out which side (positive or negative) of a
particular discrete zero crossing is closer to the genuine zero crossing, and hence which should be
marked as part of the zero crossing contour. Think about various possible 3×3 neighbourhoods.
5. Think of an interpolation method which would allow you to estimate the zero crossing location
between two pixels to sub-pixel precision.
References
R. Gonzalez and R. Woods Digital Image Processing, Addison-Wesley Publishing Company, 1992, p
442.
Local Information
General advice about the local HIPR installation is available here
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
Line detection
Common Names: Line detection
Brief Description
While edges (i.e. boundaries between regions with relatively distinct greylevels) are by far the most
common type of discontinuity in an image, instances of thin lines in an image occur frequently enough
that it is useful to have a separate mechanism for detecting them. Here we present a convolution based
technique which produces a gradient image description of the thin lines in an input image. Note that the
Hough transform can be used to detect lines, however, in that case, the output is a parametric
description of the lines in an image.
How It Works
The line detection operator consists of a convolution mask tuned to detect the presence of lines of a
particular width n, at a particular orientation . Figure 1 shows a collection of four such masks, which
each respond to lines of single pixel width at the particular orientation shown.
Figure 1 Four line detection masks which respond maximally to horizontal, vertical, and
oblique (+45 and -45 degree) single pixel wide lines.
If denotes the response of mask i, we can apply each of these masks across an image and, for any
particular point, if for all that point is more likely to contain a line whose
orientation (and width) corresponds to that of mask i. One usually thresholds to eliminate weak lines
corresponding to edges and other features with intensity gradients which have a different scale than the
desired line width. In order to find complete lines, one must join together line fragments, e.g., with an
edge tracking operator.
To illustrate line detection, we start with the artificial image , which contains thick line segments
running horizontally, vertically and obliquely across the image. The result of applying the line detection
operator, using the horizontal convolution mask shown in Figure 1.a, is . (Note this gradient image
has been normalized for display.) There are two points of interest to note here.
1. Notice that, because of way that the oblique lines (and some `vertical' ends of the horizontal bars)
are represented on a square pixel grid, e.g. shows a zoomed region containing both
features, the horizontal line detector responds to more than high spatial intensity horizontal line-
2. On an image such as this one - where the lines to be detected are wider than the mask (i.e. the
image lines are five pixels wide, while the mask is tuned for a single width pixel) - the line
detector acts like an edge detector: the edges of the lines are found, rather than the lines
themselves.
This latter fact might cause us to naively think that the image which gave rise to contained a series
of parallel lines rather than single thick ones. However, if we compare this result to that obtained by
applying the line detection mask to an image containing lines of a single pixel width, we find some
consistent differences. For example, we can skeletonize the original (so as to obtain a
representation of the original wherein most lines are a single pixel width), apply the horizontal line
detector , and then threshold the result . If we then threshold the original line detected image
at the same pixel value, we obtain the null image . Thus, the values corresponding to the true,
single pixel lines found in the skeletonized version are stronger than those values corresponding to
edges. Also, if we examine a cropped and zoomed version of the line detected raw image and the
skeletonized line detected image we see that the single pixel width lines are distinguished by a
region of minimal response on either side of the maximal response values coincident with the pixel
location of a line. One can use this signature to distinguish lines from edges.
The result of line detecting (and then normalizing) the skeletonized version of this image with single
pixel width convolution masks of different are for a vertical mask, for the oblique 45
degree line and for the oblique 135 degree line. The thresholded versions are , , and
, respectively. We can add these together to produce a reasonably faithful binary representation of
It is instructive to compare the two operators under more realistic circumstances, e.g. with the natural
image . After converting this to a greyscale image and applying the Canny
operator, we obtain . Applying the line detector yields . We can improve this result
by using a trick employed by the Canny operator. By smoothing the image before line detecting, we
obtain the cleaner result . However, even with this preprocessing, the line detector still gives a
poor result compared to the edge detector. This is true because there are few single pixel width lines in
this image and therefore the detector is responding to the other high spatial frequency image features (i.
e. edges, thick lines and noise). (Note that in the previous example, the image contained the feature that
the mask was tuned for and therefore we were able to threshold away the weaker mask response to
edges.) We could improve this result by increasing the width of the mask or geometrically scaling the
image.
Exercises
1. Consider the basic image . We can investigate the scale of features in the image by
applying line detection masks of different widths. For example, after convolving with a single
pixel horizontal line detecting mask we discover that only the striped shirt of the bank robber
contains single pixel width lines. The normalized result is shown in and after
thresholding (at a value of 254), we obtain . a) Perform the same analysis on the image
using different width masks to extract the different features (e.g. roof, windows, doors,
etc.). Threshold your result so that the final images contain a binary description of just the feature
2. Investigate a line detection algorithm which might extract the tail feathers of the peacock in
. You will most likely need to apply some smoothing as a first step and may then want
apply several different masks and add the results together. Compare your final result with an edge
detection algorithm, e.g. Roberts cross, Sobel, Compass and/or Canny edge detector.
References
D. Vernon Machine Vision, Prentice-Hall, 1991, Chap 5.
R. Gonzalez and R. Woods Digital Image Processing, Addison-Wesley Publishing Company, 1992, pp
415 - 416.
Local Information
General advice about the local HIPR installation is available here
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
Convolution
Convolution is a simple mathematical operation which is fundamental to many common image
processing operators. Convolution provides a way of `multiplying together' two arrays of numbers,
generally of different sizes, but of the same dimensionality, to produce a third array of numbers of the
same dimensionality. This can be used in image processing to implement operators whose output pixel
values are simple linear combinations of certain input pixel values.
In an image processing context, one of the input arrays is normally just a greylevel image. The second
array is usually much smaller, and is also two dimensional (although it may be just a single pixel thick),
and is known as the kernel. Figure 1 shows an example image and kernel that we will use to illustrate
convolution.
Figure 1 An example small image (left) and kernel (right) for illustrating convolution.
The labels within each grid square are used to identify each square.
The convolution is performed by sliding the kernel over the image, generally starting at the top left
corner, so as to move the kernel through all the positions where the kernel fits entirely within the
boundaries of the image. (Note that implementations differ in what they do at the edges of images as
explained below.) Each kernel position corresponds to a single output pixel, the value of which is
calculated by multiplying together the kernel value and the underlying image pixel value for each of the
cells in the kernel, and then adding all these numbers together.
So in our example, the value of the bottom right pixel in the output image will be given by:
If the image has M rows and N columns, and the kernel has m rows and n columns, then the size of the
output image will have M-m+1 rows, and N-n+1 columns.
Note that many implementations of convolution produce a larger output image than this because they
relax the constraint that the kernel can only be moved to positions where it fits entirely within the image.
Instead, these implementations typically slide the kernel to all positions where just the top left corner of
the kernel is within the image. Therefore the kernel `overlaps' the image on the bottom and right edges.
One advantage of this approach is that the output image is the same size as the input image.
Unfortunately, in order to calculate the output pixel values for the bottom and right edges of the image,
it is necessary to invent input pixel values for places where the kernel extends off the end of the image.
Typically pixel values of zero are chosen for regions outside the true image, but this can often distort the
output image at these places. Therefore in general if you are using a convolution implementation that
does this, it is better to clip the image to remove these spurious regions. Removing n-1 pixels from the
right hand side and m-1 pixels from the bottom will fix things.
Convolution can be used to implement many different operators, particularly spatial filters and feature
detectors. Examples include Gaussian smoothing and the Sobel edge detector.
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
Kernel
A kernel is a (usually) smallish matrix of numbers that is used in image convolutions. Different sized
kernels containing different patterns of numbers give rise to different results under convolution. For
instance, Figure 1 shows a 3×3 kernel that implements a mean filter.
The word `kernel' is also commonly used as a synonym for `structuring element', which is a similar
object used in mathematical morphology. A structuring element differs from a kernel in that it also has a
specified origin. This sense of the word `kernel' is not used in HIPR.
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK
http://www.cee.hw.ac.uk/hipr/html/kernel.html12/04/2007 12:15:20
Image Synthesis - Noise Generation
Noise Generation
Common Names: Noise Generation
Brief Description
Noise are random background events which have to be dealt with in every system processing real
signals. They are not part of the ideal signal and may be caused by a wide range of sources, e.g.
variations in the detector sensitivity, environmental variations, the discrete nature of radiation,
transmission or quantization errors, etc. It is also possible to treat irrelevant scene details as if they were
image noise (e.g. surface reflectance textures). The characteristics of noise depend on their source, as
does the operator which best reduces their effects.
Many image processing packages contain operators to artificially add noise to an image. Deliberately
corrupting an image with noise allows us to test the resistance of an image processing operator to noise
and assess the performance of various noise filters.
How It Works
Noise can generally be grouped in two classes:
Image independent noise can often be described by an additive noise model, where the recorded image f
(i,j) is the sum of the true image s(i,j) and the noise n(i,j):
The noise n(i,j) is often zero-mean and described by its variance . The impact of the noise on the
image is often described by the signal to noise ratio (SNR), which is given by
where and are the variances of the true image and the recorded image, respectively.
In many cases, additive noise is evenly distributed over the frequency domain (i.e. white noise), whereas
an image contains mostly low frequency information. Hence, the noise is dominant for high frequencies
and its effects can be reduced using some kind of lowpass filter. This can be done either with a
frequency filter or with a spatial filter. (Often a spatial filter is preferable, as it is computationally less
expensive than a frequency filter.)
In the second case of data dependent noise, (e.g. arising when monochromatic radiation is scattered
from a surface whose roughness is of the order of a wavelength, causing wave interference which results
in image speckle), it can be possible to model noise with a multiplicative, or non-linear, model. These
models are mathematically more complicated, hence, if possible, the noise is assumed to be data
independent.
Detector Noise
One kind of noise which occurs in all recorded images to a certain extent is detector noise. This kind of
noise is due to the discrete nature of radiation, i.e. the fact that each imaging system is recording an
image by counting photons. Allowing some assumptions (which are valid for many applications) this
noise can be modeled with an independent, additive model - where the noise n(i,j) has a zero-mean
Gaussian distribution described by its standard deviation ( ), or variance. (The 1-D Gaussian
distribution has the form shown in Figure 1.) This means that each pixel in the noisy image is the sum of
the true pixel value and a random, Gaussian distributed noise value.
Another common form of noise is data drop-out noise (commonly referred to as intensity spikes, speckle
or salt and pepper noise). Here, the noise is caused by errors in the data transmission. The corrupted
pixels are either set to the maximum value (which looks like snow in the image) or have single bits
flipped over. In some cases, single pixels are set alternatively to zero or to the maximum value, giving
the image a `salt and pepper' like appearance. Unaffected pixels always remain unchanged. The noise is
usually quantified by the percentage of pixels which are corrupted.
Gaussian Noise
We will begin by considering additive noise with a Gaussian distribution. If we add Gaussian noise with
values of 8, we obtain the image . Increasing yields and for =13 and 20.
Gaussian noise can be reduced using a spatial filter. However, it must be kept in mind that when
smoothing an image, we not only reduce the noise, but also the fine-scaled image details because they
also correspond to blocked high frequencies. The most effective basic spatial filtering techniques for
noise removal include: mean filtering, Median filtering and Gaussian smoothing. Crimmins Speckle
Removal filter can also produce good noise removal. More sophisticated algorithms which utilize
statistical properties of the image and/or noise fields exist for noise removal. For example, adaptive
smoothing algorithms may be defined which adjust the filter response according to local variations in the
statistical properties of the data.
In the following examples, images have been corrupted with various kinds and amounts of drop-out
noise. In , pixels have been set to 0 or 255 with probability p=1%. In pixel bits were flipped
with p=3%, and in 5% of the pixels (whose locations are chosen at random) are set to the
maximum value, producing the snowy appearance.
For this kind of noise, conventional lowpass filtering, e.g. mean filtering or Gaussian smoothing is
relatively unsuccessful because the corrupted pixel value can vary significantly from the original and
therefore the mean can be significantly different from the true value. Median filter filter removes drop-
out noise more efficiently and at the same time preserves the edges and small details in the image better.
Conservative smoothing can be used to obtain a result which preserves a great deal of high frequency
detail, but is only effective at reducing low levels of noise.
Exercises
1. is a binary chess board image with 2% of drop-out noise. Which operator yields the best
results in removing the noise ?
2. is the same image corrupted with Gaussian noise with a variance of 180. Is the operator
used in Exercise 1 still the most appropriate ? Compare the best results obtained from both noisy
images.
3. Compare the images achieved by median filter and mean filter filtering with the result you
obtain by applying a frequency lowpass filter to the image. How does the mean filter relate to the
frequency filter? Compare the computational costs of mean, median and frequency filtering.
References
R. Gonzales and R. Woods Digital Image Processing, Addison Wesley, 1992, pp 187 - 213
A. Jain Fundamentals of Digital Image Processing, Prentice Hall, 1989, pp 244 - 253, 273 - 275
E. Davies Machine Vision: Theory, Algorithms and Practicalities, Academic Press, 1990, pp 29 - 30, 40
- 47, 493.
Local Information
General advice about the local HIPR installation is available here
©1994 Bob Fisher, Simon Perkins, Ashley Walker and Erik Wolfart
Department of Artificial Intelligence
University of Edinburgh
UK