Exploring Image Segmentation Techniques: Watershed Algorithm using OpenCV (2024)

Jaskaran Bhatia

8 min read

Jul 15, 2023

Exploring Image Segmentation Techniques: Watershed Algorithm using OpenCV (2)

Image segmentation, a fundamental aspect of computer vision, has experienced a massive transformation over the years. This will be a series of three blog posts that delves into three distinct techniques for image segmentation — the classical Watershed Algorithm with OpenCV, the deep learning-based UNet model implemented with PyTorch, SOTA Image Segmentation Models. While, this part focuses on the Watershed Algorithm and it’s implementation using OpenCV. In the next part, we will also train a UNet model on a Human Segmentation Dataset, demonstrating the power and applicability of deep-learning based techniques.

Image segmentation involves partitioning an image into various segments or regions, each containing a set of pixels. The ultimate aim is to simplify or modify the representation of an image into something more meaningful, consequently making it easier to analyze. These techniques have been widely adopted in a multitude of applications ranging from object identification within images to medical imaging diagnostics.

In the space of traditional image segmentation methodologies, the Watershed Algorithm holds a significant place. The algorithm visualizes an image as a topographic landscape, producing ‘catchment basins’ and ‘watershed ridge lines’ within the image to segregate different objects. In a simplified manner, any grayscale image can be viewed as a topographic surface where high intensity denotes peaks and hills while low intensity denotes valleys.

Despite being conceptually easy to understand and effective, the Watershed Algorithm can sometimes lead to over-segmentation, where an object is split into numerous segments. However, fine-tuning the algorithm and adding pre-processing steps can enhance the algorithm’s performance.

Thresholding: In the context of the Watershed Algorithm, thresholding plays an important role in identifying certain parts of the image. After converting the image to grayscale, the algorithm applies thresholding to the grayscale image to obtain a binary image that helps in segregating the foreground (objects to be segmented) and the background.

# Load image
img = cv2.imread('water_coins.jpg')
imshow("Original image", img)# Grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Threshold using OTSU
ret, thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
imshow("Thresholded", thresh)

Exploring Image Segmentation Techniques: Watershed Algorithm using OpenCV (3)

Exploring Image Segmentation Techniques: Watershed Algorithm using OpenCV (4)

2. Opening (Erosion followed by Dilation): In this step, the opening operation, which is an erosion operation followed by a dilation operation, is performed. The purpose of this step is primarily to remove noise. The erosion operation removes small white noise in the image, but it also shrinks our objects. Following this with a dilation operation allows us to retain the size of our objects while keeping the noise out.

Let’s understand erosion and dilation

Erosion: This operation erodes away the boundaries of the foreground object. It works by creating a convolution kernel and passing it over the image. If any of the pixels in the region under the kernel are black, then the pixel in the middle of the kernel is set to black. This operation is effective at removing small white noise.
Dilation: After erosion, dilation is performed, which is essentially the opposite of erosion. It adds pixels to the boundaries of objects in an image. If any of the pixels in the region under the kernel are white, then the pixel in the middle of the kernel is set to white.

# noise removal
kernel = np.ones((3,3), np.uint8)
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN,kernel, iterations = 2)

Let’s break it down:

Creating the kernel: np.ones((3,3),np.uint8) creates a 3x3 matrix with all elements being '1'. This is used as a 'structuring element' for our morphological operation. It could be of different shapes (square, circle, etc.), but here, we are using a square one.
Applying the opening operation: cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations = 2) applies the opening operation. 'thresh' is our binary image obtained after thresholding, cv2.MORPH_OPEN indicates that we want to do an opening operation, 'kernel' is our structuring element, and 'iterations = 2' means we want to perform the operation twice.

3. Dilation for Background Identification: In this step, the dilation operation is used to identify the background region of the image. The result of previous step, where noise has been removed, is subjected to dilation. After dilation, a significant portion around the objects (or the foreground) is expected to be the background region (since dilation expands the objects). This “sure background” region aids in the subsequent steps of the Watershed algorithm where we aim to identify distinct segments/objects.

# sure background area
sure_bg = cv2.dilate(opening, kernel, iterations=3)

4. Distance Transformation: Watershed Algorithm involves applying a distance transform to identify regions that are likely to be the foreground. Here’s the code for this step:

# Finding sure foreground area
dist_transform = cv2.distanceTransform(opening, cv2.DIST_L2,5)
ret, sure_fg = cv2.threshold(dist_transform, 0.7 * dist_transform.max(), 255, 0)

In this step, we’re doing two things:

Applying Distance Transform: The cv2.distanceTransform function calculates the distance from each binary image pixel to the nearest zero pixel using the cv2.DIST_L2 (Euclidean distance). The distance transform helps us identify regions that are likely to be in the foreground. The function cv2.distanceTransform(opening, cv2.DIST_L2, 5) calculates this transform.
Thresholding the Distance Transform: After calculating the distance transform, we apply thresholding to this transformed image to get the sure foreground region. The cv2.threshold(dist_transform, 0.7*dist_transform.max(), 255, 0) function call applies the thresholding. The second parameter 0.7*dist_transform.max() sets the threshold level at 70% of the maximum distance found by the distance transform. Pixels with distance transform values higher than this threshold are set as sure foreground.

5. Identify unknown regions: We identify the unknown region, i.e., the region that is neither sure foreground nor sure background. We first convert the sure foreground (sure_fg) into an unsigned 8-bit integer. We then subtract the sure foreground from the sure background (sure_bg) to get the unknown region. The unknown region is key for the Watershed algorithm because it signifies the transition region between distinct objects or between an object and the background.

# Finding unknown region
sure_fg = np.uint8(sure_fg)
unknown = cv2.subtract(sure_bg, sure_fg)imshow("SureFG", sure_fg)
imshow("SureBG", sure_bg)
imshow("unknown", unknown)

Exploring Image Segmentation Techniques: Watershed Algorithm using OpenCV (5)

Exploring Image Segmentation Techniques: Watershed Algorithm using OpenCV (6)

Exploring Image Segmentation Techniques: Watershed Algorithm using OpenCV (7)

sure_fg (Sure Foreground): The regions occupied by the coins, or rather, the centers of the coins (due to the use of the distance transform and subsequent thresholding), would be identified as the sure foreground.
sure_bg (Sure Background): The region surrounding the coins and any areas inside the coins that are large enough not to be removed by morphological operations are marked as the sure background. Essentially, these are areas where there are no coins.
unknown (Unknown Region): These regions are neither part of the sure foreground nor the sure background. These are the regions near the edges of the coins where the algorithm is not confident enough to assign them as foreground (coins) or background (area around coins).

6. Label the sure_bg, sure_fg and unknown regions: This involves creating a marker and labeling the regions inside it. The regions we mark are the sure background (sure_bg), sure foreground (sure_fg), and the unknown regions. Here's the code snippet for this step:

# Marker labelling
# Connected Components determines the connectivity of blob-like regions in a binary image.
ret, markers = cv2.connectedComponents(sure_fg)# Add one to all labels so that sure background is not 0, but 1
markers = markers+1
# Now, mark the region of unknown with zero
markers[unknown==255] = 0

Also, we want the sure background to be labeled differently from the sure foreground, we add 1 to all the labels in the marker image. After this operation, sure background pixels are labeled as 1, and the sure foreground pixels are labeled starting from 2.

7. Apply watershed algorithm

Next, step is applying watershed algorithm to the markers (the labelled regions found in previous steps)

markers = cv2.watershed(img,markers)
img[markers == -1] = [255,0,0]imshow("img", img)

Exploring Image Segmentation Techniques: Watershed Algorithm using OpenCV (8)

The cv2.watershed() function modifies the marker image (markers) itself. The borders of the objects are marked with -1 in the markers image. The different objects in the image are labeled with different positive integers. The regions which we were not sure whether they were background or foreground are determined by the watershed algorithm - they get either assigned to the background or some object, resulting in a clear boundary division among objects and background.

How the watershed algorithm works ?

The concept of “flooding” and “dam construction” in the Watershed Algorithm is essentially a metaphorical way to describe how the algorithm works

Flooding: The “flooding” process refers to the expansion of each labeled region (the markers) based on the gradient of the image. In this context, the gradient represents the topographic elevation, with high-intensity pixel values representing peaks and low-intensity pixel values representing valleys. The flooding starts from the valleys, or the regions with the lowest intensity values. The flooding process is carried out in such a way that each pixel in the image is assigned a label. The label it receives depends on which marker’s “flood” reaches it first. If a pixel is equidistant from multiple markers, it remains as part of the unknown region for now.
Dam Construction: As the flooding process continues, the floodwaters from different markers (representing different regions in the image) will eventually start to meet. When they do, a “dam” is constructed. In terms of the algorithm, this dam construction corresponds to the creation of boundaries in the marker image. These boundaries are assigned a special label (usually -1). The dams are constructed at the locations where the floodwaters from different markers meet, which are typically the areas of the image where there’s a rapid change in intensity — signifying the boundary between different regions in the image.

After applying the Watershed Algorithm, our marker image (that initially had labels for sure foreground, sure background, and unknown region) now contains labels for each distinct object in the image. We have effectively segmented the image into distinct objects (coins) and the background.

The Watershed Algorithm offers an intuitive and efficient approach to image segmentation, allowing the meaningful extraction of features from complex images. The practical implementation in Python with the OpenCV library further simplifies the process and offers a quick way to perform image segmentation. Though it can suffer from over-segmentation in its basic form, appropriate image preprocessing and parameter tuning can effectively counter this issue, making it a powerful tool in the realm of image analysis. Always remember that the choice of segmentation technique depends on the specific requirements and constraints of your project.

We will be exploring more techniques of image segmentation in the upcoming blogs :).