What is Image Segmentation: The Basics and Key Techniques
For a human being, it is easy to look at a selfie and identify the face in the image. However, for a machine, it’s not so easy to identify the face of a person while separating it from the rest of the image (the background). If we wanted to train an ML system to recognize the face of a person in an image, we would need to train it with image segmentation.
Today we will take a close look at image segmentation, all of its major aspects, as well as the techniques used to perform this type of image annotation. Let’s start by getting an understanding of what image segmentation is.
What is Image Segmentation?
Image segmentation is the process of taking a digital image and dividing it into subgroups called segments, thereby reducing the overall complexity of the image and enabling the analysis and processing of each segment. If we delve into image segmentation further, we see that a segmentation image is all about assigning particular labels to pixels to identify objects, people, and other important elements.
Of the common use cases for image segmentation is object detection. Instead of having to process an entire image, what researchers do is use an image segmentation algorithm first to find objects of interest in the image. Then the object detector can operate on a bounding box that was defined by the algorithm. This reduces inference time while also improving accuracy.
The Various Stages of Image Segmentation
Image segmentation involves taking a lot of image inputs and generating an output which is a mask or a matrix with various elements that specify the object class or instance to which each pixel belongs. Many high-level image features, or heuristics, can be useful for image segmentation. These features are the basis for standard image segmentation algorithms, which use clustering algorithms such as edges and histograms.
Various neural network designs and implementations exist that are suitable for image segmentation. They usually contain the following basic components:
- An encoder — this is a series of layers that extract image features using progressively deeper and narrower filters. The encoder can be pre-trained on a similar task, such as image recognition, which allows it to leverage its existing knowledge to perform segmentation tasks.
- A decoder — is a series of layers that gradually convert the encoder’s output into a segmentation mask corresponding with the input image’s pixel resolution.
- Skip connections — multiple long-range neural network connections allow the model to identify features at different scales to enhance model accuracy.
Now that we learned about the basic components, let’s take a look at how the data annotation process can be done:
- Manual image segmentation – this requires human data annotators to manually prepare training datasets with labeling, semantic segmentation and other methods.
- Automated segmentation – a machine learning algorithm is able to perform some segmentation tasks, but usually requires some data validation work to make sure everything was done correctly.
Types of Image Segmentation
Image segmentation can be done in several different ways. Below you will find some of the most common techniques:
- Semantic image segmentation — this involves arranging the pixels in an image based on semantic class.
- Instance segmentation — this technique involves classifying pixels based on the instances of an object instead of classes.
- Panoptic segmentation — panoptic segmentation is a newer technique than the previous two mentioned above and is often expressed as a combination of semantic and instance segmentation. It predicts the identity of each object, separating every instance of each object in the image.
What Image Segmentation Techniques are Used to Annotate Data?
Here are some common image segmentation techniques:
- Edge-Based Segmentation. This is a popular image processing technique that identifies the edges of various objects in a given image. It helps to locate the features of associated objects in the image using information from the edges. Edge detection helps remove redundant information from the image, thereby reducing its size and facilitating analysis.
- Threshold-Based Segmentation. This is a simple image segmentation method where the pixels are divided based on their intensity relative to a given value or threshold. It is useful for segmenting objects with higher intensity than other objects or backgrounds.
- Region-Based Segmentation. This technique involves dividing an image into regions with similar characteristics. Every region is a group of pixels, which the algorithm locates through a seed point. When the algorithm finds the seed points, it can grow regions by adding more pixels or shrinking and merging them with other points.
- Cluster-Based Segmentation. Clustering algorithms are unsupervised classification algorithms that help identify hidden information in images. They enhance human vision by isolating clusters, shadings, and structures. The images are divided into clusters of pixels with similar characteristics, separating data elements and grouping similar elements into clusters.
- Watershed Segmentation. The definition of watersheds is transformations in a grayscale image. The algorithms treat images like topographic maps with pixel brightness determining the height. With watershed segmentation, lines are detected that form ridges and basins, marking the areas between the watershed lines. It divides images into multiple pixel heights, grouping pixels with the same gray value.
Applications of Image Segmentation
Image segmentation is one of the most important types of image annotation. There are many different applications of image segmentation. Some of the most popular ones are described below:
- Facial recognition. Facial recognition is a very popular security feature that can be found in smartphones and also security cameras. Image segmentation helps the AI-powered cameras identify unique features of each person’s face that only a particular person can access on a phone or system.
- Image-based search. Search engines, such as Google and Bing, offer image-based search capabilities that rely on image segmentation techniques to identify objects in a given image and compare their findings with the relevant images they find to give you search results.
- License plate identification. Many traffic lights and cameras use number plate identification to charge fines and help with searches. Number plate identification technology allows a traffic system to recognize a car and get its ownership-related information. It uses image segmentation to separate a number plate and its information from the rest of the objects present in its vision. This technology has simplified the fining process considerably for governments.
Mindy Support’s Customer Cases
Mindy Support has extensive experience realizing image segmentation projects for various industries and complexities. Some of the most interesting ones are listed below:
Semantic segmentation for a clothing store
A midsize online retailer was looking to boost their sales with AR try-on of their clothing. They needed to annotate images with semantic segmentation for the system to better model the bounds of the clothing item and human skin, thereby producing a better fit. We assembled a team of 45 data annotators who annotated 200,000 images in one month allowing the client to keep the project on schedule. It was very important that all of the boundaries were properly annotated so we needed to implement an automated QA process. This allowed us to maintain a quality score of 98%.
Image annotation for a client in the automotive industry
A US client in the automotive industry asked us to detect and annotate facial features on images. There were 9 labels. We had to annotate the head with bounding boxes, pupils and irises of both eyes with circles, eyelids with polylines. Frames were provided in sequences. There were images of people with and without glasses with transparent and toned glass. Images were taken in sequences, while a person was moving eyes from one to another side. Some extra efforts were required to annotate objects accurately. The client required >98% quality of annotations. We built the annotation process in several phases with extra approvals and quality checks.
Object detection and classification of interior objects
Our client needed to train the machine learning system to detect various interior objects and their classification (table, chair, kitchen cabinet, wardrobe, vase, etc.). There was a large list of object classes (100+) with minor characteristic differences and it was difficult to define the boundaries of each object since they were occluded by other objects. Given the large number of objects per image we needed to be very focused and carefully check the image so as not to miss a single object. We also faced challenges in determining the functional purpose of some objects. We actualized the project by preparing the videos and expanding the text materials. We also included an additional stage in the workflow which was to check the quality of the annotations and object detection.
Mindy Support’s Experience With Image Segmentation
Mindy Support is a global company for image annotation services and other types of data annotation. We are trusted by several Fortune 500 and GAFAM companies, as well as innovative startups. With nine years of experience under our belt and offices and representatives in Cyprus, Poland, Romania, The Netherlands, India, and Ukraine, Mindy Support’s team now stands strong with 2000+ professionals helping companies with their most advanced data annotation image challenges.