Get Data Annotation Right the First Time
Category: AI insights
Published date: 10.06.2020
Read time: 6 min
While data annotation can often be tedious and time-consuming, it is very important that everything is done correctly so that the product can function as intended. Companies do not want their in-house to spend time annotating the data since they need to work on their core business functions and having them perform such low-level tasks is not a good use of their time. There needs to be a strategy in place that clearly stipulates how you manage the data annotation process. We assembled some key questions you need to answer before you begin a data annotation project to make sure everything is done right the first time.
How Sensitive is the Data?
If your dataset includes personally identifiable information (PII), then your data can be classified as sensitive. While PII usually refers to credit card numbers and social security numbers, as far as data annotation is concerned, this also includes peoples’ faces which can be used to track or identify them. Since there are many privacy regulations governing such data, it is always a good idea to be extra careful when working with this information.
If you are working with personally identifiable information, there are three to choose from:
- In-house annotation – While this will keep all of the data from leaving your company, we already mentioned above that this is not a good use of your developers’ time and it can also be very expensive. Your engineers, developers, data scientists, and other people involved in creating the product are paid a high salary for expertise in their given field, not for annotating data.
- Outsourcing annotation – Outsourcing can be more advantageous than the previous option, but you have to choose your service provider carefully. Think about all possible risks prior. You need to find a company that has its own employees, is GDPR compliant, and has ISO certifications.
- Blurring the images – You can blur the faces to “de-identify” all of the people in the image which would no longer make such data sensitive. This opens up possibilities like crowdsourcing. However, then you run the risk of poorly training the machine learning algorithms since it will be difficult for the system to differentiate peoples’ actual faces from the blurred ones.
What Can Be Automated?
While a lot of companies are still looking to hire human data annotations, the machines are getting so advanced that they can perform some tasks themselves. This includes things like data labeling, where they simply need to identify the given items in the image such as cars, traffic lights, and pedestrians. They can also perform 2D box annotation which simply involves superimposing a square or a rectangle around each object you are looking to identify.
Having said this, there are more advanced forms of data annotation such as semantic segmentation which is linking every pixel in the image with a class label. This form of data annotation is common when creating custom applications. For example, there are some products designed for the agriculture industry that use drones equipped with computer vision cameras to take aerial images of the fields. The AI system will analyze all of the images and can identify single stalks of grain that are damaged by pests or otherwise not growing properly.
How Will You Motivate Your Data Annotators?
Data annotation is very redundant work and after you have annotated hundreds of images, it can be very easy to lose concentration and motivation. In order to solve this problem, outsourcing providers offer monetary incentives to motivate employees as well as prizes they can earn for working hard and providing quality work. At Mindy Support, we also provide our employees with career advancement opportunities for our best performing team members. In fact, a lot of our middle managers today have started out as data annotators and worked their way up to become team leads.
How Will Accuracy Be Ensured?
It takes a lot of time and money to have to redo certain tasks, which is why it is very important to get everything right the first time. There needs to be a quality assurance process in place to identify poorly annotated images before it derails the entire project. Some data annotation tools offer automated QA processes but if such an option is not available, it has to be done manually to the best of your availability. If you choose to outsource your data annotation, be sure to inquire about the QA measures the service provider has in place and some data about how accurate they annotate the data.
Choose a Company That Has the Necessary Experience and Expertise
Mindy Support can assemble even the most sizable team within a short timeframe to get even the most complex data annotation projects done on time. We have rigorous security measures in place to prevent any data leaks and have a sophisticated QA process to get the job done right the first time around. Contact us today to find out why so many companies trust us to actualize their data annotation projects.
Posted by Il’ya Dudkin