Audio Annotation Services

Audio annotation plays an important role in the development of chatbots,

virtual assistants and other NLP technology. Mindy Support provides

comprehensive audio annotation services covering all of the various

annotation types listed below.

GET A QUOTE

Types of Audio Annotation Services We Provide

Sound Labeling

With sound labeling, the data annotators are given a recording and they need to separate all of the needed sounds and label them. For example, these can be certain keywords or the sound of a specific musical instrument.

Event Tracking

Event tracking evaluates performance of the sound event detection systems in multisource conditions similar to our everyday life, where the sound sources are rarely heard in isolation. In this task, there is no control over the number of overlapping sound events at each time, not in the training nor in the testing audio data.

Speech to Text Transcription

Speech to text transcription is an important part of creating NLP technology. It involves taking recorded speech and transcribing them to text while carefully labeling both words and sounds that the person pronounces. It is also important to use the right punctuation as well.

Audio Classification

Audio classification is listening and analyzing audio recordings. Using this data, the machines are able to differentiate between sounds and voice commands. This type of audio annotation is important in the development of virtual assistants, automatic speech recognition and text to speech systems. There are many different types of audio classification:

Types of Audio Classification

Acoustic Data Classification
Environmental Sound Classification
Music Classification
Natural Language Utterance Classification

Acoustic Data Classification

This form of data annotation involves identifying exactly where the sounds were recorded. The data annotators need to differentiate between all kinds of environments such as homes, schools, cafes and almost anything else. This is very useful for maintaining sound libraries for audio multimedia and in creating monitoring systems as well.

Environmental Sound Classification

Just like the name implies, the data annotators need to categorize various sounds that can be attributed to various environments. For example, there are certain sounds that are specific to cities such as construction, car horns, sirens and many other sounds. This is very useful for creating security systems that can identify sounds of break-ins and also for predictive maintenance as well.

Music Classification

There are many things that could be classified here such as the genre, instruments played, ensemble type and many other things as well. This type of annotation is very useful for organizing music libraries and improving user recommendations.

Natural Language Utterance Classification

This type of annotation requires classifying small details such as dialect, semantics and many other things found in human speech. This is very important because this is what allows chatbots and virtual assistance to better understand human speech.

Acoustic Data Classification

Environmental Sound Classification

Music Classification

Natural Language Utterance Classification

Multi Label Audio Annotation Tasks

Multi label audio annotation is the process of placing multiple labels to identify overlapping sound sources in temporally-complex urban soundscapes. There are three main types of tasks:

Binary-labeling

Determine whether a single suggested sound-source class was present or not in the recording. This task type provided both positive and negative labels explicitly.

One-stage multi-labeling

The data annotators are presented with a list of class labels and they need to select all the sound-source classes present in the audio.

Two-stage hierarchical multi-labeling

In the first stage, the audio is presented to the data annotators alongside a list of superclass labels and they need to identify the sounds. At the second stage, this same audio is given to a different data annotator This task type provides positive labels explicitly and negative labels implicitly.

Our Clients

Media and Entertainment

For this project we needed to identify all the sounds that were heard in the video and mark them in a list. Most of the sounds belonged to musical instruments and it was very difficult to distinguish the sounds of different instruments belonging to the same group (for example, plucked strings). Also on the audio there were sounds of nature, animals, human speech and emotions. There were over 750 labels in total.

Technique: Sound Labeling
Size: 100 000 audio files
Completion time: 18 days
Team: 10 FTE
Quality: >95%

Security

Our job was to classify each audio according to the sounds that were heard on the recording. It was necessary to distinguish voices (female, male, child), emotions (crying, screaming, laughing), sounds of nature (rain, wind, thunder), cities (car horn, traffic noise). There were over 50 labels in total.

Technique: Audio Classification
Size: 80 000 audio files
Completion time: 10 days
Team: 6 FTE
Quality: >95%

Information Technologies

Our client is working on an application that helps people with speech difficulties express themselves using their own voice. Within the framework of the project, annotators mark the categories that the sounds and speech on audio belong to. The audio recording was made with the participation of different users of the platform and each person has their own manner and difficulties with speech, and annotators must take into account the peculiarities of speech of each individual user on the audio they are working with.

Technique: Sound Labeling
Size: > 2 000 000 audio files
Completion time: ongoing
Team: 5 FTE
Quality: >98%

Automotive

We are working with the sophisticated audio annotation task when we must identify the timestamp and label human speech as well as all the background noise happening inside the vehicle such as radio, laughing, shouting, singing, animals and even silence. Also, some sounds had to be categorized by the level of violence and atmosphere they create around (positive, negative, neutral). As part of the project, we needed to mark up to 8 audio tracks – only 1 group of sounds should be marked on each track.

Technique: Event Tracking
Size: 2 500 audio files
Completion time: 15 days
Team: 45 FTE
Quality: >99%

Information Technologies

As part of the project, it was necessary to transcribe speech from audio recording into text, observing a high level of literacy and all punctuation. The audio recordings were in 4 languages – German, French, Italian and English. The audio was a recording of peoples’ speeches with different dialects, which complicated the work of the annotators.

Technique: Speech to Text
Size: 800 hours of audio
Completion time: 60 days
Team: 21 FTE
Quality: >98%

Why Choose Us

2000+ people
GDPR compliance
ISO 27001:2013 certification
Download the certificate
ISO 9001:2015 certification
Download the certificate

Let’s Expand with Mindy!

We have a minimum threshold for starting any new project, which is 735 productive man-hours a month (equivalent to 5 graphic annotators working on the task monthly).