Providing Comprehensive Optical Character Recognition in 14 Languages

  • Text annotation languages
  • Client Profile:

    Industry: IT & Software
    Location: USA
    Size: 5,000+

  • Company Bio:

    The client develops Optical Character Recognition (OCR) technology widely used in different apps and software to detect, transcribe, and translate text after.

Business Problem

The client was working on a project with subprojects in various languages: German, French, Spanish, Swedish, English, Ukrainian, Russian, and Turkish. The company was looking for a service provider who could annotate the text in images and detect the various types of text in the datasets (both with Latin and Cyrillic alphabets). The accuracy of the annotation work was required to be 99+%. 

Why Mindy Support

The client selected Mindy Support as we had more than four years of experience carrying out OCR projects. Our European operational centers with our in-house teams were one of the advantages since we already had data annotators who were proficient in several European languages. Also, Mindy Support had proven expertise in quality management and was able to deal with projects that require an exceptional quality level, so we were ready to commit to 99+% accuracy. In addition, the client was looking for a vendor that could quickly scale the team to a few hundred full-time employees (FTEs) without sacrificing the quality. We were a good fit for this requirement since Mindy Support had one of the largest in-house data annotation teams in Europe and sufficient recruiting processes in place to scale up fast and with a high number of qualified candidates.  

Solutions Provided

After we assembled the needed personnel for the project, we divided the team into three units since this was a complex project. The three units corresponded with the three stages of the project: 

  • Data annotation team 
  • Quality validation team 
  • Quality assurance team

We were able to ramp up the team of annotators to 350 FTEs in 1.5 months without sacrificing the quality of the annotation work. The dataset was divided into various languages, and each language project had a dedicated team working on OCR in their respective language. The annotators needed to label the images and transcribe them into alphabetical characters, which were either Latin or Cyrillic. Once all of the annotation work was completed, our QA team worked closely with the client to make sure all of the QA metrics were met. We provided the client with detailed reports and quality metrics to maintain a high level of transparency and constantly keep the client updated on how the project is progressing. Since our QA team worked independently from the data annotation and quality validation teams, it essentially served as the client’s internal QA team, allowing the customer to save costs since the company did not need to search and train its own QA team for the project.  

Thanks to our diligence and ability to build effective workflows between teams, we were able to reach the needed 99%+ quality level in 99.5% of the cases. Throughout the duration of the project, the client was very satisfied with our progress, results, and ability to meet deadlines, and the project turned into long-term successful cooperation with additional projects in 14 languages in total.

Results Delivered to the Client

  • Close and effective collaboration over the course of 4 years
  • Provided OCR annotation in 14 languages such as Portuguese, Polish, Dutch, Norwegian, Danish, Italian, German, French, Spanish, Swedish, English, Ukrainian, Russian, and Turkish.
  • Comprehensive solution for the entire project lifecycle: data annotation, data validation, and QA
  • 99%+ quality score