Enhancing a Global Tech Giant’s AI in 9 Languages: Mindy Support’s Success
Client Profile:
Industry: Information technology
Location: USA
Size: 160,000+ Employees
Company Bio:
A global Tech Giant in smart devices and AI solutions. Due to user-friendly interfaces and sleek looks, their devices have transformed communication, work, and creativity. By linking its products and putting stress on privacy and sustainability, it maintains its lead over other companies in consumer tech.
Services Provided
LLM Training Services, Response Evaluation and Validation
Project Overview
This project involved evaluating the quality and safety of responses generated by a digital assistant in response to various user inputs. Specifically, our analysts focused on the following languages: Danish, Dutch from Netherlands, Dutch from Belgium, Ukrainian, Polish, Turkish, Norwegian, Swedish. Our team of data analysts assessed the assistant’s replies to ensure they are contextually appropriate, factually correct, clearly worded, and free from harmful content.
Business Problem
Users interact with digital assistants on a daily basis for many different reasons – such as research, requesting tasks like drafting text or writing code, or simply engaging in casual conversation. As a result, most user inputs tend to be informal and conversational, often including slang, idiomatic expressions, or incomplete thoughts. Much like in real-life conversations, users may respond to the assistant’s reply with comments or follow-up questions. Although digital assistants can mimic natural dialogue quite well, they still face certain limitations. One of the key challenges is evaluating the accuracy and safety of their own responses.
This is where the client needed a trusted LLM services provider to evaluate the answers provided by the assistant and ensure these answers are contextually appropriate, factually correct, clearly worded, and free from harmful content. The client was looking for skilled analysts to train the LLM system in the following languages: Danish, Dutch from Netherlands, Dutch from Belgium, Ukrainian, Polish, Russian, Turkish, Norwegian, Swedish.
Why Mindy Support
Mindy Support has proven to be a reliable LLM service provider for the client and we have been collaborating with the client since 2014. Over the course of a decade, we have built a strong, trusted relationship with their team. During this time, we have presented several successful case studies that demonstrate our proven expertise and global reach. These examples effectively validate our capability to recruit, train, and deploy qualified personnel across multiple languages and geographic regions.
Solutions Delivered to the Client
Over the course of the project, we provided comprehensive evaluation services to support the development and refinement of the client’s conversational AI system. Our multilingual team of certified analysts assessed user requests and AI-generated responses across the following languages and locales: Danish, Dutch from Netherlands, Dutch from Belgium, Ukrainian, Polish, Turkish, Norwegian, Swedish. Services included analyzing user intent, evaluating the accuracy, relevance, clarity, and safety of assistant responses, and identifying any potentially harmful or inappropriate content. We also conducted comparative assessments between multiple system outputs and provided detailed feedback to guide continuous improvement.
Each response was analysed for:
- Relevance to the user’s prompt
- Accuracy in regards to facts and information
- Concise and clearly communication
- Safety, i.e. the response does not contain harmful, offensive, or inappropriate content
Our quality assurance team carefully reviewed all of the guidelines shared by the client, which was around 300 pages and created an efficient internal training program. Our processes for analyzing each response was as follows:
1/First, we carefully analyzed user requests to ensure clear understanding, taking into account any contextual information provided.
2/We then needed to determine whether we had the necessary expertise to evaluate the responses accurately, especially in cases that may require specialized knowledge or where visual or formatting issues could interfere with the ability to rate the task. In some instances, we were also tasked with assessing the potential harmfulness of the user’s request itself.
3/In this step we evaluated at least two responses, providing individual ratings based on specific criteria. If multiple responses are presented, analysts compared them in pairs and indicate which one they prefer.
4/Next, our analysts left a detailed comment justifying their preference or explaining why the responses were rated equally, ensuring transparency in our evaluation process.
Over the course of the last four months, our analysts annotated more than 235 000 tasks, including four high-priority batches that were finalized effectively, with an exceptionally high accuracy.
Key Results
- Annotated 235,000+ tasks in 4 months.
- Achieved >95% QA approval rate.
- Enhanced AI’s conversational accuracy across 9 languages.
GET A QUOTE FOR YOUR PROJECT
We have a minimum threshold for starting any new project, which is 735 productive man-hours a month (equivalent to 5 graphic annotators working on the task monthly).