Achieving Global Diversity: How We Collected Over 1 Million Images for Facial Recognition
Client Profile
Industry: Information Technology
Location: USA
Size: 10,000+ employees
Company Bio
Our client is one of the largest and most influential technology companies in the United States, renowned globally for its innovation and leadership in the tech industry. The client is based out of Silicon Valley and has grown to become a dominant force in various sectors, including software development, cloud computing, hardware, artificial intelligence (AI), and digital services.
Services Provided
Project Overview
Mindy Support helped the client actualize a large-scale data collection project aimed at enhancing their facial recognition technology. The client required a diverse dataset of over 1 million images to improve the accuracy and performance of their AI model. To meet this challenge, we set an ambitious goal of recruiting 100,000 participants, each contributing multiple images to generate a dataset of 1 million unique images. Through targeted recruitment strategies, streamlined data collection processes, and effective participant engagement, we successfully achieved our goal, providing the client with a robust and diverse image dataset for their project.
Business Problem
The client was focused on perfecting a cutting-edge facial recognition technology and required a large, diverse image dataset to train their AI models effectively. To achieve the necessary accuracy, they needed at least 1 mln unique images that represented a wide array of cultures, ethnicities, and demographic backgrounds. The dataset had to encompass different parameters, including gender, age, skin color, and facial features, ensuring that the model could perform reliably across diverse populations and avoid biases that could arise from underrepresentation.
To meet these requirements, the client set specific guidelines for data collection. Each participant was asked to provide 100 photos, capturing various angles and expressions. There were no strict criteria for lighting, time of day, or the type of devices used to take the photos, allowing for natural variation in the data. Participants were invited to access a custom-built application via a unique link. Once inside, they filled out personal information, including name, gender, age, country of origin, nationality, and contact details. They were then able to upload and manage their photo submissions, with the option to delete any images they did not wish to include, as long as the final submission contained a minimum of 100 photos. Once the profile and images were submitted, the data would be integrated into the system, contributing to a rich, comprehensive dataset for training the facial recognition algorithm.
Why Mindy Support
The client chose Mindy Support for this ambitious project because of our global presence and proven expertise in large-scale data collection. We operate in over 25 countries and offer a diverse, skilled team, capable of managing projects across different cultures and languages. Our international reach allows us to source a wide variety of participants, ensuring the dataset met the client’s requirements for diversity in gender, age, and ethnicity. With a track record of delivering high-quality results on time, and ensuring full compliance with the laws of the regions where data is collected, as well as obtaining the necessary prior consent from project participants, Mindy Support was the ideal partner to help the client achieve their goals in perfecting facial recognition technology
Solutions Delivered to the Client
We set an ambitious goal of collecting over 1 million images from 100,000 participants, located across different countries and continents, to meet the client’s requirements for a diverse and representative dataset. The most challenging aspect of this project was ensuring both the large number of participants and the required diversity in terms of age, gender, and skin color. Recruiting participants aged 45 and older proved particularly difficult, as this demographic often lacks the necessary digital skills to efficiently participate in a project requiring the upload of 100 photos. Additionally, achieving the specified diversity in skin color, especially in regions like India and other countries in Asia, was challenging due to the complex variations in skin tone that needed to be accurately represented.
Our success in overcoming these challenges was largely due to our well-established network across the 25+ countries where we operate. This network allowed us to recruit a broad spectrum of participants and ensure adequate representation from each demographic group, including those that are typically harder to engage. Additionally, we implemented technical solutions that helped assess and categorize participants based on the required criteria, such as skin color, which was critical for meeting the client’s expectations. Through careful clustering of participants into relevant categories and leveraging our global operations, we were able to collect a diverse and high-quality dataset, ultimately achieving the project’s ambitious goals.
In the end, we successfully completed the project, recruiting more than 100,000 participants and surpassing the client’s goal with over 1 million images submitted. Our team’s expertise in large-scale recruitment, combined with our global network, ensured we delivered a diverse and representative dataset that met the client’s strict requirements for facial recognition training. The project was completed on time and to the client’s high standards, reinforcing our position as a trusted partner in data collection. We are now anticipating additional requests from the client for further data collection and annotation projects, as they continue to enhance their technology.
Key Results
- 100,000 participants recruited
- 1 million images submitted to the client
- Activated all of our recruiting resources in 25+ countries
GET A QUOTE FOR YOUR PROJECT
We have a minimum threshold for starting any new project, which is 735 productive man-hours a month (equivalent to 5 graphic annotators working on the task monthly).