How Facebook Uses AI to Moderate Content
There are currently around 2.7 billion users on Facebook who are posting 350 million photos every day which is 243,000 uploads every single minute. There is also a similar volume of other types of posts such as article shares, memes, and many others. Even though a lot of the posts are fun, informative, or commercial in nature, there are many others that spread false news, engage in cyberbullying and other malicious activity. Therefore, Facebook needs to be able to monitor and filter out such activity while allowing use permitted by its rules and regulations. Facebook is now starting to increasingly rely on AI to moderate user-posted content. Let’s take a look at how this technology works and the types of data annotation required to create it.
How Does New Technology Work?
Content that has been identified as hate speech, misinformation, spam, and other labels that violate Facebook’s terms and conditions are either flagged by human users or machine learning algorithms. If a certain case is a clear violation, then the post would get taken down right away, while others will be sent for additional review by human moderators. There are about 15,000 such moderators all over the world and they would review the flagged content in more or less chronological order.
Recently, Facebook adjusted its machine learning algorithms to give more weight to more important posts, meaning they will be given more priority, but it says that it will try to deal with the posts that do the most harm first. This could be content involving terrorism, child exploitation, or self-harm. Posts that have been marked as spam can wait in line since they are not urgent. Facebook did not specify the criteria by which they will judge which posts will be given such priority, but they will continue working on the algorithm to sort through the existing queue of posts that have been flagged for violation.
How are Such Machine Learning Algorithms Trained?
In order to make sure that the machine learning algorithms correctly identify high priority posts, training datasets need to be prepared and labeled with the necessary attributes the machines need to recognize. For example, if the post contains an image with a caption, the system will need to identify various aspects such as hate speech and its various symbols, weapons, drugs, and many other things. Also, it will need to understand the context of the image and text since not every image with alcohol, for example, will violate their policies.
In order to prepare such datasets, human data annotators would need to label all of the needed information in an image or text. We can only imagine the amount of data that would be needed to train algorithms that can go through all of the 350 million photos uploaded every day and check and make sure they do not violate any policies. In addition to image recognition, there would also need to be extensive image annotation done including methods such as tagging, polygon annotation, semantic segmentation, and many others. Keep in mind, this is only for images. We also need to keep in mind text-based posts which would require things like optical character recognition (OCR), classification and categorization, and many others.
Human Moderators are Still Needed
Even though machine learning has made many strides over the past couple of years, we are still far from the ability to completely replace human moderators with machines. Facebook’s goal with the use of AI and machine learning is to assist human workers to get through all of the content faster and make fewer mistakes. There are also a lot of gray areas that machines still cannot understand such as misinformation, bullying, and harassment. This is why Facebook will continue to use AI to augment the capabilities of human workers instead of completely replacing them.
Mindy Support is Assisting Companies Create New AI Products
Even though your product may require a lot less data than Facebook algorithms, we can provide you with comprehensive data annotation services. We have an extensive track record of successfully implementing data annotation projects of various complexities. Since we are one of the largest BPO providers in Eastern Europe we will be able to assemble a team for you quickly, regardless of the complexity or size of your project.