Artificial intelligence has this in itself that the effectiveness of developed algorithms depends on what data was used to teach them. This problem was noticed in the case of face recognition algorithms that coped well with white men’s photos. In the case of women with dark skin complexion, the same algorithms were much worse. That’s why IBM has prepared a new photo database that will be used to teach this type of algorithm.
This problem was noticed and publicized by Joy Buolamwini from M.I.T. Media Lab. The researcher, well-known among researchers, decided to check how artificial intelligence algorithms deal with the recognition of people in the pictures. Photographs that were used during the experiment were of good quality and well-lit. However, the algorithms studied were to determine the gender of the person in the picture. It turned out that the algorithms prepared by IBM, Microsoft and Face ++ had no problem determining the gender of white men. The error was only 1%. For women of the same complexion the error increased to 7%. For dark-skinned men, 12% of the decisions were wrong, and for dark-skinned women the error increased to 35%. The conclusion from this study was very simple.
Artificial intelligence can be “racist” if it is taught poorly selected data
This problem is very well-known to people who deal with machine learning on a daily basis. This is about dealing with unevenly differentiated data that is used to teach models of artificial intelligence. Tested algorithms for face recognition performed best with white men because they dominated among the images used to teach individual algorithms. That’s why IBM released a new database a few days ago that contains over a million photos of different people. At the same time, the company made sure that the photographs presented people with different skin tones and that representatives of various countries would be included in the database. The second database contains photos of 36,000 people, which can be used to test previously learned algorithms. Also in this case, IBM made sure that the data was appropriately diversified.