Police lab wants your happy childhood pictures to train AI to detect child abuse
Like the Hotdog, Not Hotdog app but more Kidnapped, Not Kidnapped
Updated Australia's federal police and Monash University are asking netizens to send in snaps of their younger selves to train a machine-learning algorithm to spot child abuse in photographs.
Researchers are looking to collect images of people aged 17 and under in safe scenarios; they don't want any nudity, even if it's a relatively innocuous picture like a child taking a bath. The crowdsourcing campaign, dubbed My Pictures Matter, is open to those aged 18 and above, who can consent to having their photographs be used for research purposes.
All the images will be amassed into a dataset managed by Monash academics in an attempt to train an AI model to tell the difference between a minor in a normal environment and an exploitative, unsafe situation. The software could, in theory, help law enforcement better automatically and rapidly pinpoint child sex abuse material (aka CSAM) in among thousands upon thousands of photographs under investigation, avoiding having human analysts inspect every single snap.
Reviewing this horrific material can be a slow process
Australian Federal Police's leading senior constable Janis Dalins said the resulting AI could potentially help identify victims and flag up unlawful material not previously known to officers.
"In 2021, the AFP-led Australian Centre to Counter Child Exploitation received more than 33,000 reports of online child exploitation and each report can contain large volumes of images and videos of children being sexually assaulted or exploited for the gratification of offenders," he said this week.
Dalins is also the co-director of AiLECS Lab, the research collaboration between academics at Monash's Faculty of Information Technology and the AFP that is running the My Pictures Matter project.
"Reviewing this horrific material can be a slow process and the constant exposure can cause significant psychological distress to investigators," he added. "AiLECS Lab's initiatives will support police officers and the children we are trying to protect; and researchers have thought of an innovative way to ethically develop the technology behind such initiatives."
- Dear Europe, here again are the reasons why scanning devices for unlawful files is not going to fly
- Apple quietly deletes details of derided CSAM scanning tech from its Child Safety page without explanation
- Australia gave police power to compel sysadmins into assisting account takeovers – so they plan to use it
- Fake it until you make it: Can synthetic data help train your AI model?
The easiest way to compile a large dataset of pictures is to scrape the open internet. But, as some of the latest AI models – such as OpenAI's DALL·E 2 and Google's Imagen – have shown, the quality of this data is difficult to control. Biased or inappropriate images can creep into the dataset, making the models problematic and potentially less effective.
Instead, the team at AiLECS believe their crowdsourcing campaign provides an easier and more ethical way to collect photographs of children. "To develop AI that can identify exploitative images, we need a very large number of children's photographs in everyday 'safe' contexts that can train and evaluate the AI models intended to combat child exploitation," Campbell Wilson, co-director of AiLECS and an associate professor at Monash University, said.
By obtaining photographs from adults, through informed consent, we are trying to build technologies that are ethically accountable and transparent
"But sourcing these images from the internet is problematic when there is no way of knowing if the children in those pictures have actually consented for their photographs to be uploaded or used for research. By obtaining photographs from adults, through informed consent, we are trying to build technologies that are ethically accountable and transparent."
People only need to send in their personal photographs and an email address as part of the campaign. Nina Lewis, a project lead and research fellow at the lab, said it wasn't going to log any other types of personal information. The email addresses will be stored in a separate database, we're told.
"The images and related data will not include any identifying information, ensuring that images used by researchers cannot reveal any personal information about the people who are depicted," she said. Participants will be given updates at each stage of the project, and can ask to remove their images from the dataset if they want.
The project's noble aims are not technically impossible, and are highly ambitious, so we can't wait to see the results, given the challenges facing image-recognition systems, such as bias and adversarial attacks among other limitations.
The Register has asked Monash University for further details. ®
Updated to add on June 6
Monash's Dr Lewis has been in touch with some more details. She told us the aim is to build a dataset of 100,000 unique images to train the AI model.
"We'll be using the photos as training and testing data for new and existing algorithms that identify and classify 'safe' images of children," she added. "We'll also be researching how those technologies can be applied to make assessments on whether digital files contain 'unsafe' imagery of children.
"The My Pictures Matter project is not training AI on images of children in unsafe situations. We're investigating the opposite scenario: how to create ethically sourced and consentful datasets for use in machine learning to help tackle the growing volume of child abuse imagery being generated and distributed through online platforms."
Responding to some of your comments raising concerns about the ability of machine-learning systems, Dr Lewis added: "We recognize that automated tools need to be more than blunt instruments, and that, for example, the presence of a high proportion of skin tone in a visual image does not of itself indicate abuse."
For those worried about privacy safeguards on the data, Dr Lewis pointed to the "data handling" section on mypicturesmatter.org after clicking on "Let's go," which states:
* Photos and any other information you provide will be stored by the AiLECS Lab using Monash University IT infrastructure and/or secure cloud services with servers located in Australia. The dataset will not be hosted in any ‘open’ repositories, however a description of the dataset may be visible in public data registries.
* Access will be restricted to authorised members of the research team. Other researchers may only be granted access to images conditional on the approval of formal ethics processes, where you have given permission. You can update your data sharing preferences at any time by emailing us at firstname.lastname@example.org.
* Research data will be kept for a minimum of 5 years after completion of any projects that utilise the dataset. Records documenting consent will be kept until the research dataset has been deleted.
She also stressed that the images collected for the project will be held and used by the university, and not the cops directly.
"This is not a police dataset, and will not be held or managed by the AFP," Dr Lewis told us. "This research is being undertaken by Monash University, with formal human research ethics clearance for how data is collected, used, and managed."