The challenge
Perceptual hashing
Harmful and illegal-to-possess data are continuously seized and generated. For example, online child abuse and exploitation investigations routinely produce investigator labelled corpora of 100,000+ images or videos. These datasets are inherently offensive, psychologically harmful, and illegal to transmit or possess.
In a child exploitation investigation, law enforcement officers are given a couple of days to view a large amount of material before reporting to court for a successful conviction.
The original method required officers to view thousands of images, comparing photo files to identify similarities. In 2018, a method known as "perceptual hashing" was introduced and used algorithms to look for similarities between the content of the images, leaving a digital watermark to identify various forms of material.
Our response
Using AI and ML to scan images
The Data Airlock platform uses Artificial Intelligence (AI) and Machine Learning (ML) to scan through and filter confronting images faster than the previous methods, whilst also keeping analytics secure and restricted.
Data Airlock focuses on three key principles; protecting people from data, protecting data from people and analysing sensitive data in a safe and secure manner.
The results
Developing new algorithms to scan sensitive data
The design enables researchers to deliver new algorithms against sensitive data without being exposed to the data, using a Model-to-Data (MTD) paradigm; keeping information in secure vaults and permitting only manually vetted algorithms to operate on the data in isolated environments called airlocks.
Full analytical capability is achieved while keeping data custodians in absolute control. Researchers receive updates during executions and vetted outputs on completion for evaluation and action. Data Airlock's composition also allows trusted third parties to host the system securely.
Since it's inception, this project has attracted attention from a range of government agencies putting Data61 at the forefront of this area with the potential of future revenue streams.
Data Airlock will help law enforcement agencies to utilise talents from the public to make law enforcement more efficient and accurate and thereby helping to swiftly remove predators from society.