Connect with us


MLaaS: Preventing API-Driven Model Theft With Variational Autoencoders




Machine Learning As a Service (MLaaS) commoditizes the fruits of expensive research and model training via APIs that give customers access to insights from the system. Though the reasoning of the system is inevitably revealed to some extent through these transactions, the core model architecture, the weights that define the utility of the model, and the specific training data that made it useful are jealously guarded for several reasons.

Firstly, the framework is likely to have exploited a number of free or open source (FOSS) code repositories, and potential rivals could trivially do likewise in pursuit of the same ends; secondly, in many cases the weights used by the models represent 95% or more of the model's ability to interpret training data better than rival models, and arguably constitute the core value of expensive investment, both in terms of research hours and high-scale, well-resourced model training on industry-grade GPUs.

Also, the mix of proprietary and public-facing data behind the model's training dataset is a potentially incendiary matter: where the data is ‘original' work obtained through costly methods, the ability of an API user to infer the data structure or content through API-permitted requests could allow them to essentially reconstruct the value of the work, either by understanding the schema of the data (allowing for practical reproduction) or by reproducing the weights that orchestrate the features of the data, which potentially allows for reproduction of an 'empty' but effective architecture into which subsequent material could be usefully processed.

Data Laundering

Further, the way that data is abstracted in the latent space of the machine learning model during training effectively ‘launders' it into generalized functions that make it difficult for copyright holders to understand if their original work has been assimilated without permission into a model.

The current laissez faire climate across the world regarding this practice is likely to fall under increasingly heavy regulation over the next 5-10 years. The EU's draft regulations for AI already contain strictures about provenance of data, and a putative transparency framework that would make it difficult for data gathering companies to circumvent domain regulations about web-scraping for research purposes. Other governments, including the US, are now committing to similar regulatory frameworks over the long term.

As the machine learning field evolves from a proof-of-concept culture into a viable commercial ecostructure, ML models found to have infringed restrictions on data, even in much earlier iterations of their products, could find themselves legally exposed.

Therefore the risk of inferring data sources over API calls relates not just to industrial espionage via model inversion and other methods, but arguably to emerging forensic methods for IP protection that may bear down on companies after the ‘wild west' era of machine learning research draws to a close.

API-Driven Exfiltration as a Means to Develop Adversarial Attack

Some machine learning frameworks constantly update their training data and algorithms, rather than deriving a definitive, long-term single model from a large corpus of historical data (as with GPT-3, for instance). These include systems related to traffic information, and other sectors where real-time data is critical to the ongoing value of an ML-driven service.

If a model's logic or data-weighting can be ‘mapped' by systematically polling it through APIs, these factors can potentially be turned against the system in the form of adversarial attacks, where maliciously crafted data can be left in the wild, in areas where the target system is likely to pick it up; or by infiltrating the data procurement routines by other methods.

Therefore measures against API-centric mapping have implications also for the security of machine learning models.

Preventing API-Driven Exfiltration

A number of research initiatives have arisen in recent years to provide methodologies that can prevent the inference of model architecture and specific source data via API calls. The latest of these is outlined in a preprint collaboration between researchers from the Indian Institute of Science at Bangalore and  Nference, an AI-based software platform based in Cambridge, Massachusetts.

Entitled Stateful Detection of Model Extraction Attacks, the research proposes a system called VarDetect, for which preliminary code has been made available at GitHub.

Running server-side, VarDetect continuously monitors user queries to an API, seeking three distinct patterns of repetitive model extraction pattern attacks. The researchers report that VarDetect is the first defense mechanism of its kind to hold up against all three types. Additionally it can counter the counter-measures of attackers that become aware of a defense mechanism, and which seek to defeat it by hiding the attack patterns with pauses, or increasing the volume of queries to obfuscate the requests which are attempting to build a map of the model.

The VarDetect architecture. Source:

The VarDetect architecture. Source:

VarDetect uses Variational Autoencoders (VAEs) to effectively create a heuristics-style evaluative probe for incoming requests. Unlike previous methods, the system is trained on proprietary data, obviating the need for access to attacker data, a weakness of previous approaches, and an unlikely scenario.

The custom model designed for the project is derived from three publicly available datasets or approaches: the work developed in 2016 by the Swiss Federal Institute of Technology and Cornell Tech; by adding noise to ‘problem domain' data, as first demonstrated in the 2017 PRADA paper from Finland; and by crawling public-facing images, inspired by the ActiveThief 2020 research from the Indian Institute of Science.

A comparison of benign and 'malignant' data samples across the five datasets used in VarDetect.

A comparison of benign and ‘malignant' data samples across the five datasets used in VarDetect.

Frequency distributions that match characteristics from the onboard dataset will be flagged as extraction signals.

The researchers concede that ordinary request patterns from benign end-users can potentially trigger false positives in the system, preventing normal usage. Therefore such perceived ‘safe' signals may subsequently be added to the VarDetect dataset, becoming incorporated into the algorithm through a rolling training schedule, depending on the preferences of the host system.