인공지능

What is Differential Privacy?

Published November 29, 2022

Updated December 9, 2022

Alex McFarland

We are living through the era of big data, which has focused even more attention on the topic of data privacy. Humans produce an incredible amount of data each second, and companies use this data for a wide range of applications. With the storing and sharing of data at an unprecedented pace, there must be more privacy protection techniques.

Differential privacy is one such approach to protecting personal data, and it has proven more effective than many of our traditional methods. It can be defined as a system for publicly sharing information about a dataset by describing patterns of groups within the dataset while withholding information about the individuals in the data set.

Differential privacy enables researchers and database analysts to obtain valuable information from databases without divulging the personal identification information about the individuals. This is critical as many databases contain a variety of personal information.

Another way of looking at differential privacy is that it creates anonymous data by injecting noise into the datasets. The introduced noise helps protect privacy while still being limited enough so analysts can reliably use the data.

You can have two near-identical datasets. One with your personal information and one without it. With differential privacy, you can ensure that the probability that a statistical query will produce a given result is the same regardless of which database it’s performed on.

How Does Differential Privacy Work?

The way differential privacy works is by introducing a privacy loss or privacy budget parameter, which is often denoted as epsilon (ε), to the dataset. These parameters controle how much noise or randomness is added to the raw dataset.

For example, imagine you have a column in the dataset with “Yes”/”No” answers from individuals.

Now, suppose you flip a coin for every individual:

Heads: the answer is left as is.
Tails: you flip a second time, recording the answer as “Yes” if heads and “No” if tails, regardless of the real answer.

By using this process, you add randomness to the data. With a large amount of data and the information from the noise-adding mechanism, the dataset will stay accurate in terms of aggregate measurements. The privacy comes in by allowing every single individual to plausibly deny their real answer thanks to the randomization process.

While this is a simplistic example of differential privacy, it provides a base-level of understanding. In real-world applications, the algorithms are more complex.

It’s also important to note that differential privacy can be implemented locally, where the noise is added to individual data before it is centralized in the database, or globally, where the noise is added to raw data after it is collected from individuals.

Examples of Differential Privacy

Differential privacy is applied across a wide range of applications like recommendation systems, social networks, and location-based services.

Here are some examples of how big companies rely on differential privacy:

Apple uses the method to gather anonymous usage insights from devices like IPhones and Macs.
Facebook uses differential privacy to collect behavioral data that can be used for targeted advertising campaigns.
Amazon relies on the technique to gain insights into personalized shopping preferences while hiding sensitive information.

Apple has been especially transparent about its use of differential privacy to gain insight into users while preserving their privacy.

“Apple has adopted and further developed a technique known in the academic world as local differential privacy to do something really exciting: gain insight into what many Apple users are doing, while helping to preserve the privacy of individual users. It is a technique that enables Apple to learn about the user community without learning about individuals in the community. Differential privacy transforms the information shared with Apple before it ever leaves the user’s device such that Apple can never reproduce the true data.”

– Apple’s Differential Privacy Overview

Applications of Differential Privacy

Since we live in this era of big data, there are many data breaches that threaten governments, organizations, and companies. At the same time, today’s machine learning applications rely on learning techniques that require large amounts of training data, often coming from individuals. Research institutions also use and share data with confidential information. Improper disclosure of this data in any way can cause many problems for both the individual and the organization, and in severe cases, it can lead to civil liability.

Formal privacy models like differential privacy address all of these problems. They are used to protect personal information, real-time location, and more.

By using differential privacy, companies can access a large amount of sensitive data for research or business without compromising the data. Research institutions can also develop specific differential privacy technologies to automate privacy processes in cloud-sharing communities, which are becoming increasingly popular.

Why Use Differential Privacy?

Differential privacy offers a few main properties that make it an excellent framework for analyzing private data while ensuring privacy:

Quantification of Privacy Loss: Differential privacy mechanisms and algorithms can measure privacy loss, which enables it to be compared to other techniques.
Composition: Since you can quantify privacy loss, you can also analyze and control it over multiple computations, enabling the development of different algorithms.
Group Privacy: Besides the individual level, differential privacy enables you to analyze and control privacy loss among larger groups.
Secure in Post-Processing: Differential privacy cannot be harmed by post-processing. For example, a data analyst can’t compute a function of the output of a differential private algorithm and turn it less differentially private.

Benefits of Differential Privacy

As we mentioned earlier, differential privacy is better than many traditional privacy techniques. For example, if all available information is identified information, differential privacy makes it easier to identify all elements of the data. It is also resistant to privacy attacks based on auxiliary information, preventing attacks that can be carried out on de-identified data.

One of the greatest benefits of differential privacy is that it is compositional, meaning you can compute the privacy loss of conducting two differentially private analyses over the same data. This is done by summing up individual privacy losses for the two analyses.

While differential privacy is a new tool and can be difficult to achieve outside research communities, easy-to-implement solutions for data privacy are becoming more accessible. In the near future, we should see an increasing number of these solutions available to a wider public.

Related Topics:AI artificial intelligence data security