Differential privacy definition
Differential privacy is a mathematical framework for protecting individual information in statistical analysis and data sharing. Differential privacy accomplishes this goal by adding controlled “noise” (randomness) to query results, making it hard to identify specific individuals in any given dataset.
See also: sensitive information
How differential privacy works
- 1.The data is collected from individual contributors in a way that ensures their privacy is protected.
- 2.A sensitivity analysis is performed on the queries that are intended to be applied to the data. Sensitivity measures how much the output of a query can change when a single individual's data is added or removed.
- 3.Depending on the level of sensitivity and the available privacy budget, a random noise generator adds the desired amount of noise to the query. In differential privacy, the privacy budget represents the total amount of privacy that can be “spent” (in other words, randomness introduced) over multiple queries. Too much noise can jeopardize the accuracy of the results without meaningfully improving privacy protection.
- 4.When the queries with the added noise are executed with, you get differentially private results. These results can be shared with external parties or used for further analysis without compromising the privacy of the individual contributors.
Common differential privacy use cases
- Collecting sensitive data for research.
- Conducting data analysis on aggregated datasets.
- Ensuring privacy in machine learning models.