Innehållet på den här sidan är tyvärr inte tillgängligt på det språk du har valt.

Hoppa till huvudinnehåll


Hem Data perturbation

Data perturbation

Data perturbation definition

Data perturbation is the intentional modification of sensitive information in a dataset to protect the privacy of individuals without compromising the analytical value of the data. Data perturbation adds noise to the original data in a controlled manner, making it more challenging for unauthorized entities to extract sensitive information.

Data perturbation is often used in the context of differential privacy — a privacy-preserving framework that aims to provide strong privacy guarantees for individuals in datasets.

See also: differential privacy, sensitive information, data analytics

Common data perturbation methods

  • Adding random values to numerical data to introduce noise. The degree of perturbation can be controlled to balance privacy and utility.
  • Using Laplace distribution to add random noise to numerical data.
  • Shuffling or permuting the values of categorical data to obscure the association between individuals and their categorical attributes.
  • Modifying temporal data (such as timestamps) — for example, by introducing random time shifts or adding noise to the time values.
  • Swapping values between different records in the dataset.
  • Introducing synonyms or changing the word order to modify text data. This protects the privacy of textual information while maintaining its general meaning.
  • Rounding or binning values of numerical data. However, this method also reduces the precision of the data.