Data poisoning definition
Data poisoning is a method that people use to manipulate or harm machine learning algorithms. They do this by putting in corrupted or misleading information while the algorithms are training. The goal of data poisoning is to trick or change how the algorithms make predictions by purposely making them biased.
During a data poisoning attack, the attackers add carefully crafted data samples into the training data. These samples are made to take advantage of weaknesses in the machine learning algorithms which can make them give wrong or biased results. The attackers’ main goal is to control how the model acts when it is given a specific input later on.
See also: machine data, machine learning
Protection from data poisoning
- Robust data filtering. Use data filtering to identify and remove poisoned samples from training datasets. It can involve outlier detection, anomaly detection, or other statistical techniques to identify suspicious data points.
- Adversarial training. Incorporate adversarial training techniques that expose the model to intentionally crafted poisoned data during the training process. This way, the model will learn to recognize and mitigate the impact of data poisoning.
- Input validation and monitoring. Employ input validation techniques to detect anomalous or malicious inputs during deployment. Monitoring the model's predictions in real time can help identify sudden shifts or biased behavior, indicating a possible data poisoning attack.
- Model transparency and explainability. Use machine learning models or techniques that provide explanations for the model's predictions. It allows for better identification of bias or unexpected behavior caused by poisoned data.