Adversarial attacks definition
Adversarial attacks are malicious techniques designed to trick artificial intelligence (AI) and machine learning systems into making mistakes. Attackers do this by making subtle, carefully-planned changes to the data they feed into the AI model. These small modifications can cause the AI system to produce completely wrong results.
These attacks are becoming a serious concern in AI security, especially in critical applications like facial recognition, autonomous vehicles, medical diagnosis systems, and cybersecurity tools where accuracy is essential.
See also: adversarial machine learning
Types of adversarial attacks
- Evasion attacks. An attacker introduces malicious input to a trained model to cause it to make incorrect predictions. The changes to the input data are often minimal but specifically crafted to mislead the model.
- Poisoning attacks. In this type of attack, the adversary contaminates the training data used to build the ML model. By injecting malicious data points into the training set, attackers can manipulate the model's learning process, causing it to learn incorrect patterns and make biased or erroneous predictions when deployed.
- Model extraction attacks. These attacks aim to replicate or steal an ML model by querying it repeatedly and observing its outputs. The attacker then uses this information to create a substitute model that mimics the behavior of the original, potentially bypassing its security measures.
- Byzantine attacks. Byzantine attacks involve malicious participants submitting corrupted model updates or gradients to poison the global model. This can degrade the model's performance or introduce backdoors.