很抱歉,此網頁上的內容未提供您所選擇的語言。

跳轉到主要內容

首頁 AI safety

AI safety

AI safety definition

AI safety is a multidisciplinary field that aims to ensure AI systems operate reliably, beneficially, and align with human values and intentions. AI safety exists to help handle the development of AI systems in fields like healthcare, finance, and content moderation, where the use of AI can significantly impact people’s lives if any failure occurs. AI safety is proactive, helping prevent biases and issues that can be overlooked in AI development.

The core objectives of AI safety cover:

  • Alignment: Ensuring AI systems pursue objectives aligned with human values and intentions.
  • Robustness. Building systems that perform reliably across different contexts, including unexpected or adversarial situations.
  • Interpretability. Making AI decision-making processes understandable and transparent to humans.
  • Control. Maintaining meaningful human oversight and the ability to intervene with systems or shut them down when necessary.

See also: AI TRiSM, cognitive technology, responsible AI

Key concerns

AI safety is necessary to avoid concerning AI development and use scenarios.

  • Unintended consequences: AI systems might be optimized for narrow goals in harmful ways.
  • Misalignment: Advanced AI might pursue objectives that do not align with human welfare.
  • Accidents: AI systems may cause errors in high-stakes domains like healthcare, transportation, and infrastructure, due to biases or erroneous data input.
  • Security vulnerabilities: AI systems may be manipulated or hacked by malicious entities.
  • Existential risks: Highly advanced AI systems could potentially pose long-term risks.