· A Safety Assurance Framework,

requiring developers to make a high confidence safety case prior to deploying models whose capabilities exceed specified thresholds. Post deployment monitoring should also be a key component of assurance for highly capable AI systems as they become more widely adopted. These safety assurances should be subject to independent audits.
Principle: IDAIS-Venice, Sept 5, 2024

Published by IDAIS (International Dialogues on AI Safety)

Related Principles

3. Security and Safety

AI systems should be safe and sufficiently secure against malicious attacks. Safety refers to ensuring the safety of developers, deployers, and users of AI systems by conducting impact or risk assessments and ensuring that known risks have been identified and mitigated. A risk prevention approach should be adopted, and precautions should be put in place so that humans can intervene to prevent harm, or the system can safely disengage itself in the event an AI system makes unsafe decisions autonomous vehicles that cause injury to pedestrians are an illustration of this. Ensuring that AI systems are safe is essential to fostering public trust in AI. Safety of the public and the users of AI systems should be of utmost priority in the decision making process of AI systems and risks should be assessed and mitigated to the best extent possible. Before deploying AI systems, deployers should conduct risk assessments and relevant testing or certification and implement the appropriate level of human intervention to prevent harm when unsafe decisions take place. The risks, limitations, and safeguards of the use of AI should be made known to the user. For example, in AI enabled autonomous vehicles, developers and deployers should put in place mechanisms for the human driver to easily resume manual driving whenever they wish. Security refers to ensuring the cybersecurity of AI systems, which includes mechanisms against malicious attacks specific to AI such as data poisoning, model inversion, the tampering of datasets, byzantine attacks in federated learning5, as well as other attacks designed to reverse engineer personal data used to train the AI. Deployers of AI systems should work with developers to put in place technical security measures like robust authentication mechanisms and encryption. Just like any other software, deployers should also implement safeguards to protect AI systems against cyberattacks, data security attacks, and other digital security risks. These may include ensuring regular software updates to AI systems and proper access management for critical or sensitive systems. Deployers should also develop incident response plans to safeguard AI systems from the above attacks. It is also important for deployers to make a minimum list of security testing (e.g. vulnerability assessment and penetration testing) and other applicable security testing tools. Some other important considerations also include: a. Business continuity plan b. Disaster recovery plan c. Zero day attacks d. IoT devices

Published by ASEAN in ASEAN Guide on AI Governance and Ethics, 2024

Reliability and safety

Throughout their lifecycle, AI systems should reliably operate in accordance with their intended purpose. This principle aims to ensure that AI systems reliably operate in accordance with their intended purpose throughout their lifecycle. This includes ensuring AI systems are reliable, accurate and reproducible as appropriate. AI systems should not pose unreasonable safety risks, and should adopt safety measures that are proportionate to the magnitude of potential risks. AI systems should be monitored and tested to ensure they continue to meet their intended purpose, and any identified problems should be addressed with ongoing risk management as appropriate. Responsibility should be clearly and appropriately identified, for ensuring that an AI system is robust and safe.

Published by Department of Industry, Innovation and Science, Australian Government in AI Ethics Principles, Nov 7, 2019

· Safety Assurance Framework

Frontier AI developers must demonstrate to domestic authorities that the systems they develop or deploy will not cross red lines such as those defined in the IDAIS Beijing consensus statement. To implement this, we need to build further scientific consensus on risks and red lines. Additionally, we should set early warning thresholds: levels of model capabilities indicating that a model may cross or come close to crossing a red line. This approach builds on and harmonizes the existing patchwork of voluntary commitments such as responsible scaling policies. Models whose capabilities fall below early warning thresholds require only limited testing and evaluation, while more rigorous assurance mechanisms are needed for advanced AI systems exceeding these early warning thresholds. Although testing can alert us to risks, it only gives us a coarse grained understanding of a model. This is insufficient to provide safety guarantees for advanced AI systems. Developers should submit a high confidence safety case, i.e., a quantitative analysis that would convince the scientific community that their system design is safe, as is common practice in other safety critical engineering disciplines. Additionally, safety cases for sufficiently advanced systems should discuss organizational processes, including incentives and accountability structures, to favor safety. Pre deployment testing, evaluation and assurance are not sufficient. Advanced AI systems may increasingly engage in complex multi agent interactions with other AI systems and users. This interaction may lead to emergent risks that are difficult to predict. Post deployment monitoring is a critical part of an overall assurance framework, and could include continuous automated assessment of model behavior, centralized AI incident tracking databases, and reporting of the integration of AI in critical systems. Further assurance should be provided by automated run time checks, such as by verifying that the assumptions of a safety case continue to hold and safely shutting down a model if operated in an out of scope environment. States have a key role to play in ensuring safety assurance happens. States should mandate that developers conduct regular testing for concerning capabilities, with transparency provided through independent pre deployment audits by third parties granted sufficient access to developers’ staff, systems and records necessary to verify the developer’s claims. Additionally, for models exceeding early warning thresholds, states could require that independent experts approve a developer’s safety case prior to further training or deployment. Moreover, states can help institute ethical norms for AI engineering, for example by stipulating that engineers have an individual duty to protect the public interest similar to those held by medical or legal professionals. Finally, states will also need to build governance processes to ensure adequate post deployment monitoring. While there may be variations in Safety Assurance Frameworks required nationally, states should collaborate to achieve mutual recognition and commensurability of frameworks.

Published by IDAIS (International Dialogues on AI Safety) in IDAIS-Venice, Sept 5, 2024

· Build and Validate:

1 To develop a sound and functional AI system that is both reliable and safe, the AI system’s technical construct should be accompanied by a comprehensive methodology to test the quality of the predictive data based systems and models according to standard policies and protocols. 2 To ensure the technical robustness of an AI system rigorous testing, validation, and re assessment as well as the integration of adequate mechanisms of oversight and controls into its development is required. System integration test sign off should be done with relevant stakeholders to minimize risks and liability. 3 Automated AI systems involving scenarios where decisions are understood to have an impact that is irreversible or difficult to reverse or may involve life and death decisions should trigger human oversight and final determination. Furthermore, AI systems should not be used for social scoring or mass surveillance purposes.

Published by SDAIA in AI Ethics Principles, Sept 14, 2022

Fifth principle: Reliability

AI enabled systems must be demonstrably reliable, robust and secure. The MOD’s AI enabled systems must be suitably reliable; they must fulfil their intended design and deployment criteria and perform as expected, within acceptable performance parameters. Those parameters must be regularly reviewed and tested for reliability to be assured on an ongoing basis, particularly as AI enabled systems learn and evolve over time, or are deployed in new contexts. Given Defence’s unique operational context and the challenges of the information environment, this principle also requires AI enabled systems to be secure, and a robust approach to cybersecurity, data protection and privacy. MOD personnel working with or alongside AI enabled systems can build trust in those systems by ensuring that they have a suitable level of understanding of the performance and parameters of those systems, as articulated in the principle of understanding.

Published by The Ministry of Defence (MOD), United Kingdom in Ethical Principles for AI in Defence, Jun 15, 2022