Safe and Ethical AI (SEA) Platform Network · Linking Artificial Intelligence Principles (LAIP)

· Safety Assurance Framework

Frontier AI developers must demonstrate to domestic authorities that the systems they develop or deploy will not cross red lines such as those defined in the IDAIS Beijing consensus statement. To implement this, we need to build further scientific consensus on risks and red lines. Additionally, we should set early warning thresholds: levels of model capabilities indicating that a model may cross or come close to crossing a red line. This approach builds on and harmonizes the existing patchwork of voluntary commitments such as responsible scaling policies. Models whose capabilities fall below early warning thresholds require only limited testing and evaluation, while more rigorous assurance mechanisms are needed for advanced AI systems exceeding these early warning thresholds. Although testing can alert us to risks, it only gives us a coarse grained understanding of a model. This is insufficient to provide safety guarantees for advanced AI systems. Developers should submit a high confidence safety case, i.e., a quantitative analysis that would convince the scientific community that their system design is safe, as is common practice in other safety critical engineering disciplines. Additionally, safety cases for sufficiently advanced systems should discuss organizational processes, including incentives and accountability structures, to favor safety. Pre deployment testing, evaluation and assurance are not sufficient. Advanced AI systems may increasingly engage in complex multi agent interactions with other AI systems and users. This interaction may lead to emergent risks that are difficult to predict. Post deployment monitoring is a critical part of an overall assurance framework, and could include continuous automated assessment of model behavior, centralized AI incident tracking databases, and reporting of the integration of AI in critical systems. Further assurance should be provided by automated run time checks, such as by verifying that the assumptions of a safety case continue to hold and safely shutting down a model if operated in an out of scope environment. States have a key role to play in ensuring safety assurance happens. States should mandate that developers conduct regular testing for concerning capabilities, with transparency provided through independent pre deployment audits by third parties granted sufficient access to developers’ staff, systems and records necessary to verify the developer’s claims. Additionally, for models exceeding early warning thresholds, states could require that independent experts approve a developer’s safety case prior to further training or deployment. Moreover, states can help institute ethical norms for AI engineering, for example by stipulating that engineers have an individual duty to protect the public interest similar to those held by medical or legal professionals. Finally, states will also need to build governance processes to ensure adequate post deployment monitoring. While there may be variations in Safety Assurance Frameworks required nationally, states should collaborate to achieve mutual recognition and commensurability of frameworks.

Principle: IDAIS-Venice, Sept 5, 2024

Published by IDAIS (International Dialogues on AI Safety)

Related Principles

3. Security and Safety

AI systems should be safe and sufficiently secure against malicious attacks. Safety refers to ensuring the safety of developers, deployers, and users of AI systems by conducting impact or risk assessments and ensuring that known risks have been identified and mitigated. A risk prevention approach should be adopted, and precautions should be put in place so that humans can intervene to prevent harm, or the system can safely disengage itself in the event an AI system makes unsafe decisions autonomous vehicles that cause injury to pedestrians are an illustration of this. Ensuring that AI systems are safe is essential to fostering public trust in AI. Safety of the public and the users of AI systems should be of utmost priority in the decision making process of AI systems and risks should be assessed and mitigated to the best extent possible. Before deploying AI systems, deployers should conduct risk assessments and relevant testing or certification and implement the appropriate level of human intervention to prevent harm when unsafe decisions take place. The risks, limitations, and safeguards of the use of AI should be made known to the user. For example, in AI enabled autonomous vehicles, developers and deployers should put in place mechanisms for the human driver to easily resume manual driving whenever they wish. Security refers to ensuring the cybersecurity of AI systems, which includes mechanisms against malicious attacks specific to AI such as data poisoning, model inversion, the tampering of datasets, byzantine attacks in federated learning5, as well as other attacks designed to reverse engineer personal data used to train the AI. Deployers of AI systems should work with developers to put in place technical security measures like robust authentication mechanisms and encryption. Just like any other software, deployers should also implement safeguards to protect AI systems against cyberattacks, data security attacks, and other digital security risks. These may include ensuring regular software updates to AI systems and proper access management for critical or sensitive systems. Deployers should also develop incident response plans to safeguard AI systems from the above attacks. It is also important for deployers to make a minimum list of security testing (e.g. vulnerability assessment and penetration testing) and other applicable security testing tools. Some other important considerations also include: a. Business continuity plan b. Disaster recovery plan c. Zero day attacks d. IoT devices

Published by ASEAN in ASEAN Guide on AI Governance and Ethics, 2024

· Emergency Preparedness Agreements and Institutions

States should agree on technical and institutional measures required to prepare for advanced AI systems, regardless of their development timescale. To facilitate these agreements, we need an international body to bring together AI safety authorities, fostering dialogue and collaboration in the development and auditing of AI safety regulations across different jurisdictions. This body would ensure states adopt and implement a minimal set of effective safety preparedness measures, including model registration, disclosure, and tripwires. Over time, this body could also set standards for and commit to using verification methods to enforce domestic implementations of the Safety Assurance Framework. These methods can be mutually enforced through incentives and penalty mechanisms, such as conditioning access to markets on compliance with global standards. Experts and safety authorities should establish incident reporting and contingency plans, and regularly update the list of verified practices to reflect current scientific understanding. This body will be a critical initial coordination mechanism. In the long run, however, states will need to go further to ensure truly global governance of risks from advanced AI.

Published by IDAIS (International Dialogues on AI Safety) in IDAIS-Venice, Sept 5, 2024

· Build and Validate:

1 To develop a sound and functional AI system that is both reliable and safe, the AI system’s technical construct should be accompanied by a comprehensive methodology to test the quality of the predictive data based systems and models according to standard policies and protocols. 2 To ensure the technical robustness of an AI system rigorous testing, validation, and re assessment as well as the integration of adequate mechanisms of oversight and controls into its development is required. System integration test sign off should be done with relevant stakeholders to minimize risks and liability. 3 Automated AI systems involving scenarios where decisions are understood to have an impact that is irreversible or difficult to reverse or may involve life and death decisions should trigger human oversight and final determination. Furthermore, AI systems should not be used for social scoring or mass surveillance purposes.

Published by SDAIA in AI Ethics Principles, Sept 14, 2022

3 Ensure transparency, explainability and intelligibility

AI should be intelligible or understandable to developers, users and regulators. Two broad approaches to ensuring intelligibility are improving the transparency and explainability of AI technology. Transparency requires that sufficient information (described below) be published or documented before the design and deployment of an AI technology. Such information should facilitate meaningful public consultation and debate on how the AI technology is designed and how it should be used. Such information should continue to be published and documented regularly and in a timely manner after an AI technology is approved for use. Transparency will improve system quality and protect patient and public health safety. For instance, system evaluators require transparency in order to identify errors, and government regulators rely on transparency to conduct proper, effective oversight. It must be possible to audit an AI technology, including if something goes wrong. Transparency should include accurate information about the assumptions and limitations of the technology, operating protocols, the properties of the data (including methods of data collection, processing and labelling) and development of the algorithmic model. AI technologies should be explainable to the extent possible and according to the capacity of those to whom the explanation is directed. Data protection laws already create specific obligations of explainability for automated decision making. Those who might request or require an explanation should be well informed, and the educational information must be tailored to each population, including, for example, marginalized populations. Many AI technologies are complex, and the complexity might frustrate both the explainer and the person receiving the explanation. There is a possible trade off between full explainability of an algorithm (at the cost of accuracy) and improved accuracy (at the cost of explainability). All algorithms should be tested rigorously in the settings in which the technology will be used in order to ensure that it meets standards of safety and efficacy. The examination and validation should include the assumptions, operational protocols, data properties and output decisions of the AI technology. Tests and evaluations should be regular, transparent and of sufficient breadth to cover differences in the performance of the algorithm according to race, ethnicity, gender, age and other relevant human characteristics. There should be robust, independent oversight of such tests and evaluation to ensure that they are conducted safely and effectively. Health care institutions, health systems and public health agencies should regularly publish information about how decisions have been made for adoption of an AI technology and how the technology will be evaluated periodically, its uses, its known limitations and the role of decision making, which can facilitate external auditing and oversight.

Published by World Health Organization (WHO) in Key ethical principles for use of artificial intelligence for health, Jun 28, 2021

4 Foster responsibility and accountability

Humans require clear, transparent specification of the tasks that systems can perform and the conditions under which they can achieve the desired level of performance; this helps to ensure that health care providers can use an AI technology responsibly. Although AI technologies perform specific tasks, it is the responsibility of human stakeholders to ensure that they can perform those tasks and that they are used under appropriate conditions. Responsibility can be assured by application of “human warranty”, which implies evaluation by patients and clinicians in the development and deployment of AI technologies. In human warranty, regulatory principles are applied upstream and downstream of the algorithm by establishing points of human supervision. The critical points of supervision are identified by discussions among professionals, patients and designers. The goal is to ensure that the algorithm remains on a machine learning development path that is medically effective, can be interrogated and is ethically responsible; it involves active partnership with patients and the public, such as meaningful public consultation and debate (101). Ultimately, such work should be validated by regulatory agencies or other supervisory authorities. When something does go wrong in application of an AI technology, there should be accountability. Appropriate mechanisms should be adopted to ensure questioning by and redress for individuals and groups adversely affected by algorithmically informed decisions. This should include access to prompt, effective remedies and redress from governments and companies that deploy AI technologies for health care. Redress should include compensation, rehabilitation, restitution, sanctions where necessary and a guarantee of non repetition. The use of AI technologies in medicine requires attribution of responsibility within complex systems in which responsibility is distributed among numerous agents. When medical decisions by AI technologies harm individuals, responsibility and accountability processes should clearly identify the relative roles of manufacturers and clinical users in the harm. This is an evolving challenge and remains unsettled in the laws of most countries. Institutions have not only legal liability but also a duty to assume responsibility for decisions made by the algorithms they use, even if it is not feasible to explain in detail how the algorithms produce their results. To avoid diffusion of responsibility, in which “everybody’s problem becomes nobody’s responsibility”, a faultless responsibility model (“collective responsibility”), in which all the agents involved in the development and deployment of an AI technology are held responsible, can encourage all actors to act with integrity and minimize harm. In such a model, the actual intentions of each agent (or actor) or their ability to control an outcome are not considered.