3

We also recommend defining clear red lines that, if crossed, mandate immediate termination of an AI system — including all copies — through rapid and safe shut down procedures. Governments should cooperate to instantiate and preserve this capacity. Moreover, prior to deployment as well as during training for the most advanced models, developers should demonstrate to regulators’ satisfaction that their system(s) will not cross these red lines.
Principle: IDAIS-Oxford, Oct 31, 2023

Published by IDAIS (International Dialogues on AI Safety)

Related Principles

4. Human centricity

AI systems should respect human centred values and pursue benefits for human society, including human beings’ well being, nutrition, happiness, etc. It is key to ensure that people benefit from AI design, development, and deployment while being protected from potential harms. AI systems should be used to promote human well being and ensure benefit for all. Especially in instances where AI systems are used to make decisions about humans or aid them, it is imperative that these systems are designed with human benefit in mind and do not take advantage of vulnerable individuals. Human centricity should be incorporated throughout the AI system lifecycle, starting from the design to development and deployment. Actions must be taken to understand the way users interact with the AI system, how it is perceived, and if there are any negative outcomes arising from its outputs. One example of how deployers can do this is to test the AI system with a small group of internal users from varied backgrounds and demographics and incorporate their feedback in the AI system. AI systems should not be used for malicious purposes or to sway or deceive users into making decisions that are not beneficial to them or society. In this regard, developers and deployers (if developing or designing inhouse) should also ensure that dark patterns are avoided. Dark patterns refer to the use of certain design techniques to manipulate users and trick them into making decisions that they would otherwise not have made. An example of a dark pattern is employing the use of default options that do not consider the end user’s interests, such as for data sharing and tracking of the user’s other online activities. As an extension of human centricity as a principle, it is also important to ensure that the adoption of AI systems and their deployment at scale do not unduly disrupt labour and job prospects without proper assessment. Deployers are encouraged to take up impact assessments to ensure a systematic and stakeholder based review and consider how jobs can be redesigned to incorporate use of AI. Personal Data Protection Commission of Singapore’s (PDPC) Guide on Job Redesign in the Age of AI6 provides useful guidance to assist organisations in considering the impact of AI on its employees, and how work tasks can be redesigned to help employees embrace AI and move towards higher value tasks.

Published by ASEAN in ASEAN Guide on AI Governance and Ethics, 2024

· 8. Robustness

Trustworthy AI requires that algorithms are secure, reliable as well as robust enough to deal with errors or inconsistencies during the design, development, execution, deployment and use phase of the AI system, and to adequately cope with erroneous outcomes. Reliability & Reproducibility. Trustworthiness requires that the accuracy of results can be confirmed and reproduced by independent evaluation. However, the complexity, non determinism and opacity of many AI systems, together with sensitivity to training model building conditions, can make it difficult to reproduce results. Currently there is an increased awareness within the AI research community that reproducibility is a critical requirement in the field. Reproducibility is essential to guarantee that results are consistent across different situations, computational frameworks and input data. The lack of reproducibility can lead to unintended discrimination in AI decisions. Accuracy. Accuracy pertains to an AI’s confidence and ability to correctly classify information into the correct categories, or its ability to make correct predictions, recommendations, or decisions based on data or models. An explicit and well formed development and evaluation process can support, mitigate and correct unintended risks. Resilience to Attack. AI systems, like all software systems, can include vulnerabilities that can allow them to be exploited by adversaries. Hacking is an important case of intentional harm, by which the system will purposefully follow a different course of action than its original purpose. If an AI system is attacked, the data as well as system behaviour can be changed, leading the system to make different decisions, or causing the system to shut down altogether. Systems and or data can also become corrupted, by malicious intention or by exposure to unexpected situations. Poor governance, by which it becomes possible to intentionally or unintentionally tamper with the data, or grant access to the algorithms to unauthorised entities, can also result in discrimination, erroneous decisions, or even physical harm. Fall back plan. A secure AI has safeguards that enable a fall back plan in case of problems with the AI system. In some cases this can mean that the AI system switches from statistical to rule based procedure, in other cases it means that the system asks for a human operator before continuing the action.

Published by The European Commission’s High-Level Expert Group on Artificial Intelligence in Draft Ethics Guidelines for Trustworthy AI, Dec 18, 2018

· Safety Assurance Framework

Frontier AI developers must demonstrate to domestic authorities that the systems they develop or deploy will not cross red lines such as those defined in the IDAIS Beijing consensus statement. To implement this, we need to build further scientific consensus on risks and red lines. Additionally, we should set early warning thresholds: levels of model capabilities indicating that a model may cross or come close to crossing a red line. This approach builds on and harmonizes the existing patchwork of voluntary commitments such as responsible scaling policies. Models whose capabilities fall below early warning thresholds require only limited testing and evaluation, while more rigorous assurance mechanisms are needed for advanced AI systems exceeding these early warning thresholds. Although testing can alert us to risks, it only gives us a coarse grained understanding of a model. This is insufficient to provide safety guarantees for advanced AI systems. Developers should submit a high confidence safety case, i.e., a quantitative analysis that would convince the scientific community that their system design is safe, as is common practice in other safety critical engineering disciplines. Additionally, safety cases for sufficiently advanced systems should discuss organizational processes, including incentives and accountability structures, to favor safety. Pre deployment testing, evaluation and assurance are not sufficient. Advanced AI systems may increasingly engage in complex multi agent interactions with other AI systems and users. This interaction may lead to emergent risks that are difficult to predict. Post deployment monitoring is a critical part of an overall assurance framework, and could include continuous automated assessment of model behavior, centralized AI incident tracking databases, and reporting of the integration of AI in critical systems. Further assurance should be provided by automated run time checks, such as by verifying that the assumptions of a safety case continue to hold and safely shutting down a model if operated in an out of scope environment. States have a key role to play in ensuring safety assurance happens. States should mandate that developers conduct regular testing for concerning capabilities, with transparency provided through independent pre deployment audits by third parties granted sufficient access to developers’ staff, systems and records necessary to verify the developer’s claims. Additionally, for models exceeding early warning thresholds, states could require that independent experts approve a developer’s safety case prior to further training or deployment. Moreover, states can help institute ethical norms for AI engineering, for example by stipulating that engineers have an individual duty to protect the public interest similar to those held by medical or legal professionals. Finally, states will also need to build governance processes to ensure adequate post deployment monitoring. While there may be variations in Safety Assurance Frameworks required nationally, states should collaborate to achieve mutual recognition and commensurability of frameworks.

Published by IDAIS (International Dialogues on AI Safety) in IDAIS-Venice, Sept 5, 2024

Responsible Deployment

Principle: The capacity of an AI agent to act autonomously, and to adapt its behavior over time without human direction, calls for significant safety checks before deployment, and ongoing monitoring. Recommendations: Humans must be in control: Any autonomous system must allow for a human to interrupt an activity or shutdown the system (an “off switch”). There may also be a need to incorporate human checks on new decision making strategies in AI system design, especially where the risk to human life and safety is great. Make safety a priority: Any deployment of an autonomous system should be extensively tested beforehand to ensure the AI agent’s safe interaction with its environment (digital or physical) and that it functions as intended. Autonomous systems should be monitored while in operation, and updated or corrected as needed. Privacy is key: AI systems must be data responsible. They should use only what they need and delete it when it is no longer needed (“data minimization”). They should encrypt data in transit and at rest, and restrict access to authorized persons (“access control”). AI systems should only collect, use, share and store data in accordance with privacy and personal data laws and best practices. Think before you act: Careful thought should be given to the instructions and data provided to AI systems. AI systems should not be trained with data that is biased, inaccurate, incomplete or misleading. If they are connected, they must be secured: AI systems that are connected to the Internet should be secured not only for their protection, but also to protect the Internet from malfunctioning or malware infected AI systems that could become the next generation of botnets. High standards of device, system and network security should be applied. Responsible disclosure: Security researchers acting in good faith should be able to responsibly test the security of AI systems without fear of prosecution or other legal action. At the same time, researchers and others who discover security vulnerabilities or other design flaws should responsibly disclose their findings to those who are in the best position to fix the problem.

Published by Internet Society, "Artificial Intelligence and Machine Learning: Policy Paper" in Guiding Principles and Recommendations, Apr 18, 2017

Third principle: Understanding

AI enabled systems, and their outputs, must be appropriately understood by relevant individuals, with mechanisms to enable this understanding made an explicit part of system design. Effective and ethical decision making in Defence, from the frontline of combat to back office operations, is always underpinned by appropriate understanding of context by those making decisions. Defence personnel must have an appropriate, context specific understanding of the AI enabled systems they operate and work alongside. This level of understanding will naturally differ depending on the knowledge required to act ethically in a given role and with a given system. It may include an understanding of the general characteristics, benefits and limitations of AI systems. It may require knowledge of a system’s purposes and correct environment for use, including scenarios where a system should not be deployed or used. It may also demand an understanding of system performance and potential fail states. Our people must be suitably trained and competent to operate or understand these tools. To enable this understanding, we must be able to verify that our AI enabled systems work as intended. While the ‘black box’ nature of some machine learning systems means that they are difficult to fully explain, we must be able to audit either the systems or their outputs to a level that satisfies those who are duly and formally responsible and accountable. Mechanisms to interpret and understand our systems must be a crucial and explicit part of system design across the entire lifecycle. This requirement for context specific understanding based on technically understandable systems must also reach beyond the MOD, to commercial suppliers, allied forces and civilians. Whilst absolute transparency as to the workings of each AI enabled system is neither desirable nor practicable, public consent and collaboration depend on context specific shared understanding. What our systems do, how we intend to use them, and our processes for ensuring beneficial outcomes result from their use should be as transparent as possible, within the necessary constraints of the national security context.

Published by The Ministry of Defence (MOD), United Kingdom in Ethical Principles for AI in Defence, Jun 15, 2022