· Measurement and Evaluation

We should develop comprehensive methods and techniques to operationalize these red lines prior to there being a meaningful risk of them being crossed. To ensure red line testing regimes keep pace with rapid AI development, we should invest in red teaming and automating model evaluation with appropriate human oversight. The onus should be on developers to convincingly demonstrate that red lines will not be crossed such as through rigorous empirical evaluations, quantitative guarantees or mathematical proofs.
Principle: IDAIS-Beijing, May 10, 2024

Published by IDAIS (International Dialogues on AI Safety)

Related Principles

· Governance

Comprehensive governance regimes are needed to ensure red lines are not breached by developed or deployed systems. We should immediately implement domestic registration for AI models and training runs above certain compute or capability thresholds. Registrations should ensure governments have visibility into the most advanced AI in their borders and levers to stem distribution and operation of dangerous models. Domestic regulators ought to adopt globally aligned requirements to prevent crossing these red lines. Access to global markets should be conditioned on domestic regulations meeting these global standards as determined by an international audit, effectively preventing development and deployment of systems that breach red lines. We should take measures to prevent the proliferation of the most dangerous technologies while ensuring broad access to the benefits of AI technologies. To achieve this we should establish multilateral institutions and agreements to govern AGI development safely and inclusively with enforcement mechanisms to ensure red lines are not crossed and benefits are shared broadly.

Published by IDAIS (International Dialogues on AI Safety) in IDAIS-Beijing, May 10, 2024

· Safety Assurance Framework

Frontier AI developers must demonstrate to domestic authorities that the systems they develop or deploy will not cross red lines such as those defined in the IDAIS Beijing consensus statement. To implement this, we need to build further scientific consensus on risks and red lines. Additionally, we should set early warning thresholds: levels of model capabilities indicating that a model may cross or come close to crossing a red line. This approach builds on and harmonizes the existing patchwork of voluntary commitments such as responsible scaling policies. Models whose capabilities fall below early warning thresholds require only limited testing and evaluation, while more rigorous assurance mechanisms are needed for advanced AI systems exceeding these early warning thresholds. Although testing can alert us to risks, it only gives us a coarse grained understanding of a model. This is insufficient to provide safety guarantees for advanced AI systems. Developers should submit a high confidence safety case, i.e., a quantitative analysis that would convince the scientific community that their system design is safe, as is common practice in other safety critical engineering disciplines. Additionally, safety cases for sufficiently advanced systems should discuss organizational processes, including incentives and accountability structures, to favor safety. Pre deployment testing, evaluation and assurance are not sufficient. Advanced AI systems may increasingly engage in complex multi agent interactions with other AI systems and users. This interaction may lead to emergent risks that are difficult to predict. Post deployment monitoring is a critical part of an overall assurance framework, and could include continuous automated assessment of model behavior, centralized AI incident tracking databases, and reporting of the integration of AI in critical systems. Further assurance should be provided by automated run time checks, such as by verifying that the assumptions of a safety case continue to hold and safely shutting down a model if operated in an out of scope environment. States have a key role to play in ensuring safety assurance happens. States should mandate that developers conduct regular testing for concerning capabilities, with transparency provided through independent pre deployment audits by third parties granted sufficient access to developers’ staff, systems and records necessary to verify the developer’s claims. Additionally, for models exceeding early warning thresholds, states could require that independent experts approve a developer’s safety case prior to further training or deployment. Moreover, states can help institute ethical norms for AI engineering, for example by stipulating that engineers have an individual duty to protect the public interest similar to those held by medical or legal professionals. Finally, states will also need to build governance processes to ensure adequate post deployment monitoring. While there may be variations in Safety Assurance Frameworks required nationally, states should collaborate to achieve mutual recognition and commensurability of frameworks.

Published by IDAIS (International Dialogues on AI Safety) in IDAIS-Venice, Sept 5, 2024

Uphold high standards of scientific and technological excellence

Rebellion is committed to scientific excellence as we advance the development, testing, and deployment of artificial intelligence and other technologies. We believe AI should be interpretable and explainable to its human users. Because AI models are evolving so quickly, we will always seek out best practices as they evolve in artificial intelligence and throughout the software and defense industries. We strive for scientific rigor such that all scientific investigations, research, and practices are conducted with the highest level of precision and accuracy. This includes strict adherence to protocols, accurate data collection and analysis, and careful interpretation of results. Our team communicates research findings and methodologies clearly and openly in a manner that allows for the replication of results by independent researchers. Black box systems are antithetical to these standards. We build our technology to be intuitive and explainable in simple terms. In addition, we ensure the safety and security of the research, development, and production environments.

Published by Rebelliondefense in AI Ethical Principles, January 2023

Ensure fairness

We are fully determined to combat all types of reducible bias in data collection, derivation, and analysis. Our teams are trained to identify and challenge biases in our own decision making and in the data we use to train and test our models. All data sets are evaluated for fairness, possible inclusion of sensitive data and implicitly discriminatory collection models. We execute statistical tests to look for imbalance and skewed datasets and include methods to augment datasets to combat these statistical biases. We pressure test our decisions by performing peer review of model design, execution, and outcomes; this includes peer review of model training and performance metrics. Before a model is graduated from one development stage to the next, a review is conducted with required acceptance criteria. This review includes in sample and out of sample testing to mitigate the risk of model overfitting to the training data, and biased outcomes in production. We subscribe to the principles laid out in the Department of Defense’s AI ethical principles: that AI technologies should be responsible, equitable, traceable, reliable, and governable.

Published by Rebelliondefense in AI Ethical Principles, January 2023

· Build and Validate:

1 To develop a sound and functional AI system that is both reliable and safe, the AI system’s technical construct should be accompanied by a comprehensive methodology to test the quality of the predictive data based systems and models according to standard policies and protocols. 2 To ensure the technical robustness of an AI system rigorous testing, validation, and re assessment as well as the integration of adequate mechanisms of oversight and controls into its development is required. System integration test sign off should be done with relevant stakeholders to minimize risks and liability. 3 Automated AI systems involving scenarios where decisions are understood to have an impact that is irreversible or difficult to reverse or may involve life and death decisions should trigger human oversight and final determination. Furthermore, AI systems should not be used for social scoring or mass surveillance purposes.

Published by SDAIA in AI Ethics Principles, Sept 14, 2022