Prepare Input Data:

1 Adequate steps and actions should be taken to measure the data sample’s quality, accuracy, suitability, and credibility when dealing with the data sets of an AI model. This is essential to ensure the accuracy of data interpretation by the AI system, the consistency of avoiding misleading measurements, as well as ensuring the relevance of the AI system’s outcomes to the purpose of the model. 2 It is crucial for the build and validate step to test how the system behaves under outlier events, extreme parameters, etc. In this step, stress test data should be prepared for extreme scenarios.
Principle: AI Ethics Principles, Sept 14, 2022

Published by SDAIA

Related Principles

· Plan and Design:

1 At the initial stages of setting out the purpose of the AI system, the design team shallcollaborate to pinpoint the objectives and how to reach them in an efficient and optimizedmanner. Planning the design of the AI system is an essential stage to translate the system’sintended goals and outcomes. During this phase, it is important to implement a fairness awaredesign that takes appropriate precautions across the AI system algorithm, processes, andmechanisms to prevent biases from having a discriminatory effect or lead to skewed andunwanted results or outcomes. 2 Fairness aware design should start at the beginning of the AI System Lifecycle with a collaborative effort from technical and non technical members to identify potential harm andbenefits, affected individuals and vulnerable groups and evaluate how they are impacted bythe results and whether the impact is justifiable given the general purpose of the AI system. 3 A fairness assessment of the AI system is crucial, and the metrics should be selected at this stage of the AI System Lifecycle. The metrics should be chosen based on the algorithm type (rule based, classification, regression, etc.), the effect of the decision (punitive, selective, etc.), and the harm and benefit on correctly and incorrectly predicted samples. 4 Sensitive personal data attributes relating to persons or groups which are systematically or historically disadvantaged should be identified and defined at this stage. The allowed threshold which makes the assessment fair or unfair should be defined. The fairness assessment metrics to be applied to sensitive features should be measured during future steps.

Published by SDAIA in AI Ethics Principles, Sept 14, 2022

· Prepare Input Data:

1 Following the best practice of responsible data acquisition, handling, classification, and management must be a priority to ensure that results and outcomes align with the AI system’s set goals and objectives. Effective data quality soundness and procurement begin by ensuring the integrity of the data source and data accuracy in representing all observations to avoid the systematic disadvantaging of under represented or advantaging over represented groups. The quantity and quality of the data sets should be sufficient and accurate to serve the purpose of the system. The sample size of the data collected or procured has a significant impact on the accuracy and fairness of the outputs of a trained model. 2 Sensitive personal data attributes which are defined in the plan and design phase should not be included in the model data not to feed the existing bias on them. Also, the proxies of the sensitive features should be analyzed and not included in the input data. In some cases, this may not be possible due to the accuracy or objective of the AI system. In this case, the justification of the usage of the sensitive personal data attributes or their proxies should be provided. 3 Causality based feature selection should be ensured. Selected features should be verified with business owners and non technical teams. 4 Automated decision support technologies present major risks of bias and unwanted application at the deployment phase, so it is critical to set out mechanisms to prevent harmful and discriminatory results at this phase.

Published by SDAIA in AI Ethics Principles, Sept 14, 2022

· Build and Validate:

1 At the build and validate stage of the AI System Lifecycle, it is essential to take into consideration implementation fairness as a common theme when building, testing, and implementing the AI system. Model building and feature selection will require engineers and designers to be aware that the choices made about grouping or separating and including or excluding features as well as more general judgments about the reliability and security of the total set of features may have significant consequences for vulnerable or protected groups. 2 During the selection of the champion model, the fairness metric assessment should be considered. The champion model fairness metrics should be within the defined threshold for the sensitive features. The optimization approach of fairness and performance metrics should be clearly set throughout this phase. The fairness assessment should be justified if the champion model does not pass the assessment. Deploy and Monitor: 1 Well defined mechanisms and protocols should be set in place when deploying the AI system to measure the fairness and performance of the outcomes and how it impacts individuals and communities. When analyzing the outcomes of the predictive model, it should be assessed if represented groups in the data sample receive benefits in equal or similar portions and if the AI system disproportionately harms specific members based on demographic differences to ensure outcome fairness. 2 The predefined fairness metrics should be monitored in production. If there is any deviation from the allowed threshold, it should be investigated whether there is a need to renew the model. 3 The overall harm and benefit of the system should be quantified and materialized on the sensitive groups.

Published by SDAIA in AI Ethics Principles, Sept 14, 2022

· Plan and Design:

1 When designing a transparent and trusted AI system, it is vital to ensure that stakeholders affected by AI systems are fully aware and informed of how outcomes are processed. They should further be given access to and an explanation of the rationale for decisions made by the AI technology in an understandable and contextual manner. Decisions should be traceable. AI system owners must define the level of transparency for different stakeholders on the technology based on data privacy, sensitivity, and authorization of the stakeholders. 2 The AI system should be designed to include an information section in the platform to give an overview of the AI model decisions as part of the overall transparency application of the technology. Information sharing as a sub principle should be adhered to with end users and stakeholders of the AI system upon request or open to the public, depending on the nature of the AI system and target market. The model should establish a process mechanism to log and address issues and complaints that arise to be able to resolve them in a transparent and explainable manner. Prepare Input Data: 1 The data sets and the processes that yield the AI system’s decision should be documented to the best possible standard to allow for traceability and an increase in transparency. 2 The data sets should be assessed in the context of their accuracy, suitability, validity, and source. This has a direct effect on the training and implementation of these systems since the criteria for the data’s organization, and structuring must be transparent and explainable in their acquisition and collection adhering to data privacy regulations and intellectual property standards and controls.

Published by SDAIA in AI Ethics Principles, Sept 14, 2022

· Prepare Input Data:

1 An important aspect of the Accountability and Responsibility principle during Prepare Input Data step in the AI System Lifecycle is data quality as it affects the outcome of the AI model and decisions accordingly. It is, therefore, important to do necessary data quality checks, clean data and ensure the integrity of the data in order to get accurate results and capture intended behavior in supervised and unsupervised models. 2 Data sets should be approved and signed off before commencing with developing the AI model. Furthermore, the data should be cleansed from societal biases. In parallel with the fairness principle, the sensitive features should not be included in the model data. In the event that sensitive features need to be included, the rationale or trade off behind the decision for such inclusion should be clearly explained. The data preparation process and data quality checks should be documented and validated by responsible parties. 3 The documentation of the process is necessary for auditing and risk mitigation. Data must be properly acquired, classified, processed, and accessible to ease human intervention and control at later stages when needed.

Published by SDAIA in AI Ethics Principles, Sept 14, 2022