· Prepare Input Data:

1 The processes and policies that govern data management should be followed when preparing the categorization and structuring of data that will feed into the AI system. 2 The data pertaining to the social and environmental topics should be accessible to the public data infrastructure and must clearly articulate the social benefit of the data presented.
Principle: AI Ethics Principles, Sept 14, 2022

Published by SDAIA

Related Principles

· Prepare Input Data:

1 Following the best practice of responsible data acquisition, handling, classification, and management must be a priority to ensure that results and outcomes align with the AI system’s set goals and objectives. Effective data quality soundness and procurement begin by ensuring the integrity of the data source and data accuracy in representing all observations to avoid the systematic disadvantaging of under represented or advantaging over represented groups. The quantity and quality of the data sets should be sufficient and accurate to serve the purpose of the system. The sample size of the data collected or procured has a significant impact on the accuracy and fairness of the outputs of a trained model. 2 Sensitive personal data attributes which are defined in the plan and design phase should not be included in the model data not to feed the existing bias on them. Also, the proxies of the sensitive features should be analyzed and not included in the input data. In some cases, this may not be possible due to the accuracy or objective of the AI system. In this case, the justification of the usage of the sensitive personal data attributes or their proxies should be provided. 3 Causality based feature selection should be ensured. Selected features should be verified with business owners and non technical teams. 4 Automated decision support technologies present major risks of bias and unwanted application at the deployment phase, so it is critical to set out mechanisms to prevent harmful and discriminatory results at this phase.

Published by SDAIA in AI Ethics Principles, Sept 14, 2022

Plan and Design:

1 The planning and design of the AI system and its associated algorithm must be configured and modelled in a manner such that there is respect for the protection of the privacy of individuals, personal data is not misused and exploited, and the decision criteria of the automated technology is not based on personally identifying characteristics or information. 2 The use of personal information should be limited only to that which is necessary for the proper functioning of the system. The design of AI systems resulting in the profiling of individuals or communities may only occur if approved by Chief Compliance and Ethics Officer, Compliance Officer or in compliance with a code of ethics and conduct developed by a national regulatory authority for the specific sector or industry. 3 The security and protection blueprint of the AI system, including the data to be processed and the algorithm to be used, should be aligned to best practices to be able to withstand cyberattacks and data breach attempts. 4 Privacy and security legal frameworks and standards should be followed and customized for the particular use case or organization. 5 An important aspect of privacy and security is data architecture; consequently, data classification and profiling should be planned to define the levels of protection and usage of personal data. 6 Security mechanisms for de identification should be planned for the sensitive or personal data in the system. Furthermore, read write update actions should be authorized for the relevant groups.

Published by SDAIA in AI Ethics Principles, Sept 14, 2022

· Prepare Input Data:

1 The exercise of data procurement, management, and organization should uphold the legal frameworks and standards of data privacy. Data privacy and security protect information from a wide range of threats. 2 The confidentiality of data ensures that information is accessible only to those who are authorized to access the information and that there are specific controls that manage the delegation of authority. 3 Designers and engineers of the AI system must exhibit the appropriate levels of integrity to safeguard the accuracy and completeness of information and processing methods to ensure that the privacy and security legal framework and standards are followed. They should also ensure that the availability and storage of data are protected through suitable security database systems. 4 All processed data should be classified to ensure that it receives the appropriate level of protection in accordance with its sensitivity or security classification and that AI system developers and owners are aware of the classification or sensitivity of the information they are handling and the associated requirements to keep it secure. All data shall be classified in terms of business requirements, criticality, and sensitivity in order to prevent unauthorized disclosure or modification. Data classification should be conducted in a contextual manner that does not result in the inference of personal information. Furthermore, de identification mechanisms should be employed based on data classification as well as requirements relating to data protection laws. 5 Data backups and archiving actions should be taken in this stage to align with business continuity, disaster recovery and risk mitigation policies.

Published by SDAIA in AI Ethics Principles, Sept 14, 2022

· Plan and Design:

1 When designing a transparent and trusted AI system, it is vital to ensure that stakeholders affected by AI systems are fully aware and informed of how outcomes are processed. They should further be given access to and an explanation of the rationale for decisions made by the AI technology in an understandable and contextual manner. Decisions should be traceable. AI system owners must define the level of transparency for different stakeholders on the technology based on data privacy, sensitivity, and authorization of the stakeholders. 2 The AI system should be designed to include an information section in the platform to give an overview of the AI model decisions as part of the overall transparency application of the technology. Information sharing as a sub principle should be adhered to with end users and stakeholders of the AI system upon request or open to the public, depending on the nature of the AI system and target market. The model should establish a process mechanism to log and address issues and complaints that arise to be able to resolve them in a transparent and explainable manner. Prepare Input Data: 1 The data sets and the processes that yield the AI system’s decision should be documented to the best possible standard to allow for traceability and an increase in transparency. 2 The data sets should be assessed in the context of their accuracy, suitability, validity, and source. This has a direct effect on the training and implementation of these systems since the criteria for the data’s organization, and structuring must be transparent and explainable in their acquisition and collection adhering to data privacy regulations and intellectual property standards and controls.

Published by SDAIA in AI Ethics Principles, Sept 14, 2022

· Prepare Input Data:

1 An important aspect of the Accountability and Responsibility principle during Prepare Input Data step in the AI System Lifecycle is data quality as it affects the outcome of the AI model and decisions accordingly. It is, therefore, important to do necessary data quality checks, clean data and ensure the integrity of the data in order to get accurate results and capture intended behavior in supervised and unsupervised models. 2 Data sets should be approved and signed off before commencing with developing the AI model. Furthermore, the data should be cleansed from societal biases. In parallel with the fairness principle, the sensitive features should not be included in the model data. In the event that sensitive features need to be included, the rationale or trade off behind the decision for such inclusion should be clearly explained. The data preparation process and data quality checks should be documented and validated by responsible parties. 3 The documentation of the process is necessary for auditing and risk mitigation. Data must be properly acquired, classified, processed, and accessible to ease human intervention and control at later stages when needed.

Published by SDAIA in AI Ethics Principles, Sept 14, 2022