7. Data Quality Obligation.

Institutions must establish data provenance, and assure quality and relevance for the data input into algorithms. [Explanatory Memorandum] The Data Quality Principle follows from the preceding obligation.
Principle: Universal Guidelines for Artificial Intelligence, Oct 23, 2018

Published by The Public Voice coalition, established by Electronic Privacy Information Center (EPIC)

Related Principles

Data Quality Principle

The Data Quality Principle follows from the preceding obligation.

Published by Center for AI and Digital Policy in Universal Guidelines for AI, Oct, 2018

· Prepare Input Data:

1 Following the best practice of responsible data acquisition, handling, classification, and management must be a priority to ensure that results and outcomes align with the AI system’s set goals and objectives. Effective data quality soundness and procurement begin by ensuring the integrity of the data source and data accuracy in representing all observations to avoid the systematic disadvantaging of under represented or advantaging over represented groups. The quantity and quality of the data sets should be sufficient and accurate to serve the purpose of the system. The sample size of the data collected or procured has a significant impact on the accuracy and fairness of the outputs of a trained model. 2 Sensitive personal data attributes which are defined in the plan and design phase should not be included in the model data not to feed the existing bias on them. Also, the proxies of the sensitive features should be analyzed and not included in the input data. In some cases, this may not be possible due to the accuracy or objective of the AI system. In this case, the justification of the usage of the sensitive personal data attributes or their proxies should be provided. 3 Causality based feature selection should be ensured. Selected features should be verified with business owners and non technical teams. 4 Automated decision support technologies present major risks of bias and unwanted application at the deployment phase, so it is critical to set out mechanisms to prevent harmful and discriminatory results at this phase.

Published by SDAIA in AI Ethics Principles, Sept 14, 2022

· Prepare Input Data:

1 The exercise of data procurement, management, and organization should uphold the legal frameworks and standards of data privacy. Data privacy and security protect information from a wide range of threats. 2 The confidentiality of data ensures that information is accessible only to those who are authorized to access the information and that there are specific controls that manage the delegation of authority. 3 Designers and engineers of the AI system must exhibit the appropriate levels of integrity to safeguard the accuracy and completeness of information and processing methods to ensure that the privacy and security legal framework and standards are followed. They should also ensure that the availability and storage of data are protected through suitable security database systems. 4 All processed data should be classified to ensure that it receives the appropriate level of protection in accordance with its sensitivity or security classification and that AI system developers and owners are aware of the classification or sensitivity of the information they are handling and the associated requirements to keep it secure. All data shall be classified in terms of business requirements, criticality, and sensitivity in order to prevent unauthorized disclosure or modification. Data classification should be conducted in a contextual manner that does not result in the inference of personal information. Furthermore, de identification mechanisms should be employed based on data classification as well as requirements relating to data protection laws. 5 Data backups and archiving actions should be taken in this stage to align with business continuity, disaster recovery and risk mitigation policies.

Published by SDAIA in AI Ethics Principles, Sept 14, 2022

· Prepare Input Data:

1 The processes and policies that govern data management should be followed when preparing the categorization and structuring of data that will feed into the AI system. 2 The data pertaining to the social and environmental topics should be accessible to the public data infrastructure and must clearly articulate the social benefit of the data presented.

Published by SDAIA in AI Ethics Principles, Sept 14, 2022

· Prepare Input Data:

1 An important aspect of the Accountability and Responsibility principle during Prepare Input Data step in the AI System Lifecycle is data quality as it affects the outcome of the AI model and decisions accordingly. It is, therefore, important to do necessary data quality checks, clean data and ensure the integrity of the data in order to get accurate results and capture intended behavior in supervised and unsupervised models. 2 Data sets should be approved and signed off before commencing with developing the AI model. Furthermore, the data should be cleansed from societal biases. In parallel with the fairness principle, the sensitive features should not be included in the model data. In the event that sensitive features need to be included, the rationale or trade off behind the decision for such inclusion should be clearly explained. The data preparation process and data quality checks should be documented and validated by responsible parties. 3 The documentation of the process is necessary for auditing and risk mitigation. Data must be properly acquired, classified, processed, and accessible to ease human intervention and control at later stages when needed.

Published by SDAIA in AI Ethics Principles, Sept 14, 2022