5. Data Provenance

A description of the way in which the training data was collected should be maintained by the builders of the algorithms, accompanied by an exploration of the potential biases induced by the human or algorithmic data gathering process. Public scrutiny of the data provides maximum opportunity for corrections. However, concerns over privacy, protecting trade secrets, or revelation of analytics that might allow malicious actors to game the system can justify restricting access to qualified and authorized individuals.
Principle: Principles for Algorithmic Transparency and Accountability, Jan 12, 2017

Published by ACM US Public Policy Council (USACM)

Related Principles

· 2. Data Governance

The quality of the data sets used is paramount for the performance of the trained machine learning solutions. Even if the data is handled in a privacy preserving way, there are requirements that have to be fulfilled in order to have high quality AI. The datasets gathered inevitably contain biases, and one has to be able to prune these away before engaging in training. This may also be done in the training itself by requiring a symmetric behaviour over known issues in the training set. In addition, it must be ensured that the proper division of the data which is being set into training, as well as validation and testing of those sets, is carefully conducted in order to achieve a realistic picture of the performance of the AI system. It must particularly be ensured that anonymisation of the data is done in a way that enables the division of the data into sets to make sure that a certain data – for instance, images from same persons – do not end up into both the training and test sets, as this would disqualify the latter. The integrity of the data gathering has to be ensured. Feeding malicious data into the system may change the behaviour of the AI solutions. This is especially important for self learning systems. It is therefore advisable to always keep record of the data that is fed to the AI systems. When data is gathered from human behaviour, it may contain misjudgement, errors and mistakes. In large enough data sets these will be diluted since correct actions usually overrun the errors, yet a trace of thereof remains in the data. To trust the data gathering process, it must be ensured that such data will not be used against the individuals who provided the data. Instead, the findings of bias should be used to look forward and lead to better processes and instructions – improving our decisions making and strengthening our institutions.

Published by The European Commission’s High-Level Expert Group on Artificial Intelligence in Draft Ethics Guidelines for Trustworthy AI, Dec 18, 2018

· 7. Respect for Privacy

Privacy and data protection must be guaranteed at all stages of the life cycle of the AI system. This includes all data provided by the user, but also all information generated about the user over the course of his or her interactions with the AI system (e.g. outputs that the AI system generated for specific users, how users responded to particular recommendations, etc.). Digital records of human behaviour can reveal highly sensitive data, not only in terms of preferences, but also regarding sexual orientation, age, gender, religious and political views. The person in control of such information could use this to his her advantage. Organisations must be mindful of how data is used and might impact users, and ensure full compliance with the GDPR as well as other applicable regulation dealing with privacy and data protection.

Published by The European Commission’s High-Level Expert Group on Artificial Intelligence in Draft Ethics Guidelines for Trustworthy AI, Dec 18, 2018

· 8. Robustness

Trustworthy AI requires that algorithms are secure, reliable as well as robust enough to deal with errors or inconsistencies during the design, development, execution, deployment and use phase of the AI system, and to adequately cope with erroneous outcomes. Reliability & Reproducibility. Trustworthiness requires that the accuracy of results can be confirmed and reproduced by independent evaluation. However, the complexity, non determinism and opacity of many AI systems, together with sensitivity to training model building conditions, can make it difficult to reproduce results. Currently there is an increased awareness within the AI research community that reproducibility is a critical requirement in the field. Reproducibility is essential to guarantee that results are consistent across different situations, computational frameworks and input data. The lack of reproducibility can lead to unintended discrimination in AI decisions. Accuracy. Accuracy pertains to an AI’s confidence and ability to correctly classify information into the correct categories, or its ability to make correct predictions, recommendations, or decisions based on data or models. An explicit and well formed development and evaluation process can support, mitigate and correct unintended risks. Resilience to Attack. AI systems, like all software systems, can include vulnerabilities that can allow them to be exploited by adversaries. Hacking is an important case of intentional harm, by which the system will purposefully follow a different course of action than its original purpose. If an AI system is attacked, the data as well as system behaviour can be changed, leading the system to make different decisions, or causing the system to shut down altogether. Systems and or data can also become corrupted, by malicious intention or by exposure to unexpected situations. Poor governance, by which it becomes possible to intentionally or unintentionally tamper with the data, or grant access to the algorithms to unauthorised entities, can also result in discrimination, erroneous decisions, or even physical harm. Fall back plan. A secure AI has safeguards that enable a fall back plan in case of problems with the AI system. In some cases this can mean that the AI system switches from statistical to rule based procedure, in other cases it means that the system asks for a human operator before continuing the action.

Published by The European Commission’s High-Level Expert Group on Artificial Intelligence in Draft Ethics Guidelines for Trustworthy AI, Dec 18, 2018

3. Artificial intelligence should not be used to diminish the data rights or privacy of individuals, families or communities.

Many of the hopes and the fears presently associated with AI are out of step with reality. The public and policymakers alike have a responsibility to understand the capabilities and limitations of this technology as it becomes an increasing part of our daily lives. This will require an awareness of when and where this technology is being deployed. Access to large quantities of data is one of the factors fuelling the current AI boom. The ways in which data is gathered and accessed need to be reconsidered, so that innovative companies, big and small, have fair and reasonable access to data, while citizens and consumers can also protect their privacy and personal agency in this changing world. Large companies which have control over vast quantities of data must be prevented from becoming overly powerful within this landscape. We call on the Government, with the Competition and Markets Authority, to review proactively the use and potential monopolisation of data by big technology companies operating in the UK.

Published by House of Lords, Select Committee on Artificial Intelligence in AI Code, Apr 16, 2018

6. Human Centricity and Well being

a. To aim for an equitable distribution of the benefits of data practices and avoid data practices that disproportionately disadvantage vulnerable groups. b. To aim to create the greatest possible benefit from the use of data and advanced modelling techniques. c. Engage in data practices that encourage the practice of virtues that contribute to human flourishing, human dignity and human autonomy. d. To give weight to the considered judgements of people or communities affected by data practices and to be aligned with the values and ethical principles of the people or communities affected. e. To make decisions that should cause no foreseeable harm to the individual, or should at least minimise such harm (in necessary circumstances, when weighed against the greater good). f. To allow users to maintain control over the data being used, the context such data is being used in and the ability to modify that use and context. g. To ensure that the overall well being of the user should be central to the AI system’s functionality.

Published by Personal Data Protection Commission (PDPC), Singapore in A compilation of existing AI ethical principles (Annex A), Jan 21, 2020