Artificial Intelligence Principles                                                                           Released: January, 2022

The Association of Test Publishers (ATP), the international trade organization representing the testing/assessment industry, acknowledges the life-enhancing potential of Artificial Intelligence (AI) when utilized in an appropriate fashion, while equally recognizing that the inappropriate application of AI to testing scenarios is capable of resulting in bias or discriminatory effects on individual test takers.  While international regulation of AI is under active consideration, almost no final laws/regulations are in place to provide the industry with guiding benchmarks for these principles.  Nevertheless, to assist ATP members in achieving accountability in their utilization of AI systems in testing scenarios, the Association has developed five (5) principles for AI development and adoption, which taken together, create a framework for constructive, responsible  uses of AI systems.  Ultimately, the ATP encourages every testing organization to use AI systems responsibly in an ethical and trustworthy manner.  

The ATP acknowledges the excellent work of the World Health Organization (WHO), the Organization for Economic Co-operation and Development (OECD), the European Commission, and many other organizations, academic institutions, and nation states who are putting forth statements of principle and proposed regulations for the fair and equitable use of AI in process and practice.  The ATP intends to continue to update and maintain the testing industry’s position as new material becomes available, and as an international regulatory consensus emerges. 

To put these Principles into useful context for the testing industry, it is necessary to distinguish between what is AI and automated systems which do not rise to the level of AI.  As Director of Google Research, Dr. Peter Norvig, has noted, “AI is all about figuring out what to do when you don't know what to do. Regular programming is about writing instructions for the computer to do what you want it to do, when you do know what you want it to do.  AI is for when you don't.[1]  Thus, traditional/conventional software that merely automates human decisions, especially in adopting pre-determined rules for item development, test delivery, and scoring, should not generally be considered AI.[2]   

Based on that key distinction, an AI system is one that perceives its environment and takes actions through “learning, reasoning, or modeling” of data that maximize its chances of success.[3] Thus, AI adapts to unforeseen circumstances by evaluating potential actions. The ability to evaluate action is what differentiates AI from conventional computing.  For example, if one is programming a car to drive from A to B, the "conventional computing way" is to program a set of instructions (e.g., "turn left, go forward for five blocks, then turn left again to end up at the fourth house on the left"), which only works for one specific route from A to B.  By comparison, the "AI way" is to program actions (e.g., "turn left", "go forward", "turn right"), along with a utility (e.g., "what is the distance to the destination?"), which allows the AI system to adapt a route itself by analyzing if it will be closer to the destination after each action; the same AI system may also enable these actions to be taken safely by programming “stop” and “slow down” functions so that the car avoids hitting other cars/pedestrians/objects.   

Consequently, these Principles especially focus on Machine learning (ML), a major category of AI algorithms, which enables the AI system to improve through experience. The process of “learning” happens when the ML system revaluates its understandings of its own environment by minimizing the discrepancies between its output and the data it knows as the “ground truth.”  For example, when considering the use of AI to implement an email spam filter, one would first train the ML system to extract facts from emails (e.g., the sender of the email, the number of recipients, the subject line, the content, attachments). In application, then, the ML system would find the optimal combinations of known facts that allow it to mark emails as spam in a way that produces results that are as similar as possible to the original labels in the training data.

On the other hand, automated decision-making is generally NOT AI, because the system Is merely a computer program automating a human function/set of functions using a predetermined algorithm.   For example, in scoring a test, the automated system is built to use the scoring key exactly the same way a human scorer would use it; there is no learning, nor any adaptation of facts to reach the desired outcomes.     

These high-level Principles are intended to encourage accountability and prudent self-governance by individual testing organizations. They also serve to discourage the carte blanche deployment of AI systems and technologies that lack rigorous validation, offer vaguely-defined metrics for detecting harmful effect, or provide insufficient documentation critical for third-party system audits and post-deployment monitoring.

With this definition, scope, and illustrations in mind, then, the ATP AI guidance consists of the following Principles: 

  • Transparency.  The use of AI techniques at any point in the test development, delivery, and administration/scoring lifecycle (including determining test taker integrity), must be openly disclosed to all stakeholders, including test takers, and, where appropriate, the AI techniques should be capable of inspection by knowledgeable third parties, whether for legislative, regulatory, or scientific purposes.[4] In this context, a testing organization is accountable for the accuracy of its disclosure(s) and such transparent disclosure should be designed to explain the use of AI in order to establish trust with its customers/stakeholders, as well as compliance with applicable privacy laws/regulations. 
  • Human-In-the-Loop (HIL).  The use of AI techniques for test development, administration/delivery, integrity, and scoring is a means to support the holistic evaluation of a scenario, and is not a replacement for trained, qualified, or licensed individuals to arrive at an outcome.  AI techniques are able to provide predictive insight and prescriptive actions (next best action, etc.), but should not serve in an autonomous fashion based on prescriptive conclusions; there should always be reliance on, or access to, human-in-the-middle input at one or more key points in the testing process as a means of challenging inaccuracy, ensuring fairness, and preventing bias/discrimination in the AI system.                  
  • Balanced Utilization.  A testing organization that includes AI techniques in its implementation process[5], should consider carefully whether there is evidence of bias/discrimination which must be ameliorated or minimized.  The organization should equally consider whether and to what extent it is possible to offset that evidence by adopting procedures to allow the participant to “opt out” and/or if it is possible to provide an alternative method of testing or test delivery.  Similarly, the testing organization should consider whether and to what extent any AI techniques are truly data-driven, especially using test takers’ personal data, to reach decisions.  It is through such careful examination and balancing mechanisms that the testing organization is able to evaluate and manage any risks of using AI. 
  • Fair and Unbiased.  AI techniques utilized in test processing (including development, delivery, scoring) must be universally available to all parties absent discrimination in participation or in terms of results.  Where a test is intentionally designed to measure some skill which may be discriminatory[6], it would be inappropriate for an AI system to amplify or introduce unintended bias or discrimination.  Critically, however, testing organizations should be clear that this AI Principle is not equivalent to psychometric principles of validity, reliability, and fairness, i.e., the Standards for Educational and Psychological Testing (2014), applicable to every assessment -- while the terms sound or appear to be similar and their effects on testing outcomes are both important – this Principle deals exclusively with AI systems and must not be confused or comingled with psychometrics.  
    • Responsible Custodians.  A testing organization must act responsibly to assure that AI techniques utilized within the testing process operate as part of a holistic solution, in a manner that is documented through appropriate research to establish its fairness, as well as assuring that it is secure and auditable. Moreover, the AI system’s consumption and retention of data, including reliance on personal data of test takers, must align with jurisdictional regulations and statutes.  Data retention and processing must be continuously assessed for risks and compliance and remediation.  Additionally, responsible use of an AI system by the testing organization requires remediation and re-documentation if bias/discrimination is detected prior to use, or subsequent to its introduction.  Responsibility also requires the developer of an AI system to cooperate with entities that implement the AI system to ensure that both organizations have access to relevant information about the AI system in making decisions about its use.

CONCLUSION

The ATP urges every testing organization to integrate these Principles into the planning, development, and deployment phases of the technical lifecycle of AI in testing.  Aligning every AI system to these Principles will leverage a process that is reliable, replicable, and scalable across a variety of testing programs.  Equally important, reliance on these Principles will provide a testing organization with information from which to comply with any future AI regulations that are eventually adopted.


[1]Talati, A. (2018, September 12). CS 6601 Artificial Intelligence. Retrieved from Subtitles To Transcripts: https://subtitlestotranscript.wordpress.com/2018/09/12/cs-6601-artificial-intelligence/

[2]   Council of the European Union, Interinstitutional File: 2021/0106(COD), Doc. No.  8115/20, Presidency compromise text to Proposal for a Regulation of the European Parliament and of the Council laying down harmonised rules on artificial intelligence (Artificial Intelligence Act) and amending certain Union legislative acts (Nov. 29, 2021) (hereinafter the :Compromise Text”).

[3] Id.

[4] Generally, AI models and techniques can be implemented in two fashions: either deterministic, where the implementation does not change based on subsequent events, or dynamic, where the implementation continues to self-modify through subsequent events.  

[5] A test method that has been developed utilizing AI techniques, but deployed in a predictive and deterministic fashion quite probably does not have an alternative method that can be implemented for an “opt out” scenario.  Moreover, if the AI system does not employ personal data, the privacy implications for test takers may be minimal or even non-existent.

[6] A test may be discriminatory by design.  For example a test that is dependent on visual stimuli would not be available to a visually impaired individual and no alternative may be available, or this testing stream is not accessible to that individual by design.