Notice & Comment

Abdicated Judgment: AI Tools and the Future of Reasoned Decision-Making in Federal Procurement, by Jessica Tillipman

This post is the ninth contribution to Notice & Comment’s symposium on AI and the APAFor other posts in the series, click here.

Federal agencies are rapidly expanding their use of artificial intelligence (AI) in government procurement. Much of the public discussion has centered on relatively narrow applications, such as tools that support market research or flag outdated contract clauses. When used to summarize or organize procurement-related information, these tools may pose manageable risks. More complex challenges arise when they extend into discretionary functions, including core evaluative tasks, that federal procurement doctrine presumes a human decision-maker will perform.

It is easy to imagine the next generation of AI procurement tools: systems that rate technical proposals, identify proposal “strengths” and “weaknesses,” and draft complete evaluation narratives, effectively doing the work of an evaluation team. Even if these systems are framed as advisory, they can shape the analysis that underpins the contract award (the “source selection decision”). Agencies are moving toward systems that could soon become the central driver of proposal evaluation, even when no one in the government can later reconstruct or defend why the awardee was chosen.

The lure of these tools is obvious. Many core procurement tasks are tedious and time-consuming, and agencies face severe capacity constraints. This pressure to adopt efficiency-enhancing tools is understandable, but these rapid changes are outpacing the legal framework that governs federal decision-making. The Administrative Procedure Act (APA) presupposes a human decision-maker who deliberates, understands the basis for the decision, and can explain that reasoning in the administrative record. That same arbitrary-and-capricious standard governs bid protest review of federal contract awards. See 28 U.S.C. § 1491(b)(4) (incorporating 5 U.S.C. § 706). Motor Vehicle Manufacturers Association v. State Farm Mutual Automobile Insurance Co., 463 U.S. 29 (1983) (State Farm), requires agencies to show that their decisions rest on a reasoned analysis of the facts before them. 

AI-enabled evaluation tools complicate that requirement. When agencies rely on AI systems whose conclusions cannot be meaningfully examined because the systems are opaque, technically complex, or contractually shielded, the government may be able to show what the tool produced, but not why those outputs reflect the reasoned judgment the APA requires. 

An Award Decision No One Can Defend

Consider a Source Selection Authority (SSA)—the official responsible for the final award decision—whose AI “assistant” processes proposals, rates them, drafts narratives, and recommends an award. Understaffed and on a tight deadline, the SSA reviews the tool’s fifty-page report, finds no apparent errors, and adopts it as the agency’s evaluation and source selection decision.

When a disappointed offeror protests (challenges) a best value tradeoff award at the U.S. Court of Federal Claims (COFC), the administrative record must reasonably explain how the agency weighed the technical factors against price, why the awardee’s proposal was determined to present the best value to the government, and how the evaluation followed the solicitation’s stated evaluation criteria. But when the agency relies on an opaque tool for evaluative judgments, it may be unable to provide an explanation consistent with the solicitation and supported by the administrative record. The AI tool’s architecture and training data are controlled by a vendor and often shielded by commercial licensing terms. The source selection authority (SSA) can recite the tool’s ratings, but without a transparent basis for the underlying judgments, the SSA cannot show how the solicitation’s best-value tradeoff was performed. The agency could mitigate that problem if the SSA independently documents the tradeoff and ties it to the solicitation’s evaluation criteria. In practice, however, time pressure can turn AI decision support into decision replacement, leaving the record with conclusions rather than a documented tradeoff analysis. The result is a record that risks failing State Farm’s demand for a reasoned explanation and frustrates meaningful judicial review.

Independent Judgment and the Administrative Record

When disappointed offerors challenge contract awards at the COFC, the court reviews whether the agency engaged in reasoned decision-making under the APA’s arbitrary-and-capricious standard. See 5 U.S.C. § 706(2)(A); 28 U.S.C. § 1491(b)(4). This standard requires agencies to “articulate a satisfactory explanation for [their] action including a ‘rational connection between the facts found and the choice made.’” State Farm, 463 U.S. at 43 (internal quotation omitted). 

The Federal Acquisition Regulation (FAR)—the primary rulebook for federal procurements—outlines agency responsibilities during the source selection process. For negotiated procurements, FAR Part 15 details the SSA’s responsibilities in making a source selection decision. Specifically, FAR 15.308 requires documented, independent judgment:

While the SSA may use reports and analyses prepared by others, the source selection decision shall represent the SSA’s independent judgment. The source selection decision shall be documented, and the documentation shall include the rationale for any business judgments and tradeoffs made or relied on by the SSA.

This framework assumes that the SSA genuinely deliberates, that the SSA’s reasoning is both articulable and documented, and that a reviewing court can find that reasoning in the record.

Bid protest decisions repeatedly emphasize that agencies must do more than simply “parrot back” the strengths and weaknesses of competing proposals. See, e.g., Serco, Inc. v. United States, 81 Fed. Cl. 463, 497 (2008). Courts permit SSAs to adopt analyses prepared by evaluation teams when making a source selection decision, but the decision must include an independent analysis or rationale that demonstrates the exercise of independent judgment. Information Sciences Corp. v. United States, 75 Fed. Cl. 406, 410 (2007).

The Documentation Paradox

AI tools can produce exactly what procurement law generally requires: detailed narratives, comparative matrices, and adjectival ratings that can be readily incorporated into evaluation reports the SSA uses in the source selection decision. Used properly, these tools serve as drafting and organizational aids that help evaluators organize voluminous records and identify tradeoffs that manual review might otherwise overlook.

Nevertheless, the same tools can invert the logic of reasoned decision-making. When evaluators or SSAs treat model-generated summaries or ratings as the judgment itself, without understanding how the system weights the underlying facts, the record appears more extensive yet is less meaningful. The apparent precision of tables and ratings obscures the absence of human judgment about which differences matter under the solicitation’s evaluation criteria. The danger is not a thin record but one that includes analysis no one can defend. In procurement, the inability to defend the evaluation can lead to sustained bid protests that delay performance, increase costs, and impede an agency’s ability to fulfill its mission.

Structural Pressures on Reasoned Judgment

These risks do not arise in a vacuum. Federal acquisition teams face intense resource and schedule pressures, often managing multiple complex competitions with limited support. In that environment, tools that promise to “do the evaluation” can feel like a lifeline, yet the same features that make these tools appealing can exacerbate the underlying accountability problem. 

Modern AI systems are often opaque by design and offer no meaningful explanation of how their outputs are produced. Even with greater insight into how the tool reaches its conclusions, such as descriptions of training data and configuration details, many acquisition professionals would lack the expertise to translate that information into the “rational connection” State Farm requires.

Commercial licensing terms add another layer of opacity. Commercial agreements often restrict access to training data, model documentation, and internal logs, making it difficult for agencies to reconstruct how an AI tool reached its conclusions and to translate that information into solicitation-specific reasoning. As Professor Cary Coglianese observes, this creates “nested opacity,” in which contractual choices compound technological limitations and obscure accountability. Agencies could demand greater transparency but rarely press contractors to provide it, even when it is essential for legal compliance. In this environment, agencies face a serious risk that their awards will be vulnerable to challenge, particularly when disappointed offerors argue that the record does not reveal a defensible tradeoff rationale.

Resource constraints create similar legal tensions. The acquisition workforce is smaller, more junior, and less experienced than in prior generations. Few acquisition professionals have deep AI expertise. Moreover, systems produce confident outputs that can encourage acquisition professionals to defer to automated recommendations without meaningful scrutiny. This excessive reliance amplifies automation bias, defined as “the tendency for an individual to over-rely on an automated system.” Under severe time pressure and with a dramatically reduced workforce, it is logical to accept seemingly polished evaluation narratives and ratings rather than build comparative analysis from scratch.

When opacity, complexity, and resource constraints converge, the exercise of independent judgment becomes increasingly unlikely. Agencies that deploy AI tools should therefore secure contractual rights that preserve the conditions for independent judgment: sufficient information to understand how the tool applies the solicitation’s evaluation criteria and the practical ability to question, and where necessary, override or disregard its outputs. 

Securing those rights is increasingly difficult. The Trump administration’s AI Action Plan adopts a deregulatory posture that prioritizes speed and innovation over new governance obligations, while Executive Order 14271, Ensuring Commercial, Cost-Effective Solutions in Federal Contracts, instructs agencies to procure commercially available products and services to the maximum extent practicable. Although OMB Memorandum M-25-22, Driving Efficient Acquisition of Artificial Intelligence, professes a commitment to responsible AI procurement, that commitment is undermined by the federal government’s largest civilian buying agency entering into below-market, enterprise-wide agreements with leading AI firms. These agreements are based on standard commercial terms and conditions that carry significant risks, including loss-leader pricing that encourages agency buy-in, vendor lock-in, and insufficient governance. In this environment, agencies often lack the leverage needed to ensure transparency, secure data rights, and maintain ongoing monitoring authority necessary to implement AI governance. The result is a policy regime that channels agencies toward commercial AI tools without guaranteeing the safeguards required for reasoned decision-making.

A Proposed Doctrinal Framework

An agency is at serious risk of failing to satisfy State Farm and FAR 15.308 when its use of AI prevents it from documenting how it applied the stated evaluation criteria and why those evaluations support the award decision. AI tools may assist with summarizing or organizing proposal information; however, problems arise when they supplant rather than support independent reasoning. A reviewing court is most likely to deem an award defensible when the record reflects that the agency meaningfully applied the stated evaluation factors and their relative importance, identifies which aspects of each proposal drove the ratings, and explains the tradeoffs that justified selecting one offeror over another.

This is not a significant doctrinal shift. Agencies have always relied on evaluation teams whose internal reasoning the SSA does not independently reconstruct. State Farm does not require disclosure of every cognitive step; it requires only that the record provide a rational,documented explanation. See, e.g., ATSC Aviation, LLC v. United States, 141 Fed. Cl. 670, 701 (2019) (“The Selection Authority does not need to duplicate the extensive analysis performed by the Evaluation Board, nor document every detail that forms the basis of the selection decision.”). 

The concern is not technical complexity alone but the combination of opacity, vendor control, and automation bias that distinguishes AI-assisted evaluation from traditional reliance on human experts. When agencies rely on evaluation teams, individual evaluators can be questioned about their reasoning, asked to clarify ambiguous judgments, and directed to reconsider conclusions that appear inconsistent with the solicitation criteria. Their judgments, however imperfect, remain accessible to the SSA and, if challenged, can be explained and defended. When key evaluative steps are embedded in a proprietary system governed by commercial licensing terms, the agency may lack both the contractual right and the technical capacity to examine how the system weighed proposal content against the stated evaluation factors. The risk is that the “rational connection” first appears in litigation, reverse-engineered to defend a result that no one inside the agency ever understood, let alone owned.

To be clear, this framework does not require courts to turn SSAs into technologists or to prohibit AI tools in source selections. Instead, it requires agencies to preserve the basic conditions for human judgment. The tool’s outputs should: (1) be presented as human-readable rationales, (2) align with the solicitation’s evaluation criteria, (3) link conclusions to the content of the proposals, and (4) enable evaluators to question, revise, supplement, or reject them in a way that reflects independent judgment rather than cosmetic editing. The source selection decision must still be based on the SSA’s independent, comparative judgment about whether and why an offeror’s technical capabilities justify any additional cost.

Vendors can build AI systems and license them under terms and conditions that support this level of transparency. What they cannot provide is the judgment itself. 

Jessica Tillipman is the Associate Dean for Government Procurement Law Studies at The George Washington University Law School.