Notice & Comment

Toward Minimum Administrative Law Standards for Agency Usage of AI, by Jordan Ascher & John Lewis

This post is the third contribution to Notice & Comment’s symposium on AI and the APA. For other posts in the series, click here.

Much of the emerging thinking about the relationship between administrative law and generative artificial intelligence is premised, expressly or implicitly, on the assumption that AI systems might come to play a leading role in shaping and explaining administrative action. It is easy to see why. One of the principal features of the Administrative Procedure Act is that it forces policymakers to offer cogent, written accounts of their reasoning, a task that can often be complicated and time-consuming. But even at their current stage of development, large language models are capable of quickly generating high-verisimilitude prose, even on technical matters.

If the prospect of agencies setting machines loose to generate and justify regulatory proposals once seemed far-fetched, it no longer does. Over the summer, the Washington Post obtained a proposal by the U.S. DOGE Service to use AI to facilitate the rescission of half of all federal regulations by January 2026. DOGE touted that AI would revolutionize the rulemaking process, saving “93% of Man Hours” and “[a]utomat[ing]” research, writing, and analysis of public comments.

The fate of the proposal—and DOGE itself—is unknown. The White House equivocated at the time, and the avalanche of regulatory actions the presentation anticipated would be submitted to the Office of Information and Regulatory Affairs for review this fall has yet to materialize. And yet the administration continues to signal that it hopes to use artificial intelligence to accelerate the rulemaking process. More recent reporting, for instance, indicates that senior officials at the Department of Transportation are planning to use Google Gemini to draft proposed rules “in a matter of minutes or even seconds.”

These proposals bring within the realm of possibility a maximalist vision of AI’s role in the rulemaking process: an LLM identifying areas for regulatory action, deciding what action to take, offering justifications for that action, and rebutting public comments—all subject to only the most cursory human review and rubber-stamping.

As we at Governing for Impact have explained, agencies acting in this way would likely face significant legal obstacles. At the very least, rules produced by LLMs, even if judged on their own terms, might be particularly vulnerable to APA challenge. More fundamentally, the APA and the cases interpreting it are plausibly read to require certain forms of substantive human involvement in the rulemaking process, which would preclude agencies from entirely outsourcing their work to AI.

Reasoned Decisionmaking & Human Involvement

As readers of this forum know well, the APA directs courts to “set aside” agency action that is “arbitrary” and “capricious.” And LLMs are, at present, subject to limitations and prone to systemic errors. For instance, LLMs have been found to “hallucinate” false information, a problem that has persisted even as technology has advanced in other respects. They might act sycophantically, validating or agreeing with even objectively incorrect user prompts. It is well known that LLMs frequently replicate biases or errors present in their training data. LLMs also have limited “context windows,” a term that broadly refers to the amount of text or information they can consider at one time. They may thus struggle to accurately process long documents, a problem of particular concern in rulemaking, which often requires analyzing complicated and extensive agency records and lengthy agency publications.

These issues mirror the classic categories of errors that the Supreme Court identified as arbitrary and capricious in State Farm, such as when an agency “relie[s] on factors which Congress has not intended it to consider, entirely fail[s] to consider an important aspect of the problem, or offer[s] an explanation for its decision that runs counter to the evidence before the agency, or is so implausible that it could not be ascribed to a difference in view or the product of agency expertise.” An LLM incapable of faithfully reviewing an administrative record might well produce a result that misstates or disregards evidence. Prompting or design may lead an LLM to ignore or downplay important considerations. Hallucinated statements in regulatory documents may be unreasonable. Thus, to the extent agencies outsource their rulemaking to AI, they increase their exposure to arbitrary and capricious claims.

These reliability concerns highlight one way in which the APA requires substantive human involvement in the rulemaking process: an agency that wishes to rely on AI must explain and justify that methodological choice. This requirement is not new. Agencies have long been required to explain their methods when using mathematical models to inform rules. In that context, agencies bear an “affirmative burden” to “‘explain the assumptions and methodology used in preparing’ any model used and ‘provide a full analytical defense’ of any challenged aspects.” Agencies must also be “conscious of the limits of the model.” It follows that an agency using an LLM in rulemaking must reasonably explain, among other things, how it chose and developed its model, how it prompted the model and validated its outputs, and why it views those results as reliable. Rubber-stamping the output of a tool known to be prone to error without this explanation would be arbitrary and capricious.

That is not the only way in which administrative law might be understood to require substantive human involvement. Agencies must give their reasons for acting: “the grounds upon which the administrative agency acted must [must] be clearly disclosed and adequately sustained.” Those reasons are arrived at through a deliberative process. The APA itself permits agencies to act only “[a]fter consideration of the relevant matter presented.” And the Supreme Court has read the APA to require agencies to “examine the relevant data,” “articulate a satisfactory explanation,” and base decisions “on a consideration of the relevant factors.” Congress, in imposing these requirements, expected that agencies would actually reason through policy problems, a process that the D.C. Circuit has said “ensures thoughtful consideration of the various issues raised.” An agency that simply rubber-stamps an LLM-chosen course of action—by, say, allocating a mere 12 minutes for a human to review an AI-generated deregulatory proposal, as DOGE proposed—has abdicated its duties to deliberate and explain. It certainly risks a finding that it has offered a “contrived” reason to support a predetermined course of action.

The APA’s notice-and-comment procedures, as well as its reasoned decisionmaking requirement, obligate agencies to consider and address significant public comments. Agencies, of course, have broad discretion in how they fulfill this task, but that discretion is not unlimited. For instance, the D.C. Circuit has said that “in informal rulemaking employing notice-and-comment procedures, dependence on severely skewed staff summaries may breach the decisionmaker’s statutory duty to accord ‘consideration’ to relevant comments submitted for the record by interested parties.” While the contours of this rule are far from clear, there is certainly the risk that an agency would violate it by tasking an LLM with responding to public comments and then rubber-stamping the resulting analysis. After all, LLMs at present may be prone to generating “severely skewed” assessments of and responses to comments. It might well violate the APA for an agency to rely on an LLM’s work without independent human effort to review and consider comments (and .01 seconds of review per comment, as called for by DOGE, might not be enough).

Finally, agency personnel likely have at least some duty not to prejudge rulemakings. As a matter of due process, a decisionmaker can be disqualified from an administrative process “when there has been a clear and convincing showing that the [official] has an unalterably closed mind on matters critical to the disposition of a proceeding.” That is a demanding standard that has (to our knowledge) never been met, and the Supreme Court has declined to read a general “open-mindedness test” into the APA. But the introduction of AI might change the analysis. Depending on how an AI system is prompted and the degree to which it exhibits sycophancy, it may be literally incapable of challenging the premises of its prompts, admitting that an adverse comment raises an important point, or flagging an idea for further consideration up the chain. In other words, whereas even opinionated humans “remain[] free, both in theory and reality, to change [their] mind[s],” an AI system’s “mind” can be closed definitively at the push of a button.

These requirements are consistent with how administrative law has long regulated agencies’ use of advanced technology in their decisionmaking. They reflect what the D.C. Circuit has described as a “safety valve[]” in the use of “sophisticated methodology”: “the insistence that ultimate responsibility for the policy decision remains with the agency rather than the computer.”

Caveats and Open Questions

This analysis is provisional. For one thing, these minimum standards do not necessarily reflect the full set of legal and policy safeguards that ought to govern agency AI usage. Substantive human involvement should be a floor, not a ceiling.

Moreover, the rubber-stamping model of AI-driven rulemaking, as arguably advanced by the DOGE and Department of Transportation proposals, may well prove unrealistic. Indeed, the former general counsel of DOGE has said in this publication that “agencies still need human lawyers and human policy experts to do substantial work after capitalizing on all available AI tools.” And while the EPA has announced plans to “use artificial intelligence to help sort and categorize public comments,” it maintains that “[h]uman supervision and input are still required throughout comment processing.” Such human involvement would change the analysis.

Nevertheless, the rubber-stamping model is useful as a device by which to tease out the basic principles that might govern agency use of AI. Whether and to what extent the minimum standards we outline apply with respect to less aggressive deployments of AI in rulemaking we leave for another day. That line-drawing question will likely be the focus of litigation, regulation, and perhaps legislation.

Jordan Ascher is Policy Counsel at Governing for Impact, where John Lewis is Deputy Legal Director.

A blog from the Yale Journal on Regulation and ABA Section of Administrative Law & Regulatory Practice.

Made possible in part by the support of Davis Polk & Wardwell LLP