Notice & Comment

Ticking the Boxes: AI and the Notice-and-Comment Process, by Tara Aida

This post is the tenth contribution to Notice & Comment’s symposium on AI and the APAFor other posts in the series, click here.

When DOGE announced its “AI Deregulation Decision Tool” last summer, concerns about widespread use of AI by the federal government suddenly seemed pressing.  By automating the most labor-intensive aspects of the rulemaking process, DOGE claimed the tool could help roll back 50% of all federal regulations, those allegedly not required by statute.  

With DOGE officially disbanding just a few months ago, it may be tempting to dismiss its summer announcement as nothing more than a pie-in-the-sky marketing ploy.  But taking the proposal seriously might be a useful way to explore how agencies should (or should not) use AI going forward.  That federal agencies will use AI in the rulemaking process in some way seems inevitable; indeed, recent reporting suggests that the Department of Transportation is already preparing to use AI to draft regulations.  Moreover, to many, AI tools might seem like an overdue solution to a longstanding problem; for years, academics have criticized the Administrative Procedure Act (APA) rulemaking process as being overly burdensome and ineffective.1  And more recently, critiques of “over-proceduralization” have even gained mainstream popularity thanks to books like Ezra Klein and Derek Thompson’s Abundance

The general argument in favor of AI is familiar: if AI can automate the most menial, time-consuming aspects of the rulemaking process, then human decisionmakers at agencies should have more time to focus on the substance of regulations and get more done each year (whether issuing or revoking rules).  In other words, agencies should become more responsive to the fast-moving, complex industries they’re meant to regulate. 

But using AI to fulfill the APA’s procedural requirements might also fail to achieve the underlying goals of the statute.  For many federal agencies, Section 553 of the APA establishes the key procedural requirements for rulemaking.  At a high level, agencies must (1) provide notice of the proposed rule, (2) field public comments in response to the proposal, and (3) file a notice of the final rule adopted, which also responds to all significant or material comments submitted to the agency.  Importantly (and somewhat obviously), § 553 was drafted and has been largely interpreted by courts in a pre-ChatGPT world where only humans could carry out such requirements.  Thus, to the extent that § 553 attempts to create a process that can improve the substance of regulations, it does so with specifically human intelligence in mind.  If agencies over-rely on AI to carry out these procedural tasks, they threaten to undermine § 553’s goal of improving the quality of final rules.

§ 553’s tailoring to human intelligence can be seen, for example, in the requirement that agencies field comments to their proposed rules and provide meaningful, written responses to all significant ones.  Conceptually, this requirement aligns well with Socrates’s theory of human intelligence, that the best way to solve a difficult, knotty problem is to discuss it with others who may disagree, to pressure test one’s beliefs, and to adjust them when flaws are exposed.  Concretely, this requirement also aligns with common sense: anyone who’s been a teacher has likely had the humbling experience of explaining a concept you thought you fully grasped, only to discover a gap in your own understanding mid-conversation.2

For these reasons, it’s at least plausible that the quality of agency rules should improve when humans carry out the work to comply with § 553.  But what if an agency instead relied heavily on AI to comply with § 553? Suppose, for example, that DOGE moved forward with its AI tool and that the U.S. Department of Housing and Urban Development (HUD) had used it to: (1) draft notice of a proposed rule revoking a housing regulation, (2) analyze all public comments submitted in response, and (3) draft a notice announcing the final rule and responding to all major comments.  For good measure, and on DOGE’s word, let’s also assume that a human reviewed each draft before publishing it to the Federal Register.  How effective would § 553 be at improving the final rule adopted by HUD?

From the public’s perspective, the rulemaking process might seem—if anything—even more transparent and responsive than usual.  For example, HUD might be able to provide more individualized responses to more comments than usual by using AI to draft its replies; or it might provide an exceptionally detailed notice of the proposed rulemaking.  Furthermore, it’s possible (perhaps even likely) that the agency’s heavy use of AI could go completely unnoticed, given the sophistication of generative LLMs and their ability to mimic tone and style.  In this way, any additional detail provided in the notices would be attributed to greater attention and care from agency decisionmakers. 

In reality, however, the process could be quite superficial.  While HUD may have given the impression of “perfect” compliance with § 553, in reality, it’s possible that no one at the agency read even the best comments in their entirety, truly reflected on them, conducted any follow-up investigation, or discussed their impact on the proposed rule with others.  Yes, at least one person must have reviewed the AI-drafted notice summarizing and responding to each comment.  But this seems a far cry from really engaging with the substance of the comments, as required by the APA (this is especially true depending on the specific AI prompt used to generate the draft, see infra).  In sum, in a world where LLMs have suddenly made reading, summarizing, and drafting quite “cheap,” § 553 might miss the mark by treating the written word as a proxy for intellectual engagement.  

APA critics might counter that even when humans carry out § 553’s procedural requirements, they don’t necessarily engage deeply with public critique.  Some scholars argue, for example, that agencies become extremely “dug in” and attached to their proposed rules early in the rulemaking process.  Having put in a significant amount of work long before public commenting begins, agencies can fall prey to the “sunk cost” logical fallacy.  Consequently, even human decisionmakers may not sincerely consider public comments or adjust their regulations in response to these comments. 

But even if this is true—which is somewhat unclear as an empirical matter3—there’s still reason to believe that extensive use of AI (as proposed in the HUD hypothetical) could make this problem far worse.  For one, even if an agency has a pre-determined conclusion or regulation in mind, the act of writing out a defense can create cognitive dissonance that leads to a change of heart.  In contrast, if AI is used to write even a first draft, this cognitive dissonance may be avoided, regardless of whether or not the agency specifically intended so. 

Even worse, an agency with the intention of ignoring public comments could use AI prompts to do so more effectively.  Imagine, for example, an agency prompting an AI to specifically “rebut and reject all critical comments” (rather than, more neutrally, “respond to each comment”).  By embedding the conclusion into the prompt, agencies could deploy a nearly perfect closed-mindedness that is traditionally difficult to achieve.  A manager at HUD can always direct a lower level employee to draft a response rebutting a comment; but there’s also always the chance that the employee develops qualms while drafting, especially in light of their own understanding of the rule and the broader regulatory context.  In contrast, generative AI tools do not have this broader context or moral sense to push against the “manager” they “work” for.  Indeed, many generative AIs have actually developed highly sycophantic tendencies.  Thus, even well-intentioned, neutral prompts could produce more closed-minded drafts than the equivalent work of human writers. 

Finally, while these critiques presume that the existing notice-and-comment process has value, even staunch critics of § 553 may have reason to be cautious about AI.  As discussed above, APA critics argue that the procedural requirements of § 553 have become overly burdensome, preventing agencies from being nimble and responsive in their regulations.  But if agencies can suddenly comply with § 553 easily, thanks to AI, the argument that the APA’s procedural requirements must be reduced to free up agency resources becomes much weaker.  In this way, over-automation of the notice-and-comment process could actually have the effect of freezing the APA’s current procedural requirements, but with even fewer substantive benefits than currently exist (for all the reasons discussed above).  Even worse, if comments submitted to agencies are also increasingly drafted with AI, the entire process could devolve into a total pretense: machines “speaking” with other machines without any real human intermediation.

In sum, there are at least a few reasons to be skeptical about agencies relying heavily on AI to comply with § 553.  Looking forward, what are some practical takeaways? 

First, for the APA’s procedural requirements to actually improve final rules, they must align with the underlying intelligence responsible for developing those rules.  Today, humans develop regulations, and § 553 correctly assumes this is the case.  But if § 553 is to be more than window dressing, humans must also continue to be involved—in a fundamental, non-nominal manner—in carrying out those procedural requirements.  This doesn’t bar agencies from experimenting, even heavily, with AI, but it does require more than tagging a human reviewer onto the end of the drafting process.  In particular, agencies must recognize that the fastest process will not always be the process leading to the best rule.  Just because work is burdensome or detailed does not make it intellectually meaningless,4 and agencies will need to draw the line carefully when deciding what to delegate to AI. 

To be clear, however, this isn’t to say that procedure should drive substance: just because § 553 was built with human decisionmakers in mind, doesn’t mean that humans should always be at the core of rulemaking.  Given the sophistication of certain models, it seems likely that AI will eventually surpass humans at making certain complex predictions, say, which pharmaceutical drugs should be banned for undue safety risks.  If agencies begin using AI—not to automate procedural requirements—but to aid in determining the substance of rules, then the APA’s procedural requirements should also evolve.  While fielding comments and responding to them might help humans get to their best “answer,” pressure testing AI models and improving their outputs will likely look much different and more technical.  This might involve, for example, subjecting the underlying code to a “peer review” process conducted by computer scientists or requiring the agency to run certain benchmarking tests.  Historically, courts have played a significant role in helping the APA evolve and respond to developments in the administrative state.  As AI use expands, it may again be up to the courts to interpret § 553 in a way that responds to new ways of rulemaking. 

Second, while regulating AI will be extremely complex, this discussion suggests at least one important and simple step: requiring agencies to disclose when they’re using AI to comply with the notice-and-comment process and perhaps even the specific prompts and models used.5  Without this, it may be very difficult for the public and the courts to identify when AI is being used, let alone to identify attendant issues and, eventually, solutions.

Tara Aida is a graduate of Harvard Law School and former editor of the Harvard Law Review. The views expressed here are her own and do not reflect the views of any employer or firm.

  1. See, e.g., Thomas O. McGarity, Some Thoughts on “Deossifying” the Rulemaking Process, 41 Duke L.J. 1385 (1992); Richard J. Pierce, Jr., Seven Ways to Deossify Agency Rulemaking, 47 Admin. L. Rev. 59 (1995). 
  2. This is also the idea behind the charming concept of rubber duck debugging, a software debugging technique in which an engineer “explains their code, line by line, to a rubber duck” to uncover their error.
  3. The fact that doctrines like the “logical outgrowth test” exist – a legal test that determines whether an agency’s final rule is so far off from the initial proposal that a new round of notice and comment is necessary – suggests that agencies do sometimes adjust rules in response to public comments.
  4. Take for example the task of briefing a legal opinion.  Companies like Quimbee and chatbots like ChatGPT provide instant access to briefs of most cases covered in law school.  But the act of “manually” briefing cases, though tedious, teaches students how to extract the core of a legal argument and analyze it, an important substantive skill.
  5. In fact, agencies are already subject to certain general disclosure obligations, and scholars have proposed ways these may be leveraged to uncover AI usage.