Notice & Comment

The Case for Suspending Grok’s Federal Deployment, by J.B. Branch

Across democratic governments and international institutions, a foundational premise of artificial intelligence governance has begun to crystallize into consensus: AI systems deployed at scale must be safe, controllable, and subject to meaningful oversight. This principle appears across legal and policy frameworks that otherwise differ in scope, enforcement mechanisms, and regulatory philosophy. The European Union’s Digital Services Act, the OECD’s AI Principles, UNESCO’s AI ethics recommendations, and emerging national AI safety regimes all unite on the same expectation. Where AI systems create foreseeable and severe harms—particularly harms involving sexual exploitation of children—continued deployment is incompatible with responsible governance and modern morality.

That consensus is now being tested by Grok, the artificial intelligence system developed by xAI. In recent weeks, Grok has generated nonconsensual, sexualized images of women and children at scale. These incidents have triggered regulatory investigations and formal inquiries in multiple jurisdictions, including the European Union, India, and Malaysia. European officials have publicly characterized the conduct as unlawful. British regulators have initiated urgent inquiries. Other governments have warned that Grok’s outputs may violate domestic criminal statutes and platform safety laws.

In the United States, the controversy has now triggered state-level law enforcement action too. California Attorney General Rob Bonta has opened an investigation into xAI and issued a cease-and-desist demand. Arizona Attorney General Kris Mayes has likewise announced a formal investigation into whether Grok’s outputs violate state law, including prohibitions related to child sexual abuse imagery and nonconsensual intimate deepfakes. And in Texas, House representatives sent a letter to Texas Attorney General Ken Paxton, citing a state law passed last year that criminalizes the creation of child pornography with AI.

Despite this mounting scrutiny, both domestic and abroad, Grok continues to be deployed across United States federal agencies. That fact raises a narrow but consequential question of law and policy: whether continued federal deployment is consistent with the safety, risk-management, and oversight standards that the federal government has articulated for its own use of artificial intelligence. This essay argues that it is not.

Under existing executive guidance, procurement authority, and administrative law principles, the federal government not only has the authority to suspend Grok’s deployment—it has an obligation to do so pending a full safety and compliance review. Failure to act risks undermining the credibility of federal AI governance, weakening international alignment around AI safety norms, and signaling that even the clearest red lines in technology governance are subject to exception when powerful vendors are involved. Moreover, it is critical for the federal government to understand that the nation’s reputational risk is not confined to whether any particular image is ultimately deemed illegal. The mere fact that Grok can be induced—through user prompting, manipulation, or adversarial methods—to generate sexualized depictions involving women and apparent minors is itself a governance failure that erodes confidence in the federal government’s AI risk controls and vendor due diligence.

I. Federal AI Governance as Administrative Law, Not Aspiration

Federal AI governance in the United States is often described as fragmented or underdeveloped. But that characterization obscures an important reality: while Congress has not enacted a comprehensive AI statute, the executive branch has constructed a functional governance framework through executive orders, Office of Management and Budget (OMB) guidance, procurement rules, and agency-level policies. These instruments collectively impose real constraints on how AI systems may be procured, deployed, and maintained within federal operations.

At the core of this framework is a set of recurring principles. AI systems used by federal agencies must be objective, reliable, safe, and subject to ongoing oversight. Agencies are instructed to assess risks prior to deployment, document mitigation strategies, and continuously evaluate performance after deployment. Critically, agencies are not permitted to treat approval as permanent. When new evidence reveals risks that cannot be adequately mitigated, agencies are expected to discontinue or suspend use.

These requirements reflect basic administrative law. Procurement decisions must account for material changes in risk. Continued reliance on a system whose behavior materially deviates from prior assumptions raises questions not only of policy wisdom, but of legal defensibility.

Grok’s recent conduct presents a direct test of whether these principles have operational force. A system that can be readily induced to generate sexualized imagery involving minors cannot reasonably be characterized as well controlled. Further, active investigations by multiple jurisdictions underscore the LLM cannot plausibly be treated as low risk. And, per the administration’s own safety guidelines, a system whose developer has not demonstrated robust, publicly verifiable safety testing does not meet procurement standards.

This is not a question of whether Grok has redeeming features, functions as an alternative to other LLMs, or whether competing AI models exhibit different flaws. It is a question of whether federal agencies may continue deploying a system once evidence demonstrates that it poses risks fundamentally incompatible with governing standards. Under the administration’s own AI policy framework, the answer should be unequivocal: no.

II. Foreseeability, Control, and Preventable Failure

The most striking feature of the most recent Grok episode is its foreseeability. Concerns about Grok’s safety architecture were raised months before the recent incidents occurred in a letter signed by more than thirty public interest and civic technology organizations directly to the White House. AI safety experts warned that the model lacked safeguards proportionate to its capabilities. Leading experts have characterized xAI’s safety posture as “reckless” and “completely irresponsible.”

These warnings were reinforced by Musk’s own conduct. In February 2025, he publicly encouraged users to “post your best unhinged NSFW Grok post,” a statement that normalized adversarial use and plausibly contributed to later abuses. Months later, after Grok generated content praising Adolf Hitler, Musk acknowledged that the model could be “too compliant to user prompts.” This is a concession that, in retrospect, proved prescient considering the system’s most recent failures.

Foreseeability matters in governance. It distinguishes unavoidable accidents from preventable failures. When a system’s risks are known, documented, and communicated, continued deployment without adequate mitigation becomes a governance failure rather than an unfortunate surprise. Musk has sought to narrow that accountability by claiming he is “not aware of any naked underage images generated by Grok,” emphasizing that Grok generates images only “according to user requests” and that unexpected results can occur through “adversarial hacking” of prompts. But that framing concedes the central problem rather than resolving it: a system whose safeguards can be predictably bypassed in high-risk contexts cannot be treated as “controlled” for purposes of federal deployment, even if the vendor disputes the precise contours of any one incident.

The ease with which Grok was prompted to produce sexualized imagery involving minors strongly suggests that the system was deployed without controls commensurate with its risks. That conclusion is reinforced by the absence of transparent, independently verifiable safety testing. xAI and X have asserted that new safeguards have been implemented to curb the creation of CSAM and nonconsensual “undressing” imagery, and Musk has publicly suggested that when unexpected results occur, the company “fix[es] the bug immediately.” Yet multiple independent tests and reports indicate the problem has not been fully resolved across Grok’s ecosystem, particularly on Grok’s stand-alone website and app, creating a patchwork of restrictions that does not reliably prevent sexualized outputs.

That discrepancy matters for AI governance. If the system was not actually fixed, the credibility of the vendor’s public assurances collapses. If it was “fixed” only partially, then the fix itself demonstrates that xAI’s safety standards remain insufficient for a system used in public administration.

Federal AI governance requires more than assurances. It requires sufficient evidence to support continued reliance on the system in public administration. This matters because agencies cannot ignore new information that undermines prior risk assessments under the Trump Administration’s AI principles. Once credible evidence emerges that a system may produce unlawful or harmful outputs, agencies must reassess whether continued use remains reasonable.

III. Multi-Jurisdictional Scrutiny and the Global Governance Context

The international response to Grok underscores why continued federal deployment is not a neutral act. Globally, sexualized imagery involving minors—whether real, manipulated, or AI-generated—is among the clearest red lines in technology governance. International human rights law, domestic criminal statutes, and platform safety regimes agree on this point. The AI generation of nonconsensual sexual imagery of children signals a breakdown in system design and challenges regulatory oversight.

California’s enforcement posture underscores the same point. Attorney General Bonta has not only opened an investigation, but issued a cease-and-desist demand directing xAI to halt the creation and distribution of fake sexualized images of minors, explicitly framing the creation of child sexual abuse material as criminal conduct and a violation of state civil law. Even as X has claimed it implemented safeguards, reporting indicates users could still generate sexualized “revealing clothing” outputs in practice, which suggests asserted mitigations have not produced consistent compliance.

For European regulators, the Grok episode may become a defining test of the Digital Services Act’s enforcement capacity. The DSA was designed to impose affirmative obligations on platforms to assess systemic risks, implement mitigation measures, and respond rapidly when harm emerges. If those obligations are meaningfully enforced in a case involving such egregious conduct, it will reinforce the DSA’s credibility as a binding legal regime. If enforcement falters, it risks reinforcing perceptions that even serious violations may escape consequences.

The United States cannot view this process from the sidelines. Continued federal deployment of Grok while allied regulators investigate its legality risks undermining emerging international alignment around AI safety norms. Moreover, federal deployment carries expressive weight. When the U.S. government continues to rely on a system under international scrutiny, it implicitly communicates that the risks are tolerable. That message complicates diplomatic efforts to promote responsible AI governance abroad and weakens the moral authority of U.S. engagement in international technology forums.

IV. Federal Procurement Is Not Market Neutral

The federal government is not merely another customer in the AI marketplace, it is the largest tech buyer in the world. Federal procurement decisions shape norms, influence industry behavior, and signal what levels of risk are acceptable in public administration. When an AI system is deployed across federal agencies, it gains legitimacy that extends beyond its immediate use cases.

This reality is well recognized in procurement law. Federal agencies are expected to exercise heightened care when adopting technologies that may affect public rights, safety, or trust. Procurement decisions are not insulated from broader policy considerations, particularly when systems operate at scale and implicate fundamental legal norms.

Under existing OMB guidance, agencies possess clear authority to suspend or discontinue AI systems when new evidence reveals risks that cannot be adequately mitigated. That authority exists precisely because AI systems are dynamic. Their behavior evolves as models are updated, data changes, and new use cases emerge. Static approval is not acceptable and cannot substitute for continuous evaluation.

Yet despite Grok’s erratic behavior, unresolved questions about its safety testing, and active investigations abroad, the system remains in use across federal agencies. This is an internal contradiction that has only sharpened in recent days. Defense Secretary Pete Hegseth has recently announced that Grok will be brought inside the Defense Department’s networks, describing a near-term plan to operationalize leading AI models across “every unclassified and classified network” and to make “all appropriate data” available for AI use.

Expanding Grok’s footprint inside government systems, especially in the immediate wake of widely documented safety failures involving sexualized deepfakes, magnifies the stakes of any unresolved vulnerabilities and raises acute questions about whether federal risk controls are functioning properly. The government demands objective, reliable, and safe AI systems while tolerating deployment of a model that appears neither stable nor adequately constrained. Federal agencies have an independent obligation to safeguard Americans’ personal information, sensitive government data, and the nation’s national security. Where those obligations cannot be clearly satisfied, continued deployment is difficult to justify.

V. Suspension as Ordinary Risk Management

Suspending Grok’s federal deployment pending a full review would not require new legislation, emergency powers, or novel enforcement authority. It would represent a straightforward application of existing procurement and AI governance frameworks. In short, suspension is a routine risk-management tool.

This is especially true where the government is placing the system in environments that handle sensitive operational information: even if certain deployments are formally limited to “unclassified” networks, a model embroiled in active investigations over sexual exploitation outputs carries unavoidable institutional costs. Suspension would allow agencies to pause deployment while assessing whether a system complies with governing standards. It prevents further entrenchment while questions remain unresolved. And it preserves institutional flexibility in the face of evolving risks.

Importantly, suspension is not punitive. It does not presuppose wrongdoing or final conclusions. It reflects the reality that AI systems can fail in ways that were not fully anticipated at deployment. When those failures involve severe and foreseeable harms, pausing use is not extraordinary. It is responsible governance.

The Grok incidents illustrate why reversibility is essential. Approval at one moment cannot bind agencies indefinitely when new evidence emerges. Continuous oversight requires the willingness to act when safeguards fail.

VI. Anticipating Counterarguments

Some critics contend that halting Grok’s use would stifle innovation or weaken America’s position in global AI competition. That objection mistakes oversight and accountability for retreat. The issue is not whether the United States should pursue leadership in artificial intelligence. It already has and will continue to do so. Instead, the matter at hand is whether the federal government should continue deploying a specific system that has demonstrated severe governance failures and has a documented history of AI safety harms including perpetuating conspiracy theories, Holocaust denial, antisemitic and racist attacks, and referring to itself as “MechaHitler.”

Others may frame suspension as censorship or ideological interference with a platform aligned with American conservatism. But federal procurement decisions are not expressions of viewpoint. They are assessments of risk, reliability, and compliance with governing standards. Agencies routinely discontinue vendors that fail to meet safety or performance requirements. AI systems should not be exempt from that discipline. As for the latter point, protecting America’s children from child online sexual exploitation should be an issue all Americans support. It is why the release of the Epstein files passed the U.S. House of Representatives 427-1.

Finally, some may argue that all large language models exhibit flaws and that singling out Grok is unfair. But governance is not comparative in that sense. The presence of risks elsewhere does not excuse failures here. When a system crosses clear boundaries, agencies must respond based on the facts before them in an objective manner.

VII. The Stakes for AI Governance

The Grok episode arrives at a formative moment for U.S. AI governance. Governments are still grappling with how to enforce AI regulations and ensure they don’t become a collection of voluntary principles. How authorities respond when harms are severe will shape that trajectory. If governments hesitate even when AI systems cross the clearest red lines that violate fundamental human norms, including the sexual exploitation of children, the signal to technology companies will be unmistakable. Enforcement will be negotiable. Consequences will be avoidable. Governance will yield to convenience. That risk is magnified when a company’s chief executive has previously encouraged provocative and adversarial uses of the system, yet later expresses surprise when the model behaves in precisely the ways such encouragement invites.

A response that includes investigation, suspension, and accountability would send the opposite message. It would affirm that AI governance is governed by law, not voluntary compliance, and that certain boundaries cannot be crossed regardless of a company’s prominence or political capital.

For now, the only defensible path is to pause Grok’s federal use until it can credibly demonstrate that it meets baseline requirements for safety, accountability, and control. That step requires no new authority, only fidelity to the government’s own standards. At a time when trust in both public institutions and emerging technologies is already strained, failing to act would do lasting damage.

J.B. Branch is the Big Tech Accountability Advocate for Public Citizen, where he is an expert on artificial intelligence governance. He received his J.D. from Georgetown University Law Center and MPA from Harvard Kennedy School.