Part II: Open Weights, Open Risks: The Policy Vacuum around Unrestricted AI Weights
Shatakshi Shekhar, Shradhanjali Sarma
Open Weights, Balanced Futures: From Blind Spots to Practical Safeguards
In the first part of our post on Open Weights, we stood at the edge of a cliff watching the release of powerful AI model weights into the open, with no clear way to pull them back. In this post, we zoom in on the policy gap and identify why the world’s most prominent AI laws, from the EU AI Act to national security protocols, fail to fully account for open-weight risks, and what choices lie ahead for lawmakers, industry, and the public. These blind spots aren’t just academic oversights, rather they shape the very conditions under which open-weight models will be deployed, replicated, and adapted worldwide. Without targeted interventions, each gap in oversight becomes an open door, inviting security risks, market distortions, and long-term harms that current frameworks were never designed to handle.
Policy Blind Spots in Current Regulatory Frameworks
Current regulatory frameworks, largely conceived for closed AI systems, exhibit several critical blind spots when applied to open-weight models due to their unique characteristics.
Loss of Direct Control of Developer: Once an LLM's weights are publicly distributed, the original developer irrevocably loses direct control over the model's subsequent use, modification, or deployment. This technical reality fundamentally undermines the efficacy of traditional security mitigations that rely on centralised control, such as API rate-limiting, hardware-based access restrictions like Trusted Execution Environments, and direct monitoring of downstream usage. Regulatory commitments focusing on securing unreleased weights or post-market monitoring, while pertinent for proprietary systems, do not adequately address the risks that emerge once open-weight models are in widespread circulation and under independent control.
Ease of Modification and Circumvention of Safeguards: Open access to model weights means that embedded safety features and alignment mechanisms, typically implemented through fine-tuning or Reinforcement Learning from Human Feedback (RLHF), can be trivially sidestepped or removed by adversaries. From a regulatory standpoint, imposing mandates on developers to prevent such modifications becomes technically infeasible for publicly released open-weight models, as these models can be run and altered on premise without the original developer's oversight. This unaddressed technical reality creates a significant gap in liability and enforcement mechanisms, as the model's behaviour can diverge from its intended safe alignment post-release.
Challenges in Attribution and Traceability: The widespread and often anonymous deployment of open-weight models complicates efforts to attribute malicious activities back to a specific model or its original developer. Technical mitigations, such as watermarking content generated by an AI model can often be removed or bypassed by adversaries once they possess the model weights, further obscuring the origin of harmful outputs. This difficulty in tracing and attributing misuse hinders effective legal and regulatory responses, making it arduous to hold specific entities accountable for harms perpetuated through modified open-weight LLMs.
Ambiguity of Open- Source Exemptions: Legislative frameworks, such as the EU AI Act, frequently include exemptions for models released under free and open-source licenses from certain regulatory obligations. However, the precise definition of free and open-source remains contentious; a strict interpretation, aligning with the Open- Source Initiative's (OSI) definition that requires full data release. This could inadvertently burden many current open-weight models (e.g., Llama, Mistral) that do not meet this standard with full systemic risk obligations, potentially stifling innovation. Conversely, a loose interpretation as open-weight could allow highly capable and risky models to escape necessary scrutiny, illustrating a critical definitional blind spot that compromises regulatory intent.
Conflation of Model-Level and System-Level Obligations: Current regulations often conflate obligations that are appropriate for the foundational model itself with those that should apply at the application or system level. For instance, mandating content watermarking at the model level for open-weight models is inherently difficult to enforce robustly, given that the watermarking mechanism can be removed or disabled by users. A more pragmatic and effective regulatory approach would focus on imposing such obligations at the application or system level, where control over deployment and usage context is more feasible, ensuring that the burden of compliance is placed where it can be technically met and effectively monitored.
Recommendations for Balancing Innovation and Safety
Addressing the complex risks introduced by open-weight LLMs requires a multi-faceted and nuanced approach that fosters responsible innovation while mitigating potential harms.
Controlled Release for High-Risk Capabilities: Rather than imposing blanket restrictions on entire open-weight models, regulatory efforts should concentrate on controlling specific, high-risk capabilities identified through rigorous and independent evaluation frameworks. This approach advocates for tiered access models where core model weights remain accessible, but functionalities directly relevant to offensive cyber operations like bioweapons development or scenarios around critical infrastructure, are subject to stricter access controls or licensing, potentially incorporating robust Know-Your-Customer (KYC) protocols. Furthermore, policies should allow for adaptive adjustments to compute thresholds based on demonstrated capabilities, ensuring that regulations do not inadvertently capture benign models or overlook highly efficient, dangerous ones.
Strengthen Public Evaluation Capacity and Collaborative Red Teaming: Significant investment is imperative for public institutions, such as national AI safety institutes, to develop and maintain standardised benchmarks for assessing LLM safety and security, building upon existing frameworks like MITRE’s OCCULT. This involves fostering public-private partnerships to conduct rigorous, pre-release red-teaming exercises specifically designed to identify and mitigate the cyber misuse potential of significant open-weight models. Enhancing public evaluation capacity ensures an independent, evidence-based regulatory landscape and provides critical insights into the evolving threat landscape, enabling the development of responsive and effective countermeasures.
Develop Policy-Based Traceability and Implement Stateful Defenses: To counter the challenges of attribution and detect covert misuse, it is recommended to integrate policy-based logging and watermarking systems designed to persist and allow traceability of harmful outputs even after open model release. Complementing this, the development of stateful defenses is crucial; these mechanisms monitor and analyse sequences of user queries across multiple independent interactions, rather than merely isolated prompts, to detect patterns indicative of misuse campaigns like decomposition attacks. While no single defence is foolproof, combining traceable outputs with continuous, context-aware monitoring of user behaviour can significantly enhance the ability to identify and mitigate sophisticated, evasive adversarial strategies.
Prioritise Sectoral Regulation: Regulatory efforts should shift their primary focus from imposing excessive burdens on foundational model developers, especially those releasing open-weight models, towards regulating the deployment and application of AI systems in specific high-risk contexts. This means addressing harmful outputs and uses through sector-specific regulations, such as those governing finance, healthcare, or critical infrastructure; where the context of use directly dictates the potential for harm. Such an approach acknowledges the technical realities of open-weight distribution, where direct control over the model is relinquished, and places the regulatory emphasis on the responsible use of AI within defined operational boundaries.
Conclusion
The regulatory blind spots around open-weight LLMs are not abstract technicalities but practical vulnerabilities that shape how these systems will be used, misused, and governed worldwide. Bridging these gaps requires moving beyond one-size-fits-all approaches: policymakers must distinguish between model-level and system-level obligations, invest in independent evaluation capacity, and target controls where risks are greatest rather than where innovation is most visible. The central challenge is balance — ensuring open-weight AI continues to drive collaboration and discovery while not leaving society exposed to unmanageable security and accountability risks. How quickly regulators, industry, and the research community converge on this balance will determine whether openness remains a strength or becomes a liability in the next phase of AI governance.


