AI guardrails are not enough and governance teams should understand why

While AI guardrails are useful for catching obvious troublesome outputs, they are not a comprehensive risk management strategy.

Contributors:
Andrew Burt
CEO
Luminos.AI
Nicholas Maietta
CIPP/US, CIPM, CIPT
VP of Legal Automation
LuminosAI
Luke Posniewski
Associate of Legal Automation
Luminos.AI
As generative and agentic artificial intelligence moves rapidly into regulated domains, such as human resources, finance, legal services and health care, organizations are racing to make these systems "safe."
The most common solution is something called AI guardrails.
Legal, privacy or AI governance teams advising on AI risk have likely heard the term. But conversations about using guardrails often misunderstand their nature and what they can and cannot reliably protect against. That misunderstanding is creating real legal exposure.
More specifically, if an organization's data scientists only rely on guardrails to protect its AI from risk, the company is exposing itself to significant liability. Guardrails are simply not enough, and AI governance teams engaging in AI risk management need to understand why.
A brief overview of guardrails
Let's start with the basics. AI guardrails are filters placed around an AI system to prevent certain kinds of outputs. They exist around most off-the-shelf models like Claude or Gemini. Think of them as the AI equivalent of content moderation on social media. They are designed to catch obvious problems before they reach the end user.
In practice, guardrails usually include things like: basic keyword filters — blocking certain words or phrases; toxicity detection — flagging hate speech, threats, or harassment; and prompt restrictions — preventing users from asking certain types of questions.
For example, if a user asks an AI system to generate something clearly illegal or explicitly harmful, a guardrail might block the response or replace it with a warning. This is why guardrails are so widely used: they are relatively easy to implement and can quickly address certain obvious risks.
Why data scientists rely on guardrails
Inside most organizations, AI systems are built by data scientists and engineers under significant time pressure. Guardrails play an important role in how these systems are deployed. Since it is extremely difficult to make an underlying AI model perfectly safe, teams often deploy a powerful general-purpose model and then add guardrails on top to "catch" bad outputs.
This approach is appealing because the model does not require retraining, guardrails can be updated independently, and one set of filters can easily apply to many use cases. For these reasons, guardrails are often treated as the primary safety mechanism, not a backup.
From a product and engineering perspective, this makes sense. From a legal perspective, it makes for a necessary, but wholly insufficient, approach to managing risk.
Guardrails are general
Guardrails are general and only address basic safety categories. Because they need to run with every use of the AI system, they generally rely on technologies that are computationally cheap and fast, like word filtering. Even more sophisticated mechanisms like denied topics or grounding checks are typically technically limited and only address a small part of the risk landscape.
Most importantly, traditional guardrails do not assess against legal risks and standards, which are nuanced and cannot, for example, be reduced to the mere presence of harmful words from a list.
That creates a fundamental limitation: guardrails can catch obvious issues, but they often miss contextual, subtle or domain-specific risks that give rise to legal liabilities.
For example, an AI system could: request sensitive or confidential information from a user or reveal that same or similar information to another user; provide inaccurate medical or financial guidance that sounds plausible; or unintentionally engage in discriminatory behavior against protected classes — all without using any "banned" words or triggering a filter.
To a guardrail, these outputs can look completely normal. But to AI governance teams, they are potential liabilities.
Why this matters
Legal risk rarely comes from obvious violations like explicit hate speech. It often comes from context. Advice from a chatbot might be misleading, leading to the same types of liabilities as if a human had given that advice. A decision made by an AI system may be biased in effect but not intent, again giving rise to a host of potential liabilities.
AI systems may also operate outside their intended scope, giving rise to contractual and other risks. The list of potential liabilities created by AI is long and only growing longer as adoption proliferates.
Guardrails are not designed to answer these questions. Instead, they are designed to catch clear, predefined categories of negative content rather than evaluating whether an output creates legal risk.
That mismatch is where companies are increasingly overlooking significant legal exposure.
What AI governance teams should do
Guardrails alone are not a comprehensive risk management strategy. If guardrails are the primary safety mechanism, governance teams should not stop at "Do you have them?" They should ask their technical teams questions like: what risks do the guardrails not cover? How consistent are they in practice? What evidence exists that they work reliably? What happens when they fail?
Most importantly legal teams should ask for clear documentation about controls that exist beyond guardrails. This can help ensure that the inherent gaps in guardrails are identified and compensated for.
Because guardrails alone are not a comprehensive risk management strategy.
What robust risk controls look like
Legally defensible AI systems must go beyond surface-level filtering. This means AI systems must be thoroughly tested and monitored. The prevailing guardrails that technical teams are apt to over rely on, in other words, need guardrails themselves.
One common next step beyond guardrails is human red teaming. Red teaming is focused on how AI responds to malicious input — in other words, when users try to break it. It is critical for identifying edge cases. But since it's focused on those edges, red teaming still misses most risks found in typical user interactions, that is, the vast majority of interactions.
More fulsome testing, evaluation and monitoring is necessary to determine how the model will behave when users interact with it as intended.
AI guardrails are useful but limited. For governance teams advising on AI, the takeaway is straightforward: If an organization's safety strategy begins and ends with guardrails, it may not be managing risk — it may just be managing appearances.
And those are not the same thing.

This content is eligible for Continuing Professional Education credits. Please self-submit according to CPE policy guidelines.
Submit for CPEsContributors:
Andrew Burt
CEO
Luminos.AI
Nicholas Maietta
CIPP/US, CIPM, CIPT
VP of Legal Automation
LuminosAI
Luke Posniewski
Associate of Legal Automation
Luminos.AI



