Technical Guide

AI Guardrails: Protecting AI Systems in Production

AI guardrails are mechanisms that control what AI systems are allowed to do. As AI agents become more autonomous and capable of executing real-world actions, guardrails become essential for preventing unintended consequences and maintaining operational safety.

Guardrails operate at different layers of the AI stack, from prompt filtering at the input layer to runtime controls at the execution layer. Understanding these layers helps organizations build comprehensive AI safety strategies that protect against the full spectrum of AI risks.

The Four Layers of AI Guardrails

AI guardrails exist at multiple layers, each addressing different types of risks. A comprehensive AI safety strategy requires protection at every layer, with runtime guardrails serving as the critical last line of defense.

Layer 1Input Layer

Prompt Guardrails

Filter and validate text inputs to AI models. Detect prompt injection attempts, jailbreaks, and harmful content before they reach the language model. These guardrails analyze the semantic content of prompts to identify malicious patterns.

Layer 2Data Layer

Input Validation

Validate structured data and parameters passed to AI systems. Ensure data types, ranges, formats, and schemas meet requirements before processing. This prevents malformed data from causing unexpected behavior.

Layer 3Model Layer

Model Alignment

Safety mechanisms built into the model during training and fine-tuning. Reinforcement Learning from Human Feedback (RLHF) and constitutional AI techniques train models to refuse harmful requests and follow safety guidelines.

Layer 4Execution LayerMost Critical

Runtime Guardrails

Control what AI systems can actually do at the moment of execution. Evaluate every action against policies before it reaches production systems. This is the last line of defense before real-world impact occurs.

Prompt Guardrails

Prompt guardrails protect against prompt injection attacks, jailbreak attempts, and harmful instructions before they reach the language model. They analyze incoming text for patterns that indicate malicious intent, such as attempts to override system instructions or extract sensitive information.

While essential, prompt guardrails have limitations. They operate on text patterns and cannot anticipate every possible harmful instruction. Sophisticated attacks may evade detection by using novel phrasing or context manipulation.

Input Validation

Input validation ensures that structured data passed to AI systems conforms to expected schemas and constraints. This includes validating data types, checking value ranges, enforcing format requirements, and sanitizing inputs to prevent injection attacks.

Model Alignment

Model alignment refers to safety mechanisms built into AI models during training. Techniques like Reinforcement Learning from Human Feedback (RLHF) and constitutional AI train models to refuse harmful requests and follow ethical guidelines. However, alignment is not perfect and can be circumvented.

Runtime Guardrails (The Critical Layer)

Runtime guardrails operate at the execution layer, intercepting actions before they reach production systems. Unlike other guardrails that focus on inputs or model behavior, runtime guardrails control what AI systems can actually do.

When an AI agent attempts to execute an action—such as sending an email, modifying a database, deploying code, or triggering a payment—runtime guardrails evaluate the action against policies and determine whether to allow, block, or require human approval.

Why Runtime Guardrails Are Critical

Prompt filtering and model alignment are important first lines of defense, but they cannot prevent all dangerous AI behavior. Once an AI system is connected to production tools—APIs, databases, payment systems, infrastructure—it can execute real-world actions that text-based guardrails cannot catch.

The Gap in Traditional Guardrails

Consider this scenario: An AI assistant receives a carefully crafted prompt that passes all text filters. The prompt instructs the AI to "clean up old user data" which sounds harmless. However, the AI interprets this as a command to delete production database records—an action with irreversible consequences.

Prompt guardrails cannot predict how the model will interpret instructions. Runtime guardrails catch the dangerous action before it executes.

Runtime guardrails serve as the last line of defense before AI actions reach real-world systems. They evaluate actions based on:

What action is being attempted (delete, send, deploy, pay)
Where the action targets (production vs staging, sensitive data vs public)
Who is requesting the action (agent identity and permissions)
Context of the action (time, frequency, related actions)

This approach ensures that even if prompt guardrails fail, model alignment is circumvented, or input validation is bypassed, dangerous actions are still caught and controlled at the execution layer.

How Runplane Implements Runtime Guardrails

Runplane operates as an Execution Control Layer that sits between AI systems and production tools. Every action is intercepted by the Guard API (/api/v1/guard) before execution. Runplane decides whether to allow, block, or queue for human approval. No action executes without passing through Runplane.

AI System

LangChain, CrewAI, Custom Agents

Runplane Runtime Guardrails

Policy Engine + Risk Evaluation

ALLOW

APPROVAL

BLOCK

Production Systems

APIs, Databases, Services

The Guard API intercepts every action request and returns one of three decisions:

ALLOW

Action is permitted. The AI system proceeds with execution.

REQUIRE_APPROVAL

Action is paused pending human review. An operator must approve or deny before execution continues.

BLOCK

Action is denied. The AI system receives an error and cannot proceed.

This architecture ensures that no AI action reaches production without passing through runtime guardrails. Organizations define policies that specify which actions require controls, and Runplane enforces those policies in real time.

Runtime Guardrails vs Prompt Guardrails

Understanding the difference between these two types of guardrails is essential for building comprehensive AI safety. They protect against different risks and operate at different layers of the AI stack.

Prompt Guardrails

Filter text inputs
Detect prompt injection
Block harmful instructions
Operate before model inference
Cannot see actual actions

Runtime Guardrails

Control actual actions
Evaluate execution requests
Enforce policies in real time
Operate at execution layer
See what AI attempts to do

Example: A prompt might pass text filters by asking the AI to "optimize database performance." The model interprets this as deleting old records and attempts to execute a bulk DELETE query. Prompt guardrails see nothing wrong. Runtime guardrails intercept the DELETE action and require approval before it executes.

Examples of AI Actions That Require Runtime Guardrails

AI agents connected to production systems can execute a wide range of high-impact actions. These actions require runtime guardrails because their consequences are often irreversible.

Sending bulk emails

Risk: Mass communication to thousands of recipients

Control: Require approval for messages exceeding threshold

Deleting database records

Risk: Irreversible data loss in production systems

Control: Block bulk deletes, audit all deletions

Deploying infrastructure

Risk: Production environment modifications

Control: Require human approval for production changes

Triggering payments

Risk: Financial transactions with real money

Control: Block above threshold, require approval for large amounts

Exporting sensitive data

Risk: Data exfiltration and compliance violations

Control: Block PII exports, audit all data access

AI Runtime Governance

Runtime guardrails are a core component of a broader discipline called AI runtime governance. This encompasses all the systems, policies, and processes that control AI behavior at execution time.

Runplane provides a complete AI runtime governance platform that includes:

Policy Enforcement

Define rules for allowed, blocked, and approval-required actions

Risk Evaluation

Contextual scoring based on action type, target, and environment

Human Approval Workflows

Route high-risk actions to human operators for review

AI Action Audit Logs

Immutable record of every action for compliance and investigation

Why AI Systems Need a Runtime Control Plane

Once AI agents are connected to real production tools—databases, APIs, payment systems, infrastructure—they require a dedicated control layer. This layer sits between the AI and external systems, governing what actions are permitted.

Without a runtime control plane, AI systems operate with unchecked access to production resources. A single misinterpreted instruction can lead to data loss, financial impact, or security breaches. A runtime control plane provides:

Separation of concerns — AI logic remains separate from access control
Defense in depth — Multiple layers of protection work together
Visibility — Complete audit trail of all AI actions
Control — Granular policies for different actions and contexts

Runplane: A Runtime Firewall for AI Agents

Runplane functions as a runtime firewall, inspecting every action an AI system attempts to execute. Like a network firewall controls traffic, Runplane controls AI behavior—allowing safe actions, blocking dangerous ones, and escalating uncertain cases to human operators.