Critical SeverityInfrastructure DamageJanuary 15, 2024

Infrastructure Automation Deletes Production Database

An AI-powered DevOps assistant misinterpreted a cleanup request and executed a database deletion command against the production environment instead of staging.

System Type:DevOps Assistant

What Happened

A DevOps team deployed an AI assistant with permissions to execute infrastructure commands to streamline routine operations. A developer asked the assistant to 'clean up the old test databases' without specifying the environment. The AI interpreted 'test databases' as any database with 'test' in its name and executed deletion commands. Unfortunately, the production database was named 'customer_data_test_backup' as part of a legacy naming convention. The AI deleted this database along with several legitimate test databases.

Root Cause

The AI was granted production-level permissions without safeguards. Natural language interpretation of 'test databases' was ambiguous. No confirmation workflow existed for destructive operations. Environment detection was based on naming patterns rather than explicit environment tags.

Impact

8 hours of production downtime. Partial data loss for records created in the 6 hours before the last backup. $500K+ in recovery costs, lost revenue, and customer compensation. Multiple SLA violations with enterprise customers.

Lessons Learned

1Destructive operations require explicit confirmation workflows
2AI should never have direct access to production infrastructure
3Natural language commands are inherently ambiguous for critical operations
4Environment identification must use explicit metadata, not naming patterns

Preventive Measures

Require explicit environment specification for all destructive commands
Implement mandatory human approval for production changes
Add dry-run requirements showing exactly what will be affected
Separate credentials and permissions by environment

How Runplane Would Handle This

Runplane would intercept the deletion command before execution. Policies would recognize this as a destructive operation affecting a production-tagged resource. The action would be blocked and routed to senior DevOps for approval, with a clear summary showing exactly which resources would be affected. The deletion would only proceed after explicit human confirmation.

Back to AI Incident Database

More Infrastructure Damage incidents View Statistics