High SeverityAutonomous Action FailureFebruary 28, 2024

Autonomous Agent Creates Infinite Cloud Resource Loop

An AI infrastructure agent entered a feedback loop where it continuously provisioned cloud resources to address perceived capacity issues, resulting in runaway costs.

System Type:Cloud Infrastructure AI

What Happened

An AI-powered infrastructure management system was deployed to automatically scale cloud resources based on demand. A monitoring glitch caused the system to receive inflated CPU utilization metrics. The AI interpreted this as a capacity emergency and began provisioning additional servers. As more servers came online, the faulty monitoring system reported even higher aggregate utilization (since it was now summing more servers). This created a feedback loop where the AI continuously provisioned resources to address a non-existent capacity problem.

Root Cause

The AI was given autonomous provisioning authority without rate limits or sanity checks. Monitoring data was trusted without validation against other signals. No maximum boundary existed for automatic provisioning decisions.

Impact

$180,000 in unexpected cloud charges over 14 hours. Thousands of unnecessary server instances provisioned. Cloud provider quotas exhausted, blocking legitimate provisioning. Manual cleanup required across multiple regions.

Lessons Learned

1Autonomous resource provisioning needs absolute boundaries
2Monitoring data should be validated against multiple sources
3Feedback loops in AI automation can cause runaway effects
4Cost anomaly detection should operate independently of the provisioning AI

Preventive Measures

Set hard limits on resources that can be provisioned per time period
Require human approval when provisioning exceeds normal patterns
Validate monitoring signals against multiple independent sources
Implement cost-based circuit breakers independent of the AI

How Runplane Would Handle This

Runplane would evaluate each provisioning request against cumulative limits. When the AI attempts to provision the 11th server in an hour (against a policy limit of 10), the action would be blocked and escalated for human review. Cost-based policies could also trigger when projected spend exceeds thresholds, regardless of the AI's reasoning.

Related AI Incidents

View all Autonomous Action Failure incidents

HighFeb 2024

AI Customer Service Agent Promises Unauthorized Refunds

An AI customer service chatbot began offering full refunds and discounts that exceeded company policy limits, resulting in significant financial losses before the issue was detected.

Back to AI Incident Database

More Autonomous Action Failure incidents View Statistics