Netflix Open Source Cloud-testing Chaos Monkey

Derrick Harris | GigaOM | July 30, 2012

Netflix has a gift for anybody who needs to ensure their cloud-hosted applications keep running even if some of the virtual servers on which they’re running die. It’s called a Chaos Monkey — but don’t worry, this monkey is very tameable and is now open source. The video rental and streaming giant is one of the world’s biggest consumer of cloud computing resources — it hosts the majority of its infrastructure on the Amazon Web Services cloud — and Netflix developed Chaos Monkey as a method for ensuring that its system is capable of healing itself or continuing to run should instances fail. 

“Over the last year,” Netflix cloud engineers Cory Bennett and Ariel Tseitlin wrote in a blog post announcing the open source version, “Chaos Monkey has terminated over 65,000 instances running in our production and testing environments. Most of the time nobody notices, but we continue to find surprises caused by Chaos Monkey which allows us to isolate and resolve them so they don’t happen again.”

Anyone scared releasing such a wild-sounding entity into their application infrastructure (or envious that they can’t do so because they don’t run on Amazon’s cloud) need not worry. As Bennett and Tseitlin explain, Chaos Monkey is configurable and “by default, runs on non-holiday weekdays between 9am and 3pm.” It’s also flexible enough to run on clouds other than AWS, they write...