Colourbox and Service unavailable
We suffered a complete service outage on Colourbox and Skyfish. The systems were completely unavailable for 3 hours (12:30 - 15:30) and suffered minor problems until 22:00.
The cause of the downtime was 2 independent problems, which unfortunately happened at roughly the same time. The first problem was caused by an automatic security update which in turn led to us not being able to deploy new resources. Normally, this would mean that our services continue to function but at a reduced capacity. However, at the same time, we saw a sudden increase in traffic. Since we were unable to add new resources, the existing resources eventually reached max capacity and became unresponsive. The issue was mitigated by manually adding resources until the underlying issue was fixed.