Elevated API Errors
Incident Report for Balena.io
Postmortem

We apologize for the service disruption occurring on the evening of 08 December, 2017. This was caused by a cascade of failures initiated during a routine credential rotation in the production environment. Due to an unanticipated side effect of the rotation, we were forced to restart all services, and during the startup process the API experienced elevated load. This created a bottleneck in the database, and slow queries which caused extremely high API response latency. Because the API was effectively unavailable, downstream resin.io services also experience partial outages.

We'd like to stress that not only will we be performing a thorough investigation into the root causes of this issue, we've also already applied mitigating improvements to avoid such an outage in the future, and all devices remained safely online during this time. No customer code (unless it was using our API in the application container) was affected.

Posted Dec 19, 2017 - 11:58 UTC

Resolved
This incident has been resolved.
Posted Dec 09, 2017 - 00:04 UTC
Investigating
We're experiencing an elevated level of API errors and are currently looking into the issue.
Posted Dec 08, 2017 - 23:37 UTC