As you probably know, reddit was down or degraded for the last 36 hours. Right now we are still a bit degraded, but we have enough servers to handle the weekend traffic (we think). We hope to be at full capacity by Monday.
We want to tell you why reddit was down.
In short, Amazon had a failure of their EBS system, which is a data storage product they offer, at around 1:15am PDT. This may sound familiar, because it was the same type of failure that took us down a month ago. This time however the failure was more widespread and affected a much larger portion of our servers (and not just ours, many other companies were affected as well). Namely, most of our database slaves were disabled from this outage. Even though we are spread across multiple availability zones (data centers), it did us no good in this case, since the outage was so widespread and hit multiple availability zones.
Since that last failure, we have been doing everything we can to move ourselves off of the EBS product. We're about half way there. All of our Cassandra nodes are now using only local disk, and we hope to have all of postgres on local disk soon.
We will continue to use Amazon's other services as we have been. They have some work to do on the EBS product, and they are aware of that and working on it. The other services that we use are still performing as expected.
That being said, if you work for another hosting platform and believe you can make a compelling offering, please contact us at email@example.com, and we'll get back to you in a few days.
The team and I have been up the last two nights waiting for this issue to get fixed on the Amazon side so that we could bring the site up as soon as possible. Because of this, we probably won't be around much to answer questions in the comments here, but feel free to talk amongst yourselves. 🙂
As always, thank you all for your continued support. And to whoever sent us a pizza, thank you! It was much appreciated.
To end on a high note, I'd just like to mention that we are making excellent progress on the hiring front to bring on some new developers to help us implement long term fixes. We hope to have some exciting announcements in that area soon.