cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
dcontesti
Community Champion

Fastly outage brings down major websites around the world - including the UK government site

6 Replies
JKWiniger
Community Champion

All I have heard on this is "The US-based company then confirmed it had found the issue." OK, so what was the issue? Wouldn't it be responsible to disclose the issue so others can make sure they do not have they same issue to effect them? I mean if the issue was found and resolved where would the problem be with disclosing, unless there is something that they are not saying.

 

Anyone know what the issue was?

 

John-

tmekelburg1
Community Champion

Summary of June 8 outage | Fastly

Fastly Says Internet Outage Was Caused By One Customer Changing A Setting : NPR

 

"We experienced a global outage due to an undiscovered software bug that surfaced on June 8 when it was triggered by a valid customer configuration change. We detected the disruption within one minute, then identified and isolated the cause, and disabled the configuration. Within 49 minutes, 95% of our network was operating as normal."

denbesten
Community Champion

Was just coming here to post the same update.

 

The interesting part to me is how this highlights a SAAS risk..... your fate is not just at the hands of the SAAS provider, but also potentially their other customers:

 

 “Early June 8, a [an unidentified] customer pushed a valid configuration change that included the specific circumstances that triggered the bug, which caused 85% of our network to return errors. [cite]

I had long realized the risk, but there is nothing like a real-world example hitting mass media to forestall the naysayers.

tmekelburg1
Community Champion


@denbesten wrote:

I had long realized the risk, but there is nothing like a real-world example hitting mass media to forestall the naysayers.

And then what to do with that risk? Does it make sense for the SaaS providers to mitigate by having a back up CDN provider that could be spun up quickly or always available for emergency traffic bursting? Or accept the risk that sometimes these things will happen? 

JKWiniger
Community Champion

I can't remember the exactly details with now but did Netflix and a few other suffer just this type of outage before and the effected companies seemed to have built redundancy in their systems after it happened. They probably split the load between 2 CDN providers so that if one had an issue the handled the extra load. Wish I could remember the details...

 

John-

denbesten
Community Champion


@tmekelburg1 wrote:


And then what to do with that risk? Does it make sense for the SaaS providers to mitigate by having a back up CDN provider that could be spun up quickly or always available for emergency traffic bursting? Or accept the risk that sometimes these things will happen? 



yep, accepting and mitigating are two of the three approaches for managing risk.  The important thing here is that when purchasing a cloud service, don't think of it as eliminating the risk that your own data center could fail; instead, one needs to think of it as replacing it with the risk that the provider's data center could fail.  So one still needs to do the analysis and decide between paying for contingency or accepting losses upon failure.