The story of yet another cloud services meltdown is yet again in the news. This time it is Azure and, while started in a single region, is now affecting some of the global services:
https://www.theregister.co.uk/2018/09/04/thunderstruck_azure_backout/
For all the undeniable benefits that cloud services do provide, the problem, as I see it, is in the very nature of the automation at scale as well as in our inability, as a cloud customers, to affect necessary changes once disaster strikes.
Of course, the arguments could be made for multi-cloud implementations with third-party cloud brokers, but I am not buying those. We are not aware of all the dependencies these services carry. Case in point is the event with the 2016 Dyn cyberattack that affected broad swathes of the cloud-based services.
We do not control and frequently are not aware, who's services your IaaS or SaaS using and relying on to run their infrastructure. Furthermore, if they'll change these or establish new dependencies, we are not likely going to be notified.
So the real picture of the critical paths may differ widely from the one we have relied on for our design, implementation and operations.
I know, that we do not and cannot control everything. Even in conventional infrastructures, we have to rely on ISPs, CAs, Global DNS infrastructure, registrars, etc.., but we've used to been able to keep the internal systems working if not in primary, than in DR locations. We've used to rely on the P2P links with our peers for critical communications. This provided degree of resiliency and self-sufficiency that is rapidly diminishing in the age of IaaS, PaaS, SaaS and SD-WAN.
The Internet, with all of its problems, used to offered unprecedented degree of connectivity between widely different networks, whereas now, it is drifting inexorably towards an InterCloud model.
The greater the concentration of resources, the likelier is the event that he next vendor-specific issue will affect disproportionately large chunk of global services.
Perhaps we should consider building conventional datacenters as DR solutions to the cloud based services.
Much like keeping printed books in archives in addition to their much more convenient digital copies.
Take this post as either a a rant or rumination.
I'm interested in your take on this subject.
Well, any use of digital services only that grow together and allow promulgation of outages that are always connected risks a ‘great forgetting’, that is temporary or in some degrees more pemanant.
Limited Outages are pretty common for reasons mostly of networking, but I suppose if there was a big outage then everyone’s HA/DR measures kicking in all at over we could see a Borked everything via cascading failure scenario. If it would would happen during times of heightened tension, and it’s not outside the bounds of possibility that one side of another were to give it a push in the right place.
The cloud business model is very much like Coke/Pepsi bottling(with different ‘secret recipes’ love from MS/AWS) ultimately you’ll be drinking cold delicicious Equinix and similar from local providers as technical persuasion and delivery are separated, but if it’s all sitting on SPOFs delivered by two (or charitably three vendors if we include Google ) then it will all be susceptible, maybe we need Tencent, Alibaba or Huawei. Or perhaps legislation in different regions to make it ecumant on the operator to prove that all the required dependencies are resolved in the event of failure(never easy).
Yeah, you really want some non cloud dependent compute - perhaps in the future when everything runs on cloud and we’ve no economic way back you be able to have some cloud delivered to you in the form of a stack on a sealed unit(emergency ICE, just add electricity) as insurance for use if all the connected services fail.