Recognized as a Leader on Gartner Magic Quadrant for BCMP Solutions, Worldwide

After the Panic, Between Outages

Oct 4, 2013

Back to Veoci BlogAfter the Panic, Between Outages

These days, network and system outages are mainstream news headlines.  When IT infrastructures go down, so do all the sites and applications that depend on them, and a ripple of annoyance turns into frustration, disgruntlement and lost productivity.  Meanwhile, engineers, PR officials, and service providers all scramble to contain the damage and fix the problem and its effects, often heading into full-on panic mode when repairs take too long or introduce new problems.

Once the dust settles, mitigation and remediation starts, with the goal of preventing the same root cause from ever happening again, and introducing new processes to try to keep things from degenerating into ad-hoc responses and eventual panic. The problem is, though, that it's impossible to predict the future, and while it's always a smart thing to prevent something from happening again, it's the unknown emergencies that are the norm.  No plan or process will be able to fully address every new point that pops up during an unpredicted incident. So what should be done after an emergency to increase the odds of a more successful response in the future, for whatever may happen?  Here are some tips... Get some experienced Ops people.  There are plenty of people who have gone through a decade of system errors and failures, and have the experience and temperament to handle these stressful outages. The caveat is that not everyone is talented at everything, and here we are talking about people who can make good decisions with limited information - NOT software developers. Knowing code doesn't make you valuable during outages. Patton was a good general during wars, but he wasn't a peacetime leader.  He excelled in the moment. During an outage - you need Pattons.  Responses should be in minutes, not hours.  Make sure that you have the right infrastructure, availability and communications tools to be able to mobilize your teams immediately. Don't just reboot.  System restarts are a knee jerk reaction, useful lots of the time, but they encourage a "ctrl-alt-del" mentality that can cloud out other considerations as to how to initially address the problem. Continuously flesh out the FMEA and use it to record history.  While the specifics of any given outage will differ with each incident, there are indeed general categories of causes and likely scenarios.  For instance, master-slave connection breaks have been around for years in a whole variety of situations. It's hard to predict the future, but this kind of issue will happen at some point.  Even if you've never gone through it before, make sure you've done your homework and know what kinds of emergencies you'll encounter. Be very careful of "improving" reliability.  A lot of the time, additional complexities have the opposite, negative effect. Gather knowledge from beyond IT outages, especially different disaster response fields. For example, plan to structure responses in a way that allows for flexibility in command and final decision-making; during an outage you need an incident commander - what he/she says is the final word. It won't be the CEO or CTO or other big shot. It'll be a person on the ground. Maintain excellence in your customer outreach.  You win more by fixing something that goes wrong than downplaying what went wrong. The next outage will be different and it will be a surprise, but it can be handled positively. You need a plan and you need to be able to launch it quickly when the next outage happens. That plan is about people being able to make informed on-the-spot decisions via well-thought out, flexible processes - not technology. -Dr. Sukh Grewal, CEO, Grey Wall Software LLC, developers of Veoci

Subscribe to the Veoci Blog

Receive all the latest emergency, crisis, and continuity management news, tips, and advice

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Related Posts

ITDR Communications: Lessons from a Middleware Bug

Anyone with the will and skill could crack the bug, which spelled serious danger for MuleSoft’s users and those users’ customers. MuleSoft needed to patch the bug immediately. And they did, all while making their customers aware of an issue that could’ve flown under the radar.

Continue reading
The “Magic” of Veoci

As the IT market is replete with contenders, it’s important for companies to stand out from the crowd - getting named by Gartner in the Magic Quadrant is a big boost.

Continue reading
Business Continuity vs ITDR: What are the Differences?

In the past, you may have heard the terms “business continuity” and “disaster recovery” used in conjunction, or even interchangeably, but what do they really mean? You probably won’t be surprised to discover that they have many similar goals when it comes to recovering from an unplanned incident and restoring essential functions, but it is the nuances of their differences that are really crucial to understand.

Continue reading

Connect with us on Social Media

Join us on our journey to improve emergency, operations, and continuity management!

Veoci Facebook PageVeoci Twitter AccountVeoci Linkedin Company Page

Face crisis and continuity challenges with expert solutions designed for you and your teams.

Learn how Veoci puts you in control