Rapid Implementation COVID-19 Solutions Available

After the Panic, Between Outages

Oct 4, 2013

Back to Veoci BlogAfter the Panic, Between Outages

These days, network and system outages are mainstream news headlines.  When IT infrastructures go down, so do all the sites and applications that depend on them, and a ripple of annoyance turns into frustration, disgruntlement and lost productivity.  Meanwhile, engineers, PR officials, and service providers all scramble to contain the damage and fix the problem and its effects, often heading into full-on panic mode when repairs take too long or introduce new problems.

Once the dust settles, mitigation and remediation starts, with the goal of preventing the same root cause from ever happening again, and introducing new processes to try to keep things from degenerating into ad-hoc responses and eventual panic. The problem is, though, that it's impossible to predict the future, and while it's always a smart thing to prevent something from happening again, it's the unknown emergencies that are the norm.  No plan or process will be able to fully address every new point that pops up during an unpredicted incident. So what should be done after an emergency to increase the odds of a more successful response in the future, for whatever may happen?  Here are some tips... Get some experienced Ops people.  There are plenty of people who have gone through a decade of system errors and failures, and have the experience and temperament to handle these stressful outages. The caveat is that not everyone is talented at everything, and here we are talking about people who can make good decisions with limited information - NOT software developers. Knowing code doesn't make you valuable during outages. Patton was a good general during wars, but he wasn't a peacetime leader.  He excelled in the moment. During an outage - you need Pattons.  Responses should be in minutes, not hours.  Make sure that you have the right infrastructure, availability and communications tools to be able to mobilize your teams immediately. Don't just reboot.  System restarts are a knee jerk reaction, useful lots of the time, but they encourage a "ctrl-alt-del" mentality that can cloud out other considerations as to how to initially address the problem. Continuously flesh out the FMEA and use it to record history.  While the specifics of any given outage will differ with each incident, there are indeed general categories of causes and likely scenarios.  For instance, master-slave connection breaks have been around for years in a whole variety of situations. It's hard to predict the future, but this kind of issue will happen at some point.  Even if you've never gone through it before, make sure you've done your homework and know what kinds of emergencies you'll encounter. Be very careful of "improving" reliability.  A lot of the time, additional complexities have the opposite, negative effect. Gather knowledge from beyond IT outages, especially different disaster response fields. For example, plan to structure responses in a way that allows for flexibility in command and final decision-making; during an outage you need an incident commander - what he/she says is the final word. It won't be the CEO or CTO or other big shot. It'll be a person on the ground. Maintain excellence in your customer outreach.  You win more by fixing something that goes wrong than downplaying what went wrong. The next outage will be different and it will be a surprise, but it can be handled positively. You need a plan and you need to be able to launch it quickly when the next outage happens. That plan is about people being able to make informed on-the-spot decisions via well-thought out, flexible processes - not technology. -Dr. Sukh Grewal, CEO, Grey Wall Software LLC, developers of Veoci

Subscribe to the Veoci Blog

Receive all the latest emergency, crisis, and continuity management news, tips, and advice

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Related Posts

The CCPA, Incident Response, and Business Continuity

There are notable differences between the two laws, however, and those differences are worth a discussion. In a past blog, we explored the impact of the GDPR on crisis management and business continuity, so we’re going to do something similar for the CCPA. What does the CCPA mean for incident response and business continuity?

Continue reading
Implementing and Managing Business Continuity Programs in Large and Complex Institutions: A Discussion with MIT, NYU and VCU

The business continuity and emergency management managers of MIT, NYU, and VCU came together for a panel discussion on the challenges—and solutions—of being the engines behind these vital programs.

Continue reading
Why Business Continuity Planning is More Important than Ever Before

Risks today are increasingly interconnected, and the future forecasts a tighter bond will form between them. And thanks to the modern world’s web of risks, businesses and organizations can never be sure which dominoes will fall when an incident kicks off. Preparation, through business continuity planning, is essential for any entity hoping to have a lasting impact.

Continue reading

Connect with us on Social Media

Join us on our journey to improve emergency, operations, and continuity management!

Veoci Facebook PageVeoci Twitter AccountVeoci Linkedin Company Page

Face crisis and continuity challenges with expert solutions designed for you and your teams.

Learn how Veoci puts you in control