Imagine the following: You work in the Infrastructure and Operations (I&O) department of a large retailer with a significant online e-commerce presence, and at noon today, a critical component of your infrastructure failed. While you scramble to find a solution, your company's Website that brings in tens of thousands of dollars a day is greeting all of your potential customers with an error message and the social networks are starting to buzz. But to make matters worse, today is not a normal day -- it's one of the highest volume days of the year.
This nightmarish scenario is an extreme example of downtime occurring at the worst possible moment for a business. But the truth of the matter is that there is never a good time for downtime, even planned downtime. As more and more employees become mobile and working- from-home policies relax, the concept of the 9 to 5 workday has eroded. Furthermore, as companies become more global, with employees, customers, partners, and suppliers spread across every time zone, it becomes increasingly difficult to schedule times when no one is affected.
Compounding the need for always-available services is the additional fact that your customers are rarely contained between the four walls of your organization. Today's IT departments are responsible for supporting two unique sets of customers with different sets of needs: internal employees and external customers, partners, and suppliers. But these constituencies are more similar than you think: Just as your internal employees increasingly expect to perform their jobs anytime and anywhere, your external stakeholders share the same expectations in their ability to purchase, receive support, or access your data and systems. Forrester refers to this concept as the extended enterprise because a business function is rarely, if ever, a self-contained workflow within the infrastructure confines of the company.
There is no "easy button" when it comes to running always-on, always-available services; a blend of a mature and stable process, people, and, of course, technologies are required. For companies that have matured their approach to high availability and disaster recovery to the point where they are one and the same -- a concept that Forrester refers to as business technology resiliency -- it has taken years of refining policies, adapting responses to downtime, and securing the appropriate levels of investment.
While you can't transform your organization overnight into an always-on, always-available enterprise, these three initial steps will get you on the right path:
Step 1: Understand the Costs of Downtime of Critical Services
Securing investment in the capabilities required to run an always-on, always-available enterprise can be difficult, especially if you don't know your hourly cost of downtime. Because it is such a complex task, Forrester finds that the majority of companies have not calculated the cost of downtime for their critical services. Although trying to calculate the impact of an outage on reputation and customer retention can be a daunting task, just calculating revenue losses or productivity losses can be a worthwhile exercise.
Remember that not all outages are created equal: Timing and duration have a significant impact on the costs of downtime. In the original example, the outage was perfectly timed to impact the largest number of potential customers and thus have the largest business impact. What if this outage occurred at 3 a.m. ET instead of noon ET? Or what if it happened on a different day? Or, what if, instead of the Website being down for 4 hours straight on a single day, it was down for 30 minutes on eight different days? Shorter duration outages tend to be less disruptive than longer ones. All of this must be taken into account when calculating the impact of an outage.
Don't try and tackle the entire infrastructure all at once; break down your calculations on a service-by-service basis, starting with the most critical business services. Understanding the costs of downtime will guide the appropriate level of investment in downtime prevention for these services.
Step 2: Focus Availability on the End-to-End Service, Not on Infrastructure Components
Many companies rigorously track server uptime and storage uptime, but few succeed in tracking a single service's uptime end to end, meaning from every infrastructure and software component that works together to deliver a single service. This, however, is the single most important thing that an IT department can track because it is the metric that gets closest to the actual customer experience. This is critical in "the age of the customer" where businesses compete and differentiate themselves on the experience of IT-enabled business processes and transactions more than ever.
Step 3: Match Business Objectives to the Right Mix of Technologies
Once you've calculated your cost of downtime and shifted your focus to end-to-end availability, the next step is to select the right technologies to support your critical services. While there are many technologies that can support the always-on, always available extended enterprise -- such as active-active architectures, rapid virtual machine rebooting, application and service monitoring solutions, or cloud-based services, the difficult part is finding an approach that simultaneously supports your availability objectives and also matches what the business is willing to pay to protect critical service. Many enterprises find it useful to group services or applications into tiers of criticality and assign standard recovery time objectives (RTOs) and recovery point objectives (RPOs) as well as service-level agreements (SLAs) for availability. Organizations can then map appropriate technologies to the tiers of criticality using the business requirement.
100 Percent Uptime Is Virtually Impossible
In the end, the goal of the always-on, always-available enterprise is not 100 percent uptime; rather it is 100 percent service continuity for your most critical services. While there are many companies that have gotten very close, sustaining true 100 percent uptime for any extended period of time is virtually impossible -- there are too many things that can go wrong, from the infrastructure to the applications to natural disasters, human error, or even planned maintenance.
Since some downtime is inevitable, it's important for you to shift your attitude from reacting to downtime toward proactive planning, good processes, and preventive efforts. You may not be able to achieve 100 percent uptime, but you can at least strive to make services available when your customers most need them and have rapid response measures in place to make sure services are brought back online as quickly as possible.
Rachel Dines is a senior analyst at Forrester Research, serving Infrastructure and Operations professionals. She will be speaking at Forrester's upcoming Infrastructure & Operations Forum, May 24-25, in Las Vegas.