Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 9

California Emergency Management Agency

State Continuity Planning Program

DISCUSSION PAPER Topic: Rating and Prioritizing an Organizations Functions for Continuity Planning

California Emergency Management Agency


State Continuity Planning Program

1.0 INTRODUCTION One of the greatest challenges in continuity planning is resolving the level of responsiveness that an organization should adopt as the performance standard for its plan. If it is too quick to respond, the costs of maintaining the response capability are excessive, relative to the value at stake if a disruption occurs. It may also react too often, and that can drive up the costs as well. If it is too slow to respond, then the very existence of the organization may be threatened (for non-government organizations), and in many cases the careers of its executives may be shortened abruptly. More specifically, how should an organization determine the response level that is appropriate for any function or major operation that it supports, and how can it assure consistency across all operations? These are not easy questions to answer, but the goal of this discussion paper is to provide some guiding principles and some practical techniques for resolving some of these issues. 2.0 THE CONTEXT AND GOAL OF THE ANALYSIS

The context for addressing these questions is that an organization is contemplating how it can plan, in advance, to recover and resume its most time-critical and valuable operations, should they be disrupted by any number of possible risk scenarios. The planning for emergency responses in the immediate aftermath of a disruption, to protect and save lives, minimize personal injuries, and reduce or minimize damages to property is a different planning activity and is not contemplated directly here. Activation of a continuity plan typically commences after the proverbial smoke has cleared and the dust settled, and senior executives have conducted a situation analysis or assessment. This context also recognizes that planning for continuity usually cannot address all of an organizations activities. Some activities must be judged to be more important than others, and some will not be covered by continuity plans because the values at risk do not justify the costs of coverage. If a disruption occurs, the design of a continuity plan anticipates that senior management and executives will form a senior activation team that: Assures itself that the most critical operations will be supported well by an activated continuity plan, and if not, they will decide on the fly how to recover these operations, probably via modifications to the processes outlined in the continuity plan; Initiates pro-active oversight of all operations that are NOT covered by a continuity plan, so that any disruptions to these functions do not distract the continuity plan teams efforts. In fact, the senior activation team may re-direct resources from non-

California Emergency Management Agency


State Continuity Planning Program

critical operations to those covered by a continuity plan to expedite their recovery; and Given the circumstances of the disruption, the senior activation team may decide to give higher priority to operations that were not originally considered to be critical, and are not covered by a continuity plan, but these efforts will necessarily be initiated on the fly without any advance thought or preparation.

Another context is how senior executives prioritize the organizations efforts when a disruption occurs. The role of executives in a crisis is to direct the organizations resources to the best of their ability, employing the information at hand during the crisis. A continuity plan should influence these decisions. The plan should offer a priority, i.e., the management and staffs best assessment of what operations are most important. The situation that is addressed here is continuity planning in advance of a crisis: what operations are sufficiently important that, to avoid or minimize disruptions to them if a crisis occurs, a plan should be constructed that reduces their exposure to operating risks and assures their rapid recovery and resumption in response to a variety of possible disruption scenarios. This discussion will begin with a simple technique to address this question and then consider some of the issues that complicate decision-making. A continuity planning project manager can decide how much simplicity or complexity is appropriate for its planning team. 3.0 A SIMPLE APPROACH

One method employed by many planners, at least in earlier cycles of developing plans, is to adopt a 24 hour or one business day recovery time objective, or RTO. The goal is to encourage all managers to examine their operations and identify those functions that, if they are disrupted for more than one business day, will cause serious harm. If so, they are labeled essential, a term that implies for continuity planners both high value-add and a time-critical requirement for performance. It can be helpful in this labeling or classification exercise to focus on one concept at a time; in this case, harm created if the operation does not occur as planned for more than a day. The as planned is important, because most organizations do not operate 24 hours a day, and few operate on weekends. Most have no difficulty taking three day weekends, about ten times a year, for holidays. Ceasing operation for 24 hours is not the strict criterion for the essential category: it is disruption of operations as they would normally occur that is the criterion.

California Emergency Management Agency


State Continuity Planning Program

4.0

MEASURING HARM

A second dimension of this simple labeling exercise is deciding how to measure harm. Again, an easy approach would count on managements judgment to assess harm via several alternative categories. For government organizations, these include: Increased risks or threats to public safety and security; Loss of trust or respect by the public; Increased risk of civil disobedience; Increased threats to the economic or social welfare of the public; and Increased stress or duress to individuals who are at risk, such as the elderly, children, the sick, or those who are incarcerated.

One should note that all of these criteria focus on service to the public: that is, to external customers (presuming the planning organization is a government agency). Sometimes another criterion is added a concern for the welfare and morale of the organizations own staff. This is potentially a complicated factor, because it suggests that continuity planning should be done, sometimes at great expense, to assure that no disruptions occur to the livelihoods of the staff (and to the vendors and contractors who support the governments operations). For simplicity, its better to formulate plan requirements on the basis of those whom the organization serves. A criterion that helps to focus the assessment of potential harm more sharply, but uses the five dimensions listed above, is whether disruption of the operations causes work to become backlogged or lost. Backlogged work implies that a disrupted service can be delivered eventually, so it is more a matter of delays in delivery; whereas lost work are services that will never happen. This criterion is not always easy to apply in examining operations; therefore, it is not as simple a judgment call as proposed above. For example, a service that provides weekly welfare check payments that is disrupted for one week: if a check a week later is for double the amount, is harm avoided? If the recipient uses the check to pay for food, probably not; if it is used to pay for rent, perhaps yes. 5.0 INTER-AGENCY OR INTRA-AGENCY SERVICES: INDEPENDENT JUDGMENTS

When a continuity planning team is canvassing its operations to identify those operations that are essential, they will encounter some operations whose customers or beneficiaries of its services are other government operations. Examples are payroll, accounts payable, computer services (including e-mail), and communication and networking services. How should they decide whether their operations cause great harm if they are disrupted for more than 24 hours?
4

California Emergency Management Agency


State Continuity Planning Program

The simple answer is to require operations that serve the public directly to determine first what the RTO is for their operations. Then, as those organizational units responsible for direct public contact consider the minimum resources needed to restore their operations, they will produce the recovery time criteria for the supporting functions, such as the ability to communicate via e-mail. 6.0 THE RISK OF RANK-ORDERING FUNCTIONS

Some continuity planning managers approach the process of prioritizing by asking representatives of divisions or branches who are participating in the planning to rank order the functions that are performed in their business unit. In other words, they should identify a single function that is the most important, another function that is the second most important, and so forth. The project planning team then seeks to develop recovery strategies for all the top ranked functions. This approach can lead to poor analysis due to a number of factors. First, some operating units may have several functions that increase the use and value of other units functions and are time-critical, or none at all, reflecting the nature of their work within the organization. For example, a division or branch that is responsible for emergency response and coordination will have many essential functions. Alternatively, a division or branch that is responsible for long term planning will probably have none. Second, the number of top ranked functions could reflect the participation rate in the planning process. If six out of ten divisions participate, there will be six top-ranked functions. However, if one division is represented by individuals from three branches, then it will report three top-ranked functions. Ranking can some times reflect the biases of the participants, rather than the nature of the goods or services in question. For this reason if a Rank-Ordering method is to be used, close coordination, analysis, and monitoring of internal participation will be required to ensure the validity of the resulting data. 7.0 CONTEMPLATING SOLUTIONS FOR RECOVERY STRATEGIES

After developing a list of essential functions, the next stage in the planning process require a review of the operating risk environment via contemplation of a few basic disruption scenarios. With these risks in mind, the planners seek to identify mitigation efforts to reduce or eliminate the risks or their consequences, and to devise recovery strategies for those risks that remain. The development of recovery strategies calls for ingenuity and creativity, because recovery strategies can assume many different forms. Examination of this activity is beyond the scope of this discussion. But one can anticipate several outcomes from the effort to devise or identify recovery strategies:

California Emergency Management Agency


State Continuity Planning Program

Some recovery strategies will be very easy to accept because they are highly effective, simple to implement during a disruption, and inexpensive to maintain; Some recovery strategies will be very expensive - in general, strategies to satisfy lower RTO values cost much more than strategies that satisfy higher or longer recovery time criteria; For some essential functions, no remotely feasible or acceptable recovery strategy can be identified that satisfies the RTO requirement; and Some recovery strategies actually bundle several functions and may even include some non-essential functions.

The overall conclusion from these observations is that, in spite of the planners best efforts to analyze the needs of the organization to resume some disrupted functions or operations more quickly than others; the continuity plans results are far from consistent. Some vulnerabilities will be evident, in the sense that recovery criteria are not met for some functions. The situation may also arise where an organization delivers many services via field offices, where a few of those services are essential and require a recovery plan for field offices. As a result, other nonessential services provided via the field offices will also be recovered quickly. 8.0 MORE ANALYSIS OF FUNCTION REQUIREMENTS

For those functions that planners initially concurred were essential but no feasible recovery strategy could be identified, several additional steps might be taken to address the vulnerability: More analysis of the additional value associated with the function can be performed, to quantify who is harmed and how much. This analysis is called a business impact assessment or business impact analysis (BIA). This work helps to refine the understanding of how much might be spent to create an effective recovery strategy if the initial costs appear to be too high. One possible outcome of a BIA is greater understanding of how the public benefits from the services delivered, and it may be possible to identify subsets of consumers or beneficiaries who have more or less dependency on the service. In short, the scale of operations to be recovered may be reduced, with an associated reduction in the cost of the recovery strategy. The operations associated with an essential function may be re-engineered, often called business process improvements, to re-structure the operations so that the most timecritical and highest value-add activities can be resumed more easily or cost-effectively.

California Emergency Management Agency


State Continuity Planning Program

A BIA typically will introduce several categories for measuring harm or the costs associated with disruptions, and it will encourage a more careful scrutiny of initial assessments. Some basic categories for measuring disruption costs include: The costs associated with transferring operations to another site, and occupying that site while the original location is being repaired (or a replacement identified); The loss of revenues associated with services or goods that are not delivered; Costs associated with making amends to a customer constituency because services were disrupted; Additional costs of production, such as hiring temporary workers to help complete backlogged work or paying employees or contractors for overtime; Costs associated with long term imbalances between production capacity and the demand for services. After service operations have been restored, if the customer base does not return, then the organization incurs costs for capacity that is under-utilized.

A BIA will also reveal effects of disruptions that may be positive: for example, revenue streams may not cease even though the customers who normally receive services are not receiving them. Or, if an organizations payment processes are disrupted, then additional interest may accrue to operating reserves until the payment processes are resumed. 9.0 ANOTHER LOOK AT THE RTO

This discussion concludes with a look at the RTO criteria for establishing whether functions are essential for continuity planning purposes. As promised earlier, the initial focus was on simple approaches. Now, consider some of the complications that can arise. If a 24-hour RTO value is adopted, functions that add value but are less time-critical than 24 hours will be recoverable without any pre-planning. One way or another, so the thinking goes, they will be recovered, no matter the cause of the disruption, without any advance preparation. Clearly, this analysis can be flawed. Some disrupted functions may not cause any harm for the first five, ten, or fifteen business days that they are down. However, great harm occurs on the sixth, eleventh, or sixteenth day of no operations, and they cannot be recovered within an acceptable time period without advance preparation. In other words, the critical path for the fastest recovery option is longer than the RTO for a function. This consideration implies that the decision to include a function that adds value (or high harm if disrupted) in a continuity plan cannot be made without considering recovery options because adequate ad hoc recovery options may not exist regardless of the RTO value.

California Emergency Management Agency


State Continuity Planning Program

This situation is more likely to arise with highly complex administrative processes, such as functions that involve extensive computer-based data processing. Transferring these operations from one site to another and re-establishing the network communications to assure adequate input and output capacity is not a trivial task. One need only consider all the planning that goes into a normal office move, with months of planning. A continuity plan may specify a comparable move with only a few hours advance notice. A second consideration in using RTO values as a criterion for categorizing functions is that a continuity plan is rarely activated the moment a disruption occurs. The RTO is defined as the maximum tolerable time that an operation can be disrupted. In some cases, the seriousness of the situation is not readily apparent for some time. Precious hours may be lost as the senior executive team delays to get clarification. When a power outage occurs, for example, it can take considerable time to determine the cause and how long it may take to remedy the situation. The answer may not be forthcoming for hours and the answer may change several times until, days later, power is restored reliably and a final answer is known. This view emphasizes two points: 1. During the design and development of a continuity plan, recovery strategies for essential functions that create great harm if their RTO values are exceeded must accommodate the possibility that the decision to activate may not occur immediately after a disruption is detected; 2. If a disruption or crisis has occurred and the senior executive team is contemplating prioritizing recovery of disrupted operations, using RTO values for prioritizing is not the best criterion. Some consideration should be given to the critical path requirements of the recovery strategies. Suppose two operations of comparable seriousness both have 36 hour RTOs. The activation decision is made 6 hours after a disruption occurs. One operation requires two hours to restore and the other requires 24 hours. The operation with the least slack time (the second one with six hours) should receive the higher priority. A third complicating consideration in categorizing functions for inclusion in a continuity plan recognizes that some operations deliver value in cycles instead of steadily. An operation may have an essential function that adds value that requires three days to perform, but the function may only be performed once a month. Therefore, it is only essential for three days a month. The typical convention in continuity planning is to define a function as essential if a disruption under the worst case imaginable can cause serious harm, i.e., the disruption occurs during the three day production period in this example. 10.0 CONTINUITY PLANNING: MANAGEMENT UNDER EXCEPTIONAL CIRCUMSTANCES For many operations, especially those that routinely experience specific types of disruptions that threaten their functions that add value, organizations often have already put in place
8

California Emergency Management Agency


State Continuity Planning Program

readily implemented recovery strategies as a matter of good business practice. For example, computer server crashes that formerly caused much disruption are now addressed routinely in many organizations. Yet the recovery strategies employed may address only a narrow range of disruption possibilities, and the organization remains vulnerable to loss of servers from more catastrophic events. The goal of continuity planning is to identify those operations that add value and, if disrupted from their normal schedule of performance via a variety of possibilities, can cause serious harm to the organizations customer base, the public at large, or to the organization itself. If a disruption occurs, at one extreme in the executives toolbox are standard operating procedures for recovering, and these circumstances need only a routine mention in a continuity plan. At another extreme are operations that enjoy sufficient robustness in the nature of their operating structure, and the demands on their output, that recovery strategies can be formulated on the fly with adequate time for resumption, and no special provisions are necessary. Finally, there are those operations that add value and have sufficiently short RTOs that if no provisions are made in advance for their recovery, executive management must operate in severe crisis management mode. These are all candidates for a continuity plan, and the latter ones may receive high priority for funding of better recovery capabilities in out years.

You might also like