In my previous blog posting I defined incident response, described why it is so important, and then presented six of the most serious mistakes in incident response that people and organizations too often make. In this posting I'll cover the rest of these mistakes.
• Failure to systematically respond to incidents. Systematically responding to incidents is one of the most critical factors in success incident response efforts because it involves keeping on track and following procedures based on proven methodologies. Too often, however, incident response efforts fail to develop a suitable methodology (such as the widely used PDCERF methodology that Russ Shumway and I describe in our book on incident response), or if the do, they do not sufficiently train incident response staff to follow procedures based on the methodology they use. A lack of detailed procedures that results in incident response staff having to use ad hoc judgment instead is another potential cause. A variety of undesirable consequences such as failing to perform critical steps can result.
• Neglecting to adequately test procedures. Untested procedures are a time bomb waiting to go off. Inaccuracies and omissions need to be identified and corrected well in advance of their being used when incidents occur. Additionally, testing procedures provides training for incident response team members who need to know what and how to proficiently perform each procedural step under each of many conditions and developments that occur during incidents. Without sufficient practice, team members are likely to respond to incidents more slowly and with more errors.
• Failing to work cooperatively with other groups/organizations. Incident response almost always requires cooperation among individuals, groups and organizations. Information security and HR staff need to cooperate, for example, if a data security breach involving HR servers or an insider attack has occurred. Information security and public relations staff members need to cooperate whenever an incident is so big and/or has so much impact that the press is likely to learn of it and make news releases about it. Cooperation between information security staff and law enforcement must also sometimes occur. Lack of internal and external cooperation has several negative consequences--prolonging the duration of incidents (something that escalates the financial costs of incidents), unnecessary duplication of effort, unnecessary barriers that obstruct incident response efforts, and so on.
• Communication deficiencies. Communication is one of the most essential elements in incident response, but too often communication during incidents is deficient. Procedures defining what type of information must be communicated to whom need to be written, but in many cases incident response procedures do not address critical communication requirements. One of the most important parts of the incident response communication process is escalating critical information to executive-level management. Blindsiding management often results in very negative consequences. Interfacing with human relations, legal and public relations is also too often overlooked. Communication links between an incident response team and major stakeholders is frequently not established, too. Additionally, out-of-band communication channels must be planned and tested in case primary communication channels fail.
• Lack of top-level management support. All authority in organizations ultimately traces to top-level management. Lack of support at this level often leads to failure of an incident response effort, frequently because of lack of allocation of financial and labor resources because management does not understand the importance of incident response. Lack of understanding and support can result from a lack of communication with executive-level management, failing to incorporate top-level management's intent and direction into incident response requirements, policy and procedures, and failing to educate executive-level management concerning the business need for having an efficient incident response capability.
• Lack of technical expertise within an incident response team. Many of today's incidents are extremely complex. Malicious code written by dozens of software development experts and generously funded by governments around the world is not at all uncommon any more. Handling today's incidents requires extremely high levels of technical expertise to the point that good technical expertise is insufficient. Lack of the necessary level of technical expertise results in a variety of negative outcomes, including failure to identify incidents, misdiagnosed incidents, incidents in which systems are not completely cleaned of malware, leading to unnecessary prolongation of the incidents, incidents that are handled incorrectly, causing damage and disruption to proliferate, and more.
• Lack of forensics training. Computer forensics and incident response are starting to go hand-in-hand. Every member of an incident response team needs to have at least some forensics training, and a few members need to be experts in this area. A dearth of forensics skills results in problems such as missing or mishandled evidence, incomplete or inaccurate analysis of systems and data, legal woes such as having potential evidence thrown out of court, and more.
• Failing to integrate efforts sufficiently with business continuity/disaster recovery efforts. Incident response and business continuity/disaster recovery have much in common. Lack of integration can result in incidents not being handled very well. For example, in denial of service (DoS) attacks, having business continuity staff involved brings a certain set of skills and expertise in dealing with operational disruptions that incident response team members are not likely to possess. Lack of integration between incident response and business continuity will also create barriers that are likely to result in business continuity not being involved in handling DoS attacks, or if they are involved, having them available only well after each incident has been detected. Without integration, unmitigated risk that falls between incident response and business continuity risk mitigation efforts is also likely to be present.
• Failing to determine, quantitize and communicate benefits of incident response to executive-level management. Executive-level management tends to be results-oriented. If the head of an effort such as an incident response effort does not show tangible results (particular in terms of cost savings), executive management is likely to view that effort as a failure. Developing, measuring and communicating incident response-related metrics are thus essential A few key incident response metrics include the amount of financial loss per security breach, the latency in responding to each critical security breach, the amount of business-disruptive down time per each security breach, the number of hours of labor per type of security breach (low, medium and high impact). Each of these metrics should diminish in numerical value as the proficiency of an incident response effort grows in time.
• Failure to consider legal aspects of incident handling. Many legal considerations apply to incident response efforts. Failure to conform to legal requirements can cause considerable undesirable fallout and even violation of laws leading to fines or even jail terms. For example, making a forensics backup of a hard drive of a banking transaction system and leaving it on one's desk over the lunch hour is a violation of the Gramm-Leach-Bliley Act. Additionally, inattention to legal requirements can result in inability to convict computer criminals. An example is failing to collect critical evidence needed to convict perpetrators or tainting evidence such that it is ruled invalid in a court case.
• Failure to take a proactive stance. Incident response is by definition reactive in nature, but without a strong proactive focus, incident response efforts languish. Incident response managers need to think two or three years ahead in terms of what kind of training, software, and hardware will be needed and how incident response-related policies and procedures will need to be changed. Incident response team members need to anticipate attacks and incidents that are likely to occur in the future and how to deal with them. Without a proactive focus, gaps between actual and needed incident response capabilities are likely to grow.
• Failing to use automation to facilitate the incident response process. Lack of incident response automation is, unfortunately, commonplace in organizations over the world. Today's incidents are generally too complex to handle without automation. Reverse engineering and forensics efforts now typically require at least some degree of automation. Keeping track of and archiving what often amounts to volumes of information is a voluminous task without automation. Keeping those who need to know in the loop in a timely manner is imperative; automated incident updates help considerable in this endeavor. Remembering what to do next and how to do it is also frequently a major challenge. Automation, such as the incident response facilitation that some of the best SIEM tools offer, has thus become a necessity.
In many ways, proficiently responding to incidents is the ultimate information security challenge. As mentioned previously, today's incidents have become extremely sophisticated, and attackers are more persistent in their efforts now than ever before. The number of potential serious mistakes in incident handling is enormous--only a few of the worst ones have been presented here. Fortunately, most of the mistakes discussed in this blog entry are not all that difficult to correct. For example, communications deficiencies can be fixed through changes in procedures and establishing (or improving) communications links with other groups and functions that need to be involved in the incident handling process. But above all, having a proactive focus is likely to improve an incident response effort more than anything else. In incident response, nothing is more important than planning and testing. A truly proactive incident response effort is likely to not be characterized by the types of mistakes that have been discussed here.