How to become a cybersecurity RSO

Frederick Scholl August 21, 2019

What is an RSO? It is a “reliability seeking organization,” as described in Vanderbilt Professor Rangaraj Ramanujam’s book, Organizing for Reliability.

We tend to think of cybersecurity as black and white, breach or no breach. We often focus on architecture, threats and defenses. In fact, we should also be concerned with the reliability of the security program. Here we define reliability as including performance consistency and resiliency. “Fault tolerant” is another descriptive term.

Many types of organizations have already developed highly reliable business processes. Achieving such goals includes both strategy and execution. I contend that much can be learned from these organizations... and venturing outside the security bubble.

Security = Reliability

This blog post was first inspired by a lecture I heard in Nashville by Professor Ramanujam. The post is all about learning from outside the security silo.

The lecture itself was hosted by the Nashville Association of Contingency Planners (ACP) Chapter; the topic was High Reliability Organizations. Today’s security agenda now overlaps with groups such as ACP; think of ransomware attacks and DDOS attacks to name two. I subsequently followed up by reading Ramanujam’s book.

This post is a condensed version of the lecture and the book, as I think it applies to information security.

What we want from a good security program is business reliability. We want to stop unplanned work whether from breaches of confidentiality, data integrity or availability.

With digital data and technology integrated into every business process, the reliability of those business processes is totally dependent on the reliability of the information security program.

Many professionals have focused on security architectures, including technology, people and processes.

Some estimate that over 2,500 security startups exist, with more jumping into the field every week. But few such startups are focused on achieving high-reliability security.

One exception category is that of “Breach and attack simulation” tools that facilitate continuous testing of controls (Verodin, Cymulate, Safebreach and others). This post argues that we need to focus more on building reliable security programs.

The starting point is the concept of an HRO or High Reliability Organization. Such organizations are distinguished from RSOs, Reliability Seeking Organizations and average organizations not focused specifically on high reliability.

I believe most organizations today are not yet RSOs or HROs regarding cybersecurity. More on how to become a cybersecurity RSO or HRO later.

What is Reliability?

What is reliability in this context? That the system will not fail to do what is expected. There are two sides to the definition. First is the logic of anticipation. Have the appropriate controls and metrics been built in?

Second is the logic of resilience. Is the system capable of containing the results of a breach? And equally important, does the security program emerge from a breach stronger than before?

Unfortunately, breaches (and other failures) often lead to finger pointing, executive dismissals and everything else but real improvements.

What things are we looking for in a reliable security program?

First: performance consistency with low variance. Today’s focus is on periodic consistency based on annual or quarterly audits.

Second: intermediate events and near misses must be tracked. Too often, they are put on the bottom of the work queue for future investigation.

Third: resilience both after the breach event and before. Today’s definitions of security resilience emphasize responding to attacks only after they have hit the headlines.

A key learning from HRO research is the importance of a systems approach to reliability. While popular accounts of breaches tend to blame “the operator,” “the admin,” “the outsourcer” or “the CISO,” real incidents have many causes.

This point is made very effectively in Josephine Wolff’s book, You’ll See This Message When It Is Too Late. Wolff presents a blow by blow analysis of recent security breaches, illustrating clearly that each has many causes.

Graeme Payne’s recent book, The New Era of Cybersecurity Breaches, explains exactly what happened in the 2017 Equifax breach. Conventional wisdom is that the company failed to patch an Apache Struts instance.

Payne’s book documents how a failure to promptly forward one email message led to this incident. A reliable, fault-tolerant security system would not be dependent on one human quickly forwarding any message.

How to Build an HRO

Three processes are found to be successful in building HROs. None of these will be new to security practitioners. However, I hope that the evidence that these processes work (and how they work) in other contexts will move those processes higher on the list of priorities in the security community.

The three processes are continuous learning and improvement (Chapter 7 of Organizing for Reliability), compliance processes vs. risk-based processes and managing for high reliability.

Continuous learning is a key component to building a high-reliability process. HROs make use of the Disaster Incubation Model (DIM), which describes the six steps leading to disasters or reliability failures. Think of this as the risk management equivalent to the Kill-Chain.

The DIM model includes six steps:

1. Starting point

2. Incubation period

3. Precipitating event

4. Onset

5. Rescue and salvage

6. Full cultural adjustment

Continuous learning ideally takes place in the “incubation period” before any disaster event occurs. Key ideas here are vicarious learning (ISACs and ISAOs; peer intelligence services like smarthive.io) and learning from small failures.

I would be curious to know if Equifax’s patch management processes had experienced other gaps prior to the Apache Struts related patch disaster.

Rangaraj Ramanujam in his book Organizing for Reliability states this key concept: "...because near misses are generated by the same conditions that lead to large failures, if organizational decision-makers could identify and correct hazardous conditions through experiencing and learning from near misses, they may be able to reduce the likelihood that their organizations would experience major failures in the future."

Enough said on this point.

Compliance vs. Risk

Compliance vs. risk is a topic often discussed by cybersecurity leaders. The prevailing opinion is that while compliance requirements can help obtain budgets, risk analysis is necessary to build a secure organization.

This attitude helps security professionals keep their jobs AND obtain funding! Given the vast amounts of work needed to become “compliant,” it is easy to be fooled into thinking your organization is effectively managing risk.

Many of the industries profiled in Ramanujam’s book are highly regulated, like healthcare, nuclear, airlines and others. Many collective years and analysis of the effectiveness of regulations are present in these industries. What are the findings that we can apply to cybersecurity?

One is that a reliable system cannot be obtained by regulation. Regulators just do not have enough information. Government regulations also are too far behind the state of industry and end up being watered down in their creation.

The CapitalOne breach is a case in point. The banking industry is one of the most highly regulated, yet apparently, cloud-based third-party risks escaped the regulators’ purview.

Given that cybersecurity is regulated, what practices can we adopt from the experiences of HROs regarding compliance and regulation? The distinction between goal-focused regulation and error-focused regulation is an important concept.

Most compliance regimes focus on the former; i.e. meeting control objectives. However, organizations may benefit from enhancing their internal error-detection capabilities. Another applicable point relates to extended organizations.

In many cases managing the regulatory and reliability implications of the organization’s supply chain may be the biggest risk faced by the organization.

Managing for security has recently become a science. CISOs now present to the board and know not to be the department of “no” and to support business initiatives. But HROs and RSOs have been managing to high reliability for decades.

The “three lenses” view of organizations leads to three parallel paths toward building a cybersecurity HRO. If you fail to see through all three lenses, you will likely not achieve your goals.

The three lenses are:

1. Strategic design

2. Political

3. Cultural

Most CISOs with technical backgrounds will readily use the “strategic design” lens. This covers the organization of the CISOs team and the interface with business operations and IS operations.

The second lens is the “political” lens. Some CISOs may be less able or interested in seeing organizational security through this lens. The objective here is to seek alliances to meet security goals. The third lens is the “cultural.”

One of the challenges faced by many security leaders is how to transition your organization toward a better security culture. Many CISOs might say, “Things would be great if we only had a better security culture.”

Some will say we need a big security breach, then we will see changes to a better culture. The experience of RSOs shows this not to be true. Disasters may or may not help.

Finger-pointing may be the only outcome. The 30-day security sprint ordered after the OPM breach is an example of a non-productive response to a cybersecurity disaster.

From Theory to Practice

What can we learn from RSOs and HROs about improving culture and how can we apply it to cybersecurity culture? One idea is to maintain a library of breaches from your industry and use this information to mitigate against small errors that will show up in the “incubation period.”

If you don’t know the causes of specific breaches how will you set up an effective defense? "Those who cannot remember the past are condemned to repeat it," said George Santayana. I think he was referring to the practice of information security management.

This point is succinctly made by Roger Grimes in his book, A Data Driven Computer Defense.

Better employee training can also go a long way toward establishing a security culture. My “thumbs down” opinion on “awareness training” was expressed in Time to kill security awareness training.

Today we need to educate all employees toward a culture of risk management. More and more security attacks are simply riding on the normal business process itself (phishing, BEC, credential stuffing), as contrasted to specialized attacks on technology.

Micro-credentials for cybersecurity represent a new approach to helping all employees master the risk skills they need. The micro-credential is more focused than a full degree or a security certification like a CISSP.

Why is it valuable for information security? Simply because it can efficiently teach the user exactly what she needs to know about security and no more.

All security practitioners face the challenge of building a reliable program. Unfortunately, there is a lack of research on successful examples of making these transitions to a cybersecurity RSO or HRO.

Building a truly reliable and self-sustaining security program seems to be like starting a fire in the woods. All the security manager can do is provide kindling, wood, logs and air. A self-sustaining blaze starts when the exact right configuration is found.

To get that configuration, follow the management approaches outlined here... and be persistent.