Hype, Logical Fallacies, and SCADA Cyber Attacks

May 25, 2016

For a few years now I’ve spent some energy focusing on calling out hyped up stories of threats to critical infrastructure and dispelling false stories about cyber attacks. There have been a lot of reasons for this including trying to make sure that the investments we make in the community go against real threats and not fake ones. This helps ensure we identify the best solutions for our problems. One of the chief reasons though is that as an educator both as an Adjunct Lecturer in the graduate program at Utica College and as a Certified Instructor at the SANS Institute I have found the effects of false stories to be far reaching. Ideally, most hype will never make it into serious policy or security debates (unfortunately some does though). But it does miseducate many individuals entering this field. It hampers their learning when their goals are often just to grow themselves and help others – and I take offense to that. In this blog I want to focus on a new article that came out on the blog at the Huffington Post titled “The Growing Threat of Cyber-Attacks on Critical Infrastructure” by Daniel Wagner, CEO of Country Risk Solutions.  I don’t want to focus on deriding the story though and instead use the story to highlight a number of informal logical fallacies. Being critical of information presented as facts without supporting evidence is a critical skill for anyone in this field. Using case-studies such as this are exceptionally important to help educate on what to be careful to avoid.

Misinformation

Mr. Wagner’s article starts off simply enough with the premise that cyber attacks are often unreported or under-reported leading the public to not fully appreciating the scope of the threat. I believe this to be very true and a keen observation. However, the article then uses a series of case studies each with factual errors as well as conjecture stated as facts. Before examining the fallacies let’s look at one of the first claims which pertains to the cyber attack on the Ukrainian power grid in December of 2015:

“It now seems clear, given the degree of sophistication of the intrusion, that the attackers could have rendered the system permanently inoperable.”

It is true that the attackers showed sophistication in their coordination and ability to carry out a well-planned operation. A full report on the attacker methodology can be found here. However, there is no proof that the attackers could have rendered that portion of the power grid permanently inoperable. In the two cases where an intentional cyber attack caused physical damage to the systems involved, the German steel works attack and the Stuxnet attack on Natanz, both systems were recoverable. This type of physical damage is definitely concerning but the attackers did not display the sophistication to achieve that type of attack and even if they had there is no evidence to show that the system would be permanently inoperable. It is an improbable scenario and would need serious proof to support that claim.

Informal Logical Fallacies

The next claim though is the most egregious and contains a number of informal logical fallacies that we can use as educational material.

“The Ukraine example was hardly the first cyber-attack on a SCADA system. Perhaps the best known previous example occurred in 2003, though at the time it was publicly attributed to a downed power line, rather than a cyber-attack (the U.S. government had decided that the ‘public’ was not yet prepared to learn about such cyber-attacks). The Northeast (U.S.) blackout that year caused 11 deaths and an estimated $6 billion in economic damages, having disrupted power over a wide area for at least two days. Never before (or since) had a ‘downed power line’ apparently resulted in such a devastating impact.”

This claim led to EENews reporter Black Sobczak to call out the article on Twitter which brought it to my attention. I questioned the author (more on that below) but first let’s dissect this claim as there are multiple fallacies here.

First, the author claims that the 2003 blackout was caused by a cyber attack. This is contrary to what is known currently about the outage and is contrary to the official findings of the investigators which may be read here. What Daniel Wagner has done here is a great example of Onus probandi also known as “burden of proof” fallacy. The type of claim that is made is most certainly not common knowledge and is contrary to what is known about the event. So the claimer should provide proof. Yet, the author does not which puts the burden of finding the proof on the reader and more specifically anyone who would disagree with the claim including the authors of the official investigation report.

Second, Daniel Wagner states that the U.S. government knew the truth of the attack and decided that the public was not ready to learn about such attacks. He states this as a fact again without proof but there’s another type of fallacy that can apply here called the historian’s fallacy. In essence, Mr. Wagner obviously believes that a cyber attack was responsible for the 2003 blackouts. Therefore, it is absurd to him that the government would not also know and therefore the only reasonable conclusion is that they hid it from the public. Even if Mr. Wagner was correct in his assessment, which he is not, he is applying his perspective and understanding today on the decision makers of the past. Or more simply stated, his is using what information he believes he has now and is judging the government’s decision on that information which they likely did not have at the time.

Third, the next claim is a type of red herring fallacy known as the straw man fallacy where an argument is misrepresented to make it easier to argue against. Mr. Wagner puts in quotes that a downed powerline was responsible for the outage and notes that a downed line has never been the reason for such an impact before or since. The findings of the investigation into the blackouts did not conclude that the outages occurred simply due to a downed power line though. The investigators put forth numerous findings which fell into four broad categories: inadequate system understanding, inadequate situational awareness, inadequate tree trimming, and inadequate diagnostic support amongst interconnected partners. Although trees were technically involved in one element it was a single variable in a complex scenario and mismanagement of a difficult situation. In addition, the ‘downed power lines’ mentioned were high energy transmission lines far more important than implied in the argument.

Mr. Wagner went on to use some other fallacies such as the informal fallacy of false authority when he cited, incorrectly by the way, Dell’s 2015 Annual Security Report. He cited the report to state that cyber attacks against supervisory control and data acquisition (SCADA) systems doubled to more than 160,000 attacks in 2014. When this statistic came out it was immediately questioned. Although Dell is a good company with many areas of expertise its expertise and insight into SCADA networks was called into question. Just because an authority is expert in one field such as IT security does not mean they are an expert in a different field such as SCADA security. There have only been a handful of known cyber attacks against critical infrastructure. The rest of the cases are often mislabeled as cyber attacks and are in the hundreds or thousands – not hundreds of thousands. Examples of realistic metrics are provided by more authoritative sources such as the ICS-CERT here.

Beyond his article though there was an interesting exchange on Twitter which I will paste below.


1

 


 

 

2


 

3

 


 

In the exchange we can see that Mr. Wagner makes the argument “what else could it have been? Seriously”. This is simultaneously a burden of proof fallacy requiring Blake or myself to provide evidence disproving his theory as well as an argument from personal incredulity. An argument from personal incredulity is a type of informal fallacy where a person cannot imagine how a scenario or statement could be true and therefore believes it must be false. Mr. Wagner took my request for proof of his claim as absurd because he believed that there was no other rational explanation for the blackouts other than a cyber attack.

I would link to the tweets directly but after my last question requesting proof Mr. Wagner blocked Blake and me.

Conclusion

Daniel Wagner is not the only person to write using informal fallacies. We all do it. The trick is to identify it and try to avoid it. I did not feel my request for proof ended up being a fruitful exchange with the author but that does not make Mr. Wagner a bad person. Everyone has bad days. It’s also entirely his right not to continue our discussion. The most important thing here though is to understand that there are a lot of baseless claims that make it into mainstream media that misinform the discussion on topics such as SCADA and critical infrastructure security. Unsurprisingly they often come from individuals without any experience in the field to which they are writing about. It is important to try to identify these claims and learn from them. One effective method is to look for fallacies and inconsistencies. Of course, always be careful to not be so focused on identifying fallacies that you dismiss the claim too hastily. That would be a great example of an argument from fallacy, also known as the fallacy fallacy, where you analyze an argument and because it contains a fallacy you conclude it must be false. Mr. Wagner’s claims are not false because of how they were presented. The claims were not worth considering because of the lack of evidence, the fallacies just helped draw attention to that.

Detecting the S7 Worm and Similar Capabilities

May 8, 2016

I first posted the following on the SANS ICS blog here

An article came out on May 5th titled “Daisy-chained research spells malware worm hell for power plants and other utilities” with the subtitle of “World’s first PLC worm spreads like cancer”. Having been on the receiving end of sensationalized headlines before I empathize with the authors of the research. Regardless of the headlines, the research, performed by Ralf Spenneberg, Maik Brüggeman, and Hendrik Schwartke, is quality work highlighting different methods of propagation and infections inside of control networks. However, before any alarmism sets in let’s look at this research and what it might mean as well as trivial methods to detect it.

The Worm

The researchers put out a great presentation and paper showing exactly what they did to create and test the worm. In short, the worm impacts Siemens SIMATIC S7-1200v3, although with some modifications the method would not be limited to Siemens or this version of PLC, and utilizes the S7 protocol to propagate without requiring the industrial PC. The authors put forth that the infection routine could be introducing an already manipulated PLC or if introduced through the PC it could then run wild without further use of the system. After a PLC is infected it starts scanning the network for TCP port 102 for the S7 protocol to identify the SIMACTIC PLCs. It scans the network through initiating and closing connections at incrementing IP addresses until it finds its targets and infects them. The researchers put the diagram below in their paper to show the process.

PLC Infection Routine

This research is valuable for a lot of purposes but I want to highlight two reasons I think it’s interesting. First, the worm was written in Structured Text and utilizes the PLC and S7 protocol against itself. The researchers use native commands and functionality which I believe will continually be a theme in ICS intrusions and attacks; there is a lot of value in reducing the reliance on custom code or malware when native functionality serves the purpose. The second reason is that all the communications it puts out should be easily identifiable by anyone watching their ICS network.

Detecting the Worm

The most difficult part of detecting threats in the ICS is getting access to the data. The era of dumb switches and a lackadaisical approach to logging needs to be stamped out; but the remnants of it serve a significant challenge. However, after getting access to the network data there are a lot of opportunities afforded defenders. In the case of the S7 worm it would be trivial to detect for anyone familiar with their ICS network.

In a modification of the Purdue Model for ICS architecture, shown below, we can see that good architecture would require the control network, where the S7 devices would exist, to be segmented off from the Internet and outside connections. Connections to the supervisory level should come through operations support and DMZ networks. All of these are potentially very static but the control devices level should be extremely static. I.e. your PLC shouldn’t be Tweeting anything or updating its Facebook status; it’s easier to spot changes here than it is in an enterprise business network.

purdue

The researcher’s worm requires the S7 protocol and it also requires the scanning of new IP addresses on TCP port 102. Even if the worm is modified to have very quick or very slow scan cycles they should all be easily detected. Many SIMATIC PLC environments will use PROFINET to communicate for device to device communications and rely mostly on S7 for configuration and interaction with the Totally Integrated Automation (TIA) portal. There should be a relatively predictable pattern of the PROFINET and S7 communications. It usually gives ICS networks a “heartbeat” that can be observed with the open source tool Wireshark. See below for a PROFINET network where the tool’s “IO Graph” feature (found under “Statistics” in the toolbar) has been selected. Changes to this observable heartbeat such as large spikes or dips in data should be investigated and would reveal increased scanning or command use in the environment.

Wireshark IO Data

Likewise, again just using Wireshark you could use the “Endpoints” feature (again found under “Statistics” in the toolbar) to identify what IP address and TCP or UDP ports are in use. This method easily identified the TCP port requests that the ICS scanner in HAVEX performed and here it would be trivial to identify connection attempts to numerous IP addresses that do not exist on the PLC network segment. Below is a picture of the same PROFINET capture from above. Your network will look different but ICS networks, especially in the control device level, are much smaller and static than traditional IT networks. Take advantage of that.

Wireshark Endpoints

Do a Health Check

At every ICS on the planet defenders should, as a bare minimum, do health checks of the network. In networks with unmanaged infrastructure put in a tap during a maintenance period and gather packet capture. Even if you only are able to do this once a year you will at least be able to identify issues ranging from misconfigured devices to worms like the one in this research. Utilize free tools such as Wireshark to just be aware of what your ICS looks like. Learn what normal looks like and hunt for weird. You’ll be amazed at what you find.

Sustainable Success

You should make the case for managed infrastructure in your ICS and do port mirroring to gather packet captures continuously. These networks do not have a large amount of data and you can gather a lot of packet capture for a small investment in storage space. From there you can learn what the ICS should look like normally and write whitelist styled intrusion detection system rules. There are many awesome professional tools that can help the security of your ICS but for this example you do not need any “next generation” tools; simply using tools such as Snort, Bro, and FlowBAT inside of the Linux distribution SecurityOnion can return huge value for continuously monitoring your ICS.

Ideally, you will have someone (maybe you!) that can perform the tactics I teach in my ICS515 course such as network security monitoring to constantly hunt for threats in your ICS. But if you do not at least you have the data to routinely go back and see if anything is wrong in the ICS or if something goes wrong you can have the data to determine the root cause and apply the appropriate mitigations. And if you ever call in an incident response team just having this data available will significantly reduce the cost and time associated with the response effort.

One other thing the S7 worm highlights is the communications going on inside the S7 protocol. Many defenders simply do not know what is going on inside their ICS protocols. In fact, many ICS protocols are poorly implemented or understood and its an area where much more research is needed. The community must do better with understanding the ICS protocols. Today, detecting changes is trivial through simple methods like Wireshark Endpoints. But the threats are evolving and defenders need to know what commands are being sent across their ICS network. Today, do health checks. Whenever possible, implement continuous monitoring and network security monitoring methodologies. Moving forward, look to deeply understand the ICS protocols and communications.

Closing Thoughts

The S7 worm is good research that highlights what can be done in the control network. But it’s not a worm from hell nor is it some unstoppable capability. The S7 worm is trivial to detect, but few are looking. The worm does however help raise awareness and for that the researchers deserve a lot of credit. For mitigations, ICS defenders should have copies of the logic running on their control devices digitally hashed and stored off of the network. Remediation with good backups is a much less difficult process than remediation without backups.

Regardless if this specific capability is realistic to your environment or not the takeaways are the same: detecting these types of capabilities only requires a minimal amount of effort once data collection is done. There are a lot of mitigations and defenses you should be putting into place but at the very least monitor your environment. If you are not collecting data from your control level you should be, when you move to monitoring the data you collect you will be significantly contributing to the safety and reliability of the ICS.

 

Fourth Sample of ICS Tailored Malware Uncovered and the Potential Impact

April 25, 2016

I first posted this piece on the SANS ICS blog here.

 

I looked at the S4 Europe agenda which was sent out this morning by Dale Peterson and saw an interesting bullet: “Rob Caldwell of Mandiant will unveil some ICS malware in the wild that is doing some new and smarter things to attack ICS. We are working with Mandiant to provide a bit more info on this in early May without doing the big reveal until S4xEurope.”
For those of you that don’t know, S4 is a conference run by Dale Peterson and this is their European debut (the other versions are in Florida and Japan and are staples of the ICS security conference scene always having hard hitting and top notch presentations). As a trusted conference, S4, and friend, Dale, I give a higher bit of credibility to anything that comes out of there than your typical conference. Add that to the fact that the Mandiant ICS team has a number of extremely credible voices (Rob Caldwell, Dan Scali, Chris Sistrunk, Mark Heard, etc.) this is even more interesting and credible.

Let’s break down what we know and why this is potentially very important.

Background on ICS Tailored Malware
To date there have been exactly three ICS tailored malware families that are publicly known. The first was Stuxnet which contained modules to target the Siemens’ systems at the Natanz facility in Iran and cause physical damage to the P-1 centrifuges. Second, there was the Havex malware used in the Dragonfly campaign (aliases include Energetic Bear and Crouching Yeti) that had a module that specifically searched for ICS specific ports (such as 102, 502, and 44818) and later more importantly an OPC module. Lastly, there was the BlackEnergy 2 malware which contained exploits and versions for GE’s CIMPLICITY and Siemens’ SIMATIC environments.

Why Haven’t We Seen More?
Most of us understand that ICS environments make for great targets especially for nation-state and APT styled actors. The ability for military posturing, political leverage, espionage, and even intellectual property theft make enticing targets. Yet, the numbers simply do not seem to align with the fear that many folks have about these environments being targeted. The question always comes up: why don’t see more ICS intrusions? I do not claim to know for sure but my running hypothesis is that it boils down primarily to three areas:

1. We do not have a lot of visibility into ICS networks. Many of the threats that we are aware of we know about due to vendors releasing reports. These vendors traditionally have end point security solutions and anti-virus in the networks that report back information to them. This allows the vendors to “see” tens of thousands of networks and the threats targeting them. In ICS we do not have these same products in scale and many are disconnected from the vendors (which is ok by the way and sometimes preferable). That combined with a lack of understanding of how to monitor these environments safely and interact with them creates a scenario where we don’t see much. Or in short, we aren’t looking.

2. Most malware families tend to be criminal in nature. APT styled malware is not as common in the larger security field. There simply isn’t as big of a motivation for criminals to make ICS specific malware families when ransomware, botnets, etc. work just as effectively in these environments anyway and they represent a smaller portion of the population. This is similar to the old Mac vs. Windows vs. Linux malware debate. One of the reasons we see more Windows malware is due to pure numbers and not because it’s less secure. There is more motive for criminals to write Windows based malware usually. For the APT styled actors, targeting ICS can be important for military and intelligence purposes but there isn’t as much motive to actually attack or bring down infrastructure outside of conflict scenarios; just to learn and position. I have my suspicions that there are a great number of ICS networks compromised with a large variety of ICS specific malware out there and we just haven’t seen the impacts to begin looking (see point #1).

3. ICS specific knowledge sets are rarer making it more difficult to create well-crafted and tailored ICS modules. The typical “cyber team” for nation-states are pretty good at Windows based environments but down in the lower ICS networks it requires system specific knowledge and engineering skills to truly craft something meaningful. This knowledge set is expanding though meaning we will definitely see more and more of these types of threats in the future.

Why is the Mandiant Discovery Potentially Important?
The claim that Mandiant has found a new ICS tailored piece of malware is important for a few reasons.

First, I have a good amount of respect for the Mandiant ICS team and if they say they’ve found something ICS specific I’ll still require proof when the time comes but I’m more inclined to believe them. Knowing the team members though I’m confident they’ll release things like indicators of compromise (IoCs) and technical knowledge so that the community can independently verify the find. This is great because many times there are claims made, even by trusted companies, without any proof offered. My general stance is that no matter how trusted the company is if there isn’t proof (for example the recent Version claim about the water hack) then it simply does not count. The community has been abused a lot with false claims and proof is required for such important topics.

Second, given that there have only been three ICS tailored malware families to have a fourth is incredibly interesting for the research both into the defense of ICS but also into the threat landscape. Understanding how the intrusions took place, what the targets were, and extracting lessons learned will be very valuable to both the technical and culture challenges in this space. It remains to be seen exactly what Mandiant means by “ICS specific” although I have messaged some trusted contacts and have been told that the agenda point isn’t a misprint; Mandiant claims to have found tailored ICS malware and not just an ICS themed phishing email or something less significant. Although I never wish harm on anyone from a threat and defense research perspective this is an amazing find.

Third, it bodes well for the ICS security industry as a whole to start making some more positive changes. There have been many ICS security companies around for years (security and incident response teams like LoftyPerch, independent consultants and contractors, red teams like Red Tiger Security, etc.) and even some dabbling by larger companies like Kaspersky and Trend Micro (who both have contributed amazing information on the ICS threat landscape). But the Mandiant ICS team in a way represents a first in the community. Mandiant, and its parent company FireEye, is a huge player in the security community. For years the Mandiant team itself has been widely respected for their incident response expertise. To have them come out and make a specific ICS team to focus on incident response was actually a big risk. It is common to see ICS products and services but many of the startups struggle much more than the media and venture capitalists would let on. Mandiant’s ICS play was a hope that the market would respond. To have the team come out with a fourth specific ICS tailored malware family bodes very well for the risk they took and with the appropriate coverage while keeping down hype this could be very important for the industry and market writ large. Of course the customers always get a big vote in this area but it could mean more folks waking up to the fact that yes ICS represent a target and yes the security community can calmly and maturely approach the problem and add value (again, please no hype, wallpapers, and fancy logos though for exploits and malware).

But Aren’t Squirrels More Damaging to the Grid?
I gave an interview to a journalist for a larger piece on squirrels and cyber threats with regards to the power grid and I believe it warrants a discussion in this piece’s context. The common joke in the community is that squirrels have done more damage to power grids than the US, China, Iran, Russia, UK, etc. combined. And it’s true. It is often stated by us in the industry to remind folks that the “OMG Cyber!” needs to calm down a bit and realize that infrastructure operators on a daily basis deal more with squirrels and Conficker than APT styled malware. However, we should not equate the probability of attacks with the importance of them. As an example, let’s consider the recent DHS and FBI report on the risk to the U.S. electrical infrastructure.

I have a lot of love and respect for many of the FBI and ICS-CERT personnel I’ve worked with. I can only describe most of them as extremely passionate and hard working. But, the claim that the risk of a cyber attack against U.S. electrical infrastructure was low was upsetting to me because of how it comes across. On the heels of the cyber attack that impacted the Ukrainian power grid the report seemed to downplay the risk to the U.S. community. It stood in direct contrast to Cyber Command’s Admiral Rogers who stated that “It is only a matter of the when, not the if we’re going to see a nation-state, group, or actor engage in destructive behavior against critical infrastructure in the United States.” He was specifically talking in context of what happened in Ukraine and the importance of it. As the head of both the NSA and the U.S. military arm for cyber it is appropriate for Admiral Rogers to have a good understanding of the foreign intelligence and foreign military threat landscape. For the DHS and FBI to contradict him, even if unintentionally, seems very misplaced in what their expertise and mission is; and this leads back to the squirrel comment.

It is not as important to think of probability with regards to destructive attacks and ICS focused intelligence operations. When a community hears of a “low probability” event they naturally prioritize it under other more high probability events. As an example, prioritizing squirrels over nation-state operations based on probability. The problem with doing that though is that the impact is so much more severe for this “lower probability” scenario that the nation must prioritize it for national security reasons. Telling the infrastructure operators, who really defend the grid not the government, to stay calm and carry on is directly competing with that need although the message should admittedly always avoid hype and alarmism. Mandiant coming out with the fourth variety of ICS tailored malware helps highlight this at a critical point in the debate both among infrastructure operators and policy makers.

Conclusion and What to Do
We won’t know exactly what the ICS tailored malware is, what it’s doing, or technical knowledge of it until Mandiant releases it. It could be a dud or it could be extremely important (knowing the Mandiant team my bet is on extremely important but let’s all remain patient for the details before claiming it to be so). However, infrastructure owners and operators do not need to wait for the technical details to be released. It is important to be doing industry best practices now including things such as network security monitoring internal to the ICS. The other three samples of ICS tailored malware were all incredibly easy to identify by folks who were looking. Students in my SANS ICS515 ICS Active Defense and Incident Response class (shameless plug) all gets hands on with these threats and are often surprised at how easy they are to identify in logs and network traffic. The trick is simply to get access to the ICS and start looking. Or in other words: you too can succeed. Defense is doable. So do not feel you need to wait for the Mandiant report. It is potentially very important and technical details will help hunt the threats but you can look now and maybe you’ll spot it, something else, or at the very least you’ll get familiar with the networks you should be defending so that it’s easier to spot something in the future whether its APT styled malware or just misconfigured devices. Either way – the most important ICS is your ICS and learning it will return huge value to you.

The Problems with Seeking and Avoiding True Attribution to Cyber Attacks

March 4, 2016

Attribution to cyber attacks means different things to different audiences. In some cases analysts only care about grouping multiple intrusions together to identify an adversary group or their campaign. This helps analysts identify and search for patterns. In this case analysts often use made up names such as “Sandworm” just to group activity together. To others, attribution means determining the person, organization, or nation-state behind the successful intrusion or attack; this latter type of attribution I will refer to as true attribution. There are many issues with true attribution that I want to explore here. However, there are also those that have pushed back on analysts exploring motives to an attack that I also want to call attention to. When dealing with attribution analysts should avoid the extremes: using true attribution inappropriately or being too hypersensitive to perform analysis and explore motives. Good analysts know when to seek true attribution and when to avoid it.

To explore these concepts I will look at true attribution at the tactical, operational, and strategic level of threat intelligence. While these levels should not be seen as a static category it will help shape the discussion. Tactical threat intelligence often deals with those folks who do the day-to-day security such as performing incident response and hunting for threats in the environment, operational threat intelligence refers to those personnel who work to identify adversary campaigns and often focus on aspects such as information sharing and working through organization knowledge gaps, and the strategic threat intelligence category I’ll use to refer to those personnel that sit at senior decision making levels whether it be executives or board of directors members at companies or national government officials and policy makers.

True Attribution at the Tactical Threat Intelligence Level

In my opinion, true attribution at the tactical threat intelligence level is only harmful to good security practices. Trying to identify who was responsible for the attack seems like a good idea to help shape security practices. As an example, an analyst who thinks that China is in their network might begin looking for intellectual property theft and try to shortcut their effort to identify the adversary. But think about that for a moment. Because our hypothetical analyst thought China was in the network, they have begun to look at the data in front of them differently. In this case, attribution has led our analyst to the land of cognitive bias. Cognitive biases are especially dangerous when performing analysis as they bias the way you think – and analysis leans so heavily on the human thought processes that it can lead us to inappropriate conclusions. Now, instead of keeping an open mind and searching for the threat in the network our analyst is falling prey to confirmation bias where the analyst is looking at the data differently based on their original hypothesis that China is in the network.

This begs the question though, if the analyst has nothing else to go off of shouldn’t they look for the tactics, techniques, and procedures of China in the network as a starting place? In my opinion that is the role of those often funky sounding made up campaign names or intrusion set names; this is what others sometimes call attribution but not true attribution. An analyst that thinks they know what “China” looks like really only knows previously observed activity. If I tell you to think about what China would be doing in a network you might think intellectual property theft. If I tell you the threat is Russia you might think of cybercrime or military pre-positioning. If I say Iran maybe you think about data destruction. The problem is, that thought process is tied to previously observed activity and it’s also going off of the assumption that previous true attribution you’ve heard is correct. Even if we assume all the previously true attribution was correct though analysts have only ever heard of or seen some of the campaigns by adversaries. Russia has teams that are interested in intellectual property theft just as China has teams that are interested in military pre-positioning. We are biased in how we view nation-state attribution based on campaigns we have seen before and it is difficult to take into consideration what is unknown. The better tactic is in identifying patterns of activity such as “Sandworm” and thinking to previous observed threats tactics, techniques, and procedures as a starting place in how we search the network for threats. Then tactical level threat intelligence analysts aren’t biased by true attribution but can use some element of attribution to learn from threats they’ve observed before while attempting to avoid cognitive biases.

True Attribution at the Operational Threat Intelligence Level

At the operational threat intelligence level the use of attribution needs to fit the audience. Operational level threat intelligence analysts should always attempt to serve as the bridge between the strategic level players and the tactical level analysts. When using the observations and information from the tactical level to translate to strategic level players there can be a role for true attribution, which we will explore later. When translating the observations at the strategic level and operational level to the tactical level though true attribution then again becomes dangerous. The way threat intelligence is positioned should be determined by the audience consuming it.

Consider this: an operational level threat intelligence analyst has been asked to take the campaigns observed in the community and translate that information for the tactical level folks to use. The indicators of compromise and security recommendations that the tactical level personnel should use are independent of attribution. The security recommendations and fixes are based off of the observed threat to the systems and vulnerabilities not the attribution; or said another way if you have to patch a vulnerability you don’t patch it differently if the exploit was Chinese or Russian based.

However, that same operational threat intelligence analyst has been asked to identify the threat landscape, the observed campaigns in the community that are relevant to the organization, and make recommendations for strategic level players that can influence organizational change. Here, the analyst may not be able to prove true attribution based off of observed adversary activity but it is in their best interest to identify patterns and motives to attacks. As an example, if there have been a number of campaigns recently that align with the motives of Chinese actors targeting the analysts’ company the recommendation from the operational level analyst to the strategic members might have them take into consideration how they interact with and do business with China. Here the analyst should use language to structure their assessment that the observed threats are Chinese based such as “high confidence”, “medium confidence”, and “low confidence.” Language such as “it is definitively China” should be avoided. Ultimately analysis is based on incomplete data sets (consider the difference between inductive and deductive reasoning) and the provided information is just an assessment.

At the operational level of threat intelligence analysts should be mindful of their audience and be open to putting forth good analysis based on observed activities, threats, and motives without being definitive on true attribution.

True Attribution at the Strategic Threat Intelligence Level

Strategic level audiences often heavily care about true attribution but not always with good reason. Government leaders and company executives want to know their threat landscape and how it might shape how they conduct business or policy. That is a good thing. However, strategic level players should be careful not to use true attribution when it’s not required.

As an example, if the organization is facing security challenges and is consistently having intellectual property stolen they need to look at the security culture of the organization and the resource investments needed to increase security and minimize risk. This inward look at the culture and security investments should usually be independent of true attribution. The tactical and operational level impacts are going to be the same whether the previous culprits were China, Iran, Russia, or the North Pole. However, if the organization is taking an outward approach to doing business or policy making they may need to consider true attribution. Because true attribution is usually based off of assessments and not usually definitive it should usually be approached as a continuum.

To look at true attribution especially for this level of threat intelligence I highly recommend two resources. First, a paper by Dr. Thomas Rid and (soon to be Dr. – congrats Ben!) Ben Buchanan titled Attributing Cyber Attacks. This paper will get you into the right mindset and understanding of attribution for the second paper I would recommend by Jason Healey titled Beyond Attribution. In Beyond Attribution, Jason Healey discusses the concept of responsibility as it applies to attribution. In short, a nation-state has responsibilities with regards to cyber operations especially if they might have been conducted from within its borders. At one side of the scale, a state can take an approach of prohibiting attacks and actually help other nations when an attack has begun. On the other side of the scale a state actually conducts the attack and integrates their attack with third-party proxies such as private companies for hire or hacktivists.

Analysts should be mindful of this spectrum of state responsibility, as Jason calls it, when considering true attribution and the nature of intelligence assessments. It is difficult to have true attribution and true attribution can be harmful to tactical level security. However, identifying motives in attacks and understanding the spectrum of state responsibility to attacks should be considered at the strategic level so that we are not so hypersensitive on the topic of attribution that every adversary gets to operate without consequence.

Case Study: Cyber Attack on the Ukrainian Power Grid

Let’s take these concepts and apply it to the cyber attack on the Ukrainian power grid. If you’re unfamiliar with the case you can read about it here. In this case, I have been very careful about my wording as I know there are multiple audiences that see my quotes in media or read my reports. On one hand, I teach a threat intelligence course and an ICS/SCADA active defense and incident response course at the SANS Institute. In this capacity most of my audience is tactical and operational level personnel. For those reasons I have often tried to reinforce that attribution in Ukraine doesn’t matter for them. Identifying indicators of compromise to hunt throughout the network, preparing the network to make it more defensible, and applying lessons learned from the Ukraine attack are all independent of true attribution. True attribution simply doesn’t matter for how we apply the lessons learned for security at those levels.

However, I also deal with strategic level players in my role in academia as a PhD student at Kings College London and as a Non-Resident National Cyber Security Fellow at New America where I work with policy makers. For this audience, it is important for me to note that definitive true attribution has not been obtained in the Ukraine attack and may never be obtained. However, in considering Jason’s spectrum of state responsibility we have to look at the attack and realize the potential motives, the larger geo-political setting, and analyze if there are any courses of action strategic level personnel should take. In my opinion, I doubt the Russian government itself carried out the attack. However, the attack on the Ukrainian power grid did not fit any apparent financial motives and the motives aligned with various Russian based actors; whether those are private companies, hacktivists, or senior government officials. Therefore, it is in my opinion and in my analysis that strategic level players should look at the elements of attribution that link to Russian based teams and consider Jason’s spectrum of state responsibility. Even if Russia had nothing to do with the attack there should be an investigation into whether or not it occurred from within their borders. If the attack is state-ignored it sets a dangerous precedent. Senior policy makers in other nations should under no circumstance jump to blaming Russia for anything. However, they should look for international cooperation and potentially an investigation as this is a first-of-its-kind cyber attack on civilian infrastructure that led to a power outage. There is a line between espionage and offense; that line was crossed in Ukraine and we must be careful of the precedent it sets.

Conclusion

In conclusion, true attribution is highly abused in the information security community today. Many organizations want true attribution but do not know how to use it appropriately and many private companies are quick to assign definitive attribution to attacks where they simply do not have the appropriate data to support their conclusions. True attribution makes media headlines and the motives for companies to engage in this activity are significant for that reason. Claims of true attribution do increase international tension; not as significantly as some would assume but they are individual data points to policy makers and national level leaders. However, being hypersensitive about true attribution enforces a culture in this field where nation-states can ignore responsibility such as investigating attacks or policing their borders as is normal in international law and policy in any other domain other than “cyber.” There is a balance to be struck. Knowing how to strike that balance and when to use attribution in the form of group names with no state ties or true attribution in the form of an evolving assessment will help the threat intelligence community move to a more mature point where tactical, operational, and strategic level players can all benefit.

 

*Edit 3/6/2016*

I had a good discussion with some colleagues around this post and wanted to add two points.

  • Richard Bejtlich had a really good blog post on the value of attribution and breaks it down in a number of useful ways. His blog post pre-dates mine but I failed to reference it the first time. It can be found here. I would recommend it as it’s a great read and doesn’t take long to work through.
  • Two peers, Mark and Tim, made a case for tactical level true attribution that I think is actually an interesting one to consider. I would argue that most tactical folks shouldn’t consider true attribution and that it’s highly highly abused and resource intensive with little value in the wider community today. That being said, Mark made the point that in a resource constrained environment it might be a useful factor in prioritization. As an example, if you have a lot of phishing emails or malware samples to look at and you need a place to start, true attribution could be of value as that starting point as long as you try to defeat any biases later on. The reason this could be of value (credit to Mark and Tim on this point) over just attribution of groups is: if you have data that is of use to specific countries (think F-35 fighter aircraft intellectual property being of value to China and Russia more so than Niger) using that information as a starting point and prioritization of your searches could be useful. This also touches on the topic of crown jewel analysis combined with threat intelligence; for anyone interested in that subject check here.  This to me gets closer to the operational level than the tactical level and I would expect operational folks to translate these concepts into a usable form for tactical level analysts instead of expecting them to start this process – but I can see the case for why this would be useful at the tactical level and would agree that it’s an interesting one to consider.
  • (Thanks to the peers that took the time to discuss their thoughts with me. Discussions like these help all of us explore our understanding of a topic and I always find my own learning process enhanced by them).

No, Norse is Not a Bellwether of the Threat Intel Industry but Does Hold Lessons Learned

January 30, 2016

Brian Krebs published an outstanding report today titled “Sources: Security Firm Norse Corp. Imploding” which has led to the emergence of a number of blogs and social media rumblings about what this means for the cyber threat intelligence community. Some have already begun positioning that this is the fall of threat intelligence. I would not only disagree and believe this to be a mostly isolated case but position that if anything this is a good sign of the community’s growing maturity. The purpose of this blog is to discuss why Norse’s potential and impending implosion does hold some lessons learned for the industry but holds no prediction of negative things to come for the threat intelligence community as a whole.

Before elaborating on these points though, I want to start off with the much needed statement about the people at Norse. To anyone in the community that holds strong negative feelings for Norse (and you are not alone) please be conscious that many of the individuals working at Norse were professionals and very talented. Many of the negative feelings towards the company were likely based on the marketing efforts and mislabeling of the content and value of their product; not negativity towards the people that work there. I hope the former employees land softly at their next jobs and I would encourage companies looking to hire to think of these individuals without prejudice.

With regards to Norse it was in many ways a good looking company. It garnered national media level attention through smart placement of their cyber attack map (yes the pew pew cyber map analysts have mostly grown to hate – but it looked good in media). There were some key employees recruited who were well respected in the industry. And it raised tens of millions of dollars in investments to appear as an exciting California security startup. So now that the company is apparently imploding it does seem natural to think that this may be an indication of things to come with regards to the threat intelligence industry and for a ripple effect in investments into this space. However, I would state this as wholly inaccurate although there are some lessons learned here for both investors and security startups.

First, Norse Corp. may have garnered national level attention but most of it was not actually good attention. Also, they billed themselves as a threat intelligence company when, in my opinion, they simply were not. Folks who are familiar with me, or read it in the Krebs report, will remember that I came out very publicly chastising their dangerous assessment that there were Iranian attacks on U.S. industrial control systems. The key reason that they had a bad assessment is actually why Norse was always doomed to fail. The company was interpreting Internet scanning data against their high level sensors as attack intelligence. Most threat intelligence companies rely upon enriched data complemented with access to incident response data of actual intrusions; not scanning activity. Norse also held no verifiable industrial control system expertise but were quick to make assessments about these systems. And further when they stated that there were attacks on control systems by Iran what the data seemed to show was they actually should have said scans against systems trying to mimic industrial control systems by Iranian IP addresses. The effort by them and the think tank AEI to state that there should be policy considerations in the Iranian nuclear negotiations based off of this data is a great representation of what not to do in the industry. Simply put, they were interpreting data as intelligence. There is a huge difference between data, information, and intelligence as I outlined here. While their product and Internet level scanning data was interesting and potentially very valuable for research it was not threat intelligence. So while they may have billed themselves as significant players in the threat intelligence community they were never really accepted by the community, or participating in it, by most leading analysts and companies. Therefore, they aren’t a bellwether of the threat intelligence industry or of other companies having trouble simply because they weren’t really ever in “the industry.” The threat intelligence community can be fairly small and making strategic mistakes can have significant lasting impact. Trust is a huge part of the equation in this community.

Second, this case-study of Norse holds great lessons learned. First, because trust is a significant part of doing intelligence work and in participating in this community there is a requirement for companies to realize they are dependent on the ecosystem and are not living in a bubble. Formal and informal relationships, company partnerships, and information sharing can help companies succeed quickly. It is not a competitive landscape in such that companies should think that success is a finite item where one company’s success means less is available for others. Quite the opposite. As threat intelligence is used more appropriately throughout the industry it will continually open up the market. For example, threat intelligence is meant to make good security programs better or to help give important context and information to strategic level organization decision makers – it is not meant to replace bad security programs or act as a magical solution for security. Second, threat intelligence companies should be very careful in lining up their marketing efforts with an honest assessment of what the company’s product or services actually produce. This should apply to any security startup but it is vital in the threat intelligence community. Whereas claims around general security can be difficult to interpret there are definitive ways to look at company claims in intelligence and dismiss them completely as hype. This dismissal is hard to recover from. Finally, an important lesson learned here is for investors and Venture Capital firms to dig deep not only into what is being shown by the company but also in how they are perceived in the community. There are many “experts” in this community who’ve never held the appropriate positions or roles to ever have been put in a situation to speak with expertise about threat intelligence. As an example, one of my critiques of Norse was that their “intelligence report” on industrial control system attacks was not written by anyone with industrial control system expertise. Just as we would expect a Russian intelligence analyst to have an understanding of Russia or even speak Russian the community and investors should demand that assessments are qualified by actual expertise not just general “cyber” expertise.

Venture Capital firms invest in companies with the expectation of not getting an immediate return on investment. In an overly simplified stereotype most Venture Capital funds expect not to see their returns for five to seven years with events such as an IPO or company merger/acquisition. Following that logic, it is reasonable to believe that investments made five to seven years ago are starting to be looked at for their return on investment to the Venture Capital firms. The landscape for investment will likely become much more competitive. There will be lessons learned from investing in good-sounding but under-performing companies. Investors and industry analysts will demand more proof of claims, understand what hype looks like a bit better, and invest even more intelligently. This is a good thing for the industry. I doubt Norse will be the last company to fail in the threat intelligence industry but the industry and investments into it will likely continue to grow. The focus will be on smarter money.

 

 

Context for the Claim of a Cyber Attack on the Israeli Electric Grid

January 26, 2016

This blog was first posted on the SANS ICS blog here.

 

Dr. Yuval Steinitz, the Minister of National Infrastructure, Energy, and Water resources, announced today at the CyberTech Conference in Tel Aviv that a “severe cyber attack” was ongoing on the Israel National Electric Authority. His statements were delivered as a closing session at the conference and noted that a number of computers at the Israeli electricity authorities had been taken offline to counter the incidentthe previous day.

There are few details that have been offered and thus it is far too early for any detailed analysis. However, this blog post attempts to add some clarity to the situation with context in how this type of behavior has been observed in the past.

First, Dr. Steinitz mentioned that computers had been taken offline. This discussion around the choice by the defenders to take systems offline indicates a normal procedure in terms of incident response and malware containment. The intention of the incident responders cannot be known at this time but this activity is consistent with standard procedures for cleaning malware off of infected systems and attempting to contain an infection so that it cannot spread to other systems. Taking systems offline is not preferable but the fact that systems were removed from the network does not necessarily make the incident more severe. On the contrary, this indicates that incident responders were able to respond early enough with planned procedures to counter the incidentprior to an impact.

Second, there have so far been no outages reported or any such impact of the “attack” quantified. It appears, only from what has been reported so far, that the use of the term “cyber attack” here is very liberal. Malware infections in industrial control system (ICS) networks are not uncommon. Many of these environments use traditional information technology systems such as Windows operating systems to host applications such as human machine interfaces (HMI) and data historians. These types of systems are as vulnerable, if not more so, than traditional information technology systems and malware infections are not novel. With regards to historical case studies it is far more common for incidental malware to lead to system failures than targeted attacks. For example, the Slammer malware reportedly caused slow downs in the Davis-Besse nuclear power plant’s networks and crashed a utility’s supervisory control and data acquisition (SCADA) networkin 2003. However, in terms of targeted/intentional intrusions leading to outages we only have three validated public case studies: Stuxnet, the German Steelworks facility, and the Ukrainian power grid. It is these targeted intrusions where an outage occurred that could be considered an attack. Often times people unintentionally abuse the phrase “cyber attack” when it is more appropriate to classify the activity as adversary intrusions, compromises, or espionage activity. To understand what constitutes an actual attack it is helpful to read theICS Cyber Kill Chain.

Third, there has been an increased focus on cyber security in Israel both as it relates to the cyber security of national infrastructure and in the technology companies that are making Israel an enticing locationforventure capital funding. In January, Israeli Prime Minister Benjamin Netanyahu gave a presentation to the World Economic Forum where the center of his discussion was cyber security. This was followed by a Februaryannouncement that the Cabinet in Israel approved a plan for a comprehensivenational cyber defense authority. With the increased focus on cyber security it is entirely possible that Israel had taken a proactive approach to looking through their infrastructure networks to identify threats. In the course of this action it may have found malware that may be targeted or incidental in nature. In either case, from what is being reported right now it appears unlikely that this is an actual attack and more likely it is the discovery of malware. However, it is important to watch for any developmentsin what is being reported.

Israel has threats that it must consider on a day-to-day basis. Critical infrastructure is constantly the focus of threats as well although there are a lack of validated case-studies to uncover the type of activity much of the community feels is going on in large quantities. However, reports of cyber attacks must be met with caution and demands for proof due to the technical and cultural challenges that face the ICS security community. Simply put, there is a lack of expertise in the quantity required alongside the type of data needed to validate and assess all of the true attacks on infrastructure while appropriately classifying lesser events. Given the current barriers present in the ICS community the claims of attacks should be watched diligently, taken seriously, but approached with caution and investigated fully.

Facts, Hype, and Takeaways from Reports on Iranian Activity Against the Power Grid and a Dam

December 21, 2015

This was first posted on the SANS ICS blog here.

 

Yesterday a report on Iranian activity focused on a small dam in New York was released by Danny Yadron at the Wall Street Journal. Today a report was released by Garance Burke and Jonathan Fahey at the Associated Press reporting on Iranian activity linked to the OpCleaver report by CYLANCE where documents related to Calpine were stolen. So what’s the hype, what are the facts, and what are the takeaways? Let’s explore:
The Facts:

I’ve worked with both Danny and Garance before and have a high amount of respect for the effort they put into their reporting. Reporting on technical content can be very difficult and sometimes leads to inaccurate reports especially when the topic of security is combined with control systems. Garance and Danny do their homework though. In that regards I instantly feel more positive about the articles. That’s also why I was willing to contribute a quote to Garance when she called and wanted a quote for the story. I didn’t get to see the story, didn’t know all that was going to be written, but understanding the type of data that was stolen that was related to Calpine — yes it absolutely is something an adversary would want and defenders should protect and she was correct to in emphasizing that.

In theWSJ story there were namedindividuals in the town who were present for, and recalled, the FBI response to the activity. Additionally, there are unclassified reports from the FBI and ICS-CERT that could likelycorrelated to the dam event. Both stories are credible in the fact that they occurred. Not all the details are properly fleshed out though for the ICS community and there’s a few areas to leave you wanting.

When looking at the ICS Cyber Kill Chain where did the dam and Calpine cases fall? Neither of them were attacks. I’d put the dam activity under Reconnaissance in Stage 1 and I’d put the Calpine case under Act under Stage 1 but not in Calpine networks. This is important to note; the “Act” was the exfiltration of sensitive documents related to Calpine but the intrusion was not in Calpine but instead in contractor networks. Or simply put, neither Calpine nor the dam were compromised. But both showed a focused effort by an adversary, possibly Iran but attribution is always tricky, against infrastructure.

Also, in the case of the dam the WSJ report notes that U.S. authorities confused the dam with other similarly named dams. The cell modem the infrastructure used would have been distributed out in a manner that likely made physical location difficult to determine. This may have confused the adversaries as well.

I’d highlight the following facts:

  • Both reports are credible news organizations and reporters
  • The WSJ report on the dam is additionally credible with regards to the event having taken place (the details could always be wrong though). This is due to correlated details with other unclassified reports, timing considerations, and a named source noting that the FBI did respond
  • The WSJ report identifies that the activity was “probing” but likely not scanning activity; the focused effort on queries and searches by the adversaries is more of a targeted Reconnaissance than random scans
  • The AP report is additionally credible given that named sources identified and provided samples of the stolen documents of internal information, passwords, and system diagrams in Calpine
  • The AP report identified that sensitive data about Calpine was stored on contractor networks and was not stolen from internal to the ICS
  • Neither the dam nor Calpine were compromised. There was no intrusion into ICS networks nor were there any attacks.

 

The Hype:

To anyone in the ICS community these reports likely have some cringe worthy statements to you. This has been the discussion in various social media circles already where ICS security professionals have taken offense to statements such as the AP report’s “cyberattackers had opened a pathway into the networks running the United States power grid.” The comments, which I agree with, are that there is no open pathway based off of these stolen documents. With the WSJ report the title states “Iranian Hackers Infiltrated New York Dam” which obviously did not actually occur since there was no intrusion. These issues are consistent with any news reports regardless of how good the journalists are on the subject matter. This is where I’m both empathetic and exhausted.

I’m empathetic because there are a lot of eyes on these reports and hands in the proverbial cookie jar. Very rarely do journalists get to choose their report’s title. Additionally, the reporters’ main audience is not the ICS community. It’s a more laymen non-technical audience. Any report that those of us would come up with in the community focused only on measured facts would likely be incredibly boring or entirely too technical for a laymen audience. I’m exhausted because this type of activity is understandable but not excusable. If we continue to hype threats and accidently miseducate the audience people will pay attention to that. Outside our community there are folks who impact our community. Policy makers and the general public have a lot of impact on ICS. Journalists and news organizations need to do better for sure, but we should also take into consideration they are trying to make something out of reports from a community who does not like sharing these types of events. Overall I felt positively about the articles but I’d like to see the news reporting community as a whole do better with regards to ICS and security.

I’d highlight the following hype:

  • The WSJ’s title is misleading as there was no intrusion
  • The AP’s statements around the impact of this data loss is (in some places) misleading. It is valuable data but does not make the grid any more vulnerable today than it was before
  • Both reports provide very little evidence and rely on unnamed sources for the attribution to Iran; given the number of reports and correlation of events the case is stronger than usual but still not enough to truly validate that the Iranian government was responsible

 

The Takeaways:

If I’m going to highlight the flaws of the journalist community I’m certainly going to highlight our own community’s flaws. News organizations need to do better in general but the ICS community needs to get better at identifying issues and being willing to share lessons learned. Some organizations are amazing at this but as an overall community there’s work to be done. The identification of the loss of data related to Calpine only came from a member at CYLANCE identifying the sensitive documents on one of the adversary’s FTP servers. The amount of documents stolen from multiple sites should have been detected by someone internal to those networks (such as Calpine’s contractor) and not waiting for a 3rd party notification. Additionally, anyone in the ICS community who’s been here long enough can think about a couple of close calls and actual incidents that are not public. If the community cannot figure out a way to responsibly share case studies and lessons learned then we will have to accept other people outside our community writing the narrative. It’s a hard task but we have to figure it out.

Defense is definitely getting better. ICS is not as vulnerable as people make it out to be. And defenders are taking a more proactive approach to security than ever before. But we as a community have some ground to cover. Taking an active defense approach to monitoring our networks, performing incident response, and sharing the non-sensitive details for the community to learn is required for us to raise the bar and have the ICS security story be written by the ICS community. Journalists are going to tell the stories regardless. It’s up to us to identify and guide a proper narrative or to not complain about it.

Additionally, the stories both highlight a focused effort by foreign adversaries targeting infrastructure. It also highlights sensitive ICS data being stored on non-ICS networks. This reinforces the need to bridge the IT/OT gap and have ICS and IT professionals work more closely together. The one thing I’ll push back on a bit from the ICS perspective is the comment from Calpine that the stolen data, diagrams, and passwords were old and thus pose no threat. Calpine may bean industry leader in this area but ICS diagrams, passwords, and data does not change that quickly at all and even when old can be useful. This type of information is definitely useful to an adversary for reconnaissance and learning purposes — but no it is not a threat to bringing down the grid or Calpine’s facilities.

I’d highlight the following takeaways:

  • The cultural and technical barriers to identifying incidents, responding to them, and sharing lessons learned need to be reduced in the community so the proper narrative can be written and security can be elevated
  • The IT/OT gap is a divide that must be bridged if for no other reason than the fact that all the sensitive information about an ICS does not just rely on the ICS networks; IT networks including contractor networks can reveal data about the ICS that we do not want adversaries to have
  • The data from the AP story would be useful to adversaries but should not be overvalued. The biggest takeaway is a focused effort by adversaries to learn about infrastructure and target it
  • The power grid or infrastructure such as dams are not as easy to impact as folks like to make it sound, but adversaries are getting smarter and focusing harder on this challenge — defenders too must get smarter and focus on the threat to keep the opportunity to damage infrastructure out of the hands of malicious actors

Minimum Viable Products are Dubious in Critical Infrastructure

December 4, 2015

Minimum Viable Products in the critical infrastructure community are increasingly just mislabeled BETA tests; that needs to be communicated correctly.

The concept of a Minimum Viable Product (MVP) is catching on across the startup industry. The idea of the MVP is tied closely to The Lean Startup model created by Eric Ries in 2011 and has very sound principals focused around maximizing the return on investment and feedback from creating new products. Eric defines the MVP as the “version of a new product which allows a team to collect the maximum amount of validated learning about customers with the least effort.” This enforces the entrepreneurial spirit and need for innovation combined with getting customer feedback about a new technology without having to develop a perfect plan or product first. An MVP is also meant to be sold to customers so that revenue is generated. In short, be open to testing things out publicly earlier, pivot based off of technical and market feedback, and earn some money to raise the valuation of the company and entice investors.

Personally, I believe the lean startup model as a whole is smart. I use some aspects of the model as CEO of Dragos Security. However, I chose not to use the concept of an MVP. Minimum Viable Products are dubious in critical infrastructure. I state this understanding that the notion of getting the product out the door and gaining feedback to guide its development is a great idea. And when I say critical infrastructure I’m focusing heavily on the industrial control system (ICS) portion of the community (energy, water, manufacturing, etc.). The problem I have though is that I have observed a number of startups in the critical infrastructure startup space taking advantage of their customers, albeit unintentionally, when they push out MVPs. This is a bold claim, I won’t point fingers, and I don’t want to come across as arrogant. But I want to make it very clear: the critical infrastructure community deals with human lives; mistakes, resource drains, and misguiding expectations impact the mission.

My observations of the startups abusing the MVP concept:

  • Bold claims are made about the new technologies seemingly out of a need to differentiate against larger and more well established companies
  • Technologies are increasingly deployed earlier in the development cycle because the startups do not want to have to invest in the industry specific hardware or software to test the technology
  • The correct customers that should be taking part in the feedback process are pushed aside in favor of easier to get customers because successes are needed as badly as cash; there is pressure to validate the company’s vision to entice or satisfy Angel or Seed investors
  • The fact that the technology is an MVP, is lightly (if at all) tested, and will very likely change in features or even purpose is not getting communicated to customers in an apparent attempt to get a jump start on the long acquisition cycles in critical infrastructure and bypass discussions on business risk
  • Customers are more heavily relied upon for feedback, or even development, costing them time and resources often due to the startups’ lack of ICS expertise; the startup may have some specific ICS knowledge or general ICS knowledge but rarely does it have depth in all the markets its tackling such as electric, natural gas, oil, water, etc. although it wants to market and sell to those industries

What is the impact of all this? Customers are taking bigger risks in terms of time, untested technologies, changing technologies, and over hyped features than they recognize. If the technology does not succeed, if the startup pivots, or if the customers burn out on the process all that’s been accomplished is significant mistrust between the critical infrastructure stakeholders and their desire to “innovate” with startups anymore. And all of this is occurring on potentially sensitive networks and infrastructure which have the potential to impact safety or the environment.

My recommendations to startups: if you are going to deploy technologies into critical infrastructure early in the development cycle make sure the risks are accurately conveyed and ensure that the customer knows that they are part of a learning process for your technology and company. This begs instant push-back: “If we communicate this as a type of test or a learning process they will likely not trust our company or technology and choose to go with other more established products and companies. We are trying to help. We are innovators.” And to my straw man here, I empathize greatly. Change is needed in this space and innovation is required. We must do better especially with regards to security. But even if we ignore the statistics around the number of failed technologies and startups that would stress why many should never actually touch an ICS environment I could comfortably state that the community is not as rigid as folks think. The critical infrastructure community, especially in ICS, gets cast in a weird light by many outside the community. My experience shows that the critical infrastructure community is just as innovative, and I would argue more so, than any other industry but they are much more careful to try to understand the potential impact and risks…as they should be.

My experience in a new technology startup: when the Dragos Security team was developing our CyberLens software we needed to test it out. Hardware was expensive and we could not afford to build out networks for every type of vendor’s ICS hardware and network communications. Although we have a lot of ICS knowledge on the team we all were keenly aware that we are not experts in every aspect of every ICS industry we wanted to sell to. Customer feedback was (and still is) vital. To add to this we were pressed because we were competing with larger more established companies and technologies but on a very limited budget. So, instead of trying to sell an MVP we simply launched a BETA instead; the BETA lasted over twelve months. How did we accomplish this? We spent $0 on marketing and sales and focused entirely on staying lean and developing and validating our technology. We made contacts in the community, educated them on what we wanted to do, advised where the technology was tested and safe to deploy, and refused to charge our BETA participants for our time or product since they were greatly helping us and keeping our costs down. In turn we offered them discounts for when our product launched and offered some of our time to educate them in matters we did have expertise. This created strong relationships with our BETA participants that carried over when we launched our product to have them join us as customers. We even found new customers when we launched based on referrals from BETA participants vouching for our company. Or more simply stated: we were overly honest and upfront, avoided hype and buzzwords, and brought value so we were seen as fellow team members and not snake oil salesmen. I recommend more startups take this approach even under pressure and when it is difficult to differentiate in the market.

My conclusion: the MVP model in its intended form is not a bad model. In many communities it is an especially smart model. And just because a company is using an MVP route in this space does not mean they are abusing anyone or falling into the pitfalls I listed above. But, as a whole, in the critical infrastructure community it is a process that is more often abused than used correctly and it is damaging in the long term. Customers are not cash cows and guinea pigs – they are investors in your vision and partners. Startups should still push out technologies earlier than trying to wait to create a perfect product without the right feedback, but call these early pushes what they are. It is not a Minimum Viable Product it is a BETA test of core features. Customers should not be asked to spend limited budgets on top of their time and feedback for it nor should they be misled as to what part of the process they are helping in. You will find the community is more likely to help when they know you are being upfront even with understandable shortcomings.