This blog is a continuation of a fantastic discussion with Richard Bejtlich. He responded to a question online, I blogged about it here and then he responded here and in response to another question I posed here. In this blog I’ll reply to his replies. The main point of this blog to me though is not to debate but to document my own experiences and look briefly at the evolution of terms and schools of thought as it applies to our cybersecurity field (the term cybersecurity vs. infosec is actually a great example of a term that has evolved yet to many still have two very different meanings and to some are interchangeable).
TL;DR Version: Richard and I seem to disagree on what an indicator is (reasonably so as his definition of one is perfectly supportable I just have a different view based on different experiences) which I didn’t previously realize was impacting how both of us viewed threat hunting. In doing so it seems that we’ve both come up under different “schools of thought” and therefore “hunting” can be leveraged in many different ways but you should strive to document and put structure to it if possible to really get the most value out of it.
Historical Context Matters But We Shouldn’t be Bound To It
In Richard’s blog he masterfully brings in some historical context of hunting’s origins (which few people really bring in history in a way Richard does but more should), or at least what was first documented (much which was done by him), in reference to the U.S. Air Force and National Security Agency. He also masterfully gives credit (another thing we should all strive more to do) to folks like Tony Sager and where his journey took him from the Air Force CERT to GE-CIRT where his use of the term was solidified even more with the wonderful analysts he cites.
What hunting was in the U.S. Air Force in the mid 2000’s and what it was at GE-CIRT shortly after doesn’t have to define the term for the rest of the field. It’s an amazing contribution, but not definitive. I think that’s, on face value, a frustrating topic. If I was at the GE-CIRT as an example and I really fleshed out a term my natural reaction would be to push against people using it differently (Richard is not doing that, I’m making a different point) but in reality terms, use-cases, and our field’s knowledge in general is constantly growing. There’s actually a logical fallacy called Appeal to Tradition which essentially states that something is what it is because it’s always been that way. But what’s interesting here is “hunting” was something numerous groups did. None documented or arguably had as much impact as the folks like Tony Sager, Richard, NSA VAO, AF-CERT, and the GE-CIRT but the experience is not less valid.
As an example, in my own experience in the National Security Agency one of my jobs was the Discovery Lead for one of the NSA Threat Operations Centers (NTOC) (would have been close proximity to Tony’s group but operating independently). Prior to this I had no knowledge of the term “hunting” and was not exposed to the schools of thought that were being developed at GE-CIRT or, interestingly enough, the use of the term and practice that Tony Sager and VAO were pursuing. I was part of a small team that established a new NTOC office at a field site and we were tasked with finding the “unknown unknowns”. This was admittedly very vague guidance but it was explained to us that the role of our NTOC office was explicitly finding the cyber threats that weren’t already documented and being tracked. Find what’s evading our insight to reveal collection gaps. We called this “Discovery” which, on a day-to-day basis, became known as “hunting.” The line between hunting and cyber threat intelligence though were very blurred for us because of our requirements; I would note that hunting was one way we went about satisfying our cyber threat intelligence requirements by identifying previously unknown intrusions (hunting) that we would then analyze (CTI). What we effectively were doing was taking an intelligence requirement, collecting against it through hunting, analyzing the intrusions we observed, producing insights, and distributing it to others as blogs, defensive mitigations and new detections, and reports. We used models such as the newly developed Diamond Model (previous to the paper’s publication) under the tutelage of Sergio Caltagirone and if I remember correctly we were the first team or one of the first to do so outside of his where he, Andrew Pendergast, and Chris Betz created the model. The interesting thing to the discussion here is that “hunting” was something that developed for my team without, to my knowledge, external prompt although I imagine there was cross-pollination from Sergio, Andrew, and Chris with the various teams they interacted with. For us, Discovery and our use of the term “hunting” was always about “finding new threats” but the main value in doing that was in identifying new defensive recommendations and gaps in collection even though we also found plenty of new threats to track. (For anyone curious I ended up choosing industrial control systems as the focus for my Discovery team since the collection would be so different and thus maybe the threats would be to; it turns out they were. This was a defining moment for me in ICS cyber security and a lot of what I had to develop in knowledge there at the NTOC helped in my journey and greatly inspired my work today at Dragos). Interestingly though one of the ways we found new threats was in the application of adversary tactics, techniques, and procedures as analytics/patterns instead of specific indicators. This aspect seems to distance Richard and I further which I’ll cover in the next section. But to close out the topic on the value you get out of hunting…
Richard acknowledges that you can identify visibility and resistance gaps as part of hunting, but it’s not the reason to go hunting.
I agree with this, but would say that identifying visibility or resistance gaps is a derivative benefit of hunting, not the reason to go hunting. Hunting IMO is an operation to detect adversaries. If you find a visibility or resistance gap, that is a bonus.
— Richard Bejtlich (@taosecurity) November 23, 2018
In response to that I would say it may not be the reason that he and others hunt but it was always the reason my team hunted early in my career and it shaped how I view hunting now. His school of thought is simply different. I would also wonder aloud if a term needs to change when the actions are the exact same but the value propositions you’re aiming for are numbered differently. If you do exactly the same things and use exactly the same approach but you hope to find threats instead of collection gaps should you rename your effort? I’m not sure on that yet but my initial thoughts would say no. The “school of thought” for hunting and how to achieve it was very specific for Richard and me; I assume there are others in other organizations and parts of the world who have similar experiences that make their schools of thought much different. Richard and I are obviously both heavily influenced by a U.S. Department of Defense flavoring. I’d be interested if anyone else had their own journeys to document around the same time period.
We Were Bound To Disagree
It is clear that Richard and I disagree on the term hunting and how it’s used but the important part is why we disagree. I actually have no issues whatsoever with how Richard is defining the term and its background for his uses; his experiences are unique and many including myself have benefited from them. I don’t think there is a right or wrong to this discussion, just a friendly exploration to flesh out the topic for ourselves (or at least mine) and others. If we go back to the original points though it was clear Richard and I were bound to disagree and I didn’t see it initially.
I referenced but then moved past a point early in my other blog that Richard referred to all threat detection falling into two categories of either “matching” or “hunting”. I thought it was an interesting discussion which I slightly disagreed with but hurried past to get to the core debate. I noted that everything is “Matching” or “Unknowns” but not that hunting is contained in one or the other; it can be across both. If you are matching indicators you are not hunting in my opinion which I think Richard would agree with. If you are matching adversary behaviors though you might be depending on how you’re doing it (if the behavior is already a defined detection then no, but if it’s a pattern you’re searching out to find new activity that isn’t defined, then yes). When in fact, that was my error and in reality the disagreement is rooted there. A subtle but important point because I didn’t realize (although to his credit he’s definitely written on it before) that Richard considers adversary TTPs to be a sub-class of indicators of compromise (IOCs). I asked him that question after his first blog and he was kind enough to answer it here.
Again, Richard is great to identify and credit influential folks such as David Bianco and Chris Sanders, both who I consider friends and have written things that have definitely helped me further my thoughts too. He references David’s Pyramid of Pain (an extremely useful model by the way) where it very clearly calls out TTPs as a type of Indicator. I’m going to disagree again, surprise surprise, but this is where everything “came together” for me in that the disagreement is in the evolution of terms and schools of thought and nothing more. David’s use of the term indicator is mostly in keeping with another foundational work by Eric Hutchins, Mike Cloppert, and Rohan Amin titled Intelligence-Driven Computer Network Defense Informed by Analysis of Adversary Campaigns and Intrusion Kill Chains. Yes the kill chain paper. Here they define out indicators to be atomic, computer, or behavioral. Behavioral indicators here are defined as those indicators that are made up of atomic and computer indicators to describe adversary activity, the example given in the paper is “the intruder would initially used a backdoor which generated network traffic matching [regular expression] at the rate of [some frequency] to [some IP address] and then replace it with one matching the MD5 hash [value] once access was established.” Where David’s use of the term seems to be in disagreement is “behavioral” which on its face would speak to TTPs but in reality TTPs can be described and leveraged now without any atomic or computed indicators to accompany them. My second point in the next paragraph will dive deeper into that.
Why is this such an important point to this discussion? For two main reasons.
- First, terms and schools of thought evolve. “Indicator” today is almost exclusively associated with indicator feeds and the atomic and computed form of indicators. Some still use behavioral indicators and talk about them quite a bit to great value. But for the majority of the industry if you say “indicator” they’ve come to learn about them, use them, and see value propositions defined by atomic and computed values. Security professionals using indicator that way or not wrong. We even see another “school of thought” coming up which is based around MITRE’s ATT&CK; quite a few of their use-cases would fit nicely into mine by using ATT&CK as one framework for guiding threat hunts. In one of their papers they specifically note the focus on tactics and techniques (TT in the TTP) for them is important to go beyond “traditional indicators of compromise.” Here it would seem they are not using TTPs as a form of IOC either.
- Second, what you can to do today for detection was not necessarily possible for most teams as little as a decade ago. The cybersecurity community has evolved quickly made many more advancements than we often credit ourselves. One such advancement is in the form of analytics. Analytics are effectively questions you ask of the data. Analytics have been around a long time in many forms but with the advancement of the community and of computing power we can now use analytics in more large scale ways and in a distributed fashion at scale (run the analytics at the edge where the data exists instead of pulling all the data back and then running analytics across it). One type of analytic, that I wrote about and referenced in the last blog when I mentioned the four types of detection paper, are threat analytics. Threat analytics effectively are adversary behaviors, i.e. TTPs or tradecraft (different things by the way). But they are not behavioral indicators in the way Hutchins, Cloppert, and Amin identified them. They don’t include any atomic or computed indicators; post detection there will be indicators but you don’t define the indicators ahead of the detection. The entire analytic might say “alert any time a file is dropped to a system, opens up a network port, generates network traffic to an external IP address, and then downloads an additional file once communications are established”. This analytic would get to the example given in the kill chain paper but without knowing the hash or IP address of the backdoor or anything about the adversary leveraging the behavior. This is done through an analytics engine now which have been around for awhile but are more common and accessible than ever before. When the analytic is defined it is “matching” to Richard’s point. But when I’m leveraging the TTP or tradecraft outside of the analytic to go search for and find new threats (numerous threats can leverage identical TTPs, so you’re searching for “unknown” threats using “known” tradecraft) I’m not matching any indicators and instead am using an intelligence-driven hypothesis to identify new threats. That’s EXACTLY what we did at the NTOC site I was at and we called that hunting.
As an aside, I think my second point is even how Richard is doing his hunting to some degree because in his blog he gives an example that you could tell an analyst to go look go HTTP user agents as they are being leveraged by the adversary for malicious network communications but not tell them what user agents to look for, that at a high level is a tactic of the adversary which would classify as a component of a TTP and not be an indicator. It is not some anomaly that is occurring to filter through but a hypothesis generated from some insight the defender had such as seeing adversaries do it before (intelligence-driven) or based on their own expertise in how the protocol should work (domain expertise). I don’t want to argue points for Richard though so maybe I’m not interpreting it correctly but I think if we spent more time (preferably over beer) on this we’d probably root out more commonalities in approaches.
All of that long winded way of saying: Richard and I fundamentally seem to disagree on what an indicator is which is having a flow down effect I didn’t previously realize to how we view and define threat hunting. We would still end up debating back and forth on the ordering of the value propositions but both agree that the value propositions all exist (find threats, identify gaps, develop new detections, etc.) which is another reason everyone should be hunting regardless of how you order the value you get out of it.
This has been fantastic. I have thoroughly enjoyed this back and forth, and thank you again Richard for being such a gracious person to explore it all with me while challenging me to document my experiences in this blog.