Under the Hood of an MDR Company: an Exercise in Innovation [Video] – Security Boulevard
Running a successful EDR platform and MDR service is never-ending pursuit to stay one step ahead of hackers. As threat actors find creative ways to attempt to circumvent our customers defenses, we are constantly innovating to ensure we can detect and respond to security incidents quickly. Our founders Chris Gerritz and Russ Morris joined me (Kelly Giles, Marketing Director, nice to meet you) in this interview to answer some of my questions about exactly how they manage to continuously sniff out ransomware attacks.
The Early Years: Before Starting a Managed Detection and Response Company
Kelly: I’m here with our two co-founders, Chris Gerritz and Russ Morris. They’re going to answer some of our questions about how we get started and how we do what we do. So, first off – how did each of you get into security? Why don’t you go first, Chris?
Chris Gerritz, Co-founder and VP of Threat Intelligence
Chris: I got into security pretty early. Going through high school, I would hack my friends whenever we got together. I was a big nerd growing up, back then that was a bad thing today that’s a good thing. But ultimately, getting into the Air Force was my big professional entry into it. So in 2003, I joined the military out of high school and went into secure communications and server administration and things like that. Those were always passions of mine. When I came back from college, went back in the Air Force and ended up joining up with the cybersecurity teams in San Antonio, Texas, and that’s where I really got deep into Incident Response, hacking, and hunting.
Russ Morris, Co-founder and CTO
Kelly: Russ, how did you get into security?
Russ: Similar to Chris, I started pretty early. My first interest was programming and things like that, I was interested in how computers worked. That slowly morphed into how computers can break and take advantage of them, and that was really fun. I didn’t get into too much trouble. Eventually I found my way into the Air Force doing server administration. I got a great opportunity to work with the Air Force red team to do a whole bunch of pentesting for giant enterprise networks which was a good experience and a whole lot of fun. Eventually, I found myself in San Antonio working on the defensive side. Chris and I met up through work and decided to eventually marry our red team experience and blue team experience into what is now Infocyte.
Memory Inspection Innovation
Kelly: You have a patented memory inspection technique that I’ve heard quite a bit about. There’s even a plaque on our wall over in the Austin office. Can you tell us a little bit more about that?
Russ: Sure. So pretty early on. We were working first for the Air Force, and then when we set out into Infocyte, we realized that there was this necessity to be able to look for bad things happening in memory that weren’t necessarily resident on the system. That was a trick that we learned doing a lot of red teaming – there was a very easy way to avoid most of the detection and prevention software out there.
So we started looking using some of the various techniques that are widely known and widely available. Tools like Volatility are good examples of these things. We started thinking, there are bad guys out there that also know about these tools and they’re really good at evading them. So how can we find them? Instead of using off-the-shelf techniques that other people use, we came up with our own and implemented it.
We essentially go out and we look for things that are running inside of a process and executing from a place that they shouldn’t be. It’s really effective; we find all kinds of bad guys.
Bringing Sophisticated Techniques to the Masses
Chris: Russ is underselling it a little bit. Traditionally, this kind of technique really only occurs when you’re doing memory forensics on one host at a time. There’s a tool set called Volatility that’s the premier open source toolkit for memory forensics and those techniques. They’ve got some cool techniques in there to be able to take apart what’s running in a computer, what’s embedded in memory and what’s not on disc, and be able to extract that.
So what we went and applied, and what Russ ultimately improved upon, was a methodology for doing that at scale very quickly, choosing what not to do, how to avoid impacting the performance of the system. We did this so that we can do it near real-time. That whole process of identifying what’s normal and what’s not in memory is what we’ve been working on for the past half decade.
Success with Memory Analysis
Kelly: Do we have any success stories or case studies about how that’s helped us to prevent attacks?
Chris: It’s hard to say just one because it’s our most successful technique. Memory analysis, and what we find in memory, is our most successful technique. When we are behind an antivirus solution like Microsoft Defender or Cylance, they stop a lot of benign attacks that are happening from files getting executed. What ends up getting through, even on the most advanced security solutions out there, are these memory-resident-only implants. Those are what we find and what we concentrate on.
And so those are our most successful techniques. Almost every incident that we encounter is running into these things, especially over the last four years. The most well-used framework for attacking is Cobalt Strike. Cobalt Strike has a heavy amount of reliance on memory techniques that we identify very easily.
Staying Light on the Client’s Host
Kelly: How do we collect all that information without bogging down the host of a network?
Russ: There are a lot of different tools out there that want to inspect every single thing that happens on the system. Chris and I spent a whole lot of time working under very constrained environments where we were not allowed to impact the system or could impact it very minimally. We decided that that was a feature that we really wanted to bring to the commercial detection world.
We’re super targeted with the information that we grab. We do not try to just grab everything, dump it into a data warehouse and analyze it later. We know exactly what we need to make determinations about what’s good and what’s bad. That’s what we do. Our primary focus is how can we be the most effective by gathering the least amount of information? And that’s what we do!
Chris: There’s a lot of things in memory also that are completely benign and always will be. Just knowing what’s normal helps us focus on what we need. Just as an example, you may have a server with a lot of RAM – 128 gigabytes of RAM – but not every piece of that is executable content or even has a pointer to it where the CPU is actually executing any of that memory. So by just focusing on where the malware might be, especially if it’s active malware, there’s only so many places that can be. Focusing allows us to be pretty light on a system. We don’t have to do a full memory dump or do anything crazy that would impact performance of the system.
Behavioral Analytics
Kelly: Can you talk to me about behavioral monitoring?
Chris: That’s an exciting thing that’s been happening in our last few years, especially with this move to EDR which really kicked off with companies that were funded heavily to kind of reinvent endpoint. When we talk about legacy antivirus, we are really talking about solutions that look at files that are about to execute and say, “is this a good or bad file?” and that’s based on reputation, or signatures that we’ve seen a file with.
And say this thing is a “known bad” file. We won’t allow that file to ever execute on a system, because it’s going around infecting everybody. As the attackers started bypassing that kind of technique, they would use different behaviors that administrators already use. If they can steal an admin account, they don’t need malware; they can just use the scripting framework on every computer that’s already existing. It became less of a fight over whether a file is good or bad, and more about whether this behavior or this action or this command that they’re executing is good or bad.
That whole shift by the attacker is what created behavioral analytics. And so as an EDR, we collect endpoint telemetry on what is occurring and what commands are running. And so it’s not just, “this file is executing, it’s PowerShell, which is good” or just the normal applications that are on the system, it’s about how they’re being used. There are administrative utilities that are on every server. And a hacker that gets in can just funnel commands to those existing utilities, and how they’re using those utilities can match an adversary technique that’s fairly common.
Chris: And so, an organization called MITRE has developed a framework called ATT&CK. That framework maps those behaviors to known adversary techniques. We put a lot of R&D into making sure that we can identify those techniques and that we correlate the actions we’re seeing to them. We’re also doing research on new techniques as they come out and help contribute to that.
Russ: Back to the question about the performance and impact of systems. We collect a lot of this raw data on the system, but then we actually execute our behavioral rules in the cloud. We’re able to take all of this knowledge that we gain across our entire customer base, and figure out what the best set of rules would be to catch the most bad guys, and then run that at scale in the cloud. So, we’re not impacting the endpoints by forcing them to run thousands of rules or something like that.
A Managed Detection and Response Company’s Battle: Behavioral Monitoring vs. Ransomware
Kelly: Advanced threats like ransomware are all over the news. How does behavioral monitoring help us to detect and respond to those threats?
Chris: Some people commingle this but protection and controls to keep things out of your network are one thing, and that should be prioritized. A well-formed network should be able to stop things from executing they’re not allowed to execute. But, in those cases where they are allowed to execute, monitoring is really the only way to determine, “is this legitimate user doing something that is negative?” because they do have privileges as an administrator.
But, if a hacker steals that administrator account and they go and launch ransomware, from a behavioral standpoint, if you don’t have good enough analytics it’s hard to determine whether someone is encrypting a file because they’re supposed to be encrypting a file. Or, are they an administrator that got taken over and are encrypting every file on the file system? How do you differentiate between the two?
Behaviors Themselves are not “Always Bad” – Context is Key
Chris: And so when we talk about behaviors, one of the key things to understand is that even if a behavior matches an adversary technique, that doesn’t necessarily mean it’s always bad, because administrators have the option to use this type of thing. That’s one of the challenges, and that’s why monitoring is so important. It’s important to have humans behind the console, and managed detection and response is probably the most in-demand thing right now. The behaviors need some sort of filter on which you say, “is this an allowed technique or not?” Because if it was always bad, it would just be a vulnerability that we would close and turn off.
We have to have discernment. And that discernment happens at scale through our analytics engine that keeps getting tuned. So we can take the billions of events that do happen and funnel them down to a set that one person can go and triage for an entire customer set.
Correlating Behaviors of Various Priority Levels to Create a Cohesive Story
Kelly: What I’ve heard you say before, is that it’s not just about a single activity. It could be that there’s one suspicious thing that happens, or it could be that each individual piece might make sense on it’s own. But, looking at the context of all of them together, it starts to tell a different story. How do you do that?
Chris: This idea of correlation has existed for a while in the SIEM world. It used to be that you had a proxy, a firewall, and an antivirus and all these different solutions that are deployed, and they would funnel their data into a SIEM (security incident event manager), and that was supposed to correlate these events.
So, if you did see something coming from a host through a firewall, through a proxy but it was the same connection, those can be correlated together. That was the idea on the network layer at least because you’re going through multiple network devices. On the endpoint, this idea of correlation is different. With EDR technology and with behavior monitoring, we can identify these behaviors, and each of these individual behaviors could be an admin activity that would be low suspicious. With most attacks, we see a series of slightly suspicious actions, medium suspicious actions, and there might only be one highly suspicious action that only hackers do.
The problem is that if I just concentrate on alerting the high quality ones that don’t have any false positives, I’m going to miss all those actions that occur when it’s, “low, low, low, low low”. Correlation really helps us elevate a low priority action that has a lot of noise.
Understanding a Hacker’s First Steps
Chris: As an example, one of the first actions almost every hacker takes when they get on the system is a command called “who am I?”. It might not be that exact command since there are a couple different ways to get that information. What they want to understand is, “what user account did I take over? Did I take over an admin account, or did I take over a user account?”
So, they are going to run this command called “who am I?”. Any admin is going to do that, too, when they log into a system. If they’re running that command from a protected system process that doesn’t normally run it, then we’re going to flag that as a high priority. Otherwise, it’s a low priority and we won’t show it, but it is still happening.
Correlation helps us say, I know a behavior they’re going to do, and it’s not always suspicious in all cases. So if I can correlate other suspicious activity that happens in sequence with these low priority things that happen to have high noise, I can combine them together. Then we can say we have one high priority incident, one, highly suspicious action or set of actions, and you can call that one alert that funnels into multiple different activities.
So that whole concept is something that we’ve been working on a lot. Different parts of the community really demand that kind of activity. So you’re going to start seeing a lot more of that with Infocyte, where we’re combining alerts. Here in January, I think we’re having a big update where we’re going to have some consolidation, where you can see an action happened on 200 systems from one person, consolidated into one alert.
Managed Detection and Response: 24/7 Monitoring
Kelly: One of the ways we’re able to help our clients is our Command level service, or MDR, where we are monitoring their network 24×7. How do we do that, is it just you two and you just never sleep?
Chris: No, I’ve given up on that life. I used to do 80 hours a week, but then I got married! So no, we have a team. We try to bake our brains into the analytics engine and make that as powerful as possible so that we get the number of events down. So it’s this combination of automation through our analytics and our cloud resources that we have on threat intelligence. And then we funnel down to get to something that can be processed by a team.
So we do have a team at Infocyte that helps monitor this. For EDR use cases and behavioral monitoring, it does require a certain level of knowledge on how adversaries actually act when they get on the network and what’s normal on a network. And so it’s very hard to actually adopt behavioral monitoring or continuous monitoring on the network if you don’t have that skill set on hand. Few if any of our customers have 24/7 capability at that level.
Some Partners Have the Necessary Expertise
Chris: There are some power users on the Infocyte platform that do incident response that may have a SOC themselves. We have partners that have a SOC (security operations center), but ultimately, the knowledge of how this looks in our platform, with our telemetry, with our rules, and how we map to MITRE does require some particular knowledge. We provide that by augmenting the teams that we support through our security operations center. So we’re an augmentation team. We help digest what we’re seeing on our platform and get recommendations on how to respond to that or how to actually analyze that.
Responding at Scale
Kelly: How do we respond at scale across a network?
Chris: Russ has built some cool capabilities into our agent, so we’ve got a full endpoint API where we’re able to go in and delete files, kill processes and do all sorts of stuff like that. There are different options that we have for initiating a command that needs to do that. When our alerts come into our alert portal, we have the ability to right click on an alert and say, “Hey, I don’t like that file. I don’t like that event. Let’s go do something to that host.” That process that we teach might be initially collecting evidence off that host so we have additional context of what occurred. With a deep triage, we get all the event logs (not just what we’ve collected) followed by a host isolation.
Right now we actually have the ability to just host isolate as a blind thing. You can host isolate and then collect the evidence and determine what you want to do after that, because once you isolate, that system can only communicate with Infocyte and any whitelisted IPs that you’ve set. So that process at least contains the threat. And that’s really where Infocyte comes in – that first portion of the threat and scoping how many systems are compromised and containing that threat. Then, you can at least start breathing and start asking the other questions of how they got in or “how do we clean this up?”
Attacks are Not Instantaneous – With Early Detection, we can Contain the Threat
Chris: That containment is the priority in the first hour. If they’re getting into my network and they’re starting to take over my network, how do I stop them from crypto locking me? In which case they might be setting up for a 2am hit. There’s plenty of time, if you can detect the early stages of an attack, to go and stop them from doing that because oftentimes, these hackers don’t even know what they’ve got when they first get into a network. They don’t know how much to ask for if they’re going to crypto lock them.
So they need to know those things before they go and do that. You’ll never see an attack that’s ransomware, these days, that is instantaneous. There’s always some time. Back in the day they used to do things like instantaneous crypto locking, but it never got enough systems to make a difference. So these days there’s time from the entry vector to when they actually crypto locking you – there’s time to monitor. There’s time to contain the threat and figure out how they got in and plug that before they actually take over the network. We’ve done that multiple times for customers.
Try Out the Platform (for Free)
If you think the platform might be a good fit for you, I have good news. I would love to invite you to try out our Community Edition for free. You can deploy it in just a few minutes.