Airo AV |
Parent/child relationships are one of the simplest and most effective ways to detect malicious activity at the host level. On Unix, multiple methods can be used to create a process, all of which result in a different behavior on the operating system. These days, a majority of host-based endpoint technologies provide ways to view process trees and write detections based on them. However, there is a fundamental understanding of process spawning that security analysts don’t take the time to learn. This is unfortunate because even after identifying a malicious process tree some data within it can be overlooked or misunderstood.
The majority of programming languages that exist today have built-in functions for process creation, but high level languages end up wrapping a number of functionalities that are actually taking place at the low-level. Perhaps those that understand processes best are the dedicated coders still building code in C using the fork
and exec
system calls. With the new Endpoint Security Framework being all the hype, you’re likely to see a number of tools telling you something forked
or exec
’ed. So with that in mind, let’s get started by looking at some of the basics of macOS process creation techniques and how they affect threat hunting.
If you’re using a tool that allows you to collect processes as they’re created you’ve probably stumbled upon a process tree that looks like this:
The confusing part of the above process tree is that, logically, we expect the command sh -c whoami
to create a child process of whoami
. However, the whoami
child ends up getting created as a sibling process. The goal of this post is to solve the mystery of this confusing behavior.
Fork
One of the most basic ways to create a process is by using the fork system call, but keep reading. From a threat hunting perspective it might not be doing exactly what you think it’s doing. Take a look at the man
page (also known as using your man fork):
$ man fork FORK(2) BSD System Calls Manual FORK(2) NAME fork -- create a new process SYNOPSIS #includepid_t fork(void); DESCRIPTION fork() causes creation of a new process. The new process (child process) is an exact copy of the calling process (parent process) except for the following: o The child process has a unique process ID. o The child process has a different parent process ID (i.e., the process ID of the parent process). o The child process has its own copy of the parent's descriptors. These descriptors reference the same underlying objects, so that, for instance, file pointers in file objects are shared between the child and the parent, so that an lseek(2) on a descriptor in the child process can affect a subsequent read or write by the parent. This descriptor copying is also used by the shell to establish standard input and output for newly created processes as well as to set up pipes. o The child processes resource utilizations are set to 0; see setrlimit(2).
The most important note being right here in the description:
"fork() causes creation of a new process. The new process (child process) is an exact copy of the calling process (parent process)..."
Let’s quickly put together some code that runs the fork function so we can see what this looks like from a process tree perspective:
1/*iFork.c*/
2#include <stdio.h>
3#include <unistd.h>
4#include <sys/wait.h>
5
6int main() {
7
8 int pid = fork();
9
10 if (pid == 0) {
11 printf("Hello from the child. PID -> %u : PPID -> %d\n", getpid(), getppid());
12 } else {
13 printf("Hello from the parent. PID -> %u\n", getpid());
14 wait(NULL);
15 }
16
17 return 0;
18}
Next we compile the program and run it:
$ gcc iFork.c -o iFork $ ./iFork Hello from the parent. PID -> 50281 Hello from the child. PID -> 50282 : PPID -> 50281
Let’s break down what’s happening here. When our program starts, we call the fork()
function. As you can see, we don’t specify any type of path for the binary in which we want to execute. This is because fork is not designed to execute a new binary. Instead, it is designed to create an exact copy of the process that is already running. Some of you are probably looking at the resulting output from our code and wondering how in the world we got two lines of output when the only two print functions are in opposing sections of the if/else statement. Remember, when we called fork
we cloned this process, meaning it was run a total of two times: the time we executed it, and the time it forked and “re-executed itself” as a new process.
So if this process is a clone, how does it know if it’s the child or the parent? The fork function actually returns 0 if we are within the child instance. Finally, notice that we called the wait() function when inside of the parent process. This wait function will suspend the parent process until the child finishes executing. This allows us to ensure that we don’t kill the parent process while the child is still running. It’s worth noting that sometimes the child process may finish before the parent, and sometimes it may finish after.
On my system, the above code caused the following process tree to be created:
Right now you might be wondering what on earth this has to do with threat hunting. We’ll get into some additional reasons why it’s important to understand fork()
later, but for now when you encounter a program that executes itself you have a basic understanding of what is happening. It’s also important to note that programs sometimes fork inside of while loops. This leads to massive process trees that all appear as duplicate processes. For this reason a lot of endpoint security solutions don’t collect or display forks. In reality, when a program forks that doesn’t really tell us threat hunters anything. It’s almost as if a process opened a new thread (but it didn’t!). We’ll talk a bit more about forks later in this blog post because they are still important, but, for now, let’s take a look at another way to execute processes.
Exec
There are a number of functions within the exec family that allow you to create a new process image. However, if you read the documentation you’ll notice what’s happening here is actually much different than the fork function. What happens with exec is that the current process image is overwritten by a new one rather than a new pid being created. As you can imagine, this can get very confusing when analyzing a process tree because it means a process that used to exist has now been overwritten by whatever program the author decided to execute.
Let’s take a look at how this works. All exec functions are pretty similar. We will demonstrate by using the execvp()
function. We will use it to run the dash shell. I’ve chosen dash for multiple reasons. Mainly, because nothing else uses it, so it’s easy to spot in a process tree, and also because it’s a shell so it will remain open until we manually close it. This allows us all the time we want to dig into the resulting process tree. We will call this code dash_wrapper.c
:
1#include <stdio.h>
2#include <unistd.h>
3
4int main( void ) {
5
6 char *argv[] = { "dash", 0 };
7 execvp(argv[0], argv);
8
9}
Next, we’ll compile this into a binary called dash_wrapper
:
gcc dash_wrapper.c -o dash_wrapper
If we run our newly compiled executable called dash_wrapper
and then take a look at the resulting process tree when it’s executed from the terminal we see:
Here you can see that even though the program we executed was called dash_wrapper
, we see no such program in the process tree. Since I’m inside the terminal, it seems logical that zsh
would execute dash_wrapper
and dash_wrapper
would then go on to execute dash. Instead what happened is zsh
executed dash_wrapper
as pid 303
, and then dash_wrapper
used execvp()
to run the dash shell, resulting in dash overtaking the pid 303
process image.
So why is this relevant? It’s relevant because if you’re using a tool that records processes in real time, you’re bound to eventually see two processes created around the same time that share the same process id and the same parent process. From a developer perspective, note that once you’ve called exec
in this manner, you will not be able to return execution back to your original program. As soon as exec
runs, the new program takes over and the old disappears. This gets us a step closer to understanding the sh -c whoami
scenario described in the first section of this post.
Fork
+ Exec
So at this point we’ve covered two separate process creation functions: fork() which runs a program by cloning the currently running process and exec() which runs a program by overwriting the current process image. This brings us to the most common usage when it comes to process creation which is a combination of both fork and exec. As mentioned above, running exec will not allow a developer (or malware author) to return control of the original executed program. This is a problem for malware because it often uses built-in commands to collect recon data on the system. For example, If malware execs the uname -a
command and then wants to parse the output to get the kernel version, it won’t be able to. After the malware exec
s uname -a
the malware will be dead. So instead, what the malware author will do is first fork (duplicate) the malware process, and then exec
the uname -a
program within that fork
so that the duplicated process is then overtaken by the uname
command. After uname
terminates, control is then returned to the process that called it (the malware) and the developer is able get the output. This is all functionality that we take for granted nowadays thanks to high level programming languages that do it all for us. C Code to perform the fork
and exec
of uname
would look something like the following (thank you, stackoverflow):
1/* getUname.c */
2#include <stdio.h>
3#include <stdlib.h>
4#include <unistd.h>
5
6#define die(e) do { fprintf(stderr, "%s\n", e); exit(EXIT_FAILURE); } while (0);
7
8int main() {
9 char output[4096];
10
11 // Create a pipe so we can get the output of the uname command
12 int link[2];
13 if (pipe(link)==-1)
14 die("Pipe failure");
15
16 // Fork this process
17 pid_t pid;
18 if ((pid = fork()) == -1)
19 die("Fork failed");
20
21 if(pid == 0) {
22 // If pid returns 0 we are now writing code for the child process
23 // We will take this forked child process and exec uname with it
24 dup2 (link[1], STDOUT_FILENO);
25 close(link[0]);
26 close(link[1]);
27 char *argv[] = { "uname", "-a", 0 };
28 execvp(argv[0], argv);
29 die("Exec failed");
30
31 } else {
32
33 close(link[1]);
34 int nbytes = read(link[0], output, sizeof(output));
35
36 // Print the output of the uname command to the terminal
37 printf("%.*s", nbytes, output);
38 wait(NULL);
39
40 /* Do whatever else you want to do with the uname output here */
41
42 }
43 return 0;
44}
If you follow the comments you should get a good idea of what’s going on here. In the above code we first fork the current process which creates a clone of this process with a new pid. We then call exec while inside of that forked process which means the forked process will then be taken over by a new process image (uname
in our case). We can of course compile this code to an executable called getUname
like so:
gcc getUname.c -o getUname
If you are using a tool to capture processes as they run, you should notice that the getUname
command would create a tree that looks like:
Ah, at last. A process tree that simply makes sense. If only they could all work this way. This tells a clear story that the getUname
executable was run and when it ran it executed the uname
executable. We can make an easy assumption that the getUname
executable requires the output of the uname
command and that’s why it chose to run it.
Obviously, the code we compiled is not malicious. It’s just an example of how malware performing recon might look. In fact, malware often creates many different child processes in this same manner. Multiple executables that are already on the system are often executed to gather data about the system because malware authors don’t want to reinvent the wheel when writing code.
So at last, this brings us back to the question I opened this blog post with. If you’re collecting process creation in real time, what in the world is up with a process tree that looks like this?
The short answer is this: sh -c whoami
exec
’s twice without forking and that’s why we get three different processes names running as pid 303
. For those that want the long explanation, hold on to your butts…
For those who like to be hands on, I will first show that it’s very easy to reproduce this process tree with C code by using the system()
function. The system()
function is a quick and dirty way to run a program. It’s perfect for when we want to execute a command and don’t care about the output. It accomplishes this by running a new program using a sh -c
call (as you can see above).
1#include <stdlib.h>
2
3int main( void ) {
4 system("/usr/bin/whoami");
5 return 0;
6}
The system()
API uses fork
and exec
to create the sh -c whoami
process which is why we see it as a child process to some_program
, but why do we see bash
and whoami
as child processes to some_program
instead of to the sh -c whoami
command? To answer this question we must first take a peek at the sh
man page (also known as shmaning).
$ man sh SH(1) BSD General Commands Manual SH(1) NAME sh -- POSIX-compliant command interpreter SYNOPSIS sh [options] DESCRIPTION sh is a POSIX-compliant command interpreter (shell). It is implemented by re-execing as either bash(1), dash(1), or zsh(1) as determined by the symbolic link located at /private/var/select/sh. If /private/var/select/sh does not exist or does not point to a valid shell, sh will use one of the supported shells. ...
Apparently to be qualified as a POSIX-compliant command interpreter you only need the ability to pass arguments to another shell because by the looks of it that’s all that sh does! It does this by grabbing the shell at the symbolic link located at /private/var/select/sh
. (Also, a quick interesting tidbit is that this symbolic link will by default point at bash even if you’ve set your default shell to something else – 10.15.5
) Anyway, we’ve now discovered the next piece of the puzzle. The sh
shell is designed to take the arguments supplied to it and then turn around and pass those exact same arguments to the bash shell using exec()
.
If you scroll back up and take a quick look at the command line used for the bash
process you’ll see that it is sh -c whoami
. You might be wondering how it’s possible for bash
’s first argument to be “sh.” That’s a great question. As a threat hunter it’s strange to see this, but in reality a program, especially a program written in C, does not necessarily need to provide the program name as the first argument. As it turns out, this technique is a special way to run the bash shell. If you look at the behemoth that is the bash
man (Bash Man!. The lesser known Marvel superhero) page you’ll see what I’m talking about under the “invocation” section.
BASH(1) BASH(1) NAME bash - GNU Bourne-Again SHell SYNOPSIS bash [options] [file] COPYRIGHT Bash is Copyright (C) 1989-2005 by the Free Software Foundation, Inc. DESCRIPTION Bash is an sh-compatible command language interpreter that executes commands read from the standard input or from a file. Bash also incorporates useful features from the Korn and C shells (ksh and csh). ... INVOCATION ... If bash is invoked with the name sh, it tries to mimic the startup behavior of historical versions of sh as closely as possible, while conforming to the POSIX standard as well. ...
Did you catch that?
"If bash is invoked with the name sh, it tries to mimic the startup behavior of historical versions of sh as closely as possible."
If you find that confusing, you’re not alone. It’s fairly vague. However, based on the fact that we see this bash instance created where the first argument is sh
, I think we can assume that it means executing bash
while using sh
as the first argument, bash
will behave a bit differently.
This finally leads to bash
exec
’ing the whoami
command. Altogether, if we look at the actions that have occurred in a non-tree format, this is the order of events we see.
Notice here that pid 304
exec
’ed twice without any fork
s. In other words, this pid has been associated with three different process images. Ah, the wonderful world of Unix.
If we take all of these events and arrange them in a process tree format we get:
And there you have it. The extremely long winded answer to the question you were asking…or maybe you weren’t asking? Regardless, you can rest assured that your computer is behaving as expected when you see such events.
Some security solutions will try to display this data to you in a format that makes more sense to the standard user. As stated before, a lot of solutions already get rid of fork
s and just try to show you when different items exec
.
Not everybody cares about exactly what’s going on under the hood so long as threat analysts are provided a tool that’s useful (and we appreciate it). However, with the release of the Apple Endpoint Security Framework we’re bound to see some new tools that show us exactly what’s going on with processes on our systems. ProcessMonitor by Objective-See and Crescendo by FireEye are two great examples of this already. The future looks promising in terms of Mac security tools and you’ve probably picked up by now that I think understanding process creation on Mac is one of the most critical components for a Mac threat hunter to understand!
Happy (Threat) Hunting! 🏹 👾
You can support them via my Patreon page!