Seccomp-bpf stands for secure computing mode. It is a simple, yet effective sandboxing tool introduced in Linux kernel 3.5. It allows the user to attach a system call filter to a process and all its descendants, thus reducing the attack surface of the kernel. Seccomp filters are expressed in Berkeley Packet Filter (BPF) format.
In this article we build a whitelist seccomp filter and we attach it to a user program using Firejail sandbox. Throughout the article we use Transmission BitTorrent client as an example.
We start by extracting a list of syscalls the program uses, build the filter and run the program in Firejail. As new syscalls are discovered during testing, the filter is updated. When everything looks fine, we integrate the filter into a security profile suitable for Firejail. These are the steps:
Linux has several tools for listing syscalls. The easiest one to use seems to be strace (apt-get install strace). We start transmission-gtk in strace using -qcf options (quiet, count, follow).
We play for about 5 minutes with the program, go through some menus, start and stop a download etc.
As we close the program, strace prints the syscall list on the terminal:
We bring strace output (cut&paste) in a text editor and clean it up. We extract a comma-separated list without any blanks, something like:
We use –seccomp.keep option to start Firejail, and –shell=none to run the program directly without the extra syscalls required by a shell:
It looks ugly in this moment, a kilometer-long command line that doesn’t even work. For some reasons strace missed some syscalls. Time to bring in the system logger.
If we get errors in the terminal, we just add the missing syscall to the list and try again. But this is not always the case. Most of the time Linux kernel will just kill the process and send audit messages to syslog. For this reason, we keep another terminal open monitoring syslog:
The log entry tells us exactly what system call number crashed the program, syscall=201 in the example above. To associate the number with a name, we use firejail as follows:
We keep on adding syscalls to the list as they are reported and try again. To get Transmission working we ended up adding pwrite64,time,exit,exit_group on top of what strace reported – not too bad!
Firejail installs in /etc/firejail directory security profiles for several popular programs. The profiles define a manicured filesystem with most directories mounted read-only, and several files and directories blanked in $HOME, mainly files holding passwords and encryption keys.
Transmission BitTorrent client is supported, and the profile also defines a default seccomp blacklist filter. I want to upgrade this filter to the whitelist filter I’ve just built. For this, I go into ~/.config/firejail directory and copy the default Transmission profile there:
We add a “shell none” line, and we replace “seccomp” with “seccomp.keep poll,select,nanosleep,futex,epoll_wait,fadvise64,[…]”. The result looks like this:
The command “caps.drop all” in the security profile above disables all capabilities. Linux capabilities feature of Linux kernel is similar to seccomp, but works deep inside the kernel.
The command “protocol unix,inet,inet6” is the protocol filter. Only UNIX socket, IPv4 and IPv6 protocols are allowed. The protocol filter is also built as a seccomp filter.
Between seccomp, capabilities and protocols more than half the kernel code is disabled.
Firejail chooses the profile automatically, based on the name of the executable. To run Transmission with all security features enabled, the command is:
Whitelist seccomp filters are easy to build, yet they need lots of testing. The filters are not portable. For example this filter build on Debian Wheezy will not work on Ubuntu 14.04. The exact list of syscalls depends on the kernel running the system, the version of the program and all the libraries the program is linking in.