Seccomp Guide

 
 
 

Introduction

Seccomp-bpf stands for secure computing mode. It is a simple, yet effective sandboxing tool introduced in Linux kernel 3.5. It allows the user to attach a system call filter to a process and all its descendants, thus reducing the attack surface of the kernel. Seccomp filters are expressed in Berkeley Packet Filter (BPF) format.

In this article we build a whitelist seccomp filter and we attach it to a user program using Firejail sandbox. Throughout the article we use Transmission BitTorrent client as an example.

We start by extracting a list of syscalls the program uses, build the filter and run the program in Firejail. As new syscalls are discovered during testing, the filter is updated. When everything looks fine, we integrate the filter into a security profile suitable for Firejail. These are the steps:

 

Syscalls

Linux has several tools for listing syscalls. The easiest one to use seems to be strace (apt-get install strace). We start transmission-gtk in strace using -qcf options (quiet, count, follow).

$ strace -qcf transmission-gtk

We play for about 5 minutes with the program, go through some menus, start and stop a download etc.

transmission-gtk BitTorrent client

transmission-gtk BitTorrent client

As we close the program, strace prints the syscall list on the terminal:

% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
42.93 3.095527 247 12512 poll
19.64 1.416000 2975 476 select
13.65 0.984000 3046 323 nanosleep
12.09 0.871552 389 2239 330 futex
11.47 0.827229 77 10680 epoll_wait
0.08 0.005779 66 88 fadvise64
0.06 0.004253 4 1043 193 read
0.06 0.004000 3 1529 3 lstat
0.00 0.000344 0 2254 1761 stat
[...]
0.00 0.000000 0 1 fallocate
0.00 0.000000 0 24 eventfd2
0.00 0.000000 0 1 inotify_init1
------ ----------- ----------- --------- --------- ----------------
100.00 7.210150 95061 23256 total

 

Firejail

We bring strace output (cut&paste) in a text editor and clean it up. We extract a comma-separated list without any blanks, something like:

poll,select,nanosleep,futex,epoll_wait,fadvise64,read,lstat,stat,[...]

We use –seccomp.keep option to start Firejail, and –shell=none to run the program directly without the extra syscalls required by a shell:

$ firejail --shell=none --seccomp.keep=poll,select,[...] transmission-gtk

seccomp-xterm

It looks ugly in this moment, a kilometer-long command line that doesn’t even work. For some reasons strace missed some syscalls. Time to bring in the system logger.

 

Syslog

If we get errors in the terminal, we just add the missing syscall to the list and try again. But this is not always the case. Most of the time Linux kernel will just kill the process and send audit messages to syslog. For this reason, we keep another terminal open monitoring syslog:

$ sudo tail -f /var/log/syslog

seccomp-syslog

The log entry tells us exactly what system call number crashed the program, syscall=201 in the example above. To associate the number with a name, we use firejail as follows:

$ firejail --debug-syscalls | grep 201
201 - time
$

We keep on adding syscalls to the list as they are reported and try again. To get Transmission working we ended up adding pwrite64,time,exit,exit_group on top of what strace reported – not too bad!

 

Security profiles

Firejail installs in /etc/firejail directory security profiles for several popular programs. The profiles define a manicured filesystem with most directories mounted read-only, and several files and directories blanked in $HOME, mainly files holding passwords and encryption keys.

Transmission BitTorrent client is supported, and the profile also defines a default seccomp blacklist filter. I want to upgrade this filter to the whitelist filter I’ve just built. For this, I go into ~/.config/firejail directory and copy the default Transmission profile there:

$ cd ~/.config/firejail
$ cp /etc/firejail/transmission-gtk.profile .
$ vim transmission-gtk.profile

We add a “shell none” line, and we replace “seccomp” with “seccomp.keep poll,select,nanosleep,futex,epoll_wait,fadvise64,[…]”. The result looks like this:

$ cd ~/.config/firejail
$ cat transmission-gtk.profile
# transmission-gtk profile
include /etc/firejail/disable-mgmt.inc
include /etc/firejail/disable-secret.inc
include /etc/firejail/disable-common.inc
include /etc/firejail/disable-devel.inc
blacklist ${HOME}/.pki/nssdb
blacklist ${HOME}/.lastpass
blacklist ${HOME}/.keepassx
blacklist ${HOME}/.password-store
blacklist ${HOME}/.wine
caps.drop all
protocol unix,inet,inet6
netfilter
noroot
tracelog
shell none
seccomp.keep poll,select,nanosleep,futex,epoll_wait,fadvise64,read,lstat,stat,epoll_ctl,sendto,readv,recvfrom,ioctl,write,inotify_add_watch,writev,socket,getdents,mprotect,mmap,open,close,fstat,lseek,munmap,brk,rt_sigaction,rt_sigprocmask,access,pipe,madvise,connect,sendmsg,recvmsg,bind,listen,getsockname,getpeername,socketpair,setsockopt,getsockopt,clone,execve,uname,fcntl,ftruncate,rename,mkdir,rmdir,unlink,readlink,umask,getrlimit,getrusage,times,getuid,getgid,geteuid,getegid,getresuid,getresgid,statfs,fstatfs,prctl,arch_prctl,epoll_create,set_tid_address,clock_getres,inotify_rm_watch,set_robust_list,fallocate,eventfd2,inotify_init1,pwrite64,time,exit,exit_group
$

The command “caps.drop all” in the security profile above disables all capabilities. Linux capabilities feature of Linux kernel is similar to seccomp, but works deep inside the kernel.

The command “protocol unix,inet,inet6” is the protocol filter. Only UNIX socket, IPv4 and IPv6 protocols are allowed. The protocol filter is also built as a seccomp filter.

Between seccomp, capabilities and protocols more than half the kernel code is disabled.

Firejail chooses the profile automatically, based on the name of the executable. To run Transmission with all security features enabled, the command is:

$ firejail transmission-gtk

transmission-gtk started in Firejail using the profile file

transmission-gtk started in Firejail using the profile file

 

Conclusion

Whitelist seccomp filters are easy to build, yet they need lots of testing. The filters are not portable. For example this filter build on Debian Wheezy will not work on Ubuntu 14.04. The exact list of syscalls depends on the kernel running the system, the version of the program and all the libraries the program is linking in.

2 thoughts on “Seccomp Guide

  1. Pingback: Игры с песочницей. Выбираем простое и быстрое решение для изоляции приложений | Vulner [beta]

  2. Pingback: Игры с песочницей. Выбираем простое и быстрое решение для изоляции приложений — INTRO STORE

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s