Traditional UNIX implementations distinguish between two categories of processes: privileged and unprivileged. Privileged processes bypass all kernel permission checks, while unprivileged processes are subject to full permission checking based on effective user and group ids (UID/GID), and supplementary group list.
With the introduction of capabilities in Linux kernel 2.2, this has changed. Capabilities (POSIX 1003.1e) are designed to split up the root privilege into a set of distinct privileges which can be independently enabled or disabled. These are used to restrict what a process running as root can do in the system. For instance, it is possible to deny filesystem mount operations, deny kernel module loading, prevent packet spoofing by denying access to raw sockets, deny altering attributes in the file system.
In this article we describe the Linux capabilities feature of Firejail security sandbox. Firejail allows the user to start programs with a specified set of capabilities. The set is applied to all processes running inside the sandbox, thus restricting what processes can do, and somehow reducing the attack surface of the kernel.
Building a Whitelist Capabilities Set
We start with a simple nginx web server example, and we use –caps.keep option to configure the allowed set of capabilities for the server processes. The set is expressed as a comma-separated list of names. For a list of all capabilities available on your system run man 7 capabilities or firejail –debug-caps:
For our nginx server we can tell off the bat that we need at least the following: CAP_SETUID, CAP_SETGID and CAP_NET_BIND_SERVICE. We need the first two because the server changes the user and group ids of the working processes from root to a generic unprivileged user. We need CAP_NET_BIND_SERVICE to bind to TCP port 80. We start the sandbox with this whitelist, and add more capabilities as required. First try:
The server tries to change the ownership of /var/lib/nginx/body, and failing to do so, shuts down. We need to add CAP_CHOWN to our whitelist:
With this modification, the server is running. The same way we can build capabilities list for all regular servers we use everyday.
These are whitelist examples for some common servers. For increased security, we also enable the default seccomp filter:
Notice how ISC DHCP server doesn’t require CAP_SETUID and CAP_SETGID, and it doesn’t drop root privileges. The server runs strictly as root. In this case capabilities and/or seccomp are the only solutions to restrict the server.
No capabilities are needed for running unprivileged user programs. Full permission checking is in effect inside the kernel for these processes. However, an attacker getting control of a user process can rise the process privileges by running setuid (SUID) programs or by exploiting the kernel directly. Once the process becomes root, it has all capabilities available to root user.
Firejail mitigates this case by dropping all capabilities from the inheritable capabilities set in profile files:
Linux capabilities are a simple, yet very effective method to restrict processes running as root. Firejail security sandbox can apply the same whitelist or blacklist filter to all processes in the sandbox.
Building whitelist filters is easy, usually based on errors reported as the program starts. There are about 35 capabilities available in the later Linux kernels, most servers need only a few of them. On kernels 3.5 or newer capabilities are used in conjunction with seccomp filters for increased security.