1 static int do_readlinkat(int dfd, const char __user
*
pathname, char __user
*
buf, int bufsiz),→
2 {
3 ...
4 error = security_inode_readlink(path.dentry);
5 if (!error) {
6 touch_atime(&path);
7 error = vfs_readlink(path.dentry, buf, bufsiz);
8 }
9 ...
10 }
(a) Kernel LSM usage in system call
readlinkat
.
vfs_readlink
(Line 7) is protected by
security_inode_readlink
(Line 4). Both
pathname
and buf (Line 1 and Line 7) are user controllable.
1 int ksys_ioctl(unsigned int fd, unsigned int cmd,
unsigned long arg),→
2 {
3 ...
4 error = security_file_ioctl(f.file, cmd, arg);
5 if (!error)
6 error = do_vfs_ioctl(f.file, fd, cmd, arg);
7 ...
8 }
9
10 int xfs_readlink_by_handle(struct file
*
parfilp,
xfs_fsop_handlereq_t
*
hreq),→
11 {
12 ...
13 error = vfs_readlink(dentry, hreq->ohandle, olen);
14 ...
15 }
(b) Kernel LSM usage in system call
ioctl
. It calls
security_file_ioctl
(Line 4) to protect
do_vfs_ioctl
(Line 6). hreq->ohandle and olen are also user controllable.
Figure 2: LSM check errors discovered by PeX.
only
CAP_SYS_RAWIO
, resulting in a missing permission check
for
CAP_SYS_ADMIN
. In particular, PeX detects this bug as an
inconsistent permission check because the two paths disagree
with each other, and further investigation shows that one is
redundant and the other is missing.
3.2 LSM Permission Check Errors
The example of LSM permission check errors is related to
how LSM hooks are instrumented for two different system
calls readlinkat and ioctl.
Figure 2a shows the LSM usage in the
readlinkat
system
call. On its call path,
vfs_readlink
(Line 7) is protected by
the LSM hook
security_inode_readlink
(Line 4) so that a
LSM-based MAC mechanism, such as SELinux or AppArmor,
can be realized to allow or deny the vfs_readlink operation.
Figure 2b presents two sub-functions for the system call
ioctl
. Similar to the above case,
ioctl
calls
ksys_ioctl
,
which includes its own LSM hook
security_file_ioctl
(Line 4) before
do_vfs_ioctl
(Line 6). This is proper design,
and there is no problem so far. However, it turns out that there
is a path from
do_vfs_ioctl
to
xfs_readlink_by_handle
(Line 10), which eventually calls the same privileged func-
tion
vfs_readlink
(see Line 7 in Figure 2a and Line 13
in Figure 2b). While this function is protected by the
security_inode_readlink
LSM hook in
readlinkat
, that
is not the case for the path to the function going through
xfs_readlink_by_handle
. The problem is that SELinux main-
tains separate ‘allow’ rules for
read
and
ioctl
. With the miss-
ing LSM
security_inode_readlink
check, a user only with
1 struct file_operations {
2 ...
3 ssize_t (
*
read_iter) (struct kiocb
*
, struct
iov_iter
*
);,!
4 ssize_t (
*
write_iter) (struct kiocb
*
, struct
iov_iter
*
);,!
5 ...
6 }
(a) The Virtual File System (VFS) kernel interface.
const struct file_operations ext4_file_operations
{
. . .
.read_iter = ext4_file_read_iter,
.write_iter = ext4_file_write_iter,
. . .
}
syscall(1, fd, buffer, count)
write(fd, buffer, count)
SyS_write(fd, buffer, count)
vfs_write(fd.file, buffer, count, fd.pos)
file->f_op->write_iter(kio, iter);
User space
Kernel space
syscall dispatcher
const struct file_operations nfs_file_operations
{
. . .
.read_iter = nfs_file_read,
.write_iter = nfs_file_write,
. . .
}
(b) VFS indirect calls in Linux kernel.
Figure 3: Indirect call examples via the VFS kernel interface.
the ‘ioctl allow rule’ may exploit the
ioctl
system call to
trigger the
vfs_readlink
operation, which should only be
permitted by the different ‘read allow rule’.
The above two Capability and LSM examples show how
challenging it is to ensure correct permission checks. There
are no tools available for kernel developers to rely on to
figure out whether a particular function should be protected
by a permission check; and, (if so) which permission checks
should be used.
4 Challenges
This section discusses two critical challenges in designing
static analysis for detecting permission errors in Linux kernel.
4.1 Indirect Call Analysis in Kernel
The first challenge lies in the frequent use of indirect calls in
Linux kernel and the difficulties in statically analyzing them
in a scalable and precise manner. To achieve a modular de-
sign, the kernel proposes a diverse set of abstraction layers
that specify the common interfaces to different concrete im-
plementations. For example, Virtual File System (VFS) [12]
abstracts a file system, thereby providing a unified and trans-
parent way to access local (e.g.,
ext4
) and network (e.g.,
nfs
)
storage devices. Under this kernel programming paradigm,
an abstraction layer defines an interface as a set of indirect
function pointers while a concrete module initializes these
pointers with its own implementations. For example, as shown
in Figure 3a, VFS abstracts all file system operations in a ker-