From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 364B2C004D2 for ; Sun, 30 Sep 2018 13:54:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CA3342075E for ; Sun, 30 Sep 2018 13:54:45 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kinvolk.io header.i=@kinvolk.io header.b="jHvNznUO" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CA3342075E Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kinvolk.io Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728526AbeI3U1v (ORCPT ); Sun, 30 Sep 2018 16:27:51 -0400 Received: from mail-pg1-f196.google.com ([209.85.215.196]:39781 "EHLO mail-pg1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728361AbeI3U1v (ORCPT ); Sun, 30 Sep 2018 16:27:51 -0400 Received: by mail-pg1-f196.google.com with SMTP id r9-v6so3756862pgv.6 for ; Sun, 30 Sep 2018 06:54:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kinvolk.io; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=xc2llaGd8XdanPLgXA8SXm8DPGWYXIutpqgOeDmyWc0=; b=jHvNznUOJ1CrwRJhS9Cb5FfimWpFOFB9U+EWaobpvg+VrKTIkR8OjiwDLcB2vIlqIO 5aIf+h9ZtS6V147GO5NVRntTtZtI+vgYPgxPdiiNdMoZ/2qBZKvGP7PQWZNr3MS7qlh0 tmXEIgbWBLeKtfofBtVlSyD3rYkma/MYSghT4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=xc2llaGd8XdanPLgXA8SXm8DPGWYXIutpqgOeDmyWc0=; b=PPZB5GdInScK2tN6wNWVF3fqZsb8YWgZsCrsFtySRov/mTPUSRvwXXVgI9be99q56z SUm1a/xls8sqo41DqsJtnFABmIMd5/rEgekXpufPuOt2S1vFxGD6nJzPasrdhH4bymYK dB8jBY+zK+MnQ73E3nAVBIS6CRUveWPWStTN5jR0Cb/rgJMKrdpnbX3tw+Alg6DLtHOI jnN8WUr7GvjAuLiEbOtAkEWaLpNM5ATBwbfJOiYEQTwiPFkCvsSyFVA2EtBdw5962caM Wm9Bdi+FJDLpEHo9SPEsfyTqh/CNzt3lHhsKDRSL7g5WULbBhRtVMgQgaXQPAReL7nLy HETw== X-Gm-Message-State: ABuFfoiJ/5DnF6LWcitl4ODsLgK9PESPVR9LRNEspmbdYsUcoc8EcZ43 wEY2pjpaG44utBe7GCXQ2TgP57PaEOj2G0V9Rb1e1Q== X-Google-Smtp-Source: ACcGV61dIr7PrwEe0odpmb2rMUfvtX6LZ/0tFzAQPfUBuKovyockjan9I9U9MvZgjYwdKp5I1+fnqzdAgduO9leCIRc= X-Received: by 2002:a17:902:f209:: with SMTP id gn9mr7371658plb.173.1538315682987; Sun, 30 Sep 2018 06:54:42 -0700 (PDT) MIME-Version: 1.0 References: <20180929103453.12025-1-cyphar@cyphar.com> In-Reply-To: <20180929103453.12025-1-cyphar@cyphar.com> From: Alban Crequy Date: Sun, 30 Sep 2018 15:54:31 +0200 Message-ID: Subject: Re: [PATCH 0/3] namei: implement various scoping AT_* flags To: cyphar@cyphar.com Cc: jlayton@kernel.org, bfields@fieldses.org, Alexander Viro , arnd@arndb.de, shuah@kernel.org, dhowells@redhat.com, luto@kernel.org, christian@brauner.io, "Eric W . Biederman" , tycho@tycho.ws, LKML , linux-fsdevel , linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org, dev , Linux Containers , jsafrane@redhat.com, msau@google.com Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Sep 29, 2018 at 12:35 PM Aleksa Sarai wrote: > > The need for some sort of control over VFS's path resolution (to avoid > malicious paths resulting in inadvertent breakouts) has been a very > long-standing desire of many userspace applications. This patchset is a > revival of Al Viro's old AT_NO_JUMPS[1] patchset with a few additions. > > The most obvious change is that AT_NO_JUMPS has been split as dicussed > in the original thread, along with a further split of AT_NO_PROCLINKS > which means that each individual property of AT_NO_JUMPS is now a > separate flag: > > * Path-based escapes from the starting-point using "/" or ".." are > blocked by AT_BENEATH. > * Mountpoint crossings are blocked by AT_XDEV. > * /proc/$pid/fd/$fd resolution is blocked by AT_NO_PROCLINKS (more > correctly it actually blocks any user of nd_jump_link() because it > allows out-of-VFS path resolution manipulation). > > AT_NO_JUMPS is now effectively (AT_BENEATH|AT_XDEV|AT_NO_PROCLINKS). At > Linus' suggestion in the original thread, I've also implemented > AT_NO_SYMLINKS which just denies _all_ symlink resolution (including > "proclink" resolution). It seems quite useful to me. > An additional improvement was made to AT_XDEV. The original AT_NO_JUMPS > path didn't consider "/tmp/.." as a mountpoint crossing -- this patch > blocks this as well (feel free to ask me to remove it if you feel this > is not sane). > > Currently I've only enabled these for openat(2) and the stat(2) family. > I would hope we could enable it for basically every *at(2) syscall -- > but many of them appear to not have a @flags argument and thus we'll > need to add several new syscalls to do this. I'm more than happy to send > those patches, but I'd prefer to know that this preliminary work is > acceptable before doing a bunch of copy-paste to add new sets of *at(2) > syscalls. What do you think of an equivalent feature AT_NO_SYMLINKS flag for mount()? I guess that would have made the fix for CVE-2017-1002101 in Kubernetes easier to write: https://kubernetes.io/blog/2018/04/04/fixing-subpath-volume-vulnerability/ > One additional feature I've implemented is AT_THIS_ROOT (I imagine this > is probably going to be more contentious than the refresh of > AT_NO_JUMPS, so I've included it in a separate patch). The patch itself > describes my reasoning, but the shortened version of the premise is that > continer runtimes need to have a way to resolve paths within a > potentially malicious rootfs. Container runtimes currently do this in > userspace[2] which has implicit race conditions that are not resolvable > in userspace (or use fork+exec+chroot and SCM_RIGHTS passing which is > inefficient). AT_THIS_ROOT allows for per-call chroot-like semantics for > path resolution, which would be invaluable for us -- and the > implementation is basically identical to AT_BENEATH (except that we > don't return errors when someone actually hits the root). > > I've added some selftests for this, but it's not clear to me whether > they should live here or in xfstests (as far as I can tell there are no > other VFS tests in selftests, while there are some tests that look like > generic VFS tests in xfstests). If you'd prefer them to be included in > xfstests, let me know. > > [1]: https://lore.kernel.org/patchwork/patch/784221/ > [2]: https://github.com/cyphar/filepath-securejoin > > Aleksa Sarai (3): > namei: implement O_BENEATH-style AT_* flags > namei: implement AT_THIS_ROOT chroot-like path resolution > selftests: vfs: add AT_* path resolution tests > > fs/fcntl.c | 2 +- > fs/namei.c | 158 ++++++++++++------ > fs/open.c | 10 ++ > fs/stat.c | 15 +- > include/linux/fcntl.h | 3 +- > include/linux/namei.h | 8 + > include/uapi/asm-generic/fcntl.h | 20 +++ > include/uapi/linux/fcntl.h | 10 ++ > tools/testing/selftests/Makefile | 1 + > tools/testing/selftests/vfs/.gitignore | 1 + > tools/testing/selftests/vfs/Makefile | 13 ++ > tools/testing/selftests/vfs/at_flags.h | 40 +++++ > tools/testing/selftests/vfs/common.sh | 37 ++++ > .../selftests/vfs/tests/0001_at_beneath.sh | 72 ++++++++ > .../selftests/vfs/tests/0002_at_xdev.sh | 54 ++++++ > .../vfs/tests/0003_at_no_proclinks.sh | 50 ++++++ > .../vfs/tests/0004_at_no_symlinks.sh | 49 ++++++ > .../selftests/vfs/tests/0005_at_this_root.sh | 66 ++++++++ > tools/testing/selftests/vfs/vfs_helper.c | 154 +++++++++++++++++ > 19 files changed, 707 insertions(+), 56 deletions(-) > create mode 100644 tools/testing/selftests/vfs/.gitignore > create mode 100644 tools/testing/selftests/vfs/Makefile > create mode 100644 tools/testing/selftests/vfs/at_flags.h > create mode 100644 tools/testing/selftests/vfs/common.sh > create mode 100755 tools/testing/selftests/vfs/tests/0001_at_beneath.sh > create mode 100755 tools/testing/selftests/vfs/tests/0002_at_xdev.sh > create mode 100755 tools/testing/selftests/vfs/tests/0003_at_no_proclinks.sh > create mode 100755 tools/testing/selftests/vfs/tests/0004_at_no_symlinks.sh > create mode 100755 tools/testing/selftests/vfs/tests/0005_at_this_root.sh > create mode 100644 tools/testing/selftests/vfs/vfs_helper.c > > -- > 2.19.0 From mboxrd@z Thu Jan 1 00:00:00 1970 From: alban at kinvolk.io (Alban Crequy) Date: Sun, 30 Sep 2018 15:54:31 +0200 Subject: [PATCH 0/3] namei: implement various scoping AT_* flags In-Reply-To: <20180929103453.12025-1-cyphar@cyphar.com> References: <20180929103453.12025-1-cyphar@cyphar.com> Message-ID: On Sat, Sep 29, 2018 at 12:35 PM Aleksa Sarai wrote: > > The need for some sort of control over VFS's path resolution (to avoid > malicious paths resulting in inadvertent breakouts) has been a very > long-standing desire of many userspace applications. This patchset is a > revival of Al Viro's old AT_NO_JUMPS[1] patchset with a few additions. > > The most obvious change is that AT_NO_JUMPS has been split as dicussed > in the original thread, along with a further split of AT_NO_PROCLINKS > which means that each individual property of AT_NO_JUMPS is now a > separate flag: > > * Path-based escapes from the starting-point using "/" or ".." are > blocked by AT_BENEATH. > * Mountpoint crossings are blocked by AT_XDEV. > * /proc/$pid/fd/$fd resolution is blocked by AT_NO_PROCLINKS (more > correctly it actually blocks any user of nd_jump_link() because it > allows out-of-VFS path resolution manipulation). > > AT_NO_JUMPS is now effectively (AT_BENEATH|AT_XDEV|AT_NO_PROCLINKS). At > Linus' suggestion in the original thread, I've also implemented > AT_NO_SYMLINKS which just denies _all_ symlink resolution (including > "proclink" resolution). It seems quite useful to me. > An additional improvement was made to AT_XDEV. The original AT_NO_JUMPS > path didn't consider "/tmp/.." as a mountpoint crossing -- this patch > blocks this as well (feel free to ask me to remove it if you feel this > is not sane). > > Currently I've only enabled these for openat(2) and the stat(2) family. > I would hope we could enable it for basically every *at(2) syscall -- > but many of them appear to not have a @flags argument and thus we'll > need to add several new syscalls to do this. I'm more than happy to send > those patches, but I'd prefer to know that this preliminary work is > acceptable before doing a bunch of copy-paste to add new sets of *at(2) > syscalls. What do you think of an equivalent feature AT_NO_SYMLINKS flag for mount()? I guess that would have made the fix for CVE-2017-1002101 in Kubernetes easier to write: https://kubernetes.io/blog/2018/04/04/fixing-subpath-volume-vulnerability/ > One additional feature I've implemented is AT_THIS_ROOT (I imagine this > is probably going to be more contentious than the refresh of > AT_NO_JUMPS, so I've included it in a separate patch). The patch itself > describes my reasoning, but the shortened version of the premise is that > continer runtimes need to have a way to resolve paths within a > potentially malicious rootfs. Container runtimes currently do this in > userspace[2] which has implicit race conditions that are not resolvable > in userspace (or use fork+exec+chroot and SCM_RIGHTS passing which is > inefficient). AT_THIS_ROOT allows for per-call chroot-like semantics for > path resolution, which would be invaluable for us -- and the > implementation is basically identical to AT_BENEATH (except that we > don't return errors when someone actually hits the root). > > I've added some selftests for this, but it's not clear to me whether > they should live here or in xfstests (as far as I can tell there are no > other VFS tests in selftests, while there are some tests that look like > generic VFS tests in xfstests). If you'd prefer them to be included in > xfstests, let me know. > > [1]: https://lore.kernel.org/patchwork/patch/784221/ > [2]: https://github.com/cyphar/filepath-securejoin > > Aleksa Sarai (3): > namei: implement O_BENEATH-style AT_* flags > namei: implement AT_THIS_ROOT chroot-like path resolution > selftests: vfs: add AT_* path resolution tests > > fs/fcntl.c | 2 +- > fs/namei.c | 158 ++++++++++++------ > fs/open.c | 10 ++ > fs/stat.c | 15 +- > include/linux/fcntl.h | 3 +- > include/linux/namei.h | 8 + > include/uapi/asm-generic/fcntl.h | 20 +++ > include/uapi/linux/fcntl.h | 10 ++ > tools/testing/selftests/Makefile | 1 + > tools/testing/selftests/vfs/.gitignore | 1 + > tools/testing/selftests/vfs/Makefile | 13 ++ > tools/testing/selftests/vfs/at_flags.h | 40 +++++ > tools/testing/selftests/vfs/common.sh | 37 ++++ > .../selftests/vfs/tests/0001_at_beneath.sh | 72 ++++++++ > .../selftests/vfs/tests/0002_at_xdev.sh | 54 ++++++ > .../vfs/tests/0003_at_no_proclinks.sh | 50 ++++++ > .../vfs/tests/0004_at_no_symlinks.sh | 49 ++++++ > .../selftests/vfs/tests/0005_at_this_root.sh | 66 ++++++++ > tools/testing/selftests/vfs/vfs_helper.c | 154 +++++++++++++++++ > 19 files changed, 707 insertions(+), 56 deletions(-) > create mode 100644 tools/testing/selftests/vfs/.gitignore > create mode 100644 tools/testing/selftests/vfs/Makefile > create mode 100644 tools/testing/selftests/vfs/at_flags.h > create mode 100644 tools/testing/selftests/vfs/common.sh > create mode 100755 tools/testing/selftests/vfs/tests/0001_at_beneath.sh > create mode 100755 tools/testing/selftests/vfs/tests/0002_at_xdev.sh > create mode 100755 tools/testing/selftests/vfs/tests/0003_at_no_proclinks.sh > create mode 100755 tools/testing/selftests/vfs/tests/0004_at_no_symlinks.sh > create mode 100755 tools/testing/selftests/vfs/tests/0005_at_this_root.sh > create mode 100644 tools/testing/selftests/vfs/vfs_helper.c > > -- > 2.19.0 From mboxrd@z Thu Jan 1 00:00:00 1970 From: alban@kinvolk.io (Alban Crequy) Date: Sun, 30 Sep 2018 15:54:31 +0200 Subject: [PATCH 0/3] namei: implement various scoping AT_* flags In-Reply-To: <20180929103453.12025-1-cyphar@cyphar.com> References: <20180929103453.12025-1-cyphar@cyphar.com> Message-ID: Content-Type: text/plain; charset="UTF-8" Message-ID: <20180930135431.vSvICVJV682oUqnVrtcL9O0pX4a1nnmHkcsqFvxzn8Y@z> On Sat, Sep 29, 2018@12:35 PM Aleksa Sarai wrote: > > The need for some sort of control over VFS's path resolution (to avoid > malicious paths resulting in inadvertent breakouts) has been a very > long-standing desire of many userspace applications. This patchset is a > revival of Al Viro's old AT_NO_JUMPS[1] patchset with a few additions. > > The most obvious change is that AT_NO_JUMPS has been split as dicussed > in the original thread, along with a further split of AT_NO_PROCLINKS > which means that each individual property of AT_NO_JUMPS is now a > separate flag: > > * Path-based escapes from the starting-point using "/" or ".." are > blocked by AT_BENEATH. > * Mountpoint crossings are blocked by AT_XDEV. > * /proc/$pid/fd/$fd resolution is blocked by AT_NO_PROCLINKS (more > correctly it actually blocks any user of nd_jump_link() because it > allows out-of-VFS path resolution manipulation). > > AT_NO_JUMPS is now effectively (AT_BENEATH|AT_XDEV|AT_NO_PROCLINKS). At > Linus' suggestion in the original thread, I've also implemented > AT_NO_SYMLINKS which just denies _all_ symlink resolution (including > "proclink" resolution). It seems quite useful to me. > An additional improvement was made to AT_XDEV. The original AT_NO_JUMPS > path didn't consider "/tmp/.." as a mountpoint crossing -- this patch > blocks this as well (feel free to ask me to remove it if you feel this > is not sane). > > Currently I've only enabled these for openat(2) and the stat(2) family. > I would hope we could enable it for basically every *at(2) syscall -- > but many of them appear to not have a @flags argument and thus we'll > need to add several new syscalls to do this. I'm more than happy to send > those patches, but I'd prefer to know that this preliminary work is > acceptable before doing a bunch of copy-paste to add new sets of *at(2) > syscalls. What do you think of an equivalent feature AT_NO_SYMLINKS flag for mount()? I guess that would have made the fix for CVE-2017-1002101 in Kubernetes easier to write: https://kubernetes.io/blog/2018/04/04/fixing-subpath-volume-vulnerability/ > One additional feature I've implemented is AT_THIS_ROOT (I imagine this > is probably going to be more contentious than the refresh of > AT_NO_JUMPS, so I've included it in a separate patch). The patch itself > describes my reasoning, but the shortened version of the premise is that > continer runtimes need to have a way to resolve paths within a > potentially malicious rootfs. Container runtimes currently do this in > userspace[2] which has implicit race conditions that are not resolvable > in userspace (or use fork+exec+chroot and SCM_RIGHTS passing which is > inefficient). AT_THIS_ROOT allows for per-call chroot-like semantics for > path resolution, which would be invaluable for us -- and the > implementation is basically identical to AT_BENEATH (except that we > don't return errors when someone actually hits the root). > > I've added some selftests for this, but it's not clear to me whether > they should live here or in xfstests (as far as I can tell there are no > other VFS tests in selftests, while there are some tests that look like > generic VFS tests in xfstests). If you'd prefer them to be included in > xfstests, let me know. > > [1]: https://lore.kernel.org/patchwork/patch/784221/ > [2]: https://github.com/cyphar/filepath-securejoin > > Aleksa Sarai (3): > namei: implement O_BENEATH-style AT_* flags > namei: implement AT_THIS_ROOT chroot-like path resolution > selftests: vfs: add AT_* path resolution tests > > fs/fcntl.c | 2 +- > fs/namei.c | 158 ++++++++++++------ > fs/open.c | 10 ++ > fs/stat.c | 15 +- > include/linux/fcntl.h | 3 +- > include/linux/namei.h | 8 + > include/uapi/asm-generic/fcntl.h | 20 +++ > include/uapi/linux/fcntl.h | 10 ++ > tools/testing/selftests/Makefile | 1 + > tools/testing/selftests/vfs/.gitignore | 1 + > tools/testing/selftests/vfs/Makefile | 13 ++ > tools/testing/selftests/vfs/at_flags.h | 40 +++++ > tools/testing/selftests/vfs/common.sh | 37 ++++ > .../selftests/vfs/tests/0001_at_beneath.sh | 72 ++++++++ > .../selftests/vfs/tests/0002_at_xdev.sh | 54 ++++++ > .../vfs/tests/0003_at_no_proclinks.sh | 50 ++++++ > .../vfs/tests/0004_at_no_symlinks.sh | 49 ++++++ > .../selftests/vfs/tests/0005_at_this_root.sh | 66 ++++++++ > tools/testing/selftests/vfs/vfs_helper.c | 154 +++++++++++++++++ > 19 files changed, 707 insertions(+), 56 deletions(-) > create mode 100644 tools/testing/selftests/vfs/.gitignore > create mode 100644 tools/testing/selftests/vfs/Makefile > create mode 100644 tools/testing/selftests/vfs/at_flags.h > create mode 100644 tools/testing/selftests/vfs/common.sh > create mode 100755 tools/testing/selftests/vfs/tests/0001_at_beneath.sh > create mode 100755 tools/testing/selftests/vfs/tests/0002_at_xdev.sh > create mode 100755 tools/testing/selftests/vfs/tests/0003_at_no_proclinks.sh > create mode 100755 tools/testing/selftests/vfs/tests/0004_at_no_symlinks.sh > create mode 100755 tools/testing/selftests/vfs/tests/0005_at_this_root.sh > create mode 100644 tools/testing/selftests/vfs/vfs_helper.c > > -- > 2.19.0