* [PATCH v5 0/2] Control over userfaultfd kernel-fault handling @ 2020-10-11 6:24 Lokesh Gidra 2020-10-11 6:24 ` [PATCH v5 1/2] Add UFFD_USER_MODE_ONLY Lokesh Gidra 2020-10-11 6:24 ` [PATCH v5 2/2] Add user-mode only option to unprivileged_userfaultfd sysctl knob Lokesh Gidra 0 siblings, 2 replies; 6+ messages in thread From: Lokesh Gidra @ 2020-10-11 6:24 UTC (permalink / raw) To: Kees Cook, Jonathan Corbet, Peter Xu, Andrea Arcangeli, Sebastian Andrzej Siewior, Andrew Morton Cc: Alexander Viro, Stephen Smalley, Eric Biggers, Lokesh Gidra, Daniel Colascione, Joel Fernandes (Google), linux-fsdevel, linux-kernel, linux-doc, kaleshsingh, calin, surenb, nnk, jeffv, kernel-team, Mike Rapoport, Shaohua Li, Jerome Glisse, Mauro Carvalho Chehab, Johannes Weiner, Mel Gorman, Nitin Gupta, Vlastimil Babka, Iurii Zaikin, Luis Chamberlain This patch series is split from [1]. The other series enables SELinux support for userfaultfd file descriptors so that its creation and movement can be controlled. It has been demonstrated on various occasions that suspending kernel code execution for an arbitrary amount of time at any access to userspace memory (copy_from_user()/copy_to_user()/...) can be exploited to change the intended behavior of the kernel. For instance, handling page faults in kernel-mode using userfaultfd has been exploited in [2, 3]. Likewise, FUSE, which is similar to userfaultfd in this respect, has been exploited in [4, 5] for similar outcome. This small patch series adds a new flag to userfaultfd(2) that allows callers to give up the ability to handle kernel-mode faults with the resulting UFFD file object. It then adds a 'user-mode only' option to the unprivileged_userfaultfd sysctl knob to require unprivileged callers to use this new flag. The purpose of this new interface is to decrease the chance of an unprivileged userfaultfd user taking advantage of userfaultfd to enhance security vulnerabilities by lengthening the race window in kernel code. [1] https://lore.kernel.org/lkml/20200211225547.235083-1-dancol@google.com/ [2] https://duasynt.com/blog/linux-kernel-heap-spray [3] https://duasynt.com/blog/cve-2016-6187-heap-off-by-one-exploit [4] https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html [5] https://bugs.chromium.org/p/project-zero/issues/detail?id=808 Changes since v4: - Added warning when bailing out from handling kernel fault. Changes since v3: - Modified the meaning of value '0' of unprivileged_userfaultfd sysctl knob. Setting this knob to '0' now allows unprivileged users to use userfaultfd, but can handle page faults in user-mode only. - The default value of unprivileged_userfaultfd sysctl knob is changed to '0'. Changes since v2: - Removed 'uffd_flags' and directly used 'UFFD_USER_MODE_ONLY' in userfaultfd(). Changes since v1: - Added external references to the threats from allowing unprivileged users to handle page faults from kernel-mode. - Removed the new sysctl knob restricting handling of page faults from kernel-mode, and added an option for the same in the existing 'unprivileged_userfaultfd' knob. Lokesh Gidra (2): Add UFFD_USER_MODE_ONLY Add user-mode only option to unprivileged_userfaultfd sysctl knob Documentation/admin-guide/sysctl/vm.rst | 15 ++++++++++----- fs/userfaultfd.c | 16 +++++++++++++--- include/uapi/linux/userfaultfd.h | 9 +++++++++ 3 files changed, 32 insertions(+), 8 deletions(-) -- 2.28.0.1011.ga647a8990f-goog ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v5 1/2] Add UFFD_USER_MODE_ONLY 2020-10-11 6:24 [PATCH v5 0/2] Control over userfaultfd kernel-fault handling Lokesh Gidra @ 2020-10-11 6:24 ` Lokesh Gidra 2020-10-24 2:08 ` Andrea Arcangeli 2020-10-11 6:24 ` [PATCH v5 2/2] Add user-mode only option to unprivileged_userfaultfd sysctl knob Lokesh Gidra 1 sibling, 1 reply; 6+ messages in thread From: Lokesh Gidra @ 2020-10-11 6:24 UTC (permalink / raw) To: Kees Cook, Jonathan Corbet, Peter Xu, Andrea Arcangeli, Sebastian Andrzej Siewior, Andrew Morton Cc: Alexander Viro, Stephen Smalley, Eric Biggers, Lokesh Gidra, Daniel Colascione, Joel Fernandes (Google), linux-fsdevel, linux-kernel, linux-doc, kaleshsingh, calin, surenb, nnk, jeffv, kernel-team, Mike Rapoport, Shaohua Li, Jerome Glisse, Mauro Carvalho Chehab, Johannes Weiner, Mel Gorman, Nitin Gupta, Vlastimil Babka, Iurii Zaikin, Luis Chamberlain, Daniel Colascione userfaultfd handles page faults from both user and kernel code. Add a new UFFD_USER_MODE_ONLY flag for userfaultfd(2) that makes the resulting userfaultfd object refuse to handle faults from kernel mode, treating these faults as if SIGBUS were always raised, causing the kernel code to fail with EFAULT. A future patch adds a knob allowing administrators to give some processes the ability to create userfaultfd file objects only if they pass UFFD_USER_MODE_ONLY, reducing the likelihood that these processes will exploit userfaultfd's ability to delay kernel page faults to open timing windows for future exploits. Signed-off-by: Daniel Colascione <dancol@google.com> Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> --- fs/userfaultfd.c | 10 +++++++++- include/uapi/linux/userfaultfd.h | 9 +++++++++ 2 files changed, 18 insertions(+), 1 deletion(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 0e4a3837da52..bd229f06d4e9 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -405,6 +405,13 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) if (ctx->features & UFFD_FEATURE_SIGBUS) goto out; + if ((vmf->flags & FAULT_FLAG_USER) == 0 && + ctx->flags & UFFD_USER_MODE_ONLY) { + printk_once(KERN_WARNING "uffd: Set unprivileged_userfaultfd " + "sysctl knob to 1 if kernel faults must be handled " + "without obtaining CAP_SYS_PTRACE capability\n"); + goto out; + } /* * If it's already released don't get it. This avoids to loop @@ -1975,10 +1982,11 @@ SYSCALL_DEFINE1(userfaultfd, int, flags) BUG_ON(!current->mm); /* Check the UFFD_* constants for consistency. */ + BUILD_BUG_ON(UFFD_USER_MODE_ONLY & UFFD_SHARED_FCNTL_FLAGS); BUILD_BUG_ON(UFFD_CLOEXEC != O_CLOEXEC); BUILD_BUG_ON(UFFD_NONBLOCK != O_NONBLOCK); - if (flags & ~UFFD_SHARED_FCNTL_FLAGS) + if (flags & ~(UFFD_SHARED_FCNTL_FLAGS | UFFD_USER_MODE_ONLY)) return -EINVAL; ctx = kmem_cache_alloc(userfaultfd_ctx_cachep, GFP_KERNEL); diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index e7e98bde221f..5f2d88212f7c 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -257,4 +257,13 @@ struct uffdio_writeprotect { __u64 mode; }; +/* + * Flags for the userfaultfd(2) system call itself. + */ + +/* + * Create a userfaultfd that can handle page faults only in user mode. + */ +#define UFFD_USER_MODE_ONLY 1 + #endif /* _LINUX_USERFAULTFD_H */ -- 2.28.0.1011.ga647a8990f-goog ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v5 1/2] Add UFFD_USER_MODE_ONLY 2020-10-11 6:24 ` [PATCH v5 1/2] Add UFFD_USER_MODE_ONLY Lokesh Gidra @ 2020-10-24 2:08 ` Andrea Arcangeli 0 siblings, 0 replies; 6+ messages in thread From: Andrea Arcangeli @ 2020-10-24 2:08 UTC (permalink / raw) To: Lokesh Gidra Cc: Kees Cook, Jonathan Corbet, Peter Xu, Sebastian Andrzej Siewior, Andrew Morton, Alexander Viro, Stephen Smalley, Eric Biggers, Daniel Colascione, Joel Fernandes (Google), linux-fsdevel, linux-kernel, linux-doc, kaleshsingh, calin, surenb, nnk, jeffv, kernel-team, Mike Rapoport, Shaohua Li, Jerome Glisse, Mauro Carvalho Chehab, Johannes Weiner, Mel Gorman, Nitin Gupta, Vlastimil Babka, Iurii Zaikin, Luis Chamberlain, Daniel Colascione On Sat, Oct 10, 2020 at 11:24:55PM -0700, Lokesh Gidra wrote: > userfaultfd handles page faults from both user and kernel code. > Add a new UFFD_USER_MODE_ONLY flag for userfaultfd(2) that makes > the resulting userfaultfd object refuse to handle faults from kernel > mode, treating these faults as if SIGBUS were always raised, causing > the kernel code to fail with EFAULT. > > A future patch adds a knob allowing administrators to give some > processes the ability to create userfaultfd file objects only if they > pass UFFD_USER_MODE_ONLY, reducing the likelihood that these processes > will exploit userfaultfd's ability to delay kernel page faults to open > timing windows for future exploits. > > Signed-off-by: Daniel Colascione <dancol@google.com> > Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v5 2/2] Add user-mode only option to unprivileged_userfaultfd sysctl knob 2020-10-11 6:24 [PATCH v5 0/2] Control over userfaultfd kernel-fault handling Lokesh Gidra 2020-10-11 6:24 ` [PATCH v5 1/2] Add UFFD_USER_MODE_ONLY Lokesh Gidra @ 2020-10-11 6:24 ` Lokesh Gidra 2020-10-24 2:48 ` Andrea Arcangeli 1 sibling, 1 reply; 6+ messages in thread From: Lokesh Gidra @ 2020-10-11 6:24 UTC (permalink / raw) To: Kees Cook, Jonathan Corbet, Peter Xu, Andrea Arcangeli, Sebastian Andrzej Siewior, Andrew Morton Cc: Alexander Viro, Stephen Smalley, Eric Biggers, Lokesh Gidra, Daniel Colascione, Joel Fernandes (Google), linux-fsdevel, linux-kernel, linux-doc, kaleshsingh, calin, surenb, nnk, jeffv, kernel-team, Mike Rapoport, Shaohua Li, Jerome Glisse, Mauro Carvalho Chehab, Johannes Weiner, Mel Gorman, Nitin Gupta, Vlastimil Babka, Iurii Zaikin, Luis Chamberlain With this change, when the knob is set to 0, it allows unprivileged users to call userfaultfd, like when it is set to 1, but with the restriction that page faults from only user-mode can be handled. In this mode, an unprivileged user (without SYS_CAP_PTRACE capability) must pass UFFD_USER_MODE_ONLY to userfaultd or the API will fail with EPERM. This enables administrators to reduce the likelihood that an attacker with access to userfaultfd can delay faulting kernel code to widen timing windows for other exploits. The default value of this knob is changed to 0. This is required for correct functioning of pipe mutex. However, this will fail postcopy live migration, which will be unnoticeable to the VM guests. To avoid this, set 'vm.userfault = 1' in /sys/sysctl.conf. For more details, refer to Andrea's reply [1]. [1] https://lore.kernel.org/lkml/20200904033438.GI9411@redhat.com/ Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> --- Documentation/admin-guide/sysctl/vm.rst | 15 ++++++++++----- fs/userfaultfd.c | 6 ++++-- 2 files changed, 14 insertions(+), 7 deletions(-) diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index 4b9d2e8e9142..4263d38c3c21 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -871,12 +871,17 @@ file-backed pages is less than the high watermark in a zone. unprivileged_userfaultfd ======================== -This flag controls whether unprivileged users can use the userfaultfd -system calls. Set this to 1 to allow unprivileged users to use the -userfaultfd system calls, or set this to 0 to restrict userfaultfd to only -privileged users (with SYS_CAP_PTRACE capability). +This flag controls the mode in which unprivileged users can use the +userfaultfd system calls. Set this to 0 to restrict unprivileged users +to handle page faults in user mode only. In this case, users without +SYS_CAP_PTRACE must pass UFFD_USER_MODE_ONLY in order for userfaultfd to +succeed. Prohibiting use of userfaultfd for handling faults from kernel +mode may make certain vulnerabilities more difficult to exploit. -The default value is 1. +Set this to 1 to allow unprivileged users to use the userfaultfd system +calls without any restrictions. + +The default value is 0. user_reserve_kbytes diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index bd229f06d4e9..0f8a975db3be 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -28,7 +28,7 @@ #include <linux/security.h> #include <linux/hugetlb.h> -int sysctl_unprivileged_userfaultfd __read_mostly = 1; +int sysctl_unprivileged_userfaultfd __read_mostly; static struct kmem_cache *userfaultfd_ctx_cachep __read_mostly; @@ -1976,7 +1976,9 @@ SYSCALL_DEFINE1(userfaultfd, int, flags) struct userfaultfd_ctx *ctx; int fd; - if (!sysctl_unprivileged_userfaultfd && !capable(CAP_SYS_PTRACE)) + if (!sysctl_unprivileged_userfaultfd && + (flags & UFFD_USER_MODE_ONLY) == 0 && + !capable(CAP_SYS_PTRACE)) return -EPERM; BUG_ON(!current->mm); -- 2.28.0.1011.ga647a8990f-goog ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v5 2/2] Add user-mode only option to unprivileged_userfaultfd sysctl knob 2020-10-11 6:24 ` [PATCH v5 2/2] Add user-mode only option to unprivileged_userfaultfd sysctl knob Lokesh Gidra @ 2020-10-24 2:48 ` Andrea Arcangeli 2020-10-24 4:08 ` Lokesh Gidra 0 siblings, 1 reply; 6+ messages in thread From: Andrea Arcangeli @ 2020-10-24 2:48 UTC (permalink / raw) To: Lokesh Gidra Cc: Kees Cook, Jonathan Corbet, Peter Xu, Sebastian Andrzej Siewior, Andrew Morton, Alexander Viro, Stephen Smalley, Eric Biggers, Daniel Colascione, Joel Fernandes (Google), linux-fsdevel, linux-kernel, linux-doc, kaleshsingh, calin, surenb, nnk, jeffv, kernel-team, Mike Rapoport, Shaohua Li, Jerome Glisse, Mauro Carvalho Chehab, Johannes Weiner, Mel Gorman, Nitin Gupta, Vlastimil Babka, Iurii Zaikin, Luis Chamberlain Hello everyone, On Sat, Oct 10, 2020 at 11:24:56PM -0700, Lokesh Gidra wrote: > With this change, when the knob is set to 0, it allows unprivileged > users to call userfaultfd, like when it is set to 1, but with the > restriction that page faults from only user-mode can be handled. > In this mode, an unprivileged user (without SYS_CAP_PTRACE capability) > must pass UFFD_USER_MODE_ONLY to userfaultd or the API will fail with > EPERM. > > This enables administrators to reduce the likelihood that > an attacker with access to userfaultfd can delay faulting kernel > code to widen timing windows for other exploits. > > The default value of this knob is changed to 0. This is required for > correct functioning of pipe mutex. However, this will fail postcopy > live migration, which will be unnoticeable to the VM guests. To avoid > this, set 'vm.userfault = 1' in /sys/sysctl.conf. For more details, > refer to Andrea's reply [1]. > > [1] https://lore.kernel.org/lkml/20200904033438.GI9411@redhat.com/ > > Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> Nobody commented so it seems everyone is on board with this change to synchronize the kernel default with the post-boot Android default. The email in the link above was pretty long, so the below would be a summary that could be added to the commit header: == The main reason this change is desirable as in the short term is that the Android userland will behave as with the sysctl set to zero. So without this commit, any Linux binary using userfaultfd to manage its memory would behave differently if run within the Android userland. == Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> BTW, this is still a minor nitpick, but a printk_once of the 1/2 could be added before the return -EPERM too, that's actually what I meant when I suggested to add a printk_once :), however the printk_once you added can turn out to be useful too for devs converting code to use bounce buffers, so it's fine too, just it could go under DEBUG_VM and to be ratelimited (similarly to the "FAULT_FLAG_ALLOW_RETRY missing %x\n" printk). Thanks, Andrea ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v5 2/2] Add user-mode only option to unprivileged_userfaultfd sysctl knob 2020-10-24 2:48 ` Andrea Arcangeli @ 2020-10-24 4:08 ` Lokesh Gidra 0 siblings, 0 replies; 6+ messages in thread From: Lokesh Gidra @ 2020-10-24 4:08 UTC (permalink / raw) To: Andrea Arcangeli Cc: Kees Cook, Jonathan Corbet, Peter Xu, Sebastian Andrzej Siewior, Andrew Morton, Alexander Viro, Stephen Smalley, Eric Biggers, Daniel Colascione, Joel Fernandes (Google), Linux FS Devel, linux-kernel, linux-doc, Kalesh Singh, Calin Juravle, Suren Baghdasaryan, Nick Kralevich, Jeffrey Vander Stoep, Cc: Android Kernel, Mike Rapoport, Shaohua Li, Jerome Glisse, Mauro Carvalho Chehab, Johannes Weiner, Mel Gorman, Nitin Gupta, Vlastimil Babka, Iurii Zaikin, Luis Chamberlain On Fri, Oct 23, 2020 at 7:48 PM Andrea Arcangeli <aarcange@redhat.com> wrote: > > Hello everyone, > > On Sat, Oct 10, 2020 at 11:24:56PM -0700, Lokesh Gidra wrote: > > With this change, when the knob is set to 0, it allows unprivileged > > users to call userfaultfd, like when it is set to 1, but with the > > restriction that page faults from only user-mode can be handled. > > In this mode, an unprivileged user (without SYS_CAP_PTRACE capability) > > must pass UFFD_USER_MODE_ONLY to userfaultd or the API will fail with > > EPERM. > > > > This enables administrators to reduce the likelihood that > > an attacker with access to userfaultfd can delay faulting kernel > > code to widen timing windows for other exploits. > > > > The default value of this knob is changed to 0. This is required for > > correct functioning of pipe mutex. However, this will fail postcopy > > live migration, which will be unnoticeable to the VM guests. To avoid > > this, set 'vm.userfault = 1' in /sys/sysctl.conf. For more details, > > refer to Andrea's reply [1]. > > > > [1] https://lore.kernel.org/lkml/20200904033438.GI9411@redhat.com/ > > > > Signed-off-by: Lokesh Gidra <lokeshgidra@google.com> > > Nobody commented so it seems everyone is on board with this change to > synchronize the kernel default with the post-boot Android default. > > The email in the link above was pretty long, so the below would be a > summary that could be added to the commit header: > > == > > The main reason this change is desirable as in the short term is that > the Android userland will behave as with the sysctl set to zero. So > without this commit, any Linux binary using userfaultfd to manage its > memory would behave differently if run within the Android userland. > > == Sure. I'll add it in the next revision. > > Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> > Thanks so much for the review. I hope it's ok to add your 'reviewed-by' in the next revision? > > BTW, this is still a minor nitpick, but a printk_once of the 1/2 could > be added before the return -EPERM too, that's actually what I meant > when I suggested to add a printk_once :), however the printk_once you > added can turn out to be useful too for devs converting code to use > bounce buffers, so it's fine too, just it could go under DEBUG_VM and > to be ratelimited (similarly to the "FAULT_FLAG_ALLOW_RETRY missing > %x\n" printk). I'll move the printk_once from 1/2 to this patch, as you suggested. > > Thanks, > Andrea > ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-10-24 4:08 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-10-11 6:24 [PATCH v5 0/2] Control over userfaultfd kernel-fault handling Lokesh Gidra 2020-10-11 6:24 ` [PATCH v5 1/2] Add UFFD_USER_MODE_ONLY Lokesh Gidra 2020-10-24 2:08 ` Andrea Arcangeli 2020-10-11 6:24 ` [PATCH v5 2/2] Add user-mode only option to unprivileged_userfaultfd sysctl knob Lokesh Gidra 2020-10-24 2:48 ` Andrea Arcangeli 2020-10-24 4:08 ` Lokesh Gidra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).