linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] userfaultfd: allow to forbid unprivileged users
@ 2019-03-11  9:36 Peter Xu
  2019-03-11  9:36 ` [PATCH 1/3] userfaultfd/sysctl: introduce unprivileged_userfaultfd Peter Xu
                   ` (5 more replies)
  0 siblings, 6 replies; 28+ messages in thread
From: Peter Xu @ 2019-03-11  9:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Paolo Bonzini, Hugh Dickins, Luis Chamberlain, Maxime Coquelin,
	kvm, Jerome Glisse, Pavel Emelyanov, Johannes Weiner, peterx,
	Martin Cracauer, Denis Plotnikov, linux-mm, Marty McFadden,
	Maya Gokhale, Mike Kravetz, Andrea Arcangeli, Mike Rapoport,
	Kees Cook, Mel Gorman, Kirill A . Shutemov, linux-fsdevel,
	Dr . David Alan Gilbert, Andrew Morton

Hi,

(The idea comes from Andrea, and following discussions with Mike and
 other people)

This patchset introduces a new sysctl flag to allow the admin to
forbid users from using userfaultfd:

  $ cat /proc/sys/vm/unprivileged_userfaultfd
  [disabled] enabled kvm

  - When set to "disabled", all unprivileged users are forbidden to
    use userfaultfd syscalls.

  - When set to "enabled", all users are allowed to use userfaultfd
    syscalls.

  - When set to "kvm", all unprivileged users are forbidden to use the
    userfaultfd syscalls, except the user who has permission to open
    /dev/kvm.

This new flag can add one more layer of security to reduce the attack
surface of the kernel by abusing userfaultfd.  Here we grant the
thread userfaultfd permission by checking against CAP_SYS_PTRACE
capability.  By default, the value is "disabled" which is the most
strict policy.  Distributions can have their own perferred value.

The "kvm" entry is a bit special here only to make sure that existing
users like QEMU/KVM won't break by this newly introduced flag.  What
we need to do is simply set the "unprivileged_userfaultfd" flag to
"kvm" here to automatically grant userfaultfd permission for processes
like QEMU/KVM without extra code to tweak these flags in the admin
code.

Patch 1:  The interface patch to introduce the flag

Patch 2:  The KVM related changes to detect opening of /dev/kvm

Patch 3:  Apply the flag to userfaultfd syscalls

All comments would be greatly welcomed.  Thanks,

Peter Xu (3):
  userfaultfd/sysctl: introduce unprivileged_userfaultfd
  kvm/mm: introduce MMF_USERFAULTFD_ALLOW flag
  userfaultfd: apply unprivileged_userfaultfd check

 fs/userfaultfd.c               | 121 +++++++++++++++++++++++++++++++++
 include/linux/sched/coredump.h |   1 +
 include/linux/userfaultfd_k.h  |   5 ++
 init/Kconfig                   |  11 +++
 kernel/sysctl.c                |  11 +++
 virt/kvm/kvm_main.c            |   7 ++
 6 files changed, 156 insertions(+)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/3] userfaultfd/sysctl: introduce unprivileged_userfaultfd
  2019-03-11  9:36 [PATCH 0/3] userfaultfd: allow to forbid unprivileged users Peter Xu
@ 2019-03-11  9:36 ` Peter Xu
  2019-03-12  6:58   ` Mike Rapoport
  2019-03-11  9:37 ` [PATCH 2/3] kvm/mm: introduce MMF_USERFAULTFD_ALLOW flag Peter Xu
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 28+ messages in thread
From: Peter Xu @ 2019-03-11  9:36 UTC (permalink / raw)
  To: linux-kernel
  Cc: Paolo Bonzini, Hugh Dickins, Luis Chamberlain, Maxime Coquelin,
	kvm, Jerome Glisse, Pavel Emelyanov, Johannes Weiner, peterx,
	Martin Cracauer, Denis Plotnikov, linux-mm, Marty McFadden,
	Maya Gokhale, Mike Kravetz, Andrea Arcangeli, Mike Rapoport,
	Kees Cook, Mel Gorman, Kirill A . Shutemov, linux-fsdevel,
	Dr . David Alan Gilbert, Andrew Morton

Introduce a new sysctl called "vm.unprivileged_userfaultfd" that can
be used to decide whether userfaultfd syscalls are allowed by
unprivileged users.  It'll allow three modes:

  - disabled: disallow unprivileged users to use uffd

  - enabled:  allow unprivileged users to use uffd

  - kvm:      allow unprivileged users to use uffd only if the user
              had enough permission to open /dev/kvm (this option only
              exists if the kernel turned on KVM).

This patch only introduce the new interface but not yet applied it to
the userfaultfd syscalls, which will be done in the follow up patch.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 fs/userfaultfd.c              | 96 +++++++++++++++++++++++++++++++++++
 include/linux/userfaultfd_k.h |  5 ++
 init/Kconfig                  | 11 ++++
 kernel/sysctl.c               | 11 ++++
 4 files changed, 123 insertions(+)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 89800fc7dc9d..c2188464555a 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -29,6 +29,8 @@
 #include <linux/ioctl.h>
 #include <linux/security.h>
 #include <linux/hugetlb.h>
+#include <linux/sysctl.h>
+#include <linux/string.h>
 
 static struct kmem_cache *userfaultfd_ctx_cachep __read_mostly;
 
@@ -93,6 +95,95 @@ struct userfaultfd_wake_range {
 	unsigned long len;
 };
 
+enum unprivileged_userfaultfd {
+	/* Disallow unprivileged users to use userfaultfd syscalls */
+	UFFD_UNPRIV_DISABLED = 0,
+	/* Allow unprivileged users to use userfaultfd syscalls */
+	UFFD_UNPRIV_ENABLED,
+#if IS_ENABLED(CONFIG_KVM)
+	/*
+	 * Allow unprivileged users to use userfaultfd syscalls only
+	 * if the user had enough permission to open /dev/kvm
+	 */
+	UFFD_UNPRIV_KVM,
+#endif
+	UFFD_UNPRIV_NUM,
+};
+
+static int unprivileged_userfaultfd __read_mostly;
+static const char *unprivileged_userfaultfd_str[UFFD_UNPRIV_NUM] = {
+	"disabled", "enabled",
+#if IS_ENABLED(CONFIG_KVM)
+	"kvm",
+#endif
+};
+
+static int unprivileged_uffd_parse(char *buf, size_t size)
+{
+	int i;
+
+	for (i = 0; i < UFFD_UNPRIV_NUM; i++) {
+		if (!strncmp(unprivileged_userfaultfd_str[i], buf, size)) {
+			unprivileged_userfaultfd = i;
+			return 0;
+		}
+	}
+
+	return -EFAULT;
+}
+
+static void unprivileged_uffd_dump(char *buf, size_t size)
+{
+	int i;
+
+	*buf = 0x00;
+	for (i = 0; i < UFFD_UNPRIV_NUM; i++) {
+		if (i == unprivileged_userfaultfd)
+			strncat(buf, "[", size - strlen(buf));
+		strncat(buf, unprivileged_userfaultfd_str[i],
+			size - strlen(buf));
+		if (i == unprivileged_userfaultfd)
+			strncat(buf, "]", size - strlen(buf));
+		strncat(buf, " ", size - strlen(buf));
+	}
+
+}
+
+int proc_unprivileged_userfaultfd(struct ctl_table *table, int write,
+				  void __user *buffer, size_t *lenp,
+				  loff_t *ppos)
+{
+	struct ctl_table tmp_table = { .maxlen = 0 };
+	int ret;
+
+	if (write) {
+		tmp_table.maxlen = UFFD_UNPRIV_STRLEN;
+		tmp_table.data = kmalloc(UFFD_UNPRIV_STRLEN, GFP_KERNEL);
+
+		ret = proc_dostring(&tmp_table, write, buffer, lenp, ppos);
+		if (ret)
+			goto out;
+
+		ret = unprivileged_uffd_parse(tmp_table.data,
+					      UFFD_UNPRIV_STRLEN);
+	} else {
+		/* Leave space for "[]" */
+		int len = UFFD_UNPRIV_STRLEN * UFFD_UNPRIV_NUM + 2;
+
+		tmp_table.maxlen = len;
+		tmp_table.data = kmalloc(len, GFP_KERNEL);
+
+		unprivileged_uffd_dump(tmp_table.data, len);
+
+		ret = proc_dostring(&tmp_table, write, buffer, lenp, ppos);
+	}
+
+out:
+	if (tmp_table.data)
+		kfree(tmp_table.data);
+	return ret;
+}
+
 static int userfaultfd_wake_function(wait_queue_entry_t *wq, unsigned mode,
 				     int wake_flags, void *key)
 {
@@ -1955,6 +2046,11 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
 
 static int __init userfaultfd_init(void)
 {
+	char unpriv_uffd[UFFD_UNPRIV_STRLEN] =
+	    CONFIG_USERFAULTFD_UNPRIVILEGED_DEFAULT;
+
+	unprivileged_uffd_parse(unpriv_uffd, sizeof(unpriv_uffd));
+
 	userfaultfd_ctx_cachep = kmem_cache_create("userfaultfd_ctx_cache",
 						sizeof(struct userfaultfd_ctx),
 						0,
diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index 37c9eba75c98..f53bc02ccffc 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -28,6 +28,11 @@
 #define UFFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK)
 #define UFFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS)
 
+#define UFFD_UNPRIV_STRLEN 16
+int proc_unprivileged_userfaultfd(struct ctl_table *table, int write,
+				  void __user *buffer, size_t *lenp,
+				  loff_t *ppos);
+
 extern vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason);
 
 extern ssize_t mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start,
diff --git a/init/Kconfig b/init/Kconfig
index c9386a365eea..d90caa4fed17 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1512,6 +1512,17 @@ config USERFAULTFD
 	  Enable the userfaultfd() system call that allows to intercept and
 	  handle page faults in userland.
 
+config USERFAULTFD_UNPRIVILEGED_DEFAULT
+        string "Default behavior for unprivileged userfault syscalls"
+        depends on USERFAULTFD
+        default "disabled"
+        help
+          Set this to "enabled" to allow userfaultfd syscalls from
+          unprivileged users.  Set this to "disabled" to forbid
+          userfaultfd syscalls from unprivileged users.  Set this to
+          "kvm" to forbid unpriviledged users but still allow users
+          who had enough permission to open /dev/kvm.
+
 config ARCH_HAS_MEMBARRIER_CALLBACKS
 	bool
 
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 7578e21a711b..5dc9f3d283dd 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -96,6 +96,9 @@
 #ifdef CONFIG_LOCKUP_DETECTOR
 #include <linux/nmi.h>
 #endif
+#ifdef CONFIG_USERFAULTFD
+#include <linux/userfaultfd_k.h>
+#endif
 
 #if defined(CONFIG_SYSCTL)
 
@@ -1704,6 +1707,14 @@ static struct ctl_table vm_table[] = {
 		.extra1		= (void *)&mmap_rnd_compat_bits_min,
 		.extra2		= (void *)&mmap_rnd_compat_bits_max,
 	},
+#endif
+#ifdef CONFIG_USERFAULTFD
+	{
+		.procname	= "unprivileged_userfaultfd",
+		.maxlen		= UFFD_UNPRIV_STRLEN,
+		.mode		= 0644,
+		.proc_handler	= proc_unprivileged_userfaultfd,
+	},
 #endif
 	{ }
 };
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/3] kvm/mm: introduce MMF_USERFAULTFD_ALLOW flag
  2019-03-11  9:36 [PATCH 0/3] userfaultfd: allow to forbid unprivileged users Peter Xu
  2019-03-11  9:36 ` [PATCH 1/3] userfaultfd/sysctl: introduce unprivileged_userfaultfd Peter Xu
@ 2019-03-11  9:37 ` Peter Xu
  2019-03-11  9:37 ` [PATCH 3/3] userfaultfd: apply unprivileged_userfaultfd check Peter Xu
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 28+ messages in thread
From: Peter Xu @ 2019-03-11  9:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Paolo Bonzini, Hugh Dickins, Luis Chamberlain, Maxime Coquelin,
	kvm, Jerome Glisse, Pavel Emelyanov, Johannes Weiner, peterx,
	Martin Cracauer, Denis Plotnikov, linux-mm, Marty McFadden,
	Maya Gokhale, Mike Kravetz, Andrea Arcangeli, Mike Rapoport,
	Kees Cook, Mel Gorman, Kirill A . Shutemov, linux-fsdevel,
	Dr . David Alan Gilbert, Andrew Morton

Introduce a new MMF_USERFAULTFD_ALLOW flag and tag it upon the process
memory address space as long as the process opened the /dev/kvm once.
It'll be dropped automatically when fork() by MMF_INIT_TASK to reset
the userfaultfd permission.

Detecting the flag gives us a chance to open the green light for kvm
upon using userfaultfd when we want to make sure all the existing kvm
users will still be able to run their userspace programs without being
affected by the new unprivileged userfaultfd switch.

Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/linux/sched/coredump.h | 1 +
 virt/kvm/kvm_main.c            | 7 +++++++
 2 files changed, 8 insertions(+)

diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
index ecdc6542070f..9f6e71182892 100644
--- a/include/linux/sched/coredump.h
+++ b/include/linux/sched/coredump.h
@@ -72,6 +72,7 @@ static inline int get_dumpable(struct mm_struct *mm)
 #define MMF_DISABLE_THP		24	/* disable THP for all VMAs */
 #define MMF_OOM_VICTIM		25	/* mm is the oom victim */
 #define MMF_OOM_REAP_QUEUED	26	/* mm was queued for oom_reaper */
+#define MMF_USERFAULTFD_ALLOW	27	/* allow userfaultfd syscall */
 #define MMF_DISABLE_THP_MASK	(1 << MMF_DISABLE_THP)
 
 #define MMF_INIT_MASK		(MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d237d3350a99..079f6ac00c36 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3403,7 +3403,14 @@ static long kvm_dev_ioctl(struct file *filp,
 	return r;
 }
 
+static int kvm_dev_open(struct inode *inode, struct file *file)
+{
+	set_bit(MMF_USERFAULTFD_ALLOW, &current->mm->flags);
+	return 0;
+}
+
 static struct file_operations kvm_chardev_ops = {
+	.open		= kvm_dev_open,
 	.unlocked_ioctl = kvm_dev_ioctl,
 	.llseek		= noop_llseek,
 	KVM_COMPAT(kvm_dev_ioctl),
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/3] userfaultfd: apply unprivileged_userfaultfd check
  2019-03-11  9:36 [PATCH 0/3] userfaultfd: allow to forbid unprivileged users Peter Xu
  2019-03-11  9:36 ` [PATCH 1/3] userfaultfd/sysctl: introduce unprivileged_userfaultfd Peter Xu
  2019-03-11  9:37 ` [PATCH 2/3] kvm/mm: introduce MMF_USERFAULTFD_ALLOW flag Peter Xu
@ 2019-03-11  9:37 ` Peter Xu
  2019-03-11  9:58   ` Peter Xu
  2019-03-12  7:01 ` [PATCH 0/3] userfaultfd: allow to forbid unprivileged users Mike Rapoport
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 28+ messages in thread
From: Peter Xu @ 2019-03-11  9:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Paolo Bonzini, Hugh Dickins, Luis Chamberlain, Maxime Coquelin,
	kvm, Jerome Glisse, Pavel Emelyanov, Johannes Weiner, peterx,
	Martin Cracauer, Denis Plotnikov, linux-mm, Marty McFadden,
	Maya Gokhale, Mike Kravetz, Andrea Arcangeli, Mike Rapoport,
	Kees Cook, Mel Gorman, Kirill A . Shutemov, linux-fsdevel,
	Dr . David Alan Gilbert, Andrew Morton

Apply the unprivileged_userfaultfd check when doing userfaultfd
syscall.  We didn't check it in other paths of userfaultfd (e.g., the
ioctl() path) because we don't want to drag down the fast path of
userfaultfd, as suggested by Andrea.

Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Suggested-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 fs/userfaultfd.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index c2188464555a..effdcfc88629 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -951,6 +951,28 @@ void userfaultfd_unmap_complete(struct mm_struct *mm, struct list_head *uf)
 	}
 }
 
+/* Whether current process allows to use userfaultfd syscalls */
+static bool userfaultfd_allowed(void)
+{
+	bool allowed = false;
+
+	switch (unprivileged_userfaultfd) {
+	case UFFD_UNPRIV_ENABLED:
+		allowed = true;
+		break;
+	case UFFD_UNPRIV_KVM:
+		allowed = !!test_bit(MMF_USERFAULTFD_ALLOW,
+				     &current->mm->flags);
+		/* Fall through */
+	case UFFD_UNPRIV_DISABLED:
+		allowed = allowed || ns_capable(current_user_ns(),
+						CAP_SYS_PTRACE);
+		break;
+	}
+
+	return allowed;
+}
+
 static int userfaultfd_release(struct inode *inode, struct file *file)
 {
 	struct userfaultfd_ctx *ctx = file->private_data;
@@ -2018,6 +2040,9 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
 	BUILD_BUG_ON(UFFD_CLOEXEC != O_CLOEXEC);
 	BUILD_BUG_ON(UFFD_NONBLOCK != O_NONBLOCK);
 
+	if (!userfaultfd_allowed())
+		return -EPERM;
+
 	if (flags & ~UFFD_SHARED_FCNTL_FLAGS)
 		return -EINVAL;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/3] userfaultfd: apply unprivileged_userfaultfd check
  2019-03-11  9:37 ` [PATCH 3/3] userfaultfd: apply unprivileged_userfaultfd check Peter Xu
@ 2019-03-11  9:58   ` Peter Xu
  0 siblings, 0 replies; 28+ messages in thread
From: Peter Xu @ 2019-03-11  9:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Paolo Bonzini, Hugh Dickins, Luis Chamberlain, Maxime Coquelin,
	kvm, Jerome Glisse, Pavel Emelyanov, Johannes Weiner,
	Martin Cracauer, Denis Plotnikov, linux-mm, Marty McFadden,
	Maya Gokhale, Mike Kravetz, Andrea Arcangeli, Mike Rapoport,
	Kees Cook, Mel Gorman, Kirill A . Shutemov, linux-fsdevel,
	Dr . David Alan Gilbert, Andrew Morton

On Mon, Mar 11, 2019 at 05:37:01PM +0800, Peter Xu wrote:
> Apply the unprivileged_userfaultfd check when doing userfaultfd
> syscall.  We didn't check it in other paths of userfaultfd (e.g., the
> ioctl() path) because we don't want to drag down the fast path of
> userfaultfd, as suggested by Andrea.
> 
> Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
> Suggested-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  fs/userfaultfd.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index c2188464555a..effdcfc88629 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -951,6 +951,28 @@ void userfaultfd_unmap_complete(struct mm_struct *mm, struct list_head *uf)
>  	}
>  }
>  
> +/* Whether current process allows to use userfaultfd syscalls */
> +static bool userfaultfd_allowed(void)
> +{
> +	bool allowed = false;
> +
> +	switch (unprivileged_userfaultfd) {
> +	case UFFD_UNPRIV_ENABLED:
> +		allowed = true;
> +		break;
> +	case UFFD_UNPRIV_KVM:
> +		allowed = !!test_bit(MMF_USERFAULTFD_ALLOW,
> +				     &current->mm->flags);
> +		/* Fall through */

Sorry I should squash this in otherwise compilation of !CONFIG_KVM
will break:

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index effdcfc88629..1b3fa5935643 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -960,10 +960,12 @@ static bool userfaultfd_allowed(void)
        case UFFD_UNPRIV_ENABLED:
                allowed = true;
                break;
+#if IS_ENABLED(CONFIG_KVM)
        case UFFD_UNPRIV_KVM:
                allowed = !!test_bit(MMF_USERFAULTFD_ALLOW,
                                     &current->mm->flags);
                /* Fall through */
+#endif
        case UFFD_UNPRIV_DISABLED:
                allowed = allowed || ns_capable(current_user_ns(),
                                                CAP_SYS_PTRACE);

Will wait for more comments before I repost.  Sorry for the noise.

Regards,

-- 
Peter Xu

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] userfaultfd/sysctl: introduce unprivileged_userfaultfd
  2019-03-11  9:36 ` [PATCH 1/3] userfaultfd/sysctl: introduce unprivileged_userfaultfd Peter Xu
@ 2019-03-12  6:58   ` Mike Rapoport
  2019-03-12 12:26     ` Peter Xu
  0 siblings, 1 reply; 28+ messages in thread
From: Mike Rapoport @ 2019-03-12  6:58 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-kernel, Paolo Bonzini, Hugh Dickins, Luis Chamberlain,
	Maxime Coquelin, kvm, Jerome Glisse, Pavel Emelyanov,
	Johannes Weiner, Martin Cracauer, Denis Plotnikov, linux-mm,
	Marty McFadden, Maya Gokhale, Mike Kravetz, Andrea Arcangeli,
	Mike Rapoport, Kees Cook, Mel Gorman, Kirill A . Shutemov,
	linux-fsdevel, Dr . David Alan Gilbert, Andrew Morton

On Mon, Mar 11, 2019 at 05:36:59PM +0800, Peter Xu wrote:
> Introduce a new sysctl called "vm.unprivileged_userfaultfd" that can
> be used to decide whether userfaultfd syscalls are allowed by
> unprivileged users.  It'll allow three modes:
> 
>   - disabled: disallow unprivileged users to use uffd
> 
>   - enabled:  allow unprivileged users to use uffd
> 
>   - kvm:      allow unprivileged users to use uffd only if the user
>               had enough permission to open /dev/kvm (this option only
>               exists if the kernel turned on KVM).
> 
> This patch only introduce the new interface but not yet applied it to
> the userfaultfd syscalls, which will be done in the follow up patch.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  fs/userfaultfd.c              | 96 +++++++++++++++++++++++++++++++++++
>  include/linux/userfaultfd_k.h |  5 ++
>  init/Kconfig                  | 11 ++++
>  kernel/sysctl.c               | 11 ++++
>  4 files changed, 123 insertions(+)
> 
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index 89800fc7dc9d..c2188464555a 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -29,6 +29,8 @@
>  #include <linux/ioctl.h>
>  #include <linux/security.h>
>  #include <linux/hugetlb.h>
> +#include <linux/sysctl.h>
> +#include <linux/string.h>
> 
>  static struct kmem_cache *userfaultfd_ctx_cachep __read_mostly;
> 
> @@ -93,6 +95,95 @@ struct userfaultfd_wake_range {
>  	unsigned long len;
>  };
> 
> +enum unprivileged_userfaultfd {
> +	/* Disallow unprivileged users to use userfaultfd syscalls */
> +	UFFD_UNPRIV_DISABLED = 0,
> +	/* Allow unprivileged users to use userfaultfd syscalls */
> +	UFFD_UNPRIV_ENABLED,
> +#if IS_ENABLED(CONFIG_KVM)
> +	/*
> +	 * Allow unprivileged users to use userfaultfd syscalls only
> +	 * if the user had enough permission to open /dev/kvm
> +	 */
> +	UFFD_UNPRIV_KVM,
> +#endif
> +	UFFD_UNPRIV_NUM,
> +};
> +
> +static int unprivileged_userfaultfd __read_mostly;
> +static const char *unprivileged_userfaultfd_str[UFFD_UNPRIV_NUM] = {
> +	"disabled", "enabled",
> +#if IS_ENABLED(CONFIG_KVM)
> +	"kvm",
> +#endif
> +};
> +
> +static int unprivileged_uffd_parse(char *buf, size_t size)
> +{
> +	int i;
> +
> +	for (i = 0; i < UFFD_UNPRIV_NUM; i++) {
> +		if (!strncmp(unprivileged_userfaultfd_str[i], buf, size)) {
> +			unprivileged_userfaultfd = i;
> +			return 0;
> +		}
> +	}
> +
> +	return -EFAULT;
> +}
> +
> +static void unprivileged_uffd_dump(char *buf, size_t size)
> +{
> +	int i;
> +
> +	*buf = 0x00;
> +	for (i = 0; i < UFFD_UNPRIV_NUM; i++) {
> +		if (i == unprivileged_userfaultfd)
> +			strncat(buf, "[", size - strlen(buf));
> +		strncat(buf, unprivileged_userfaultfd_str[i],
> +			size - strlen(buf));
> +		if (i == unprivileged_userfaultfd)
> +			strncat(buf, "]", size - strlen(buf));
> +		strncat(buf, " ", size - strlen(buf));
> +	}
> +
> +}
> +
> +int proc_unprivileged_userfaultfd(struct ctl_table *table, int write,
> +				  void __user *buffer, size_t *lenp,
> +				  loff_t *ppos)
> +{
> +	struct ctl_table tmp_table = { .maxlen = 0 };
> +	int ret;
> +
> +	if (write) {
> +		tmp_table.maxlen = UFFD_UNPRIV_STRLEN;
> +		tmp_table.data = kmalloc(UFFD_UNPRIV_STRLEN, GFP_KERNEL);
> +
> +		ret = proc_dostring(&tmp_table, write, buffer, lenp, ppos);
> +		if (ret)
> +			goto out;
> +
> +		ret = unprivileged_uffd_parse(tmp_table.data,
> +					      UFFD_UNPRIV_STRLEN);
> +	} else {
> +		/* Leave space for "[]" */
> +		int len = UFFD_UNPRIV_STRLEN * UFFD_UNPRIV_NUM + 2;
> +
> +		tmp_table.maxlen = len;
> +		tmp_table.data = kmalloc(len, GFP_KERNEL);
> +
> +		unprivileged_uffd_dump(tmp_table.data, len);
> +
> +		ret = proc_dostring(&tmp_table, write, buffer, lenp, ppos);
> +	}
> +
> +out:
> +	if (tmp_table.data)
> +		kfree(tmp_table.data);
> +	return ret;
> +}
> +
>  static int userfaultfd_wake_function(wait_queue_entry_t *wq, unsigned mode,
>  				     int wake_flags, void *key)
>  {
> @@ -1955,6 +2046,11 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
> 
>  static int __init userfaultfd_init(void)
>  {
> +	char unpriv_uffd[UFFD_UNPRIV_STRLEN] =
> +	    CONFIG_USERFAULTFD_UNPRIVILEGED_DEFAULT;
> +
> +	unprivileged_uffd_parse(unpriv_uffd, sizeof(unpriv_uffd));
> +
>  	userfaultfd_ctx_cachep = kmem_cache_create("userfaultfd_ctx_cache",
>  						sizeof(struct userfaultfd_ctx),
>  						0,
> diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
> index 37c9eba75c98..f53bc02ccffc 100644
> --- a/include/linux/userfaultfd_k.h
> +++ b/include/linux/userfaultfd_k.h
> @@ -28,6 +28,11 @@
>  #define UFFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK)
>  #define UFFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS)
> 
> +#define UFFD_UNPRIV_STRLEN 16
> +int proc_unprivileged_userfaultfd(struct ctl_table *table, int write,
> +				  void __user *buffer, size_t *lenp,
> +				  loff_t *ppos);
> +
>  extern vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason);
> 
>  extern ssize_t mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start,
> diff --git a/init/Kconfig b/init/Kconfig
> index c9386a365eea..d90caa4fed17 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1512,6 +1512,17 @@ config USERFAULTFD
>  	  Enable the userfaultfd() system call that allows to intercept and
>  	  handle page faults in userland.
> 
> +config USERFAULTFD_UNPRIVILEGED_DEFAULT
> +        string "Default behavior for unprivileged userfault syscalls"
> +        depends on USERFAULTFD
> +        default "disabled"
> +        help
> +          Set this to "enabled" to allow userfaultfd syscalls from
> +          unprivileged users.  Set this to "disabled" to forbid
> +          userfaultfd syscalls from unprivileged users.  Set this to
> +          "kvm" to forbid unpriviledged users but still allow users
> +          who had enough permission to open /dev/kvm.

I'd phrase it a bit differently:

This option controls privilege level required to execute userfaultfd
system call.

Set this to "enabled" to allow userfaultfd system call from unprivileged
users. 
Set this to "disabled" to allow userfaultfd system call only for users who
have ptrace capability.
Set this to "kvm" to restrict userfaultfd system call usage to users with
permissions to open "/dev/kvm".
 
> +
>  config ARCH_HAS_MEMBARRIER_CALLBACKS
>  	bool
> 
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 7578e21a711b..5dc9f3d283dd 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -96,6 +96,9 @@
>  #ifdef CONFIG_LOCKUP_DETECTOR
>  #include <linux/nmi.h>
>  #endif
> +#ifdef CONFIG_USERFAULTFD
> +#include <linux/userfaultfd_k.h>
> +#endif
> 
>  #if defined(CONFIG_SYSCTL)
> 
> @@ -1704,6 +1707,14 @@ static struct ctl_table vm_table[] = {
>  		.extra1		= (void *)&mmap_rnd_compat_bits_min,
>  		.extra2		= (void *)&mmap_rnd_compat_bits_max,
>  	},
> +#endif
> +#ifdef CONFIG_USERFAULTFD
> +	{
> +		.procname	= "unprivileged_userfaultfd",
> +		.maxlen		= UFFD_UNPRIV_STRLEN,
> +		.mode		= 0644,
> +		.proc_handler	= proc_unprivileged_userfaultfd,
> +	},
>  #endif
>  	{ }
>  };
> -- 
> 2.17.1
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/3] userfaultfd: allow to forbid unprivileged users
  2019-03-11  9:36 [PATCH 0/3] userfaultfd: allow to forbid unprivileged users Peter Xu
                   ` (2 preceding siblings ...)
  2019-03-11  9:37 ` [PATCH 3/3] userfaultfd: apply unprivileged_userfaultfd check Peter Xu
@ 2019-03-12  7:01 ` Mike Rapoport
  2019-03-12 12:29   ` Peter Xu
  2019-03-12  7:49 ` Kirill A. Shutemov
  2019-03-12 19:59 ` Mike Kravetz
  5 siblings, 1 reply; 28+ messages in thread
From: Mike Rapoport @ 2019-03-12  7:01 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-kernel, Paolo Bonzini, Hugh Dickins, Luis Chamberlain,
	Maxime Coquelin, kvm, Jerome Glisse, Pavel Emelyanov,
	Johannes Weiner, Martin Cracauer, Denis Plotnikov, linux-mm,
	Marty McFadden, Maya Gokhale, Mike Kravetz, Andrea Arcangeli,
	Mike Rapoport, Kees Cook, Mel Gorman, Kirill A . Shutemov,
	linux-fsdevel, Dr . David Alan Gilbert, Andrew Morton

Hi Peter,

On Mon, Mar 11, 2019 at 05:36:58PM +0800, Peter Xu wrote:
> Hi,
> 
> (The idea comes from Andrea, and following discussions with Mike and
>  other people)
> 
> This patchset introduces a new sysctl flag to allow the admin to
> forbid users from using userfaultfd:
> 
>   $ cat /proc/sys/vm/unprivileged_userfaultfd
>   [disabled] enabled kvm
> 
>   - When set to "disabled", all unprivileged users are forbidden to
>     use userfaultfd syscalls.
> 
>   - When set to "enabled", all users are allowed to use userfaultfd
>     syscalls.
> 
>   - When set to "kvm", all unprivileged users are forbidden to use the
>     userfaultfd syscalls, except the user who has permission to open
>     /dev/kvm.
> 
> This new flag can add one more layer of security to reduce the attack
> surface of the kernel by abusing userfaultfd.  Here we grant the
> thread userfaultfd permission by checking against CAP_SYS_PTRACE
> capability.  By default, the value is "disabled" which is the most
> strict policy.  Distributions can have their own perferred value.
> 
> The "kvm" entry is a bit special here only to make sure that existing
> users like QEMU/KVM won't break by this newly introduced flag.  What
> we need to do is simply set the "unprivileged_userfaultfd" flag to
> "kvm" here to automatically grant userfaultfd permission for processes
> like QEMU/KVM without extra code to tweak these flags in the admin
> code.
> 
> Patch 1:  The interface patch to introduce the flag
> 
> Patch 2:  The KVM related changes to detect opening of /dev/kvm
> 
> Patch 3:  Apply the flag to userfaultfd syscalls
 
I'd appreciate to see "Patch 4: documentation update" ;-)
It'd be also great to update the man pages after this is merged.

Except for the comment to patch 1, feel free to add

Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>

> All comments would be greatly welcomed.  Thanks,
> 
> Peter Xu (3):
>   userfaultfd/sysctl: introduce unprivileged_userfaultfd
>   kvm/mm: introduce MMF_USERFAULTFD_ALLOW flag
>   userfaultfd: apply unprivileged_userfaultfd check
> 
>  fs/userfaultfd.c               | 121 +++++++++++++++++++++++++++++++++
>  include/linux/sched/coredump.h |   1 +
>  include/linux/userfaultfd_k.h  |   5 ++
>  init/Kconfig                   |  11 +++
>  kernel/sysctl.c                |  11 +++
>  virt/kvm/kvm_main.c            |   7 ++
>  6 files changed, 156 insertions(+)
> 
> -- 
> 2.17.1
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/3] userfaultfd: allow to forbid unprivileged users
  2019-03-11  9:36 [PATCH 0/3] userfaultfd: allow to forbid unprivileged users Peter Xu
                   ` (3 preceding siblings ...)
  2019-03-12  7:01 ` [PATCH 0/3] userfaultfd: allow to forbid unprivileged users Mike Rapoport
@ 2019-03-12  7:49 ` Kirill A. Shutemov
  2019-03-12 12:43   ` Peter Xu
  2019-03-12 19:59 ` Mike Kravetz
  5 siblings, 1 reply; 28+ messages in thread
From: Kirill A. Shutemov @ 2019-03-12  7:49 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-kernel, Paolo Bonzini, Hugh Dickins, Luis Chamberlain,
	Maxime Coquelin, kvm, Jerome Glisse, Pavel Emelyanov,
	Johannes Weiner, Martin Cracauer, Denis Plotnikov, linux-mm,
	Marty McFadden, Maya Gokhale, Mike Kravetz, Andrea Arcangeli,
	Mike Rapoport, Kees Cook, Mel Gorman, linux-fsdevel,
	Dr . David Alan Gilbert, Andrew Morton, linux-api

On Mon, Mar 11, 2019 at 05:36:58PM +0800, Peter Xu wrote:
> Hi,
> 
> (The idea comes from Andrea, and following discussions with Mike and
>  other people)
> 
> This patchset introduces a new sysctl flag to allow the admin to
> forbid users from using userfaultfd:
> 
>   $ cat /proc/sys/vm/unprivileged_userfaultfd
>   [disabled] enabled kvm

CC linux-api@

This is unusual way to return current value for sysctl. Does it work fine
with sysctl tool?

Have you considered to place the switch into /sys/kernel/mm instead?
I doubt it's the last tunable for userfaultfd. Maybe we should have an
directory for it under /sys/kernel/mm?

>   - When set to "disabled", all unprivileged users are forbidden to
>     use userfaultfd syscalls.
> 
>   - When set to "enabled", all users are allowed to use userfaultfd
>     syscalls.
> 
>   - When set to "kvm", all unprivileged users are forbidden to use the
>     userfaultfd syscalls, except the user who has permission to open
>     /dev/kvm.
> 
> This new flag can add one more layer of security to reduce the attack
> surface of the kernel by abusing userfaultfd.  Here we grant the
> thread userfaultfd permission by checking against CAP_SYS_PTRACE
> capability.  By default, the value is "disabled" which is the most
> strict policy.  Distributions can have their own perferred value.
> 
> The "kvm" entry is a bit special here only to make sure that existing
> users like QEMU/KVM won't break by this newly introduced flag.  What
> we need to do is simply set the "unprivileged_userfaultfd" flag to
> "kvm" here to automatically grant userfaultfd permission for processes
> like QEMU/KVM without extra code to tweak these flags in the admin
> code.
> 
> Patch 1:  The interface patch to introduce the flag
> 
> Patch 2:  The KVM related changes to detect opening of /dev/kvm
> 
> Patch 3:  Apply the flag to userfaultfd syscalls
> 
> All comments would be greatly welcomed.  Thanks,
> 
> Peter Xu (3):
>   userfaultfd/sysctl: introduce unprivileged_userfaultfd
>   kvm/mm: introduce MMF_USERFAULTFD_ALLOW flag
>   userfaultfd: apply unprivileged_userfaultfd check
> 
>  fs/userfaultfd.c               | 121 +++++++++++++++++++++++++++++++++
>  include/linux/sched/coredump.h |   1 +
>  include/linux/userfaultfd_k.h  |   5 ++
>  init/Kconfig                   |  11 +++
>  kernel/sysctl.c                |  11 +++
>  virt/kvm/kvm_main.c            |   7 ++
>  6 files changed, 156 insertions(+)
> 
> -- 
> 2.17.1
> 

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] userfaultfd/sysctl: introduce unprivileged_userfaultfd
  2019-03-12  6:58   ` Mike Rapoport
@ 2019-03-12 12:26     ` Peter Xu
  2019-03-12 13:53       ` Mike Rapoport
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Xu @ 2019-03-12 12:26 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: linux-kernel, Paolo Bonzini, Hugh Dickins, Luis Chamberlain,
	Maxime Coquelin, kvm, Jerome Glisse, Pavel Emelyanov,
	Johannes Weiner, Martin Cracauer, Denis Plotnikov, linux-mm,
	Marty McFadden, Maya Gokhale, Mike Kravetz, Andrea Arcangeli,
	Mike Rapoport, Kees Cook, Mel Gorman, Kirill A . Shutemov,
	linux-fsdevel, Dr . David Alan Gilbert, Andrew Morton

On Tue, Mar 12, 2019 at 08:58:30AM +0200, Mike Rapoport wrote:

[...]

> > +config USERFAULTFD_UNPRIVILEGED_DEFAULT
> > +        string "Default behavior for unprivileged userfault syscalls"
> > +        depends on USERFAULTFD
> > +        default "disabled"
> > +        help
> > +          Set this to "enabled" to allow userfaultfd syscalls from
> > +          unprivileged users.  Set this to "disabled" to forbid
> > +          userfaultfd syscalls from unprivileged users.  Set this to
> > +          "kvm" to forbid unpriviledged users but still allow users
> > +          who had enough permission to open /dev/kvm.
> 
> I'd phrase it a bit differently:
> 
> This option controls privilege level required to execute userfaultfd
                      ^
                      +---- add " the default"?

> system call.
> 
> Set this to "enabled" to allow userfaultfd system call from unprivileged
> users. 
> Set this to "disabled" to allow userfaultfd system call only for users who
> have ptrace capability.
> Set this to "kvm" to restrict userfaultfd system call usage to users with
                                                                      ^
                         add " who have ptrace capability, or" -------+

> permissions to open "/dev/kvm".

I think your version is better than mine, but I'd like to confirm
about above two extra changes before I squash them into the patch. :)

Thanks!

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/3] userfaultfd: allow to forbid unprivileged users
  2019-03-12  7:01 ` [PATCH 0/3] userfaultfd: allow to forbid unprivileged users Mike Rapoport
@ 2019-03-12 12:29   ` Peter Xu
  0 siblings, 0 replies; 28+ messages in thread
From: Peter Xu @ 2019-03-12 12:29 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: linux-kernel, Paolo Bonzini, Hugh Dickins, Luis Chamberlain,
	Maxime Coquelin, kvm, Jerome Glisse, Pavel Emelyanov,
	Johannes Weiner, Martin Cracauer, Denis Plotnikov, linux-mm,
	Marty McFadden, Maya Gokhale, Mike Kravetz, Andrea Arcangeli,
	Mike Rapoport, Kees Cook, Mel Gorman, Kirill A . Shutemov,
	linux-fsdevel, Dr . David Alan Gilbert, Andrew Morton

On Tue, Mar 12, 2019 at 09:01:47AM +0200, Mike Rapoport wrote:
> Hi Peter,
> 
> On Mon, Mar 11, 2019 at 05:36:58PM +0800, Peter Xu wrote:
> > Hi,
> > 
> > (The idea comes from Andrea, and following discussions with Mike and
> >  other people)
> > 
> > This patchset introduces a new sysctl flag to allow the admin to
> > forbid users from using userfaultfd:
> > 
> >   $ cat /proc/sys/vm/unprivileged_userfaultfd
> >   [disabled] enabled kvm
> > 
> >   - When set to "disabled", all unprivileged users are forbidden to
> >     use userfaultfd syscalls.
> > 
> >   - When set to "enabled", all users are allowed to use userfaultfd
> >     syscalls.
> > 
> >   - When set to "kvm", all unprivileged users are forbidden to use the
> >     userfaultfd syscalls, except the user who has permission to open
> >     /dev/kvm.
> > 
> > This new flag can add one more layer of security to reduce the attack
> > surface of the kernel by abusing userfaultfd.  Here we grant the
> > thread userfaultfd permission by checking against CAP_SYS_PTRACE
> > capability.  By default, the value is "disabled" which is the most
> > strict policy.  Distributions can have their own perferred value.
> > 
> > The "kvm" entry is a bit special here only to make sure that existing
> > users like QEMU/KVM won't break by this newly introduced flag.  What
> > we need to do is simply set the "unprivileged_userfaultfd" flag to
> > "kvm" here to automatically grant userfaultfd permission for processes
> > like QEMU/KVM without extra code to tweak these flags in the admin
> > code.
> > 
> > Patch 1:  The interface patch to introduce the flag
> > 
> > Patch 2:  The KVM related changes to detect opening of /dev/kvm
> > 
> > Patch 3:  Apply the flag to userfaultfd syscalls
>  
> I'd appreciate to see "Patch 4: documentation update" ;-)
> It'd be also great to update the man pages after this is merged.

Oops, sorry!  I should have remembered that.

> 
> Except for the comment to patch 1, feel free to add
> 
> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>

Thanks Mike!  I'll take it for 2/3 until I got confirmation from you
on patch 1.

Regards,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/3] userfaultfd: allow to forbid unprivileged users
  2019-03-12  7:49 ` Kirill A. Shutemov
@ 2019-03-12 12:43   ` Peter Xu
  0 siblings, 0 replies; 28+ messages in thread
From: Peter Xu @ 2019-03-12 12:43 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-kernel, Paolo Bonzini, Hugh Dickins, Luis Chamberlain,
	Maxime Coquelin, kvm, Jerome Glisse, Pavel Emelyanov,
	Johannes Weiner, Martin Cracauer, Denis Plotnikov, linux-mm,
	Marty McFadden, Maya Gokhale, Mike Kravetz, Andrea Arcangeli,
	Mike Rapoport, Kees Cook, Mel Gorman, linux-fsdevel,
	Dr . David Alan Gilbert, Andrew Morton, linux-api

Hi, Kirill,

On Tue, Mar 12, 2019 at 10:49:51AM +0300, Kirill A. Shutemov wrote:
> On Mon, Mar 11, 2019 at 05:36:58PM +0800, Peter Xu wrote:
> > Hi,
> > 
> > (The idea comes from Andrea, and following discussions with Mike and
> >  other people)
> > 
> > This patchset introduces a new sysctl flag to allow the admin to
> > forbid users from using userfaultfd:
> > 
> >   $ cat /proc/sys/vm/unprivileged_userfaultfd
> >   [disabled] enabled kvm
> 
> CC linux-api@
> 
> This is unusual way to return current value for sysctl. Does it work fine
> with sysctl tool?

It can work, though it displays the same as "cat":

$ sysctl vm.unprivileged_userfaultfd
vm.unprivileged_userfaultfd = disabled enabled [kvm] 

> 
> Have you considered to place the switch into /sys/kernel/mm instead?
> I doubt it's the last tunable for userfaultfd. Maybe we should have an
> directory for it under /sys/kernel/mm?

I haven't thought about sysfs, if that's preferred I can consider to
switch to that.  And yes I think creating a directory should be a good
idea.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/3] userfaultfd/sysctl: introduce unprivileged_userfaultfd
  2019-03-12 12:26     ` Peter Xu
@ 2019-03-12 13:53       ` Mike Rapoport
  0 siblings, 0 replies; 28+ messages in thread
From: Mike Rapoport @ 2019-03-12 13:53 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-kernel, Paolo Bonzini, Hugh Dickins, Luis Chamberlain,
	Maxime Coquelin, kvm, Jerome Glisse, Pavel Emelyanov,
	Johannes Weiner, Martin Cracauer, Denis Plotnikov, linux-mm,
	Marty McFadden, Maya Gokhale, Mike Kravetz, Andrea Arcangeli,
	Mike Rapoport, Kees Cook, Mel Gorman, Kirill A . Shutemov,
	linux-fsdevel, Dr . David Alan Gilbert, Andrew Morton

On Tue, Mar 12, 2019 at 08:26:33PM +0800, Peter Xu wrote:
> On Tue, Mar 12, 2019 at 08:58:30AM +0200, Mike Rapoport wrote:
> 
> [...]
> 
> > > +config USERFAULTFD_UNPRIVILEGED_DEFAULT
> > > +        string "Default behavior for unprivileged userfault syscalls"
> > > +        depends on USERFAULTFD
> > > +        default "disabled"
> > > +        help
> > > +          Set this to "enabled" to allow userfaultfd syscalls from
> > > +          unprivileged users.  Set this to "disabled" to forbid
> > > +          userfaultfd syscalls from unprivileged users.  Set this to
> > > +          "kvm" to forbid unpriviledged users but still allow users
> > > +          who had enough permission to open /dev/kvm.
> > 
> > I'd phrase it a bit differently:
> > 
> > This option controls privilege level required to execute userfaultfd
>                       ^
>                       +---- add " the default"?
> 
> > system call.
> > 
> > Set this to "enabled" to allow userfaultfd system call from unprivileged
> > users. 
> > Set this to "disabled" to allow userfaultfd system call only for users who
> > have ptrace capability.
> > Set this to "kvm" to restrict userfaultfd system call usage to users with
>                                                                       ^
>                          add " who have ptrace capability, or" -------+
> 
> > permissions to open "/dev/kvm".
> 
> I think your version is better than mine, but I'd like to confirm
> about above two extra changes before I squash them into the patch. :)

I like your changes.
 
> Thanks!
> 
> -- 
> Peter Xu
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/3] userfaultfd: allow to forbid unprivileged users
  2019-03-11  9:36 [PATCH 0/3] userfaultfd: allow to forbid unprivileged users Peter Xu
                   ` (4 preceding siblings ...)
  2019-03-12  7:49 ` Kirill A. Shutemov
@ 2019-03-12 19:59 ` Mike Kravetz
  2019-03-13  6:00   ` Peter Xu
  5 siblings, 1 reply; 28+ messages in thread
From: Mike Kravetz @ 2019-03-12 19:59 UTC (permalink / raw)
  To: Peter Xu, linux-kernel
  Cc: Paolo Bonzini, Hugh Dickins, Luis Chamberlain, Maxime Coquelin,
	kvm, Jerome Glisse, Pavel Emelyanov, Johannes Weiner,
	Martin Cracauer, Denis Plotnikov, linux-mm, Marty McFadden,
	Maya Gokhale, Andrea Arcangeli, Mike Rapoport, Kees Cook,
	Mel Gorman, Kirill A . Shutemov, linux-fsdevel,
	Dr . David Alan Gilbert, Andrew Morton

On 3/11/19 2:36 AM, Peter Xu wrote:
> 
> The "kvm" entry is a bit special here only to make sure that existing
> users like QEMU/KVM won't break by this newly introduced flag.  What
> we need to do is simply set the "unprivileged_userfaultfd" flag to
> "kvm" here to automatically grant userfaultfd permission for processes
> like QEMU/KVM without extra code to tweak these flags in the admin
> code.

Another user is Oracle DB, specifically with hugetlbfs.  For them, we would
like to add a special case like kvm described above.  The admin controls
who can have access to hugetlbfs, so I think adding code to the open
routine as in patch 2 of this series would seem to work.

However, I can imagine more special cases being added for other users.  And,
once you have more than one special case then you may want to combine them.
For example, kvm and hugetlbfs together.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/3] userfaultfd: allow to forbid unprivileged users
  2019-03-12 19:59 ` Mike Kravetz
@ 2019-03-13  6:00   ` Peter Xu
  2019-03-13  8:22     ` Paolo Bonzini
  2019-03-13 17:50     ` Mike Kravetz
  0 siblings, 2 replies; 28+ messages in thread
From: Peter Xu @ 2019-03-13  6:00 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-kernel, Paolo Bonzini, Hugh Dickins, Luis Chamberlain,
	Maxime Coquelin, kvm, Jerome Glisse, Pavel Emelyanov,
	Johannes Weiner, Martin Cracauer, Denis Plotnikov, linux-mm,
	Marty McFadden, Maya Gokhale, Andrea Arcangeli, Mike Rapoport,
	Kees Cook, Mel Gorman, Kirill A . Shutemov, linux-fsdevel,
	Dr . David Alan Gilbert, Andrew Morton

On Tue, Mar 12, 2019 at 12:59:34PM -0700, Mike Kravetz wrote:
> On 3/11/19 2:36 AM, Peter Xu wrote:
> > 
> > The "kvm" entry is a bit special here only to make sure that existing
> > users like QEMU/KVM won't break by this newly introduced flag.  What
> > we need to do is simply set the "unprivileged_userfaultfd" flag to
> > "kvm" here to automatically grant userfaultfd permission for processes
> > like QEMU/KVM without extra code to tweak these flags in the admin
> > code.
> 
> Another user is Oracle DB, specifically with hugetlbfs.  For them, we would
> like to add a special case like kvm described above.  The admin controls
> who can have access to hugetlbfs, so I think adding code to the open
> routine as in patch 2 of this series would seem to work.

Yes I think if there's an explicit and safe place we can hook for
hugetlbfs then we can do the similar trick as KVM case.  Though I
noticed that we can not only create hugetlbfs files under the
mountpoint (which the admin can control), but also using some other
ways.  The question (of me... sorry if it's a silly one!) is whether
all other ways to use hugetlbfs is still under control of the admin.
One I know of is memfd_create() which seems to be doable even as
unprivileged users.  If so, should we only limit the uffd privilege to
those hugetlbfs users who use the mountpoint directly?

Another question is about fork() of privileged processes - for KVM we
only grant privilege for the exact process that opened the /dev/kvm
node, and the privilege will be lost for any forked childrens.  Is
that the same thing for OracleDB/Hugetlbfs?

> 
> However, I can imagine more special cases being added for other users.  And,
> once you have more than one special case then you may want to combine them.
> For example, kvm and hugetlbfs together.

It looks fine to me if we're using MMF_USERFAULTFD_ALLOW flag upon
mm_struct, since that seems to be a very general flag that can be used
by anything we want to grant privilege for, not only KVM?

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/3] userfaultfd: allow to forbid unprivileged users
  2019-03-13  6:00   ` Peter Xu
@ 2019-03-13  8:22     ` Paolo Bonzini
  2019-03-13 18:52       ` Andrea Arcangeli
  2019-03-13 17:50     ` Mike Kravetz
  1 sibling, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2019-03-13  8:22 UTC (permalink / raw)
  To: Peter Xu, Mike Kravetz
  Cc: linux-kernel, Hugh Dickins, Luis Chamberlain, Maxime Coquelin,
	kvm, Jerome Glisse, Pavel Emelyanov, Johannes Weiner,
	Martin Cracauer, Denis Plotnikov, linux-mm, Marty McFadden,
	Maya Gokhale, Andrea Arcangeli, Mike Rapoport, Kees Cook,
	Mel Gorman, Kirill A . Shutemov, linux-fsdevel,
	Dr . David Alan Gilbert, Andrew Morton

On 13/03/19 07:00, Peter Xu wrote:
>> However, I can imagine more special cases being added for other users.  And,
>> once you have more than one special case then you may want to combine them.
>> For example, kvm and hugetlbfs together.
> It looks fine to me if we're using MMF_USERFAULTFD_ALLOW flag upon
> mm_struct, since that seems to be a very general flag that can be used
> by anything we want to grant privilege for, not only KVM?

Perhaps you can remove the fork() limitation, and add a new suboption to
prctl(PR_SET_MM) that sets/resets MMF_USERFAULTFD_ALLOW.  If somebody
wants to forbid unprivileged userfaultfd and use KVM, they'll have to
use libvirt or some other privileged management tool.

We could also add support for this prctl to systemd, and then one could
do "systemd-run -pAllowUserfaultfd=yes COMMAND".

Paolo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/3] userfaultfd: allow to forbid unprivileged users
  2019-03-13  6:00   ` Peter Xu
  2019-03-13  8:22     ` Paolo Bonzini
@ 2019-03-13 17:50     ` Mike Kravetz
  2019-03-15  8:26       ` Peter Xu
  1 sibling, 1 reply; 28+ messages in thread
From: Mike Kravetz @ 2019-03-13 17:50 UTC (permalink / raw)
  To: Peter Xu
  Cc: linux-kernel, Paolo Bonzini, Hugh Dickins, Luis Chamberlain,
	Maxime Coquelin, kvm, Jerome Glisse, Pavel Emelyanov,
	Johannes Weiner, Martin Cracauer, Denis Plotnikov, linux-mm,
	Marty McFadden, Maya Gokhale, Andrea Arcangeli, Mike Rapoport,
	Kees Cook, Mel Gorman, Kirill A . Shutemov, linux-fsdevel,
	Dr . David Alan Gilbert, Andrew Morton

On 3/12/19 11:00 PM, Peter Xu wrote:
> On Tue, Mar 12, 2019 at 12:59:34PM -0700, Mike Kravetz wrote:
>> On 3/11/19 2:36 AM, Peter Xu wrote:
>>>
>>> The "kvm" entry is a bit special here only to make sure that existing
>>> users like QEMU/KVM won't break by this newly introduced flag.  What
>>> we need to do is simply set the "unprivileged_userfaultfd" flag to
>>> "kvm" here to automatically grant userfaultfd permission for processes
>>> like QEMU/KVM without extra code to tweak these flags in the admin
>>> code.
>>
>> Another user is Oracle DB, specifically with hugetlbfs.  For them, we would
>> like to add a special case like kvm described above.  The admin controls
>> who can have access to hugetlbfs, so I think adding code to the open
>> routine as in patch 2 of this series would seem to work.
> 
> Yes I think if there's an explicit and safe place we can hook for
> hugetlbfs then we can do the similar trick as KVM case.  Though I
> noticed that we can not only create hugetlbfs files under the
> mountpoint (which the admin can control), but also using some other
> ways.  The question (of me... sorry if it's a silly one!) is whether
> all other ways to use hugetlbfs is still under control of the admin.
> One I know of is memfd_create() which seems to be doable even as
> unprivileged users.  If so, should we only limit the uffd privilege to
> those hugetlbfs users who use the mountpoint directly?

Wow!  I did not realize that apps which specify mmap(MAP_HUGETLB) do not
need any special privilege to use huge pages.  Honestly, I am not sure if
that was by design or a bug.  The memfd_create code is based on the MAP_HUGETLB
code and also does not need any special privilege.  Not to sidetrack this
discussion, but people on Cc may know if this is a bug or by design.  My
opinion is that huge pages are a limited resource and should be under control.
One needs to be a member of a special group (or root) to access via System V
interfaces.

The DB use case only does mmap of files in an explicitly mounted filesystem.
So, limiting it in that manner would work for them.

> Another question is about fork() of privileged processes - for KVM we
> only grant privilege for the exact process that opened the /dev/kvm
> node, and the privilege will be lost for any forked childrens.  Is
> that the same thing for OracleDB/Hugetlbfs?

I need to confirm with the DB people, but it is my understanding that the
exact process which does the open/mmap will be the one using userfaultfd.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/3] userfaultfd: allow to forbid unprivileged users
  2019-03-13  8:22     ` Paolo Bonzini
@ 2019-03-13 18:52       ` Andrea Arcangeli
  2019-03-13 19:12         ` Paolo Bonzini
  2019-03-13 20:01         ` Mike Kravetz
  0 siblings, 2 replies; 28+ messages in thread
From: Andrea Arcangeli @ 2019-03-13 18:52 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Xu, Mike Kravetz, linux-kernel, Hugh Dickins,
	Luis Chamberlain, Maxime Coquelin, kvm, Jerome Glisse,
	Pavel Emelyanov, Johannes Weiner, Martin Cracauer,
	Denis Plotnikov, linux-mm, Marty McFadden, Maya Gokhale,
	Mike Rapoport, Kees Cook, Mel Gorman, Kirill A . Shutemov,
	linux-fsdevel, Dr . David Alan Gilbert, Andrew Morton

Hello,

On Wed, Mar 13, 2019 at 09:22:31AM +0100, Paolo Bonzini wrote:
> On 13/03/19 07:00, Peter Xu wrote:
> >> However, I can imagine more special cases being added for other users.  And,
> >> once you have more than one special case then you may want to combine them.
> >> For example, kvm and hugetlbfs together.
> > It looks fine to me if we're using MMF_USERFAULTFD_ALLOW flag upon
> > mm_struct, since that seems to be a very general flag that can be used
> > by anything we want to grant privilege for, not only KVM?
> 
> Perhaps you can remove the fork() limitation, and add a new suboption to
> prctl(PR_SET_MM) that sets/resets MMF_USERFAULTFD_ALLOW.  If somebody
> wants to forbid unprivileged userfaultfd and use KVM, they'll have to
> use libvirt or some other privileged management tool.
> 
> We could also add support for this prctl to systemd, and then one could
> do "systemd-run -pAllowUserfaultfd=yes COMMAND".

systemd can already implement -pAllowUserfaultfd=no with seccomp if it
wants. It can also implement -yes if by default turns off userfaultfd
like firejail -seccomp would do.

If the end goal is to implement the filtering with an userland policy
instead of a kernel policy, seccomp enabled for all services sounds
reasonable. It's very unlikely you'll block only userfaultfd, firejail
-seccomp by default blocks dozen of syscalls that are unnecessary
99.9% of the time.

This is not about implementing an userland flexible policy, it's just
a simple kernel policy, to use until userland disables the kernel
policy to takeover with seccomp across the board.

I wouldn't like this too be too complicated because this is already
theoretically overlapping 100% with seccomp.

hugetlbfs is more complicated to detect, because even if you inherit
it from fork(), the services that mounts the fs may be in a different
container than the one that Oracle that uses userfaultfd later on down
the road from a different context. And I don't think it would be ok to
allow running userfaultfd just because you can open a file in an
hugetlbfs file system. With /dev/kvm it's a bit different, that's
chmod o-r by default.. no luser should be able to open it.

Unless somebody suggests a consistent way to make hugetlbfs "just
work" (like we could achieve clean with CRIU and KVM), I think Oracle
will need a one liner change in the Oracle setup to echo into that
file in addition of running the hugetlbfs mount.

Note that DPDK host bridge process will also need a one liner change
to do a dummy open/close of /dev/kvm to unblock the syscall.

Thanks,
Andrea

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/3] userfaultfd: allow to forbid unprivileged users
  2019-03-13 18:52       ` Andrea Arcangeli
@ 2019-03-13 19:12         ` Paolo Bonzini
  2019-03-13 23:44           ` Andrea Arcangeli
  2019-03-13 20:01         ` Mike Kravetz
  1 sibling, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2019-03-13 19:12 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Peter Xu, Mike Kravetz, linux-kernel, Hugh Dickins,
	Luis Chamberlain, Maxime Coquelin, kvm, Jerome Glisse,
	Pavel Emelyanov, Johannes Weiner, Martin Cracauer,
	Denis Plotnikov, linux-mm, Marty McFadden, Maya Gokhale,
	Mike Rapoport, Kees Cook, Mel Gorman, Kirill A . Shutemov,
	linux-fsdevel, Dr . David Alan Gilbert, Andrew Morton


> On Wed, Mar 13, 2019 at 09:22:31AM +0100, Paolo Bonzini wrote:
> Unless somebody suggests a consistent way to make hugetlbfs "just
> work" (like we could achieve clean with CRIU and KVM), I think Oracle
> will need a one liner change in the Oracle setup to echo into that
> file in addition of running the hugetlbfs mount.

Hi Andrea, can you explain more in detail the risks of enabling
userfaultfd for unprivileged users?

Paolo

> Note that DPDK host bridge process will also need a one liner change
> to do a dummy open/close of /dev/kvm to unblock the syscall.
> 
> Thanks,
> Andrea
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/3] userfaultfd: allow to forbid unprivileged users
  2019-03-13 18:52       ` Andrea Arcangeli
  2019-03-13 19:12         ` Paolo Bonzini
@ 2019-03-13 20:01         ` Mike Kravetz
  2019-03-13 23:55           ` Andrea Arcangeli
  1 sibling, 1 reply; 28+ messages in thread
From: Mike Kravetz @ 2019-03-13 20:01 UTC (permalink / raw)
  To: Andrea Arcangeli, Paolo Bonzini
  Cc: Peter Xu, linux-kernel, Hugh Dickins, Luis Chamberlain,
	Maxime Coquelin, kvm, Jerome Glisse, Pavel Emelyanov,
	Johannes Weiner, Martin Cracauer, Denis Plotnikov, linux-mm,
	Marty McFadden, Maya Gokhale, Mike Rapoport, Kees Cook,
	Mel Gorman, Kirill A . Shutemov, linux-fsdevel,
	Dr . David Alan Gilbert, Andrew Morton

On 3/13/19 11:52 AM, Andrea Arcangeli wrote:
> 
> hugetlbfs is more complicated to detect, because even if you inherit
> it from fork(), the services that mounts the fs may be in a different
> container than the one that Oracle that uses userfaultfd later on down
> the road from a different context. And I don't think it would be ok to
> allow running userfaultfd just because you can open a file in an
> hugetlbfs file system. With /dev/kvm it's a bit different, that's
> chmod o-r by default.. no luser should be able to open it.
> 
> Unless somebody suggests a consistent way to make hugetlbfs "just
> work" (like we could achieve clean with CRIU and KVM), I think Oracle
> will need a one liner change in the Oracle setup to echo into that
> file in addition of running the hugetlbfs mount.

I think you are suggesting the DB setup process enable uffd for all users.
Correct?

This may be too simple, and I don't really like group access, but how about
just defining a uffd group?  If you are in the group you can make uffd
system calls.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/3] userfaultfd: allow to forbid unprivileged users
  2019-03-13 19:12         ` Paolo Bonzini
@ 2019-03-13 23:44           ` Andrea Arcangeli
  2019-03-14 10:58             ` Paolo Bonzini
  0 siblings, 1 reply; 28+ messages in thread
From: Andrea Arcangeli @ 2019-03-13 23:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Xu, Mike Kravetz, linux-kernel, Hugh Dickins,
	Luis Chamberlain, Maxime Coquelin, kvm, Jerome Glisse,
	Pavel Emelyanov, Johannes Weiner, Martin Cracauer,
	Denis Plotnikov, linux-mm, Marty McFadden, Maya Gokhale,
	Mike Rapoport, Kees Cook, Mel Gorman, Kirill A . Shutemov,
	linux-fsdevel, Dr . David Alan Gilbert, Andrew Morton

Hi Paolo,

On Wed, Mar 13, 2019 at 03:12:28PM -0400, Paolo Bonzini wrote:
> 
> > On Wed, Mar 13, 2019 at 09:22:31AM +0100, Paolo Bonzini wrote:
> > Unless somebody suggests a consistent way to make hugetlbfs "just
> > work" (like we could achieve clean with CRIU and KVM), I think Oracle
> > will need a one liner change in the Oracle setup to echo into that
> > file in addition of running the hugetlbfs mount.
> 
> Hi Andrea, can you explain more in detail the risks of enabling
> userfaultfd for unprivileged users?

There's no more risk than in allowing mremap (direct memory overwrite
after underflow) or O_DIRECT (dirtycow) for unprivileged users.

If there was risk in allowing userfaultfd for unprivileged users,
CAP_SYS_PTRACE would be mandatory and there wouldn't be an option to
allow userfaultfd to all unprivileged users.

This is just an hardening policy in kernel (for those that don't run
everything under seccomp) that may even be removed later.

Unlike mremap and O_DIRECT, because we've only an handful of
(important) applications using userfaultfd so far, we can do like the
bpf syscall:

SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size)
{
	union bpf_attr attr = {};
	int err;

	if (sysctl_unprivileged_bpf_disabled && !capable(CAP_SYS_ADMIN))
		return -EPERM;

Except we picked CAP_SYS_PTRACE because CRIU already has to run with
such capability for other reasons.

So this is intended as the "sysctl_unprivileged_bpf_disabled" trick
applied to the bpf syscall, also applied to the userfaultfd syscall,
nothing more nothing less.

Then I thought we can add a tristate so an open of /dev/kvm would also
allow the syscall to make things more user friendly because
unprivileged containers ideally should have writable mounts done with
nodev and no matter the privilege they shouldn't ever get an hold on
the KVM driver (and those who do, like kubevirt, will then just work).

There has been one complaint because userfaultfd can also be used to
facilitate exploiting use after free bugs in other kernel code:

https://cyseclabs.com/blog/linux-kernel-heap-spray

This isn't a bug in userfaultfd, the only bug there is some kernel
code doing an use after free in a reproducible place, userfaultfd only
allows to stop copy-user where the exploits like copy-user to be
stopped.

This isn't particularly concerning, because you can achieve the same
objective with FUSE. In fact even if you set CONFIG_USERFAULTFD=n in
the kernel config and CONFIG_FUSE_FS=n, a security bug like that can
still be exploited eventually. It's just less reproducible if you
can't stop copy-user.

Restricting userfaultfd to privileged processes, won't make such
kernel bug become harmless, it'll just require more attempts to
exploit as far as I can tell. To put things in prospective, the
exploits for the most serious security bugs like mremap missing
underflow check, dirtycow or the missing stack_guard_gap would not
get any facilitation by userfaultfd.

I also thought we were randomizing all slab heaps by now to avoid
issues like above, I don't know if the randomized slab freelist oreder
CONFIG_SLAB_FREELIST_RANDOM and also the pointer with
CONFIG_SLAB_FREELIST_HARDENED precisely to avoid the exploits like
above. It's not like you can disable those two options even if you set
CONFIG_USERFAULTFD=n. I wonder if in that blog post the exploit was
set on a kernel with those two options enabled.

In any case not allowing non privileged processes to run the
userfaultfd syscall will provide some hardening feature also against
such kind of concern.

Thanks,
Andrea

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/3] userfaultfd: allow to forbid unprivileged users
  2019-03-13 20:01         ` Mike Kravetz
@ 2019-03-13 23:55           ` Andrea Arcangeli
  2019-03-14  3:32             ` Mike Kravetz
  0 siblings, 1 reply; 28+ messages in thread
From: Andrea Arcangeli @ 2019-03-13 23:55 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Paolo Bonzini, Peter Xu, linux-kernel, Hugh Dickins,
	Luis Chamberlain, Maxime Coquelin, kvm, Jerome Glisse,
	Pavel Emelyanov, Johannes Weiner, Martin Cracauer,
	Denis Plotnikov, linux-mm, Marty McFadden, Maya Gokhale,
	Mike Rapoport, Kees Cook, Mel Gorman, Kirill A . Shutemov,
	linux-fsdevel, Dr . David Alan Gilbert, Andrew Morton

On Wed, Mar 13, 2019 at 01:01:40PM -0700, Mike Kravetz wrote:
> On 3/13/19 11:52 AM, Andrea Arcangeli wrote:
> > 
> > hugetlbfs is more complicated to detect, because even if you inherit
> > it from fork(), the services that mounts the fs may be in a different
> > container than the one that Oracle that uses userfaultfd later on down
> > the road from a different context. And I don't think it would be ok to
> > allow running userfaultfd just because you can open a file in an
> > hugetlbfs file system. With /dev/kvm it's a bit different, that's
> > chmod o-r by default.. no luser should be able to open it.
> > 
> > Unless somebody suggests a consistent way to make hugetlbfs "just
> > work" (like we could achieve clean with CRIU and KVM), I think Oracle
> > will need a one liner change in the Oracle setup to echo into that
> > file in addition of running the hugetlbfs mount.
> 
> I think you are suggesting the DB setup process enable uffd for all users.
> Correct?

Yes. In addition of the hugetlbfs setup, various apps requires to also
increase fs.inotify.max_user_watches or file-max and other tweaks,
this would be one of those tweaks.

> This may be too simple, and I don't really like group access, but how about
> just defining a uffd group?  If you are in the group you can make uffd
> system calls.

Everything is possible, I'm just afraid it gets too complex.

So you suggest to echo a gid into the file?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/3] userfaultfd: allow to forbid unprivileged users
  2019-03-13 23:55           ` Andrea Arcangeli
@ 2019-03-14  3:32             ` Mike Kravetz
  0 siblings, 0 replies; 28+ messages in thread
From: Mike Kravetz @ 2019-03-14  3:32 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Paolo Bonzini, Peter Xu, linux-kernel, Hugh Dickins,
	Luis Chamberlain, Maxime Coquelin, kvm, Jerome Glisse,
	Pavel Emelyanov, Johannes Weiner, Martin Cracauer,
	Denis Plotnikov, linux-mm, Marty McFadden, Maya Gokhale,
	Mike Rapoport, Kees Cook, Mel Gorman, Kirill A . Shutemov,
	linux-fsdevel, Dr . David Alan Gilbert, Andrew Morton

On 3/13/19 4:55 PM, Andrea Arcangeli wrote:
> On Wed, Mar 13, 2019 at 01:01:40PM -0700, Mike Kravetz wrote:
>> On 3/13/19 11:52 AM, Andrea Arcangeli wrote:
>>> Unless somebody suggests a consistent way to make hugetlbfs "just
>>> work" (like we could achieve clean with CRIU and KVM), I think Oracle
>>> will need a one liner change in the Oracle setup to echo into that
>>> file in addition of running the hugetlbfs mount.
>>
>> I think you are suggesting the DB setup process enable uffd for all users.
>> Correct?
> 
> Yes. In addition of the hugetlbfs setup, various apps requires to also
> increase fs.inotify.max_user_watches or file-max and other tweaks,
> this would be one of those tweaks.

Yes, I agree.
It is just that unprivileged_userfaultfd disabled would likely to be the
default set by distros.  Or, perhaps 'kvm'?  Then, if you want to run the
DB, the admin (or the DB) will need to set it to enabled.  And, this results
in it being enabled for everyone.  I think I understand the scope of any
security exposure this would cause from a technical perspective.  However,
I can imagine people being concerned about enabling everywhere if this is
not the default setting.

If it is OK to disable everywhere, why not just use disable for the kvm
use case as well? :)

>> This may be too simple, and I don't really like group access, but how about
>> just defining a uffd group?  If you are in the group you can make uffd
>> system calls.
> 
> Everything is possible, I'm just afraid it gets too complex.
> 
> So you suggest to echo a gid into the file?

That is what I was thinking.  But, I was mostly thinking of that because
Peter's earlier comment made me go and check hugetlbfs code.  There is a
sysctl_hugetlb_shm_group variable that does this, even though it is mostly
unused in the hugetlbfs code.

I know the kvm dev open scheme works for kvm.  Was just trying to think
of something more general that would work for hugetlbfs/DB or other use
cases.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/3] userfaultfd: allow to forbid unprivileged users
  2019-03-13 23:44           ` Andrea Arcangeli
@ 2019-03-14 10:58             ` Paolo Bonzini
  2019-03-14 15:23               ` Alexei Starovoitov
  2019-03-14 16:16               ` Andrea Arcangeli
  0 siblings, 2 replies; 28+ messages in thread
From: Paolo Bonzini @ 2019-03-14 10:58 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Peter Xu, Mike Kravetz, linux-kernel, Hugh Dickins,
	Luis Chamberlain, Maxime Coquelin, kvm, Jerome Glisse,
	Pavel Emelyanov, Johannes Weiner, Martin Cracauer,
	Denis Plotnikov, linux-mm, Marty McFadden, Maya Gokhale,
	Mike Rapoport, Kees Cook, Mel Gorman, Kirill A . Shutemov,
	linux-fsdevel, Dr . David Alan Gilbert, Andrew Morton

On 14/03/19 00:44, Andrea Arcangeli wrote:
> Then I thought we can add a tristate so an open of /dev/kvm would also
> allow the syscall to make things more user friendly because
> unprivileged containers ideally should have writable mounts done with
> nodev and no matter the privilege they shouldn't ever get an hold on
> the KVM driver (and those who do, like kubevirt, will then just work).

I wouldn't even bother with the KVM special case.  Containers can use
seccomp if they want a fine-grained policy.

(Actually I wouldn't bother with the knob at all; the attack surface of
userfaultfd is infinitesimal compared to the BPF JIT...).

Paolo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/3] userfaultfd: allow to forbid unprivileged users
  2019-03-14 10:58             ` Paolo Bonzini
@ 2019-03-14 15:23               ` Alexei Starovoitov
  2019-03-14 16:00                 ` Paolo Bonzini
  2019-03-14 16:16               ` Andrea Arcangeli
  1 sibling, 1 reply; 28+ messages in thread
From: Alexei Starovoitov @ 2019-03-14 15:23 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Andrea Arcangeli, Peter Xu, Mike Kravetz, LKML, Hugh Dickins,
	Luis Chamberlain, Maxime Coquelin, kvm, Jerome Glisse,
	Pavel Emelyanov, Johannes Weiner, Martin Cracauer,
	Denis Plotnikov, linux-mm, Marty McFadden, Maya Gokhale,
	Mike Rapoport, Kees Cook, Mel Gorman, Kirill A . Shutemov,
	Linux-Fsdevel, Dr . David Alan Gilbert, Andrew Morton,
	Daniel Borkmann

On Thu, Mar 14, 2019 at 4:00 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 14/03/19 00:44, Andrea Arcangeli wrote:
> > Then I thought we can add a tristate so an open of /dev/kvm would also
> > allow the syscall to make things more user friendly because
> > unprivileged containers ideally should have writable mounts done with
> > nodev and no matter the privilege they shouldn't ever get an hold on
> > the KVM driver (and those who do, like kubevirt, will then just work).
>
> I wouldn't even bother with the KVM special case.  Containers can use
> seccomp if they want a fine-grained policy.
>
> (Actually I wouldn't bother with the knob at all; the attack surface of
> userfaultfd is infinitesimal compared to the BPF JIT...).

please name _one_ BPF JIT bug that affected unprivileged user space.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/3] userfaultfd: allow to forbid unprivileged users
  2019-03-14 15:23               ` Alexei Starovoitov
@ 2019-03-14 16:00                 ` Paolo Bonzini
  0 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2019-03-14 16:00 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrea Arcangeli, Peter Xu, Mike Kravetz, LKML, Hugh Dickins,
	Luis Chamberlain, Maxime Coquelin, kvm, Jerome Glisse,
	Pavel Emelyanov, Johannes Weiner, Martin Cracauer,
	Denis Plotnikov, linux-mm, Marty McFadden, Maya Gokhale,
	Mike Rapoport, Kees Cook, Mel Gorman, Kirill A . Shutemov,
	Linux-Fsdevel, Dr . David Alan Gilbert, Andrew Morton,
	Daniel Borkmann

On 14/03/19 16:23, Alexei Starovoitov wrote:
> On Thu, Mar 14, 2019 at 4:00 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> On 14/03/19 00:44, Andrea Arcangeli wrote:
>>> Then I thought we can add a tristate so an open of /dev/kvm would also
>>> allow the syscall to make things more user friendly because
>>> unprivileged containers ideally should have writable mounts done with
>>> nodev and no matter the privilege they shouldn't ever get an hold on
>>> the KVM driver (and those who do, like kubevirt, will then just work).
>>
>> I wouldn't even bother with the KVM special case.  Containers can use
>> seccomp if they want a fine-grained policy.
>>
>> (Actually I wouldn't bother with the knob at all; the attack surface of
>> userfaultfd is infinitesimal compared to the BPF JIT...).
> 
> please name _one_ BPF JIT bug that affected unprivileged user space.

I didn't say there were any bugs, I talked about attack surface.  The
potential impact would obviously be much bigger and, even leaving the
JIT aside, the userspace API is much more complex.

All this is just about paranoia, not about past experience.

Paolo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/3] userfaultfd: allow to forbid unprivileged users
  2019-03-14 10:58             ` Paolo Bonzini
  2019-03-14 15:23               ` Alexei Starovoitov
@ 2019-03-14 16:16               ` Andrea Arcangeli
  2019-03-15 16:09                 ` Kees Cook
  1 sibling, 1 reply; 28+ messages in thread
From: Andrea Arcangeli @ 2019-03-14 16:16 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Xu, Mike Kravetz, linux-kernel, Hugh Dickins,
	Luis Chamberlain, Maxime Coquelin, kvm, Jerome Glisse,
	Pavel Emelyanov, Johannes Weiner, Martin Cracauer,
	Denis Plotnikov, linux-mm, Marty McFadden, Maya Gokhale,
	Mike Rapoport, Kees Cook, Mel Gorman, Kirill A . Shutemov,
	linux-fsdevel, Dr . David Alan Gilbert, Andrew Morton

On Thu, Mar 14, 2019 at 11:58:15AM +0100, Paolo Bonzini wrote:
> On 14/03/19 00:44, Andrea Arcangeli wrote:
> > Then I thought we can add a tristate so an open of /dev/kvm would also
> > allow the syscall to make things more user friendly because
> > unprivileged containers ideally should have writable mounts done with
> > nodev and no matter the privilege they shouldn't ever get an hold on
> > the KVM driver (and those who do, like kubevirt, will then just work).
> 
> I wouldn't even bother with the KVM special case.  Containers can use
> seccomp if they want a fine-grained policy.

We can have a single boolean 0|1 and stick to a simpler sysctl and no
gid and if you want to use userfaultfd you need to enable it for all
users. I agree seccomp already provides more than enough granularity
to do more finegrined choices.

So this will be for who's paranoid and prefers to disable userfaultfd
as a whole as an hardening feature like the bpf sysctl allows: it will
allow to block uffd syscall without having to rebuild the kernel with
CONFIG_USERFAULTFD=n in environments where seccomp cannot be easily
enabled (i.e. without requiring userland changes).

That's very fine with me, but then it wasn't me complaining in the
first place. Kees?

If the above is ok, we can implement it as a static key, not that the
syscall itself is particularly performance critical but it'll be
simple enough as a boolean (only the ioctl are performance critical
but those are unaffected).

The blog post about UAF is not particularly interesting in my view,
unless both of the following points are true 1) it can be also proven
that the very same two UAF bugs, cannot be exploited by other means
(as far as I can tell it can be exploited by other means regardless of
userfaultfd) and 2) the slab randomization was actually enabled (99%
of the time in all POC all randomization features like kalsr are
incidentally disabled first to facilitate publishing papers and blog
posts, but those are really the features intended to reduce the
reproduciblity of exploits against UAF bugs, not disabling userfaultfd
which only provides a minor advantage, and unlike in PoC environments,
we enable those slab randomization in production 100% of the time
whenever they're available in the kernel).

Thanks,
Andrea

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/3] userfaultfd: allow to forbid unprivileged users
  2019-03-13 17:50     ` Mike Kravetz
@ 2019-03-15  8:26       ` Peter Xu
  0 siblings, 0 replies; 28+ messages in thread
From: Peter Xu @ 2019-03-15  8:26 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-kernel, Paolo Bonzini, Hugh Dickins, Luis Chamberlain,
	Maxime Coquelin, kvm, Jerome Glisse, Pavel Emelyanov,
	Johannes Weiner, Martin Cracauer, Denis Plotnikov, linux-mm,
	Marty McFadden, Maya Gokhale, Andrea Arcangeli, Mike Rapoport,
	Kees Cook, Mel Gorman, Kirill A . Shutemov, linux-fsdevel,
	Dr . David Alan Gilbert, Andrew Morton

On Wed, Mar 13, 2019 at 10:50:48AM -0700, Mike Kravetz wrote:
> On 3/12/19 11:00 PM, Peter Xu wrote:
> > On Tue, Mar 12, 2019 at 12:59:34PM -0700, Mike Kravetz wrote:
> >> On 3/11/19 2:36 AM, Peter Xu wrote:
> >>>
> >>> The "kvm" entry is a bit special here only to make sure that existing
> >>> users like QEMU/KVM won't break by this newly introduced flag.  What
> >>> we need to do is simply set the "unprivileged_userfaultfd" flag to
> >>> "kvm" here to automatically grant userfaultfd permission for processes
> >>> like QEMU/KVM without extra code to tweak these flags in the admin
> >>> code.
> >>
> >> Another user is Oracle DB, specifically with hugetlbfs.  For them, we would
> >> like to add a special case like kvm described above.  The admin controls
> >> who can have access to hugetlbfs, so I think adding code to the open
> >> routine as in patch 2 of this series would seem to work.
> > 
> > Yes I think if there's an explicit and safe place we can hook for
> > hugetlbfs then we can do the similar trick as KVM case.  Though I
> > noticed that we can not only create hugetlbfs files under the
> > mountpoint (which the admin can control), but also using some other
> > ways.  The question (of me... sorry if it's a silly one!) is whether
> > all other ways to use hugetlbfs is still under control of the admin.
> > One I know of is memfd_create() which seems to be doable even as
> > unprivileged users.  If so, should we only limit the uffd privilege to
> > those hugetlbfs users who use the mountpoint directly?
> 
> Wow!  I did not realize that apps which specify mmap(MAP_HUGETLB) do not
> need any special privilege to use huge pages.  Honestly, I am not sure if
> that was by design or a bug.  The memfd_create code is based on the MAP_HUGETLB
> code and also does not need any special privilege.  Not to sidetrack this
> discussion, but people on Cc may know if this is a bug or by design.  My
> opinion is that huge pages are a limited resource and should be under control.
> One needs to be a member of a special group (or root) to access via System V
> interfaces.

Yeah I completely agree that huge pages should need some special
care...

> 
> The DB use case only does mmap of files in an explicitly mounted filesystem.
> So, limiting it in that manner would work for them.
> 
> > Another question is about fork() of privileged processes - for KVM we
> > only grant privilege for the exact process that opened the /dev/kvm
> > node, and the privilege will be lost for any forked childrens.  Is
> > that the same thing for OracleDB/Hugetlbfs?
> 
> I need to confirm with the DB people, but it is my understanding that the
> exact process which does the open/mmap will be the one using userfaultfd.

It'll be nice if these can be confirmed and if above proposal could
still be an alternative for us (grant privilege for processes who do
mknod() upon the hugetlbfs mountpoint; drop privilege when fork as
usual), since IMHO it is still the simplest approach comparing to what
we've discussed in the other threads...

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/3] userfaultfd: allow to forbid unprivileged users
  2019-03-14 16:16               ` Andrea Arcangeli
@ 2019-03-15 16:09                 ` Kees Cook
  0 siblings, 0 replies; 28+ messages in thread
From: Kees Cook @ 2019-03-15 16:09 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Paolo Bonzini, Peter Xu, Mike Kravetz, LKML, Hugh Dickins,
	Luis Chamberlain, Maxime Coquelin, KVM, Jerome Glisse,
	Pavel Emelyanov, Johannes Weiner, Martin Cracauer,
	Denis Plotnikov, Linux-MM, Marty McFadden, Maya Gokhale,
	Mike Rapoport, Mel Gorman, Kirill A . Shutemov, linux-fsdevel,
	Dr . David Alan Gilbert, Andrew Morton

On Thu, Mar 14, 2019 at 9:16 AM Andrea Arcangeli <aarcange@redhat.com> wrote:
> So this will be for who's paranoid and prefers to disable userfaultfd
> as a whole as an hardening feature like the bpf sysctl allows: it will
> allow to block uffd syscall without having to rebuild the kernel with
> CONFIG_USERFAULTFD=n in environments where seccomp cannot be easily
> enabled (i.e. without requiring userland changes).
>
> That's very fine with me, but then it wasn't me complaining in the
> first place. Kees?

I'm fine with a boolean. I just wanted to find a way to disable at
runtime (so distro users had it available to them).

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2019-03-15 16:15 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-11  9:36 [PATCH 0/3] userfaultfd: allow to forbid unprivileged users Peter Xu
2019-03-11  9:36 ` [PATCH 1/3] userfaultfd/sysctl: introduce unprivileged_userfaultfd Peter Xu
2019-03-12  6:58   ` Mike Rapoport
2019-03-12 12:26     ` Peter Xu
2019-03-12 13:53       ` Mike Rapoport
2019-03-11  9:37 ` [PATCH 2/3] kvm/mm: introduce MMF_USERFAULTFD_ALLOW flag Peter Xu
2019-03-11  9:37 ` [PATCH 3/3] userfaultfd: apply unprivileged_userfaultfd check Peter Xu
2019-03-11  9:58   ` Peter Xu
2019-03-12  7:01 ` [PATCH 0/3] userfaultfd: allow to forbid unprivileged users Mike Rapoport
2019-03-12 12:29   ` Peter Xu
2019-03-12  7:49 ` Kirill A. Shutemov
2019-03-12 12:43   ` Peter Xu
2019-03-12 19:59 ` Mike Kravetz
2019-03-13  6:00   ` Peter Xu
2019-03-13  8:22     ` Paolo Bonzini
2019-03-13 18:52       ` Andrea Arcangeli
2019-03-13 19:12         ` Paolo Bonzini
2019-03-13 23:44           ` Andrea Arcangeli
2019-03-14 10:58             ` Paolo Bonzini
2019-03-14 15:23               ` Alexei Starovoitov
2019-03-14 16:00                 ` Paolo Bonzini
2019-03-14 16:16               ` Andrea Arcangeli
2019-03-15 16:09                 ` Kees Cook
2019-03-13 20:01         ` Mike Kravetz
2019-03-13 23:55           ` Andrea Arcangeli
2019-03-14  3:32             ` Mike Kravetz
2019-03-13 17:50     ` Mike Kravetz
2019-03-15  8:26       ` Peter Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).