LKML Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 0/7] Harden userfaultfd
@ 2019-10-12 19:15 Daniel Colascione
  2019-10-12 19:15 ` [PATCH 1/7] Add a new flags-accepting interface for anonymous inodes Daniel Colascione
                   ` (8 more replies)
  0 siblings, 9 replies; 44+ messages in thread
From: Daniel Colascione @ 2019-10-12 19:15 UTC (permalink / raw)
  To: linux-api, linux-kernel, lokeshgidra, dancol, nnk; +Cc: nosh, timmurray

Userfaultfd in unprivileged contexts could be potentially very
useful. We'd like to harden userfaultfd to make such unprivileged use
less risky. This patch series allows SELinux to manage userfaultfd
file descriptors (via a new flag, for compatibility with existing
code) and allows administrators to limit userfaultfd to servicing
user-mode faults, increasing the difficulty of using userfaultfd in
exploit chains invoking delaying kernel faults.

A new anon_inodes interface allows callers to opt into SELinux
management of anonymous file objects. In this mode, anon_inodes
creates new ephemeral inodes for anonymous file objects instead of
reusing a singleton dummy inode. A new LSM hook gives security modules
an opportunity to configure and veto these ephemeral inodes.

Existing anon_inodes users must opt into the new functionality.

Daniel Colascione (7):
  Add a new flags-accepting interface for anonymous inodes
  Add a concept of a "secure" anonymous file
  Add a UFFD_SECURE flag to the userfaultfd API.
  Teach SELinux about a new userfaultfd class
  Let userfaultfd opt out of handling kernel-mode faults
  Allow users to require UFFD_SECURE
  Add a new sysctl for limiting userfaultfd to user mode faults

 Documentation/admin-guide/sysctl/vm.rst | 19 +++++-
 fs/anon_inodes.c                        | 89 +++++++++++++++++--------
 fs/userfaultfd.c                        | 47 +++++++++++--
 include/linux/anon_inodes.h             | 27 ++++++--
 include/linux/lsm_hooks.h               |  8 +++
 include/linux/security.h                |  2 +
 include/linux/userfaultfd_k.h           |  3 +
 include/uapi/linux/userfaultfd.h        | 14 ++++
 kernel/sysctl.c                         |  9 +++
 security/security.c                     |  8 +++
 security/selinux/hooks.c                | 68 +++++++++++++++++++
 security/selinux/include/classmap.h     |  2 +
 12 files changed, 256 insertions(+), 40 deletions(-)

-- 
2.23.0.700.g56cf767bdb-goog


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 1/7] Add a new flags-accepting interface for anonymous inodes
  2019-10-12 19:15 [PATCH 0/7] Harden userfaultfd Daniel Colascione
@ 2019-10-12 19:15 ` Daniel Colascione
  2019-10-14  4:26   ` kbuild test robot
                     ` (2 more replies)
  2019-10-12 19:15 ` [PATCH 2/7] Add a concept of a "secure" anonymous file Daniel Colascione
                   ` (7 subsequent siblings)
  8 siblings, 3 replies; 44+ messages in thread
From: Daniel Colascione @ 2019-10-12 19:15 UTC (permalink / raw)
  To: linux-api, linux-kernel, lokeshgidra, dancol, nnk; +Cc: nosh, timmurray

Add functions forwarding from the old names to the new ones so we
don't need to change any callers.

Signed-off-by: Daniel Colascione <dancol@google.com>
---
 fs/anon_inodes.c            | 62 ++++++++++++++++++++++---------------
 include/linux/anon_inodes.h | 27 +++++++++++++---
 2 files changed, 59 insertions(+), 30 deletions(-)

diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
index 89714308c25b..caa36019afca 100644
--- a/fs/anon_inodes.c
+++ b/fs/anon_inodes.c
@@ -56,60 +56,71 @@ static struct file_system_type anon_inode_fs_type = {
 };
 
 /**
- * anon_inode_getfile - creates a new file instance by hooking it up to an
- *                      anonymous inode, and a dentry that describe the "class"
- *                      of the file
+ * anon_inode_getfile2 - creates a new file instance by hooking it up to
+ *                       an anonymous inode, and a dentry that describe
+ *                       the "class" of the file
  *
  * @name:    [in]    name of the "class" of the new file
  * @fops:    [in]    file operations for the new file
  * @priv:    [in]    private data for the new file (will be file's private_data)
- * @flags:   [in]    flags
+ * @flags:   [in]    flags for the file
+ * @anon_inode_flags: [in] flags for anon_inode*
  *
- * Creates a new file by hooking it on a single inode. This is useful for files
+ * Creates a new file by hooking it on an unspecified inode. This is useful for files
  * that do not need to have a full-fledged inode in order to operate correctly.
  * All the files created with anon_inode_getfile() will share a single inode,
  * hence saving memory and avoiding code duplication for the file/inode/dentry
  * setup.  Returns the newly created file* or an error pointer.
+ *
+ * anon_inode_flags must be zero.
  */
-struct file *anon_inode_getfile(const char *name,
-				const struct file_operations *fops,
-				void *priv, int flags)
+struct file *anon_inode_getfile2(const char *name,
+				 const struct file_operations *fops,
+				 void *priv, int flags, int anon_inode_flags)
 {
+	struct inode *inode;
 	struct file *file;
 
-	if (IS_ERR(anon_inode_inode))
-		return ERR_PTR(-ENODEV);
-
-	if (fops->owner && !try_module_get(fops->owner))
-		return ERR_PTR(-ENOENT);
+	if (anon_inode_flags)
+		return ERR_PTR(-EINVAL);
 
+	inode =	anon_inode_inode;
+	if (IS_ERR(inode))
+		return ERR_PTR(-ENODEV);
 	/*
-	 * We know the anon_inode inode count is always greater than zero,
-	 * so ihold() is safe.
+	 * We know the anon_inode inode count is always
+	 * greater than zero, so ihold() is safe.
 	 */
-	ihold(anon_inode_inode);
-	file = alloc_file_pseudo(anon_inode_inode, anon_inode_mnt, name,
+	ihold(inode);
+
+	if (fops->owner && !try_module_get(fops->owner)) {
+		file = ERR_PTR(-ENOENT);
+		goto err;
+	}
+
+	file = alloc_file_pseudo(inode, anon_inode_mnt, name,
 				 flags & (O_ACCMODE | O_NONBLOCK), fops);
 	if (IS_ERR(file))
 		goto err;
 
-	file->f_mapping = anon_inode_inode->i_mapping;
+	file->f_mapping = inode->i_mapping;
 
 	file->private_data = priv;
 
 	return file;
 
 err:
-	iput(anon_inode_inode);
+	iput(inode);
 	module_put(fops->owner);
 	return file;
 }
 EXPORT_SYMBOL_GPL(anon_inode_getfile);
+EXPORT_SYMBOL_GPL(anon_inode_getfile2);
 
 /**
- * anon_inode_getfd - creates a new file instance by hooking it up to an
- *                    anonymous inode, and a dentry that describe the "class"
- *                    of the file
+ * anon_inode_getfd2 - creates a new file instance by hooking it up to an
+ *                     anonymous inode, and a dentry that describe the "class"
+ *                     of the file
  *
  * @name:    [in]    name of the "class" of the new file
  * @fops:    [in]    file operations for the new file
@@ -122,8 +133,8 @@ EXPORT_SYMBOL_GPL(anon_inode_getfile);
  * hence saving memory and avoiding code duplication for the file/inode/dentry
  * setup.  Returns new descriptor or an error code.
  */
-int anon_inode_getfd(const char *name, const struct file_operations *fops,
-		     void *priv, int flags)
+int anon_inode_getfd2(const char *name, const struct file_operations *fops,
+		      void *priv, int flags, int anon_inode_flags)
 {
 	int error, fd;
 	struct file *file;
@@ -133,7 +144,7 @@ int anon_inode_getfd(const char *name, const struct file_operations *fops,
 		return error;
 	fd = error;
 
-	file = anon_inode_getfile(name, fops, priv, flags);
+	file = anon_inode_getfile2(name, fops, priv, flags, anon_inode_flags);
 	if (IS_ERR(file)) {
 		error = PTR_ERR(file);
 		goto err_put_unused_fd;
@@ -147,6 +158,7 @@ int anon_inode_getfd(const char *name, const struct file_operations *fops,
 	return error;
 }
 EXPORT_SYMBOL_GPL(anon_inode_getfd);
+EXPORT_SYMBOL_GPL(anon_inode_getfd2);
 
 static int __init anon_inode_init(void)
 {
diff --git a/include/linux/anon_inodes.h b/include/linux/anon_inodes.h
index d0d7d96261ad..10699462dcb1 100644
--- a/include/linux/anon_inodes.h
+++ b/include/linux/anon_inodes.h
@@ -11,11 +11,28 @@
 
 struct file_operations;
 
-struct file *anon_inode_getfile(const char *name,
-				const struct file_operations *fops,
-				void *priv, int flags);
-int anon_inode_getfd(const char *name, const struct file_operations *fops,
-		     void *priv, int flags);
+#define ANON_INODE_SECURE 1
+
+struct file *anon_inode_getfile2(const char *name,
+				 const struct file_operations *fops,
+				 void *priv, int flags, int anon_inode_flags);
+int anon_inode_getfd2(const char *name, const struct file_operations *fops,
+		      void *priv, int flags, int anon_inode_flags);
+
+static inline int anon_inode_getfd(const char *name,
+				   const struct file_operations *fops,
+				   void *priv, int flags)
+{
+	return anon_inode_getfd2(name, fops, priv, flags, 0);
+}
+
+static inline struct file *anon_inode_getfile(const char *name,
+					      const struct file_operations *fops,
+					      void *priv, int flags)
+{
+	return anon_inode_getfile2(name, fops, priv, flags, 0);
+}
+
 
 #endif /* _LINUX_ANON_INODES_H */
 
-- 
2.23.0.700.g56cf767bdb-goog


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 2/7] Add a concept of a "secure" anonymous file
  2019-10-12 19:15 [PATCH 0/7] Harden userfaultfd Daniel Colascione
  2019-10-12 19:15 ` [PATCH 1/7] Add a new flags-accepting interface for anonymous inodes Daniel Colascione
@ 2019-10-12 19:15 ` Daniel Colascione
  2019-10-14  3:01   ` kbuild test robot
  2019-10-15  8:08   ` Christoph Hellwig
  2019-10-12 19:15 ` [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API Daniel Colascione
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 44+ messages in thread
From: Daniel Colascione @ 2019-10-12 19:15 UTC (permalink / raw)
  To: linux-api, linux-kernel, lokeshgidra, dancol, nnk; +Cc: nosh, timmurray

A secure anonymous file is one we hooked up to its own inode (as
opposed to the shared inode we use for non-secure anonymous files). A
new selinux hook gives security modules a chance to initialize, label,
and veto the creation of these secure anonymous files. Security
modules had limit ability to interact with non-secure anonymous files
due to all of these files sharing a single inode.

Signed-off-by: Daniel Colascione <dancol@google.com>
---
 fs/anon_inodes.c          | 45 ++++++++++++++++++++++++++++++---------
 include/linux/lsm_hooks.h |  8 +++++++
 include/linux/security.h  |  2 ++
 security/security.c       |  8 +++++++
 4 files changed, 53 insertions(+), 10 deletions(-)

diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
index caa36019afca..d68d76523ad3 100644
--- a/fs/anon_inodes.c
+++ b/fs/anon_inodes.c
@@ -55,6 +55,23 @@ static struct file_system_type anon_inode_fs_type = {
 	.kill_sb	= kill_anon_super,
 };
 
+struct inode *anon_inode_make_secure_inode(const char *name,
+					   const struct file_operations *fops)
+{
+	struct inode *inode;
+	int error;
+	inode = alloc_anon_inode(anon_inode_mnt->mnt_sb);
+	if (IS_ERR(inode))
+		return ERR_PTR(PTR_ERR(inode));
+	inode->i_flags &= ~S_PRIVATE;
+	error =	security_inode_init_security_anon(inode, name, fops);
+	if (error) {
+		iput(inode);
+		return ERR_PTR(error);
+	}
+	return inode;
+}
+
 /**
  * anon_inode_getfile2 - creates a new file instance by hooking it up to
  *                       an anonymous inode, and a dentry that describe
@@ -72,7 +89,9 @@ static struct file_system_type anon_inode_fs_type = {
  * hence saving memory and avoiding code duplication for the file/inode/dentry
  * setup.  Returns the newly created file* or an error pointer.
  *
- * anon_inode_flags must be zero.
+ * If anon_inode_flags contains ANON_INODE_SECURE, create a new inode
+ * and enable security checks for it. Otherwise, attach a new file to
+ * a singleton placeholder inode with security checks disabled.
  */
 struct file *anon_inode_getfile2(const char *name,
 				 const struct file_operations *fops,
@@ -81,17 +100,23 @@ struct file *anon_inode_getfile2(const char *name,
 	struct inode *inode;
 	struct file *file;
 
-	if (anon_inode_flags)
+	if (anon_inode_flags & ~ANON_INODE_SECURE)
 		return ERR_PTR(-EINVAL);
 
-	inode =	anon_inode_inode;
-	if (IS_ERR(inode))
-		return ERR_PTR(-ENODEV);
-	/*
-	 * We know the anon_inode inode count is always
-	 * greater than zero, so ihold() is safe.
-	 */
-	ihold(inode);
+	if (anon_inode_flags & ANON_INODE_SECURE) {
+		inode =	anon_inode_make_secure_inode(name, fops);
+		if (IS_ERR(inode))
+			return ERR_PTR(PTR_ERR(inode));
+	} else {
+		inode =	anon_inode_inode;
+		if (IS_ERR(inode))
+			return ERR_PTR(-ENODEV);
+		/*
+		 * We know the anon_inode inode count is always
+		 * greater than zero, so ihold() is safe.
+		 */
+		ihold(inode);
+	}
 
 	if (fops->owner && !try_module_get(fops->owner)) {
 		file = ERR_PTR(-ENOENT);
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index a3763247547c..3744ce9e9172 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -215,6 +215,10 @@
  *	Returns 0 if @name and @value have been successfully set,
  *	-EOPNOTSUPP if no security attribute is needed, or
  *	-ENOMEM on memory allocation failure.
+ * @inode_init_security_anon:
+ *      Set up a secure anonymous inode.
+ *	Returns 0 on success. Returns -EPERM if	the security module denies
+ *	the creation of this inode.
  * @inode_create:
  *	Check permission to create a regular file.
  *	@dir contains inode structure of the parent of the new file.
@@ -1552,6 +1556,9 @@ union security_list_options {
 					const struct qstr *qstr,
 					const char **name, void **value,
 					size_t *len);
+	int (*inode_init_security_anon)(struct inode *inode,
+					const char *name,
+					const struct file_operations *fops);
 	int (*inode_create)(struct inode *dir, struct dentry *dentry,
 				umode_t mode);
 	int (*inode_link)(struct dentry *old_dentry, struct inode *dir,
@@ -1876,6 +1883,7 @@ struct security_hook_heads {
 	struct hlist_head inode_alloc_security;
 	struct hlist_head inode_free_security;
 	struct hlist_head inode_init_security;
+	struct hlist_head inode_init_security_anon;
 	struct hlist_head inode_create;
 	struct hlist_head inode_link;
 	struct hlist_head inode_unlink;
diff --git a/include/linux/security.h b/include/linux/security.h
index a8d59d612d27..5b6f7e0de577 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -315,6 +315,8 @@ void security_inode_free(struct inode *inode);
 int security_inode_init_security(struct inode *inode, struct inode *dir,
 				 const struct qstr *qstr,
 				 initxattrs initxattrs, void *fs_data);
+int security_inode_init_security_anon(struct inode *inode, const char *name,
+				      const struct file_operations *fops);
 int security_old_inode_init_security(struct inode *inode, struct inode *dir,
 				     const struct qstr *qstr, const char **name,
 				     void **value, size_t *len);
diff --git a/security/security.c b/security/security.c
index 1bc000f834e2..c87695f66413 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1001,6 +1001,14 @@ int security_inode_init_security(struct inode *inode, struct inode *dir,
 }
 EXPORT_SYMBOL(security_inode_init_security);
 
+int
+security_inode_init_security_anon(struct inode *inode,
+				  const char *name,
+				  const struct file_operations *fops)
+{
+	return call_int_hook(inode_init_security_anon, 0, inode, name, fops);
+}
+
 int security_old_inode_init_security(struct inode *inode, struct inode *dir,
 				     const struct qstr *qstr, const char **name,
 				     void **value, size_t *len)
-- 
2.23.0.700.g56cf767bdb-goog


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API.
  2019-10-12 19:15 [PATCH 0/7] Harden userfaultfd Daniel Colascione
  2019-10-12 19:15 ` [PATCH 1/7] Add a new flags-accepting interface for anonymous inodes Daniel Colascione
  2019-10-12 19:15 ` [PATCH 2/7] Add a concept of a "secure" anonymous file Daniel Colascione
@ 2019-10-12 19:15 ` Daniel Colascione
  2019-10-12 23:10   ` Andy Lutomirski
  2019-10-12 19:15 ` [PATCH 4/7] Teach SELinux about a new userfaultfd class Daniel Colascione
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 44+ messages in thread
From: Daniel Colascione @ 2019-10-12 19:15 UTC (permalink / raw)
  To: linux-api, linux-kernel, lokeshgidra, dancol, nnk; +Cc: nosh, timmurray

The new secure flag makes userfaultfd use a new "secure" anonymous
file object instead of the default one, letting security modules
supervise userfaultfd use.

Requiring that users pass a new flag lets us avoid changing the
semantics for existing callers.

Signed-off-by: Daniel Colascione <dancol@google.com>
---
 fs/userfaultfd.c                 | 28 +++++++++++++++++++++++++---
 include/uapi/linux/userfaultfd.h |  8 ++++++++
 2 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index f9fd18670e22..29f920fb236e 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1022,6 +1022,13 @@ static int resolve_userfault_fork(struct userfaultfd_ctx *ctx,
 {
 	int fd;
 
+	/*
+	 * Using a secure-mode UFFD to monitor forks isn't supported
+	 * right now.
+	 */
+	if (new->flags & UFFD_SECURE)
+		return -EOPNOTSUPP;
+
 	fd = anon_inode_getfd("[userfaultfd]", &userfaultfd_fops, new,
 			      O_RDWR | (new->flags & UFFD_SHARED_FCNTL_FLAGS));
 	if (fd < 0)
@@ -1841,6 +1848,18 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx,
 		ret = -EINVAL;
 		goto out;
 	}
+	if ((ctx->flags & UFFD_SECURE) &&
+	    (features & UFFD_FEATURE_EVENT_FORK)) {
+		/*
+		 * We don't support UFFD_FEATURE_EVENT_FORK on a
+		 * secure-mode UFFD: doing so would need us to
+		 * construct the new file object in the context of the
+		 * fork child, and it's not worth it right now.
+		 */
+		ret = -EINVAL;
+		goto out;
+	}
+
 	/* report all available features and ioctls to userland */
 	uffdio_api.features = UFFD_API_FEATURES;
 	uffdio_api.ioctls = UFFD_API_IOCTLS;
@@ -1942,6 +1961,7 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
 {
 	struct userfaultfd_ctx *ctx;
 	int fd;
+	static const int uffd_flags = UFFD_SECURE;
 
 	if (!sysctl_unprivileged_userfaultfd && !capable(CAP_SYS_PTRACE))
 		return -EPERM;
@@ -1951,8 +1971,9 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
 	/* Check the UFFD_* constants for consistency.  */
 	BUILD_BUG_ON(UFFD_CLOEXEC != O_CLOEXEC);
 	BUILD_BUG_ON(UFFD_NONBLOCK != O_NONBLOCK);
+	BUILD_BUG_ON(UFFD_SHARED_FCNTL_FLAGS & uffd_flags);
 
-	if (flags & ~UFFD_SHARED_FCNTL_FLAGS)
+	if (flags & ~(UFFD_SHARED_FCNTL_FLAGS | uffd_flags))
 		return -EINVAL;
 
 	ctx = kmem_cache_alloc(userfaultfd_ctx_cachep, GFP_KERNEL);
@@ -1969,8 +1990,9 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
 	/* prevent the mm struct to be freed */
 	mmgrab(ctx->mm);
 
-	fd = anon_inode_getfd("[userfaultfd]", &userfaultfd_fops, ctx,
-			      O_RDWR | (flags & UFFD_SHARED_FCNTL_FLAGS));
+	fd = anon_inode_getfd2("[userfaultfd]", &userfaultfd_fops, ctx,
+			       O_RDWR | (flags & UFFD_SHARED_FCNTL_FLAGS),
+			       ((flags & UFFD_SECURE) ? ANON_INODE_SECURE : 0));
 	if (fd < 0) {
 		mmdrop(ctx->mm);
 		kmem_cache_free(userfaultfd_ctx_cachep, ctx);
diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
index 48f1a7c2f1f0..12d7d40d7f25 100644
--- a/include/uapi/linux/userfaultfd.h
+++ b/include/uapi/linux/userfaultfd.h
@@ -231,4 +231,12 @@ struct uffdio_zeropage {
 	__s64 zeropage;
 };
 
+/*
+ * Flags for the userfaultfd(2) system call itself.
+ */
+
+/*
+ * Create a userfaultfd with MAC security checks enabled.
+ */
+#define UFFD_SECURE 1
 #endif /* _LINUX_USERFAULTFD_H */
-- 
2.23.0.700.g56cf767bdb-goog


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 4/7] Teach SELinux about a new userfaultfd class
  2019-10-12 19:15 [PATCH 0/7] Harden userfaultfd Daniel Colascione
                   ` (2 preceding siblings ...)
  2019-10-12 19:15 ` [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API Daniel Colascione
@ 2019-10-12 19:15 ` Daniel Colascione
  2019-10-12 23:08   ` Andy Lutomirski
  2019-10-12 19:16 ` [PATCH 5/7] Let userfaultfd opt out of handling kernel-mode faults Daniel Colascione
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 44+ messages in thread
From: Daniel Colascione @ 2019-10-12 19:15 UTC (permalink / raw)
  To: linux-api, linux-kernel, lokeshgidra, dancol, nnk; +Cc: nosh, timmurray

Use the secure anonymous inode LSM hook we just added to let SELinux
policy place restrictions on userfaultfd use. The create operation
applies to processes creating new instances of these file objects;
transfer between processes is covered by restrictions on read, write,
and ioctl access already checked inside selinux_file_receive.

Signed-off-by: Daniel Colascione <dancol@google.com>
---
 fs/userfaultfd.c                    |  4 +-
 include/linux/userfaultfd_k.h       |  2 +
 security/selinux/hooks.c            | 68 +++++++++++++++++++++++++++++
 security/selinux/include/classmap.h |  2 +
 4 files changed, 73 insertions(+), 3 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 29f920fb236e..1123089c3d55 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1014,8 +1014,6 @@ static __poll_t userfaultfd_poll(struct file *file, poll_table *wait)
 	}
 }
 
-static const struct file_operations userfaultfd_fops;
-
 static int resolve_userfault_fork(struct userfaultfd_ctx *ctx,
 				  struct userfaultfd_ctx *new,
 				  struct uffd_msg *msg)
@@ -1934,7 +1932,7 @@ static void userfaultfd_show_fdinfo(struct seq_file *m, struct file *f)
 }
 #endif
 
-static const struct file_operations userfaultfd_fops = {
+const struct file_operations userfaultfd_fops = {
 #ifdef CONFIG_PROC_FS
 	.show_fdinfo	= userfaultfd_show_fdinfo,
 #endif
diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index ac9d71e24b81..549c8b0cca52 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -30,6 +30,8 @@
 
 extern int sysctl_unprivileged_userfaultfd;
 
+extern const struct file_operations userfaultfd_fops;
+
 extern vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason);
 
 extern ssize_t mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start,
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 9625b99e677f..0b3a36cbfbdc 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -92,6 +92,10 @@
 #include <linux/fsnotify.h>
 #include <linux/fanotify.h>
 
+#ifdef CONFIG_USERFAULTFD
+#include <linux/userfaultfd_k.h>
+#endif
+
 #include "avc.h"
 #include "objsec.h"
 #include "netif.h"
@@ -2943,6 +2947,69 @@ static int selinux_inode_init_security(struct inode *inode, struct inode *dir,
 	return 0;
 }
 
+static int selinux_inode_init_security_anon(struct inode *inode,
+					    const char *name,
+					    const struct file_operations *fops)
+{
+	const struct task_security_struct *tsec = selinux_cred(current_cred());
+	struct common_audit_data ad;
+	struct inode_security_struct *isec;
+
+	if (unlikely(IS_PRIVATE(inode)))
+		return 0;
+
+	/*
+	 * We shouldn't be creating secure anonymous inodes before LSM
+	 * initialization completes.
+	 */
+	if (unlikely(!selinux_state.initialized))
+		return -EBUSY;
+
+	isec = selinux_inode(inode);
+
+	/*
+	 * We only get here once per ephemeral inode.  The inode has
+	 * been initialized via inode_alloc_security but is otherwise
+	 * untouched, so check that the state is as
+	 * inode_alloc_security left it.
+	 */
+	BUG_ON(isec->initialized != LABEL_INVALID);
+	BUG_ON(isec->sclass != SECCLASS_FILE);
+
+#ifdef CONFIG_USERFAULTFD
+	if (fops == &userfaultfd_fops)
+		isec->sclass = SECCLASS_UFFD;
+#endif
+
+	if (isec->sclass == SECCLASS_FILE) {
+		printk(KERN_WARNING "refusing to create secure anonymous inode "
+		       "of unknown type");
+		return -EOPNOTSUPP;
+	}
+	/*
+	 * Always give secure anonymous inodes the sid of the
+	 * creating task.
+	 */
+
+	isec->sid = tsec->sid;
+	isec->initialized = LABEL_INITIALIZED;
+
+	/*
+	 * Now that we've initialized security, check whether we're
+	 * allowed to actually create this type of anonymous inode.
+	 */
+
+	ad.type = LSM_AUDIT_DATA_INODE;
+	ad.u.inode = inode;
+
+	return avc_has_perm(&selinux_state,
+			    tsec->sid,
+			    isec->sid,
+			    isec->sclass,
+			    FILE__CREATE,
+			    &ad);
+}
+
 static int selinux_inode_create(struct inode *dir, struct dentry *dentry, umode_t mode)
 {
 	return may_create(dir, dentry, SECCLASS_FILE);
@@ -6840,6 +6907,7 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = {
 	LSM_HOOK_INIT(inode_alloc_security, selinux_inode_alloc_security),
 	LSM_HOOK_INIT(inode_free_security, selinux_inode_free_security),
 	LSM_HOOK_INIT(inode_init_security, selinux_inode_init_security),
+	LSM_HOOK_INIT(inode_init_security_anon, selinux_inode_init_security_anon),
 	LSM_HOOK_INIT(inode_create, selinux_inode_create),
 	LSM_HOOK_INIT(inode_link, selinux_inode_link),
 	LSM_HOOK_INIT(inode_unlink, selinux_inode_unlink),
diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h
index 32e9b03be3dd..41bc5da78048 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -244,6 +244,8 @@ struct security_class_mapping secclass_map[] = {
 	  {"map_create", "map_read", "map_write", "prog_load", "prog_run"} },
 	{ "xdp_socket",
 	  { COMMON_SOCK_PERMS, NULL } },
+	{ "uffd",
+	  { COMMON_FILE_PERMS, NULL } },
 	{ NULL }
   };
 
-- 
2.23.0.700.g56cf767bdb-goog


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 5/7] Let userfaultfd opt out of handling kernel-mode faults
  2019-10-12 19:15 [PATCH 0/7] Harden userfaultfd Daniel Colascione
                   ` (3 preceding siblings ...)
  2019-10-12 19:15 ` [PATCH 4/7] Teach SELinux about a new userfaultfd class Daniel Colascione
@ 2019-10-12 19:16 ` Daniel Colascione
  2019-10-12 19:16 ` [PATCH 6/7] Allow users to require UFFD_SECURE Daniel Colascione
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 44+ messages in thread
From: Daniel Colascione @ 2019-10-12 19:16 UTC (permalink / raw)
  To: linux-api, linux-kernel, lokeshgidra, dancol, nnk; +Cc: nosh, timmurray

userfaultfd handles page faults from both user and kernel code.  Add a
new UFFD_USER_MODE_ONLY flag for userfaultfd(2) that makes the
resulting userfaultfd object refuse to handle faults from kernel mode,
treating these faults as if SIGBUS were always raised, causing the
kernel code to fail with EFAULT.

A future patch adds a knob allowing administrators to give some
processes the ability to create userfaultfd file objects only if they
pass UFFD_USER_MODE_ONLY, reducing the likelihood that these processes
will exploit userfaultfd's ability to delay kernel page faults to open
timing windows for future exploits.

Signed-off-by: Daniel Colascione <dancol@google.com>
---
 fs/userfaultfd.c                 | 5 ++++-
 include/uapi/linux/userfaultfd.h | 6 ++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 1123089c3d55..986d23b2cd33 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -389,6 +389,9 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason)
 
 	if (ctx->features & UFFD_FEATURE_SIGBUS)
 		goto out;
+	if ((vmf->flags & FAULT_FLAG_USER) == 0 &&
+	    ctx->flags & UFFD_USER_MODE_ONLY)
+		goto out;
 
 	/*
 	 * If it's already released don't get it. This avoids to loop
@@ -1959,7 +1962,7 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
 {
 	struct userfaultfd_ctx *ctx;
 	int fd;
-	static const int uffd_flags = UFFD_SECURE;
+	static const int uffd_flags = UFFD_SECURE | UFFD_USER_MODE_ONLY;
 
 	if (!sysctl_unprivileged_userfaultfd && !capable(CAP_SYS_PTRACE))
 		return -EPERM;
diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
index 12d7d40d7f25..eadd1497e7b5 100644
--- a/include/uapi/linux/userfaultfd.h
+++ b/include/uapi/linux/userfaultfd.h
@@ -239,4 +239,10 @@ struct uffdio_zeropage {
  * Create a userfaultfd with MAC security checks enabled.
  */
 #define UFFD_SECURE 1
+
+/*
+ * Create a userfaultfd that can handle page faults only in user mode.
+ */
+#define UFFD_USER_MODE_ONLY 2
+
 #endif /* _LINUX_USERFAULTFD_H */
-- 
2.23.0.700.g56cf767bdb-goog


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 6/7] Allow users to require UFFD_SECURE
  2019-10-12 19:15 [PATCH 0/7] Harden userfaultfd Daniel Colascione
                   ` (4 preceding siblings ...)
  2019-10-12 19:16 ` [PATCH 5/7] Let userfaultfd opt out of handling kernel-mode faults Daniel Colascione
@ 2019-10-12 19:16 ` Daniel Colascione
  2019-10-12 23:12   ` Andy Lutomirski
  2019-10-12 19:16 ` [PATCH 7/7] Add a new sysctl for limiting userfaultfd to user mode faults Daniel Colascione
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 44+ messages in thread
From: Daniel Colascione @ 2019-10-12 19:16 UTC (permalink / raw)
  To: linux-api, linux-kernel, lokeshgidra, dancol, nnk; +Cc: nosh, timmurray

This change adds 2 as an allowable value for
unprivileged_userfaultfd. (Previously, this sysctl could be either 0
or 1.) When unprivileged_userfaultfd is 2, users with CAP_SYS_PTRACE
may create userfaultfd with or without UFFD_SECURE, but users without
CAP_SYS_PTRACE must pass UFFD_SECURE to userfaultfd in order for the
system call to succeed, effectively forcing them to opt into
additional security checks.

Signed-off-by: Daniel Colascione <dancol@google.com>
---
 Documentation/admin-guide/sysctl/vm.rst | 6 ++++--
 fs/userfaultfd.c                        | 4 +++-
 kernel/sysctl.c                         | 2 +-
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 64aeee1009ca..6664eec7bd35 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -842,8 +842,10 @@ unprivileged_userfaultfd
 
 This flag controls whether unprivileged users can use the userfaultfd
 system calls.  Set this to 1 to allow unprivileged users to use the
-userfaultfd system calls, or set this to 0 to restrict userfaultfd to only
-privileged users (with SYS_CAP_PTRACE capability).
+userfaultfd system calls, or set this to 0 to restrict userfaultfd to
+only privileged users (with SYS_CAP_PTRACE capability).  If set to 2,
+unprivileged (non-SYS_CAP_PTRACE) users may use userfaultfd only if
+they pass the UFFD_SECURE, enabling MAC security checks.
 
 The default value is 1.
 
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 986d23b2cd33..aaed9347973e 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1963,8 +1963,10 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
 	struct userfaultfd_ctx *ctx;
 	int fd;
 	static const int uffd_flags = UFFD_SECURE | UFFD_USER_MODE_ONLY;
+	bool need_cap_check = sysctl_unprivileged_userfaultfd == 0 ||
+		(sysctl_unprivileged_userfaultfd == 2 && !(flags & UFFD_SECURE));
 
-	if (!sysctl_unprivileged_userfaultfd && !capable(CAP_SYS_PTRACE))
+	if (need_cap_check && !capable(CAP_SYS_PTRACE))
 		return -EPERM;
 
 	BUG_ON(!current->mm);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 00fcea236eba..fc98d5df344e 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1738,7 +1738,7 @@ static struct ctl_table vm_table[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec_minmax,
 		.extra1		= SYSCTL_ZERO,
-		.extra2		= SYSCTL_ONE,
+		.extra2		= &two,
 	},
 #endif
 	{ }
-- 
2.23.0.700.g56cf767bdb-goog


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 7/7] Add a new sysctl for limiting userfaultfd to user mode faults
  2019-10-12 19:15 [PATCH 0/7] Harden userfaultfd Daniel Colascione
                   ` (5 preceding siblings ...)
  2019-10-12 19:16 ` [PATCH 6/7] Allow users to require UFFD_SECURE Daniel Colascione
@ 2019-10-12 19:16 ` Daniel Colascione
  2019-10-16  0:02 ` [PATCH 0/7] Harden userfaultfd James Morris
  2019-11-15 15:09 ` Stephen Smalley
  8 siblings, 0 replies; 44+ messages in thread
From: Daniel Colascione @ 2019-10-12 19:16 UTC (permalink / raw)
  To: linux-api, linux-kernel, lokeshgidra, dancol, nnk; +Cc: nosh, timmurray

Add a new sysctl knob unprivileged_userfaultfd_user_mode_only.
This sysctl can be set to either zero or one. When zero (the default)
the system lets all users call userfaultfd with or without
UFFD_USER_MODE_ONLY, modulo other access controls. When
unprivileged_userfaultfd_user_mode_only is set to one, users without
CAP_SYS_PTRACE must pass UFFD_USER_MODE_ONLY to userfaultfd or the API
will fail with EPERM. This facility allows administrators to reduce
the likelihood that an attacker with access to userfaultfd can delay
faulting kernel code to widen timing windows for other exploits.

Signed-off-by: Daniel Colascione <dancol@google.com>
---
 Documentation/admin-guide/sysctl/vm.rst | 13 +++++++++++++
 fs/userfaultfd.c                        | 12 ++++++++++--
 include/linux/userfaultfd_k.h           |  1 +
 kernel/sysctl.c                         |  9 +++++++++
 4 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 6664eec7bd35..330fd82b3f4e 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -849,6 +849,19 @@ they pass the UFFD_SECURE, enabling MAC security checks.
 
 The default value is 1.
 
+unprivileged_userfaultfd_user_mode_only
+========================================
+
+This flag controls whether unprivileged users can use the userfaultfd
+system calls to handle page faults in kernel mode.  If set to zero,
+userfaultfd works with or without UFFD_USER_MODE_ONLY, modulo
+unprivileged_userfaultfd above.  If set to one, users without
+SYS_CAP_PTRACE must pass UFFD_USER_MODE_ONLY in order for userfaultfd
+to succeed.  Prohibiting use of userfaultfd for handling faults from
+kernel mode may make certain vulnerabilities more difficult
+to exploit.
+
+The default value is 0.
 
 user_reserve_kbytes
 ===================
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index aaed9347973e..02addd425ab7 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -29,6 +29,7 @@
 #include <linux/hugetlb.h>
 
 int sysctl_unprivileged_userfaultfd __read_mostly = 1;
+int sysctl_unprivileged_userfaultfd_user_mode_only __read_mostly = 0;
 
 static struct kmem_cache *userfaultfd_ctx_cachep __read_mostly;
 
@@ -1963,8 +1964,15 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
 	struct userfaultfd_ctx *ctx;
 	int fd;
 	static const int uffd_flags = UFFD_SECURE | UFFD_USER_MODE_ONLY;
-	bool need_cap_check = sysctl_unprivileged_userfaultfd == 0 ||
-		(sysctl_unprivileged_userfaultfd == 2 && !(flags & UFFD_SECURE));
+	bool need_cap_check = false;
+
+	if (sysctl_unprivileged_userfaultfd == 0 ||
+	    (sysctl_unprivileged_userfaultfd == 2 && !(flags & UFFD_SECURE)))
+		need_cap_check = true;
+
+	if (sysctl_unprivileged_userfaultfd_user_mode_only &&
+	    (flags & UFFD_USER_MODE_ONLY) == 0)
+		need_cap_check = true;
 
 	if (need_cap_check && !capable(CAP_SYS_PTRACE))
 		return -EPERM;
diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index 549c8b0cca52..efe14abb2dc8 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -29,6 +29,7 @@
 #define UFFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS)
 
 extern int sysctl_unprivileged_userfaultfd;
+extern int sysctl_unprivileged_userfaultfd_user_mode_only;
 
 extern const struct file_operations userfaultfd_fops;
 
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index fc98d5df344e..4f296676c0ac 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1740,6 +1740,15 @@ static struct ctl_table vm_table[] = {
 		.extra1		= SYSCTL_ZERO,
 		.extra2		= &two,
 	},
+	{
+		.procname	= "unprivileged_userfaultfd_user_mode_only",
+		.data		= &sysctl_unprivileged_userfaultfd_user_mode_only,
+		.maxlen		= sizeof(sysctl_unprivileged_userfaultfd_user_mode_only),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= SYSCTL_ONE,
+	},
 #endif
 	{ }
 };
-- 
2.23.0.700.g56cf767bdb-goog


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/7] Teach SELinux about a new userfaultfd class
  2019-10-12 19:15 ` [PATCH 4/7] Teach SELinux about a new userfaultfd class Daniel Colascione
@ 2019-10-12 23:08   ` Andy Lutomirski
  2019-10-13  0:11     ` Daniel Colascione
  0 siblings, 1 reply; 44+ messages in thread
From: Andy Lutomirski @ 2019-10-12 23:08 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: Linux API, LKML, lokeshgidra, Nick Kralevich, nosh, Tim Murray

On Sat, Oct 12, 2019 at 12:16 PM Daniel Colascione <dancol@google.com> wrote:
>
> Use the secure anonymous inode LSM hook we just added to let SELinux
> policy place restrictions on userfaultfd use. The create operation
> applies to processes creating new instances of these file objects;
> transfer between processes is covered by restrictions on read, write,
> and ioctl access already checked inside selinux_file_receive.

This is great, and I suspect we'll want it for things like SGX, too.
But the current design seems like it will make it essentially
impossible for SELinux to reference an anon_inode class whose
file_operations are in a module, and moving file_operations out of a
module would be nasty.

Could this instead be keyed off a new struct anon_inode_class, an
enum, or even just a string?

--Andy

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API.
  2019-10-12 19:15 ` [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API Daniel Colascione
@ 2019-10-12 23:10   ` Andy Lutomirski
  2019-10-13  0:51     ` Daniel Colascione
  0 siblings, 1 reply; 44+ messages in thread
From: Andy Lutomirski @ 2019-10-12 23:10 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: Linux API, LKML, lokeshgidra, Nick Kralevich, nosh, Tim Murray

On Sat, Oct 12, 2019 at 12:16 PM Daniel Colascione <dancol@google.com> wrote:
>
> The new secure flag makes userfaultfd use a new "secure" anonymous
> file object instead of the default one, letting security modules
> supervise userfaultfd use.
>
> Requiring that users pass a new flag lets us avoid changing the
> semantics for existing callers.

Is there any good reason not to make this be the default?

The only downside I can see is that it would increase the memory usage
of userfaultfd(), but that doesn't seem like such a big deal.  A
lighter-weight alternative would be to have a single inode shared by
all userfaultfd instances, which would require a somewhat different
internal anon_inode API.

In any event, I don't think that "make me visible to SELinux" should
be a choice that user code makes.

--Andy

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 6/7] Allow users to require UFFD_SECURE
  2019-10-12 19:16 ` [PATCH 6/7] Allow users to require UFFD_SECURE Daniel Colascione
@ 2019-10-12 23:12   ` Andy Lutomirski
  0 siblings, 0 replies; 44+ messages in thread
From: Andy Lutomirski @ 2019-10-12 23:12 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: Linux API, LKML, lokeshgidra, Nick Kralevich, nosh, Tim Murray

On Sat, Oct 12, 2019 at 12:16 PM Daniel Colascione <dancol@google.com> wrote:
>
> This change adds 2 as an allowable value for
> unprivileged_userfaultfd. (Previously, this sysctl could be either 0
> or 1.) When unprivileged_userfaultfd is 2, users with CAP_SYS_PTRACE
> may create userfaultfd with or without UFFD_SECURE, but users without
> CAP_SYS_PTRACE must pass UFFD_SECURE to userfaultfd in order for the
> system call to succeed, effectively forcing them to opt into
> additional security checks.

This patch can go away entirely if you make UFFD_SECURE automatic.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/7] Teach SELinux about a new userfaultfd class
  2019-10-12 23:08   ` Andy Lutomirski
@ 2019-10-13  0:11     ` Daniel Colascione
  2019-10-13  0:46       ` Andy Lutomirski
  0 siblings, 1 reply; 44+ messages in thread
From: Daniel Colascione @ 2019-10-13  0:11 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux API, LKML, Lokesh Gidra, Nick Kralevich, Nosh Minwalla, Tim Murray

On Sat, Oct 12, 2019 at 4:09 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> On Sat, Oct 12, 2019 at 12:16 PM Daniel Colascione <dancol@google.com> wrote:
> >
> > Use the secure anonymous inode LSM hook we just added to let SELinux
> > policy place restrictions on userfaultfd use. The create operation
> > applies to processes creating new instances of these file objects;
> > transfer between processes is covered by restrictions on read, write,
> > and ioctl access already checked inside selinux_file_receive.
>
> This is great, and I suspect we'll want it for things like SGX, too.
> But the current design seems like it will make it essentially
> impossible for SELinux to reference an anon_inode class whose
> file_operations are in a module, and moving file_operations out of a
> module would be nasty.
>
> Could this instead be keyed off a new struct anon_inode_class, an
> enum, or even just a string?

The new LSM hook already receives the string that callers pass to the
anon_inode APIs; modules can look at that instead of the fops if they
want. The reason to pass both the name and the fops through the hook
is to allow LSMs to match using fops comparison (which seems less
prone to breakage) when possible and rely on string matching when it
isn't.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/7] Teach SELinux about a new userfaultfd class
  2019-10-13  0:11     ` Daniel Colascione
@ 2019-10-13  0:46       ` Andy Lutomirski
  0 siblings, 0 replies; 44+ messages in thread
From: Andy Lutomirski @ 2019-10-13  0:46 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: Andy Lutomirski, Linux API, LKML, Lokesh Gidra, Nick Kralevich,
	Nosh Minwalla, Tim Murray

On Sat, Oct 12, 2019 at 5:12 PM Daniel Colascione <dancol@google.com> wrote:
>
> On Sat, Oct 12, 2019 at 4:09 PM Andy Lutomirski <luto@kernel.org> wrote:
> >
> > On Sat, Oct 12, 2019 at 12:16 PM Daniel Colascione <dancol@google.com> wrote:
> > >
> > > Use the secure anonymous inode LSM hook we just added to let SELinux
> > > policy place restrictions on userfaultfd use. The create operation
> > > applies to processes creating new instances of these file objects;
> > > transfer between processes is covered by restrictions on read, write,
> > > and ioctl access already checked inside selinux_file_receive.
> >
> > This is great, and I suspect we'll want it for things like SGX, too.
> > But the current design seems like it will make it essentially
> > impossible for SELinux to reference an anon_inode class whose
> > file_operations are in a module, and moving file_operations out of a
> > module would be nasty.
> >
> > Could this instead be keyed off a new struct anon_inode_class, an
> > enum, or even just a string?
>
> The new LSM hook already receives the string that callers pass to the
> anon_inode APIs; modules can look at that instead of the fops if they
> want. The reason to pass both the name and the fops through the hook
> is to allow LSMs to match using fops comparison (which seems less
> prone to breakage) when possible and rely on string matching when it
> isn't.

I suppose that whoever makes the first module that wants to use this
mechanism can have the fun task of reworking it.  There's nothing
user-visible here that would make it hard to change in the future.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API.
  2019-10-12 23:10   ` Andy Lutomirski
@ 2019-10-13  0:51     ` Daniel Colascione
  2019-10-13  1:14       ` Andy Lutomirski
  0 siblings, 1 reply; 44+ messages in thread
From: Daniel Colascione @ 2019-10-13  0:51 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linux API, LKML, Lokesh Gidra, Nick Kralevich, Nosh Minwalla, Tim Murray

On Sat, Oct 12, 2019 at 4:10 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> On Sat, Oct 12, 2019 at 12:16 PM Daniel Colascione <dancol@google.com> wrote:
> >
> > The new secure flag makes userfaultfd use a new "secure" anonymous
> > file object instead of the default one, letting security modules
> > supervise userfaultfd use.
> >
> > Requiring that users pass a new flag lets us avoid changing the
> > semantics for existing callers.
>
> Is there any good reason not to make this be the default?
>
>
> The only downside I can see is that it would increase the memory usage
> of userfaultfd(), but that doesn't seem like such a big deal.  A
> lighter-weight alternative would be to have a single inode shared by
> all userfaultfd instances, which would require a somewhat different
> internal anon_inode API.

I'd also prefer to just make SELinux use mandatory, but there's a
nasty interaction with UFFD_EVENT_FORK. Adding a new UFFD_SECURE mode
which blocks UFFD_EVENT_FORK sidesteps this problem. Maybe you know a
better way to deal with it.

Right now, when a process with a UFFD-managed VMA using
UFFD_EVENT_FORK forks, we make a new userfaultfd_ctx out of thin air
and enqueue it on the message queue for the parent process. When we
dequeue that context, we get to resolve_userfault_fork, which makes up
a new UFFD file object out of thin air in the context of the reading
process. Following normal SELinux rules, the SID attached to that new
file object would be the task SID of the process *reading* the fork
event, not the SID of the new fork child. That seems wrong, because
the label we give to the UFFD should correspond to the label of the
process that UFFD controls.

To try to solve this problem, we can move the file object creation to
the fork child and enqueue the file object itself instead of just the
userfaultfd_ctx, treating the dequeue as a file-descriptor-receive
operation just like a recvmsg of an AF_UNIX socket with SCM_RIGHTS.
(This approach seems more elegant anyway, since it reflects what's
actually going on.) The trouble the early-file-object-creation
approach is that the fork child may not be allowed to create UFFD file
objects on its own and an LSM can't tell the difference between
UFFD_EVENT_FORK handling creating the file object and the fork child
just calling userfaultfd(), meaning an LSM could veto the creation of
the file object for the fork event. We can't just create a
non-ANON_INODE_SECURE file object instead: that would defeat the whole
purpose of supervising UFFD using SELinux.

But maybe we can go further: let's separate authentication and
authorization, as we do in other LSM hooks. Let's split my
inode_init_security_anon into two hooks, inode_init_security_anon and
inode_create_anon. We'd define the former to just initialize the file
object's security information --- in the SELinux case, figuring out
its class and SID --- and define the latter to answer the yes/no
question of whether a particular anonymous inode creation should be
allowed. Normally, anon_inode_getfile2() would just call both hooks.
We'd add another anon_inode_getfd flag, ANON_INODE_SKIP_AUTHORIZATION
or something, that would tell anon_inode_getfile2() to skip calling
the authorization hook, effectively making the creation always
succeed. We can then make the UFFD code pass
ANON_INODE_SKIP_AUTHORIZATION when it's creating a file object in the
fork child while creating UFFD_EVENT_FORK messages.

Granted, UFFD fork processing doesn't actually occur in the fork
child, but in copy_mm, in the parent --- but the right thing should
happen anyway, right?

I'm open to suggestions. In the meantime, I figured we'd just define a
UFFD_SECURE and make it incompatible with UFFD_EVENT_FORK.

> In any event, I don't think that "make me visible to SELinux" should
> be a choice that user code makes.

Right. The new unprivileged_userfaultfd setting is ugly, but it at
least removes the ability of unprivileged users to opt out of SELinux
supervision.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API.
  2019-10-13  0:51     ` Daniel Colascione
@ 2019-10-13  1:14       ` Andy Lutomirski
  2019-10-13  1:38         ` Daniel Colascione
                           ` (3 more replies)
  0 siblings, 4 replies; 44+ messages in thread
From: Andy Lutomirski @ 2019-10-13  1:14 UTC (permalink / raw)
  To: Daniel Colascione, Linus Torvalds, Jann Horn, Andrea Arcangeli,
	Pavel Emelyanov
  Cc: Andy Lutomirski, Linux API, LKML, Lokesh Gidra, Nick Kralevich,
	Nosh Minwalla, Tim Murray

[adding more people because this is going to be an ABI break, sigh]

On Sat, Oct 12, 2019 at 5:52 PM Daniel Colascione <dancol@google.com> wrote:
>
> On Sat, Oct 12, 2019 at 4:10 PM Andy Lutomirski <luto@kernel.org> wrote:
> >
> > On Sat, Oct 12, 2019 at 12:16 PM Daniel Colascione <dancol@google.com> wrote:
> > >
> > > The new secure flag makes userfaultfd use a new "secure" anonymous
> > > file object instead of the default one, letting security modules
> > > supervise userfaultfd use.
> > >
> > > Requiring that users pass a new flag lets us avoid changing the
> > > semantics for existing callers.
> >
> > Is there any good reason not to make this be the default?
> >
> >
> > The only downside I can see is that it would increase the memory usage
> > of userfaultfd(), but that doesn't seem like such a big deal.  A
> > lighter-weight alternative would be to have a single inode shared by
> > all userfaultfd instances, which would require a somewhat different
> > internal anon_inode API.
>
> I'd also prefer to just make SELinux use mandatory, but there's a
> nasty interaction with UFFD_EVENT_FORK. Adding a new UFFD_SECURE mode
> which blocks UFFD_EVENT_FORK sidesteps this problem. Maybe you know a
> better way to deal with it.

...

> But maybe we can go further: let's separate authentication and
> authorization, as we do in other LSM hooks. Let's split my
> inode_init_security_anon into two hooks, inode_init_security_anon and
> inode_create_anon. We'd define the former to just initialize the file
> object's security information --- in the SELinux case, figuring out
> its class and SID --- and define the latter to answer the yes/no
> question of whether a particular anonymous inode creation should be
> allowed. Normally, anon_inode_getfile2() would just call both hooks.
> We'd add another anon_inode_getfd flag, ANON_INODE_SKIP_AUTHORIZATION
> or something, that would tell anon_inode_getfile2() to skip calling
> the authorization hook, effectively making the creation always
> succeed. We can then make the UFFD code pass
> ANON_INODE_SKIP_AUTHORIZATION when it's creating a file object in the
> fork child while creating UFFD_EVENT_FORK messages.

That sounds like an improvement.  Or maybe just teach SELinux that
this particular fd creation is actually making an anon_inode that is a
child of an existing anon inode and that the context should be copied
or whatever SELinux wants to do.  Like this, maybe:

static int resolve_userfault_fork(struct userfaultfd_ctx *ctx,
                                  struct userfaultfd_ctx *new,
                                  struct uffd_msg *msg)
{
        int fd;

Change this:

        fd = anon_inode_getfd("[userfaultfd]", &userfaultfd_fops, new,
                              O_RDWR | (new->flags & UFFD_SHARED_FCNTL_FLAGS));

to something like:

      fd = anon_inode_make_child_fd(..., ctx->inode, ...);

where ctx->inode is the one context's inode.

*** HOWEVER *** !!!

Now that you've pointed this mechanism out, it is utterly and
completely broken and should be removed from the kernel outright or at
least severely restricted.  A .read implementation MUST NOT ACT ON THE
CALLING TASK.  Ever.  Just imagine the effect of passing a userfaultfd
as stdin to a setuid program.

So I think the right solution might be to attempt to *remove*
UFFD_EVENT_FORK.  Maybe the solution is to say that, unless the
creator of a userfaultfd() has global CAP_SYS_ADMIN, then it cannot
use UFFD_FEATURE_EVENT_FORK) and print a warning (once) when
UFFD_FEATURE_EVENT_FORK is allowed.  And, after some suitable
deprecation period, just remove it.  If it's genuinely useful, it
needs an entirely new API based on ioctl() or a syscall.  Or even
recvmsg() :)

And UFFD_SECURE should just become automatic, since you don't have a
problem any more. :-p

--Andy

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API.
  2019-10-13  1:14       ` Andy Lutomirski
@ 2019-10-13  1:38         ` Daniel Colascione
  2019-10-14 16:04         ` Jann Horn
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 44+ messages in thread
From: Daniel Colascione @ 2019-10-13  1:38 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Jann Horn, Andrea Arcangeli, Pavel Emelyanov,
	Linux API, LKML, Lokesh Gidra, Nick Kralevich, Nosh Minwalla,
	Tim Murray

On Sat, Oct 12, 2019 at 6:14 PM Andy Lutomirski <luto@kernel.org> wrote:
>
..
>
> > But maybe we can go further: let's separate authentication and
> > authorization, as we do in other LSM hooks. Let's split my
> > inode_init_security_anon into two hooks, inode_init_security_anon and
> > inode_create_anon. We'd define the former to just initialize the file
> > object's security information --- in the SELinux case, figuring out
> > its class and SID --- and define the latter to answer the yes/no
> > question of whether a particular anonymous inode creation should be
> > allowed. Normally, anon_inode_getfile2() would just call both hooks.
> > We'd add another anon_inode_getfd flag, ANON_INODE_SKIP_AUTHORIZATION
> > or something, that would tell anon_inode_getfile2() to skip calling
> > the authorization hook, effectively making the creation always
> > succeed. We can then make the UFFD code pass
> > ANON_INODE_SKIP_AUTHORIZATION when it's creating a file object in the
> > fork child while creating UFFD_EVENT_FORK messages.
>
> That sounds like an improvement.  Or maybe just teach SELinux that
> this particular fd creation is actually making an anon_inode that is a
> child of an existing anon inode and that the context should be copied
> or whatever SELinux wants to do.  Like this, maybe:
>
> static int resolve_userfault_fork(struct userfaultfd_ctx *ctx,
>                                   struct userfaultfd_ctx *new,
>                                   struct uffd_msg *msg)
> {
>         int fd;
>
> Change this:
>
>         fd = anon_inode_getfd("[userfaultfd]", &userfaultfd_fops, new,
>                               O_RDWR | (new->flags & UFFD_SHARED_FCNTL_FLAGS));
>
> to something like:
>
>       fd = anon_inode_make_child_fd(..., ctx->inode, ...);
>
> where ctx->inode is the one context's inode.

Yeah. I figured we could just add a special-purpose hook for this
case. Having a special hook for this one case feels ugly though, and
at copy_mm time, we don't have a PID for the new child yet --- I don't
know whether LSMs would care about that. But maybe this is one of
those "doctor, it hurts when I do this!" situations and this child
process difficulty is just a hint that some other design might work
better.

> Now that you've pointed this mechanism out, it is utterly and
> completely broken and should be removed from the kernel outright or at
> least severely restricted.  A .read implementation MUST NOT ACT ON THE
> CALLING TASK.  Ever.  Just imagine the effect of passing a userfaultfd
> as stdin to a setuid program.
>
> So I think the right solution might be to attempt to *remove*
> UFFD_EVENT_FORK.  Maybe the solution is to say that, unless the
> creator of a userfaultfd() has global CAP_SYS_ADMIN, then it cannot
> use UFFD_FEATURE_EVENT_FORK) and print a warning (once) when
> UFFD_FEATURE_EVENT_FORK is allowed.  And, after some suitable
> deprecation period, just remove it.  If it's genuinely useful, it
> needs an entirely new API based on ioctl() or a syscall.  Or even
> recvmsg() :)

IMHO, userfaultfd should have been a datagram socket from the start.
As you point out, it's a good fit for the UFFD protocol, which
involves FD passing and a fixed message size.

> And UFFD_SECURE should just become automatic, since you don't have a
> problem any more. :-p

Agreed. I'll wait to hear what everyone else has to say.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/7] Add a concept of a "secure" anonymous file
  2019-10-12 19:15 ` [PATCH 2/7] Add a concept of a "secure" anonymous file Daniel Colascione
@ 2019-10-14  3:01   ` kbuild test robot
  2019-10-15  8:08   ` Christoph Hellwig
  1 sibling, 0 replies; 44+ messages in thread
From: kbuild test robot @ 2019-10-14  3:01 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: kbuild-all, linux-api, linux-kernel, lokeshgidra, dancol, nnk,
	nosh, timmurray

[-- Attachment #1: Type: text/plain, Size: 1984 bytes --]

Hi Daniel,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[cannot apply to v5.4-rc2 next-20191011]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Daniel-Colascione/Harden-userfaultfd/20191014-102741
config: i386-tinyconfig (attached as .config)
compiler: gcc-7 (Debian 7.4.0-13) 7.4.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   fs/anon_inodes.c: In function 'anon_inode_make_secure_inode':
>> fs/anon_inodes.c:67:10: error: implicit declaration of function 'security_inode_init_security_anon'; did you mean 'security_inode_init_security'? [-Werror=implicit-function-declaration]
     error = security_inode_init_security_anon(inode, name, fops);
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
             security_inode_init_security
   cc1: some warnings being treated as errors

vim +67 fs/anon_inodes.c

    57	
    58	struct inode *anon_inode_make_secure_inode(const char *name,
    59						   const struct file_operations *fops)
    60	{
    61		struct inode *inode;
    62		int error;
    63		inode = alloc_anon_inode(anon_inode_mnt->mnt_sb);
    64		if (IS_ERR(inode))
    65			return ERR_PTR(PTR_ERR(inode));
    66		inode->i_flags &= ~S_PRIVATE;
  > 67		error =	security_inode_init_security_anon(inode, name, fops);
    68		if (error) {
    69			iput(inode);
    70			return ERR_PTR(error);
    71		}
    72		return inode;
    73	}
    74	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 7207 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/7] Add a new flags-accepting interface for anonymous inodes
  2019-10-12 19:15 ` [PATCH 1/7] Add a new flags-accepting interface for anonymous inodes Daniel Colascione
@ 2019-10-14  4:26   ` kbuild test robot
  2019-10-14 15:38   ` Jann Horn
  2019-10-15  8:08   ` Christoph Hellwig
  2 siblings, 0 replies; 44+ messages in thread
From: kbuild test robot @ 2019-10-14  4:26 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: kbuild-all, linux-api, linux-kernel, lokeshgidra, dancol, nnk,
	nosh, timmurray

[-- Attachment #1: Type: text/plain, Size: 18565 bytes --]

Hi Daniel,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[cannot apply to v5.4-rc3 next-20191011]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Daniel-Colascione/Harden-userfaultfd/20191014-102741
reproduce: make htmldocs

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   Warning: The Sphinx 'sphinx_rtd_theme' HTML theme was not found. Make sure you have the theme installed to produce pretty HTML output. Falling back to the default theme.
   WARNING: dot(1) not found, for better output quality install graphviz from http://www.graphviz.org
   WARNING: convert(1) not found, for SVG to PDF conversion install ImageMagick (https://www.imagemagick.org)
   Error: Cannot open file drivers/dma-buf/reservation.c
   Error: Cannot open file drivers/dma-buf/reservation.c
   Error: Cannot open file drivers/dma-buf/reservation.c
   Error: Cannot open file include/linux/reservation.h
   Error: Cannot open file include/linux/reservation.h
   include/linux/lsm_hooks.h:1822: warning: Function parameter or member 'quotactl' not described in 'security_list_options'
   include/linux/lsm_hooks.h:1822: warning: Function parameter or member 'quota_on' not described in 'security_list_options'
   include/linux/lsm_hooks.h:1822: warning: Function parameter or member 'sb_free_mnt_opts' not described in 'security_list_options'
   include/linux/lsm_hooks.h:1822: warning: Function parameter or member 'sb_eat_lsm_opts' not described in 'security_list_options'
   include/linux/lsm_hooks.h:1822: warning: Function parameter or member 'sb_kern_mount' not described in 'security_list_options'
   include/linux/lsm_hooks.h:1822: warning: Function parameter or member 'sb_show_options' not described in 'security_list_options'
   include/linux/lsm_hooks.h:1822: warning: Function parameter or member 'sb_add_mnt_opt' not described in 'security_list_options'
   include/linux/lsm_hooks.h:1822: warning: Function parameter or member 'd_instantiate' not described in 'security_list_options'
   include/linux/lsm_hooks.h:1822: warning: Function parameter or member 'getprocattr' not described in 'security_list_options'
   include/linux/lsm_hooks.h:1822: warning: Function parameter or member 'setprocattr' not described in 'security_list_options'
   include/linux/lsm_hooks.h:1822: warning: Function parameter or member 'locked_down' not described in 'security_list_options'
   include/linux/regulator/machine.h:196: warning: Function parameter or member 'max_uV_step' not described in 'regulation_constraints'
   include/linux/regulator/driver.h:223: warning: Function parameter or member 'resume' not described in 'regulator_ops'
   fs/fs-writeback.c:918: warning: Excess function parameter 'nr_pages' description in 'cgroup_writeback_by_id'
>> fs/anon_inodes.c:139: warning: Function parameter or member 'anon_inode_flags' not described in 'anon_inode_getfd2'
   fs/direct-io.c:258: warning: Excess function parameter 'offset' description in 'dio_complete'
   fs/libfs.c:501: warning: Excess function parameter 'available' description in 'simple_write_end'
   fs/posix_acl.c:647: warning: Function parameter or member 'inode' not described in 'posix_acl_update_mode'
   fs/posix_acl.c:647: warning: Function parameter or member 'mode_p' not described in 'posix_acl_update_mode'
   fs/posix_acl.c:647: warning: Function parameter or member 'acl' not described in 'posix_acl_update_mode'
   include/linux/spi/spi.h:190: warning: Function parameter or member 'driver_override' not described in 'spi_device'
   drivers/usb/typec/bus.c:1: warning: 'typec_altmode_unregister_driver' not found
   drivers/usb/typec/bus.c:1: warning: 'typec_altmode_register_driver' not found
   drivers/usb/typec/class.c:1: warning: 'typec_altmode_register_notifier' not found
   drivers/usb/typec/class.c:1: warning: 'typec_altmode_unregister_notifier' not found
   include/linux/w1.h:277: warning: Function parameter or member 'of_match_table' not described in 'w1_family'
   drivers/gpio/gpiolib-of.c:92: warning: Excess function parameter 'dev' description in 'of_gpio_need_valid_mask'
   include/linux/i2c.h:337: warning: Function parameter or member 'init_irq' not described in 'i2c_client'
   kernel/dma/coherent.c:1: warning: no structured comments found
   include/linux/input/sparse-keymap.h:43: warning: Function parameter or member 'sw' not described in 'key_entry'
   include/linux/skbuff.h:888: warning: Function parameter or member 'dev_scratch' not described in 'sk_buff'
   include/linux/skbuff.h:888: warning: Function parameter or member 'list' not described in 'sk_buff'
   include/linux/skbuff.h:888: warning: Function parameter or member 'ip_defrag_offset' not described in 'sk_buff'
   include/linux/skbuff.h:888: warning: Function parameter or member 'skb_mstamp_ns' not described in 'sk_buff'
   include/linux/skbuff.h:888: warning: Function parameter or member '__cloned_offset' not described in 'sk_buff'
   include/linux/skbuff.h:888: warning: Function parameter or member 'head_frag' not described in 'sk_buff'
   include/linux/skbuff.h:888: warning: Function parameter or member '__pkt_type_offset' not described in 'sk_buff'
   include/linux/skbuff.h:888: warning: Function parameter or member 'encapsulation' not described in 'sk_buff'
   include/linux/skbuff.h:888: warning: Function parameter or member 'encap_hdr_csum' not described in 'sk_buff'
   include/linux/skbuff.h:888: warning: Function parameter or member 'csum_valid' not described in 'sk_buff'
   include/linux/skbuff.h:888: warning: Function parameter or member '__pkt_vlan_present_offset' not described in 'sk_buff'
   include/linux/skbuff.h:888: warning: Function parameter or member 'vlan_present' not described in 'sk_buff'
   include/linux/skbuff.h:888: warning: Function parameter or member 'csum_complete_sw' not described in 'sk_buff'
   include/linux/skbuff.h:888: warning: Function parameter or member 'csum_level' not described in 'sk_buff'
   include/linux/skbuff.h:888: warning: Function parameter or member 'inner_protocol_type' not described in 'sk_buff'
   include/linux/skbuff.h:888: warning: Function parameter or member 'remcsum_offload' not described in 'sk_buff'
   include/linux/skbuff.h:888: warning: Function parameter or member 'sender_cpu' not described in 'sk_buff'
   include/linux/skbuff.h:888: warning: Function parameter or member 'reserved_tailroom' not described in 'sk_buff'
   include/linux/skbuff.h:888: warning: Function parameter or member 'inner_ipproto' not described in 'sk_buff'
   include/net/sock.h:233: warning: Function parameter or member 'skc_addrpair' not described in 'sock_common'
   include/net/sock.h:233: warning: Function parameter or member 'skc_portpair' not described in 'sock_common'
   include/net/sock.h:233: warning: Function parameter or member 'skc_ipv6only' not described in 'sock_common'
   include/net/sock.h:233: warning: Function parameter or member 'skc_net_refcnt' not described in 'sock_common'
   include/net/sock.h:233: warning: Function parameter or member 'skc_v6_daddr' not described in 'sock_common'
   include/net/sock.h:233: warning: Function parameter or member 'skc_v6_rcv_saddr' not described in 'sock_common'
   include/net/sock.h:233: warning: Function parameter or member 'skc_cookie' not described in 'sock_common'
   include/net/sock.h:233: warning: Function parameter or member 'skc_listener' not described in 'sock_common'
   include/net/sock.h:233: warning: Function parameter or member 'skc_tw_dr' not described in 'sock_common'
   include/net/sock.h:233: warning: Function parameter or member 'skc_rcv_wnd' not described in 'sock_common'
   include/net/sock.h:233: warning: Function parameter or member 'skc_tw_rcv_nxt' not described in 'sock_common'
   include/net/sock.h:515: warning: Function parameter or member 'sk_rx_skb_cache' not described in 'sock'
   include/net/sock.h:515: warning: Function parameter or member 'sk_wq_raw' not described in 'sock'
   include/net/sock.h:515: warning: Function parameter or member 'tcp_rtx_queue' not described in 'sock'
   include/net/sock.h:515: warning: Function parameter or member 'sk_tx_skb_cache' not described in 'sock'
   include/net/sock.h:515: warning: Function parameter or member 'sk_route_forced_caps' not described in 'sock'
   include/net/sock.h:515: warning: Function parameter or member 'sk_txtime_report_errors' not described in 'sock'
   include/net/sock.h:515: warning: Function parameter or member 'sk_validate_xmit_skb' not described in 'sock'
   include/net/sock.h:515: warning: Function parameter or member 'sk_bpf_storage' not described in 'sock'
   include/net/sock.h:2439: warning: Function parameter or member 'tcp_rx_skb_cache_key' not described in 'DECLARE_STATIC_KEY_FALSE'
   include/net/sock.h:2439: warning: Excess function parameter 'sk' description in 'DECLARE_STATIC_KEY_FALSE'
   include/net/sock.h:2439: warning: Excess function parameter 'skb' description in 'DECLARE_STATIC_KEY_FALSE'
   include/linux/netdevice.h:2053: warning: Function parameter or member 'gso_partial_features' not described in 'net_device'
   include/linux/netdevice.h:2053: warning: Function parameter or member 'l3mdev_ops' not described in 'net_device'
   include/linux/netdevice.h:2053: warning: Function parameter or member 'xfrmdev_ops' not described in 'net_device'
   include/linux/netdevice.h:2053: warning: Function parameter or member 'tlsdev_ops' not described in 'net_device'
   include/linux/netdevice.h:2053: warning: Function parameter or member 'name_assign_type' not described in 'net_device'
   include/linux/netdevice.h:2053: warning: Function parameter or member 'ieee802154_ptr' not described in 'net_device'
   include/linux/netdevice.h:2053: warning: Function parameter or member 'mpls_ptr' not described in 'net_device'
   include/linux/netdevice.h:2053: warning: Function parameter or member 'xdp_prog' not described in 'net_device'
   include/linux/netdevice.h:2053: warning: Function parameter or member 'gro_flush_timeout' not described in 'net_device'
   include/linux/netdevice.h:2053: warning: Function parameter or member 'nf_hooks_ingress' not described in 'net_device'
   include/linux/netdevice.h:2053: warning: Function parameter or member '____cacheline_aligned_in_smp' not described in 'net_device'
   include/linux/netdevice.h:2053: warning: Function parameter or member 'qdisc_hash' not described in 'net_device'
   include/linux/netdevice.h:2053: warning: Function parameter or member 'xps_cpus_map' not described in 'net_device'
   include/linux/netdevice.h:2053: warning: Function parameter or member 'xps_rxqs_map' not described in 'net_device'
   include/linux/phylink.h:56: warning: Function parameter or member '__ETHTOOL_DECLARE_LINK_MODE_MASK(advertising' not described in 'phylink_link_state'
   include/linux/phylink.h:56: warning: Function parameter or member '__ETHTOOL_DECLARE_LINK_MODE_MASK(lp_advertising' not described in 'phylink_link_state'
   drivers/net/phy/phylink.c:595: warning: Function parameter or member 'config' not described in 'phylink_create'
   drivers/net/phy/phylink.c:595: warning: Excess function parameter 'ndev' description in 'phylink_create'
   lib/genalloc.c:1: warning: 'gen_pool_add_virt' not found
   lib/genalloc.c:1: warning: 'gen_pool_alloc' not found
   lib/genalloc.c:1: warning: 'gen_pool_free' not found
   lib/genalloc.c:1: warning: 'gen_pool_alloc_algo' not found
   include/linux/bitmap.h:341: warning: Function parameter or member 'nbits' not described in 'bitmap_or_equal'
   include/linux/rculist.h:374: warning: Excess function parameter 'cond' description in 'list_for_each_entry_rcu'
   include/linux/rculist.h:651: warning: Excess function parameter 'cond' description in 'hlist_for_each_entry_rcu'
   mm/util.c:1: warning: 'get_user_pages_fast' not found
   mm/slab.c:4215: warning: Function parameter or member 'objp' not described in '__ksize'
   drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c:335: warning: Excess function parameter 'dev' description in 'amdgpu_gem_prime_export'
   drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c:336: warning: Excess function parameter 'dev' description in 'amdgpu_gem_prime_export'
   drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c:142: warning: Function parameter or member 'blockable' not described in 'amdgpu_mn_read_lock'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:347: warning: cannot understand function prototype: 'struct amdgpu_vm_pt_cursor '
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:348: warning: cannot understand function prototype: 'struct amdgpu_vm_pt_cursor '
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:494: warning: Function parameter or member 'start' not described in 'amdgpu_vm_pt_first_dfs'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:546: warning: Function parameter or member 'adev' not described in 'for_each_amdgpu_vm_pt_dfs_safe'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:546: warning: Function parameter or member 'vm' not described in 'for_each_amdgpu_vm_pt_dfs_safe'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:546: warning: Function parameter or member 'start' not described in 'for_each_amdgpu_vm_pt_dfs_safe'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:546: warning: Function parameter or member 'cursor' not described in 'for_each_amdgpu_vm_pt_dfs_safe'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:546: warning: Function parameter or member 'entry' not described in 'for_each_amdgpu_vm_pt_dfs_safe'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:821: warning: Function parameter or member 'level' not described in 'amdgpu_vm_bo_param'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1283: warning: Function parameter or member 'params' not described in 'amdgpu_vm_update_flags'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1283: warning: Function parameter or member 'bo' not described in 'amdgpu_vm_update_flags'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1283: warning: Function parameter or member 'level' not described in 'amdgpu_vm_update_flags'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1283: warning: Function parameter or member 'pe' not described in 'amdgpu_vm_update_flags'
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1283: warning: Function parameter or member 'addr' not described in 'amdgpu_vm_update_flags'

vim +139 fs/anon_inodes.c

562787a5c32ccd Davide Libenzi    2009-09-22  119  
562787a5c32ccd Davide Libenzi    2009-09-22  120  /**
428e297f7ee416 Daniel Colascione 2019-10-12  121   * anon_inode_getfd2 - creates a new file instance by hooking it up to an
562787a5c32ccd Davide Libenzi    2009-09-22  122   *                     anonymous inode, and a dentry that describe the "class"
562787a5c32ccd Davide Libenzi    2009-09-22  123   *                     of the file
562787a5c32ccd Davide Libenzi    2009-09-22  124   *
562787a5c32ccd Davide Libenzi    2009-09-22  125   * @name:    [in]    name of the "class" of the new file
562787a5c32ccd Davide Libenzi    2009-09-22  126   * @fops:    [in]    file operations for the new file
562787a5c32ccd Davide Libenzi    2009-09-22  127   * @priv:    [in]    private data for the new file (will be file's private_data)
562787a5c32ccd Davide Libenzi    2009-09-22  128   * @flags:   [in]    flags
562787a5c32ccd Davide Libenzi    2009-09-22  129   *
562787a5c32ccd Davide Libenzi    2009-09-22  130   * Creates a new file by hooking it on a single inode. This is useful for files
562787a5c32ccd Davide Libenzi    2009-09-22  131   * that do not need to have a full-fledged inode in order to operate correctly.
562787a5c32ccd Davide Libenzi    2009-09-22  132   * All the files created with anon_inode_getfd() will share a single inode,
562787a5c32ccd Davide Libenzi    2009-09-22  133   * hence saving memory and avoiding code duplication for the file/inode/dentry
562787a5c32ccd Davide Libenzi    2009-09-22  134   * setup.  Returns new descriptor or an error code.
562787a5c32ccd Davide Libenzi    2009-09-22  135   */
428e297f7ee416 Daniel Colascione 2019-10-12  136  int anon_inode_getfd2(const char *name, const struct file_operations *fops,
428e297f7ee416 Daniel Colascione 2019-10-12  137  		      void *priv, int flags, int anon_inode_flags)
562787a5c32ccd Davide Libenzi    2009-09-22  138  {
562787a5c32ccd Davide Libenzi    2009-09-22 @139  	int error, fd;
562787a5c32ccd Davide Libenzi    2009-09-22  140  	struct file *file;
562787a5c32ccd Davide Libenzi    2009-09-22  141  
562787a5c32ccd Davide Libenzi    2009-09-22  142  	error = get_unused_fd_flags(flags);
562787a5c32ccd Davide Libenzi    2009-09-22  143  	if (error < 0)
562787a5c32ccd Davide Libenzi    2009-09-22  144  		return error;
562787a5c32ccd Davide Libenzi    2009-09-22  145  	fd = error;
562787a5c32ccd Davide Libenzi    2009-09-22  146  
428e297f7ee416 Daniel Colascione 2019-10-12  147  	file = anon_inode_getfile2(name, fops, priv, flags, anon_inode_flags);
562787a5c32ccd Davide Libenzi    2009-09-22  148  	if (IS_ERR(file)) {
562787a5c32ccd Davide Libenzi    2009-09-22  149  		error = PTR_ERR(file);
562787a5c32ccd Davide Libenzi    2009-09-22  150  		goto err_put_unused_fd;
562787a5c32ccd Davide Libenzi    2009-09-22  151  	}
5dc8bf8132d59c Davide Libenzi    2007-05-10  152  	fd_install(fd, file);
5dc8bf8132d59c Davide Libenzi    2007-05-10  153  
2030a42cecd4dd Al Viro           2008-02-23  154  	return fd;
5dc8bf8132d59c Davide Libenzi    2007-05-10  155  
5dc8bf8132d59c Davide Libenzi    2007-05-10  156  err_put_unused_fd:
5dc8bf8132d59c Davide Libenzi    2007-05-10  157  	put_unused_fd(fd);
5dc8bf8132d59c Davide Libenzi    2007-05-10  158  	return error;
5dc8bf8132d59c Davide Libenzi    2007-05-10  159  }
d6d281684913da Avi Kivity        2007-06-28  160  EXPORT_SYMBOL_GPL(anon_inode_getfd);
428e297f7ee416 Daniel Colascione 2019-10-12  161  EXPORT_SYMBOL_GPL(anon_inode_getfd2);
5dc8bf8132d59c Davide Libenzi    2007-05-10  162  

:::::: The code at line 139 was first introduced by commit
:::::: 562787a5c32ccdf182de27793a83a9f2ee86cd77 anonfd: split interface into file creation and install

:::::: TO: Davide Libenzi <davidel@xmailserver.org>
:::::: CC: Linus Torvalds <torvalds@linux-foundation.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 7279 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/7] Add a new flags-accepting interface for anonymous inodes
  2019-10-12 19:15 ` [PATCH 1/7] Add a new flags-accepting interface for anonymous inodes Daniel Colascione
  2019-10-14  4:26   ` kbuild test robot
@ 2019-10-14 15:38   ` Jann Horn
  2019-10-14 18:15     ` Daniel Colascione
  2019-10-15  8:08   ` Christoph Hellwig
  2 siblings, 1 reply; 44+ messages in thread
From: Jann Horn @ 2019-10-14 15:38 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: Linux API, kernel list, Lokesh Gidra, Nick Kralevich, nosh, Tim Murray

On Sat, Oct 12, 2019 at 9:16 PM Daniel Colascione <dancol@google.com> wrote:
> Add functions forwarding from the old names to the new ones so we
> don't need to change any callers.

This patch does more than the commit message says; it also refactors
the body of the function. (I would've moved that refactoring over into
patch 2, but I guess this works, too.)

[...]
> -struct file *anon_inode_getfile(const char *name,
> -                               const struct file_operations *fops,
> -                               void *priv, int flags)
> +struct file *anon_inode_getfile2(const char *name,
> +                                const struct file_operations *fops,
> +                                void *priv, int flags, int anon_inode_flags)

(AFAIK, normal kernel style is to slap a "__" prefix in front of the
function name instead of appending a digit, but I guess it doesn't
really matter.)

>  {
> +       struct inode *inode;
>         struct file *file;
>
> -       if (IS_ERR(anon_inode_inode))
> -               return ERR_PTR(-ENODEV);
> -
> -       if (fops->owner && !try_module_get(fops->owner))
> -               return ERR_PTR(-ENOENT);
> +       if (anon_inode_flags)
> +               return ERR_PTR(-EINVAL);
>
> +       inode = anon_inode_inode;
> +       if (IS_ERR(inode))
> +               return ERR_PTR(-ENODEV);
>         /*
> -        * We know the anon_inode inode count is always greater than zero,
> -        * so ihold() is safe.
> +        * We know the anon_inode inode count is always
> +        * greater than zero, so ihold() is safe.
>          */

This looks like maybe you started editing the comment, then un-did the
change, but left the modified line wrapping in your patch? Please
avoid that - code changes with no real reason make "git blame" output
more annoying and create trouble when porting patches between kernel
versions.

[...]
>  EXPORT_SYMBOL_GPL(anon_inode_getfile);
> +EXPORT_SYMBOL_GPL(anon_inode_getfile2);
[...]
>  EXPORT_SYMBOL_GPL(anon_inode_getfd);
> +EXPORT_SYMBOL_GPL(anon_inode_getfd2);

Since anon_inode_getfd() is now a static inline function in
include/linux/anon_inodes.h, exporting it doesn't make sense anymore.
Same for anon_inode_getfile().

[...]
> +#define ANON_INODE_SECURE 1

That #define belongs in a later patch, right?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API.
  2019-10-13  1:14       ` Andy Lutomirski
  2019-10-13  1:38         ` Daniel Colascione
@ 2019-10-14 16:04         ` Jann Horn
  2019-10-23 19:09           ` Andrea Arcangeli
  2019-10-22 21:27         ` Daniel Colascione
  2019-10-23  4:11         ` Andy Lutomirski
  3 siblings, 1 reply; 44+ messages in thread
From: Jann Horn @ 2019-10-14 16:04 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Daniel Colascione, Linus Torvalds, Andrea Arcangeli,
	Pavel Emelyanov, Linux API, LKML, Lokesh Gidra, Nick Kralevich,
	Nosh Minwalla, Tim Murray

On Sun, Oct 13, 2019 at 3:14 AM Andy Lutomirski <luto@kernel.org> wrote:
> [adding more people because this is going to be an ABI break, sigh]
> On Sat, Oct 12, 2019 at 5:52 PM Daniel Colascione <dancol@google.com> wrote:
> > On Sat, Oct 12, 2019 at 4:10 PM Andy Lutomirski <luto@kernel.org> wrote:
> > > On Sat, Oct 12, 2019 at 12:16 PM Daniel Colascione <dancol@google.com> wrote:
> > > > The new secure flag makes userfaultfd use a new "secure" anonymous
> > > > file object instead of the default one, letting security modules
> > > > supervise userfaultfd use.
> > > >
> > > > Requiring that users pass a new flag lets us avoid changing the
> > > > semantics for existing callers.
> > >
> > > Is there any good reason not to make this be the default?
> > >
> > >
> > > The only downside I can see is that it would increase the memory usage
> > > of userfaultfd(), but that doesn't seem like such a big deal.  A
> > > lighter-weight alternative would be to have a single inode shared by
> > > all userfaultfd instances, which would require a somewhat different
> > > internal anon_inode API.
> >
> > I'd also prefer to just make SELinux use mandatory, but there's a
> > nasty interaction with UFFD_EVENT_FORK. Adding a new UFFD_SECURE mode
> > which blocks UFFD_EVENT_FORK sidesteps this problem. Maybe you know a
> > better way to deal with it.
[...]
> Now that you've pointed this mechanism out, it is utterly and
> completely broken and should be removed from the kernel outright or at
> least severely restricted.  A .read implementation MUST NOT ACT ON THE
> CALLING TASK.  Ever.  Just imagine the effect of passing a userfaultfd
> as stdin to a setuid program.
>
> So I think the right solution might be to attempt to *remove*
> UFFD_EVENT_FORK.  Maybe the solution is to say that, unless the
> creator of a userfaultfd() has global CAP_SYS_ADMIN, then it cannot
> use UFFD_FEATURE_EVENT_FORK) and print a warning (once) when
> UFFD_FEATURE_EVENT_FORK is allowed.  And, after some suitable
> deprecation period, just remove it.  If it's genuinely useful, it
> needs an entirely new API based on ioctl() or a syscall.  Or even
> recvmsg() :)

FWIW, <https://codesearch.debian.net/search?q=UFFD_FEATURE_EVENT_FORK&literal=1>
just shows the kernel, kernel selftests, and strace code for decoding
syscall arguments. CRIU uses it though (probably for postcopy live
migration / lazy migration?), I guess that code isn't in debian for
some reason.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/7] Add a new flags-accepting interface for anonymous inodes
  2019-10-14 15:38   ` Jann Horn
@ 2019-10-14 18:15     ` Daniel Colascione
  2019-10-14 18:30       ` Jann Horn
  0 siblings, 1 reply; 44+ messages in thread
From: Daniel Colascione @ 2019-10-14 18:15 UTC (permalink / raw)
  To: Jann Horn
  Cc: Linux API, kernel list, Lokesh Gidra, Nick Kralevich,
	Nosh Minwalla, Tim Murray

Thanks for taking a look

On Mon, Oct 14, 2019 at 8:39 AM Jann Horn <jannh@google.com> wrote:
>
> On Sat, Oct 12, 2019 at 9:16 PM Daniel Colascione <dancol@google.com> wrote:
> > Add functions forwarding from the old names to the new ones so we
> > don't need to change any callers.
>
> This patch does more than the commit message says; it also refactors
> the body of the function. (I would've moved that refactoring over into
> patch 2, but I guess this works, too.)
>
> [...]
> > -struct file *anon_inode_getfile(const char *name,
> > -                               const struct file_operations *fops,
> > -                               void *priv, int flags)
> > +struct file *anon_inode_getfile2(const char *name,
> > +                                const struct file_operations *fops,
> > +                                void *priv, int flags, int anon_inode_flags)
>
> (AFAIK, normal kernel style is to slap a "__" prefix in front of the
> function name instead of appending a digit, but I guess it doesn't
> really matter.)

I thought prefixing "_" was for signaling "this is an implementation
detail and you probably don't want to call it unless you know what
you're doing", not "here's a new version that does a new thing".

> >  {
> > +       struct inode *inode;
> >         struct file *file;
> >
> > -       if (IS_ERR(anon_inode_inode))
> > -               return ERR_PTR(-ENODEV);
> > -
> > -       if (fops->owner && !try_module_get(fops->owner))
> > -               return ERR_PTR(-ENOENT);
> > +       if (anon_inode_flags)
> > +               return ERR_PTR(-EINVAL);
> >
> > +       inode = anon_inode_inode;
> > +       if (IS_ERR(inode))
> > +               return ERR_PTR(-ENODEV);
> >         /*
> > -        * We know the anon_inode inode count is always greater than zero,
> > -        * so ihold() is safe.
> > +        * We know the anon_inode inode count is always
> > +        * greater than zero, so ihold() is safe.
> >          */
>
> This looks like maybe you started editing the comment, then un-did the
> change, but left the modified line wrapping in your patch? Please
> avoid that - code changes with no real reason make "git blame" output
> more annoying and create trouble when porting patches between kernel
> versions.

I'll fix it.

>
> [...]
> >  EXPORT_SYMBOL_GPL(anon_inode_getfile);
> > +EXPORT_SYMBOL_GPL(anon_inode_getfile2);
> [...]
> >  EXPORT_SYMBOL_GPL(anon_inode_getfd);
> > +EXPORT_SYMBOL_GPL(anon_inode_getfd2);
>
> Since anon_inode_getfd() is now a static inline function in
> include/linux/anon_inodes.h, exporting it doesn't make sense anymore.
> Same for anon_inode_getfile().

I didn't want to break modules unnecessarily. Declaring the function
inline and also exporting it gives us an efficiency win while avoiding
an ABI break, right?

> [...]
> > +#define ANON_INODE_SECURE 1
>
> That #define belongs in a later patch, right?

Yep.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/7] Add a new flags-accepting interface for anonymous inodes
  2019-10-14 18:15     ` Daniel Colascione
@ 2019-10-14 18:30       ` Jann Horn
  0 siblings, 0 replies; 44+ messages in thread
From: Jann Horn @ 2019-10-14 18:30 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: Linux API, kernel list, Lokesh Gidra, Nick Kralevich,
	Nosh Minwalla, Tim Murray

On Mon, Oct 14, 2019 at 8:16 PM Daniel Colascione <dancol@google.com> wrote:
> On Mon, Oct 14, 2019 at 8:39 AM Jann Horn <jannh@google.com> wrote:
> >
> > On Sat, Oct 12, 2019 at 9:16 PM Daniel Colascione <dancol@google.com> wrote:
> > > Add functions forwarding from the old names to the new ones so we
> > > don't need to change any callers.
> >
> > This patch does more than the commit message says; it also refactors
> > the body of the function. (I would've moved that refactoring over into
> > patch 2, but I guess this works, too.)
> >
> > [...]
> > > -struct file *anon_inode_getfile(const char *name,
> > > -                               const struct file_operations *fops,
> > > -                               void *priv, int flags)
> > > +struct file *anon_inode_getfile2(const char *name,
> > > +                                const struct file_operations *fops,
> > > +                                void *priv, int flags, int anon_inode_flags)
> >
> > (AFAIK, normal kernel style is to slap a "__" prefix in front of the
> > function name instead of appending a digit, but I guess it doesn't
> > really matter.)
>
> I thought prefixing "_" was for signaling "this is an implementation
> detail and you probably don't want to call it unless you know what
> you're doing", not "here's a new version that does a new thing".

Ah, I guess that might be true.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/7] Add a new flags-accepting interface for anonymous inodes
  2019-10-12 19:15 ` [PATCH 1/7] Add a new flags-accepting interface for anonymous inodes Daniel Colascione
  2019-10-14  4:26   ` kbuild test robot
  2019-10-14 15:38   ` Jann Horn
@ 2019-10-15  8:08   ` Christoph Hellwig
  2 siblings, 0 replies; 44+ messages in thread
From: Christoph Hellwig @ 2019-10-15  8:08 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: linux-api, linux-kernel, lokeshgidra, nnk, nosh, timmurray

On Sat, Oct 12, 2019 at 12:15:56PM -0700, Daniel Colascione wrote:
> Add functions forwarding from the old names to the new ones so we
> don't need to change any callers.

Independent of the usefulness of the interface (I'll let other comment,
but you defintively want to talk to Al Viro), adding a second interface
for only 17 callers total is a bad idea.  Just switch the existing
interface to pass flags.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 2/7] Add a concept of a "secure" anonymous file
  2019-10-12 19:15 ` [PATCH 2/7] Add a concept of a "secure" anonymous file Daniel Colascione
  2019-10-14  3:01   ` kbuild test robot
@ 2019-10-15  8:08   ` Christoph Hellwig
  1 sibling, 0 replies; 44+ messages in thread
From: Christoph Hellwig @ 2019-10-15  8:08 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: linux-api, linux-kernel, lokeshgidra, nnk, nosh, timmurray

On Sat, Oct 12, 2019 at 12:15:57PM -0700, Daniel Colascione wrote:
> A secure anonymous file is one we hooked up to its own inode (as
> opposed to the shared inode we use for non-secure anonymous files). A
> new selinux hook gives security modules a chance to initialize, label,
> and veto the creation of these secure anonymous files. Security
> modules had limit ability to interact with non-secure anonymous files
> due to all of these files sharing a single inode.

Again please add Al.  Also explain what the problem would be to always
use a separate inode.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/7] Harden userfaultfd
  2019-10-12 19:15 [PATCH 0/7] Harden userfaultfd Daniel Colascione
                   ` (6 preceding siblings ...)
  2019-10-12 19:16 ` [PATCH 7/7] Add a new sysctl for limiting userfaultfd to user mode faults Daniel Colascione
@ 2019-10-16  0:02 ` James Morris
  2019-11-15 15:09 ` Stephen Smalley
  8 siblings, 0 replies; 44+ messages in thread
From: James Morris @ 2019-10-16  0:02 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: linux-api, linux-kernel, lokeshgidra, nnk, nosh, timmurray

On Sat, 12 Oct 2019, Daniel Colascione wrote:

>  Documentation/admin-guide/sysctl/vm.rst | 19 +++++-
>  fs/anon_inodes.c                        | 89 +++++++++++++++++--------
>  fs/userfaultfd.c                        | 47 +++++++++++--
>  include/linux/anon_inodes.h             | 27 ++++++--
>  include/linux/lsm_hooks.h               |  8 +++
>  include/linux/security.h                |  2 +
>  include/linux/userfaultfd_k.h           |  3 +
>  include/uapi/linux/userfaultfd.h        | 14 ++++
>  kernel/sysctl.c                         |  9 +++
>  security/security.c                     |  8 +++
>  security/selinux/hooks.c                | 68 +++++++++++++++++++
>  security/selinux/include/classmap.h     |  2 +
>  12 files changed, 256 insertions(+), 40 deletions(-)

For any changes to security/ please include the linux-security-module 
list.

-- 
James Morris
<jmorris@namei.org>


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API.
  2019-10-13  1:14       ` Andy Lutomirski
  2019-10-13  1:38         ` Daniel Colascione
  2019-10-14 16:04         ` Jann Horn
@ 2019-10-22 21:27         ` Daniel Colascione
  2019-10-23  4:11         ` Andy Lutomirski
  3 siblings, 0 replies; 44+ messages in thread
From: Daniel Colascione @ 2019-10-22 21:27 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Jann Horn, Andrea Arcangeli, Pavel Emelyanov,
	Linux API, LKML, Lokesh Gidra, Nick Kralevich, Nosh Minwalla,
	Tim Murray

On Sat, Oct 12, 2019 at 6:14 PM Andy Lutomirski <luto@kernel.org> wrote:
> [adding more people because this is going to be an ABI break, sigh]

Just pinging this thread. I'd like to rev my patch and I'm not sure
what we want to do about problem Andy identified. Are we removing
UFFD_EVENT_FORK?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API.
  2019-10-13  1:14       ` Andy Lutomirski
                           ` (2 preceding siblings ...)
  2019-10-22 21:27         ` Daniel Colascione
@ 2019-10-23  4:11         ` Andy Lutomirski
  2019-10-23  7:29           ` Cyrill Gorcunov
  3 siblings, 1 reply; 44+ messages in thread
From: Andy Lutomirski @ 2019-10-23  4:11 UTC (permalink / raw)
  To: Andy Lutomirski, Pavel Emelyanov, Cyrill Gorcunov
  Cc: Daniel Colascione, Linus Torvalds, Jann Horn, Andrea Arcangeli,
	Linux API, LKML, Lokesh Gidra, Nick Kralevich, Nosh Minwalla,
	Tim Murray

Trying again.  It looks like I used the wrong address for Pavel.

On Sat, Oct 12, 2019 at 6:14 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> [adding more people because this is going to be an ABI break, sigh]
>
> On Sat, Oct 12, 2019 at 5:52 PM Daniel Colascione <dancol@google.com> wrote:
> >
> > On Sat, Oct 12, 2019 at 4:10 PM Andy Lutomirski <luto@kernel.org> wrote:
> > >
> > > On Sat, Oct 12, 2019 at 12:16 PM Daniel Colascione <dancol@google.com> wrote:
> > > >
> > > > The new secure flag makes userfaultfd use a new "secure" anonymous
> > > > file object instead of the default one, letting security modules
> > > > supervise userfaultfd use.
> > > >
> > > > Requiring that users pass a new flag lets us avoid changing the
> > > > semantics for existing callers.
> > >
> > > Is there any good reason not to make this be the default?
> > >
> > >
> > > The only downside I can see is that it would increase the memory usage
> > > of userfaultfd(), but that doesn't seem like such a big deal.  A
> > > lighter-weight alternative would be to have a single inode shared by
> > > all userfaultfd instances, which would require a somewhat different
> > > internal anon_inode API.
> >
> > I'd also prefer to just make SELinux use mandatory, but there's a
> > nasty interaction with UFFD_EVENT_FORK. Adding a new UFFD_SECURE mode
> > which blocks UFFD_EVENT_FORK sidesteps this problem. Maybe you know a
> > better way to deal with it.
>
> ...
>
> > But maybe we can go further: let's separate authentication and
> > authorization, as we do in other LSM hooks. Let's split my
> > inode_init_security_anon into two hooks, inode_init_security_anon and
> > inode_create_anon. We'd define the former to just initialize the file
> > object's security information --- in the SELinux case, figuring out
> > its class and SID --- and define the latter to answer the yes/no
> > question of whether a particular anonymous inode creation should be
> > allowed. Normally, anon_inode_getfile2() would just call both hooks.
> > We'd add another anon_inode_getfd flag, ANON_INODE_SKIP_AUTHORIZATION
> > or something, that would tell anon_inode_getfile2() to skip calling
> > the authorization hook, effectively making the creation always
> > succeed. We can then make the UFFD code pass
> > ANON_INODE_SKIP_AUTHORIZATION when it's creating a file object in the
> > fork child while creating UFFD_EVENT_FORK messages.
>
> That sounds like an improvement.  Or maybe just teach SELinux that
> this particular fd creation is actually making an anon_inode that is a
> child of an existing anon inode and that the context should be copied
> or whatever SELinux wants to do.  Like this, maybe:
>
> static int resolve_userfault_fork(struct userfaultfd_ctx *ctx,
>                                   struct userfaultfd_ctx *new,
>                                   struct uffd_msg *msg)
> {
>         int fd;
>
> Change this:
>
>         fd = anon_inode_getfd("[userfaultfd]", &userfaultfd_fops, new,
>                               O_RDWR | (new->flags & UFFD_SHARED_FCNTL_FLAGS));
>
> to something like:
>
>       fd = anon_inode_make_child_fd(..., ctx->inode, ...);
>
> where ctx->inode is the one context's inode.
>
> *** HOWEVER *** !!!
>
> Now that you've pointed this mechanism out, it is utterly and
> completely broken and should be removed from the kernel outright or at
> least severely restricted.  A .read implementation MUST NOT ACT ON THE
> CALLING TASK.  Ever.  Just imagine the effect of passing a userfaultfd
> as stdin to a setuid program.
>
> So I think the right solution might be to attempt to *remove*
> UFFD_EVENT_FORK.  Maybe the solution is to say that, unless the
> creator of a userfaultfd() has global CAP_SYS_ADMIN, then it cannot
> use UFFD_FEATURE_EVENT_FORK) and print a warning (once) when
> UFFD_FEATURE_EVENT_FORK is allowed.  And, after some suitable
> deprecation period, just remove it.  If it's genuinely useful, it
> needs an entirely new API based on ioctl() or a syscall.  Or even
> recvmsg() :)
>
> And UFFD_SECURE should just become automatic, since you don't have a
> problem any more. :-p
>
> --Andy

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API.
  2019-10-23  4:11         ` Andy Lutomirski
@ 2019-10-23  7:29           ` Cyrill Gorcunov
  2019-10-23 12:43             ` Mike Rapoport
  0 siblings, 1 reply; 44+ messages in thread
From: Cyrill Gorcunov @ 2019-10-23  7:29 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Pavel Emelyanov, Daniel Colascione, Linus Torvalds, Jann Horn,
	Andrea Arcangeli, Linux API, LKML, Lokesh Gidra, Nick Kralevich,
	Nosh Minwalla, Tim Murray, Mike Rapoport, Radostin Stoyanov,
	Andrey Vagin

On Tue, Oct 22, 2019 at 09:11:04PM -0700, Andy Lutomirski wrote:
> Trying again.  It looks like I used the wrong address for Pavel.

Thanks for CC Andy! I must confess I didn't dive into userfaultfd engine
personally but let me CC more people involved from criu side. (overquoting
left untouched for their sake).

> 
> On Sat, Oct 12, 2019 at 6:14 PM Andy Lutomirski <luto@kernel.org> wrote:
> >
> > [adding more people because this is going to be an ABI break, sigh]
> >
> > On Sat, Oct 12, 2019 at 5:52 PM Daniel Colascione <dancol@google.com> wrote:
> > >
> > > On Sat, Oct 12, 2019 at 4:10 PM Andy Lutomirski <luto@kernel.org> wrote:
> > > >
> > > > On Sat, Oct 12, 2019 at 12:16 PM Daniel Colascione <dancol@google.com> wrote:
> > > > >
> > > > > The new secure flag makes userfaultfd use a new "secure" anonymous
> > > > > file object instead of the default one, letting security modules
> > > > > supervise userfaultfd use.
> > > > >
> > > > > Requiring that users pass a new flag lets us avoid changing the
> > > > > semantics for existing callers.
> > > >
> > > > Is there any good reason not to make this be the default?
> > > >
> > > >
> > > > The only downside I can see is that it would increase the memory usage
> > > > of userfaultfd(), but that doesn't seem like such a big deal.  A
> > > > lighter-weight alternative would be to have a single inode shared by
> > > > all userfaultfd instances, which would require a somewhat different
> > > > internal anon_inode API.
> > >
> > > I'd also prefer to just make SELinux use mandatory, but there's a
> > > nasty interaction with UFFD_EVENT_FORK. Adding a new UFFD_SECURE mode
> > > which blocks UFFD_EVENT_FORK sidesteps this problem. Maybe you know a
> > > better way to deal with it.
> >
> > ...
> >
> > > But maybe we can go further: let's separate authentication and
> > > authorization, as we do in other LSM hooks. Let's split my
> > > inode_init_security_anon into two hooks, inode_init_security_anon and
> > > inode_create_anon. We'd define the former to just initialize the file
> > > object's security information --- in the SELinux case, figuring out
> > > its class and SID --- and define the latter to answer the yes/no
> > > question of whether a particular anonymous inode creation should be
> > > allowed. Normally, anon_inode_getfile2() would just call both hooks.
> > > We'd add another anon_inode_getfd flag, ANON_INODE_SKIP_AUTHORIZATION
> > > or something, that would tell anon_inode_getfile2() to skip calling
> > > the authorization hook, effectively making the creation always
> > > succeed. We can then make the UFFD code pass
> > > ANON_INODE_SKIP_AUTHORIZATION when it's creating a file object in the
> > > fork child while creating UFFD_EVENT_FORK messages.
> >
> > That sounds like an improvement.  Or maybe just teach SELinux that
> > this particular fd creation is actually making an anon_inode that is a
> > child of an existing anon inode and that the context should be copied
> > or whatever SELinux wants to do.  Like this, maybe:
> >
> > static int resolve_userfault_fork(struct userfaultfd_ctx *ctx,
> >                                   struct userfaultfd_ctx *new,
> >                                   struct uffd_msg *msg)
> > {
> >         int fd;
> >
> > Change this:
> >
> >         fd = anon_inode_getfd("[userfaultfd]", &userfaultfd_fops, new,
> >                               O_RDWR | (new->flags & UFFD_SHARED_FCNTL_FLAGS));
> >
> > to something like:
> >
> >       fd = anon_inode_make_child_fd(..., ctx->inode, ...);
> >
> > where ctx->inode is the one context's inode.
> >
> > *** HOWEVER *** !!!
> >
> > Now that you've pointed this mechanism out, it is utterly and
> > completely broken and should be removed from the kernel outright or at
> > least severely restricted.  A .read implementation MUST NOT ACT ON THE
> > CALLING TASK.  Ever.  Just imagine the effect of passing a userfaultfd
> > as stdin to a setuid program.
> >
> > So I think the right solution might be to attempt to *remove*
> > UFFD_EVENT_FORK.  Maybe the solution is to say that, unless the
> > creator of a userfaultfd() has global CAP_SYS_ADMIN, then it cannot
> > use UFFD_FEATURE_EVENT_FORK) and print a warning (once) when
> > UFFD_FEATURE_EVENT_FORK is allowed.  And, after some suitable
> > deprecation period, just remove it.  If it's genuinely useful, it
> > needs an entirely new API based on ioctl() or a syscall.  Or even
> > recvmsg() :)
> >
> > And UFFD_SECURE should just become automatic, since you don't have a
> > problem any more. :-p
> >
> > --Andy
> 

	Cyrill

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API.
  2019-10-23  7:29           ` Cyrill Gorcunov
@ 2019-10-23 12:43             ` Mike Rapoport
  2019-10-23 17:13               ` Andy Lutomirski
  0 siblings, 1 reply; 44+ messages in thread
From: Mike Rapoport @ 2019-10-23 12:43 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Andy Lutomirski, Pavel Emelyanov, Daniel Colascione,
	Linus Torvalds, Jann Horn, Andrea Arcangeli, Linux API, LKML,
	Lokesh Gidra, Nick Kralevich, Nosh Minwalla, Tim Murray,
	Mike Rapoport, Radostin Stoyanov, Andrey Vagin

On Wed, Oct 23, 2019 at 10:29:20AM +0300, Cyrill Gorcunov wrote:
> On Tue, Oct 22, 2019 at 09:11:04PM -0700, Andy Lutomirski wrote:
> > Trying again.  It looks like I used the wrong address for Pavel.
> 
> Thanks for CC Andy! I must confess I didn't dive into userfaultfd engine
> personally but let me CC more people involved from criu side. (overquoting
> left untouched for their sake).

Thanks for CC Cyrill!

 
> > On Sat, Oct 12, 2019 at 6:14 PM Andy Lutomirski <luto@kernel.org> wrote:
> > >
> > > [adding more people because this is going to be an ABI break, sigh]
> > >
> > > On Sat, Oct 12, 2019 at 5:52 PM Daniel Colascione <dancol@google.com> wrote:
> > > >
> > > > On Sat, Oct 12, 2019 at 4:10 PM Andy Lutomirski <luto@kernel.org> wrote:
> > > > >
> > > > > On Sat, Oct 12, 2019 at 12:16 PM Daniel Colascione <dancol@google.com> wrote:
> > > > > >
> > > > > > The new secure flag makes userfaultfd use a new "secure" anonymous
> > > > > > file object instead of the default one, letting security modules
> > > > > > supervise userfaultfd use.
> > > > > >
> > > > > > Requiring that users pass a new flag lets us avoid changing the
> > > > > > semantics for existing callers.
> > > > >
> > > > > Is there any good reason not to make this be the default?
> > > > >
> > > > >
> > > > > The only downside I can see is that it would increase the memory usage
> > > > > of userfaultfd(), but that doesn't seem like such a big deal.  A
> > > > > lighter-weight alternative would be to have a single inode shared by
> > > > > all userfaultfd instances, which would require a somewhat different
> > > > > internal anon_inode API.
> > > >
> > > > I'd also prefer to just make SELinux use mandatory, but there's a
> > > > nasty interaction with UFFD_EVENT_FORK. Adding a new UFFD_SECURE mode
> > > > which blocks UFFD_EVENT_FORK sidesteps this problem. Maybe you know a
> > > > better way to deal with it.
> > > >
> > > > Right now, when a process with a UFFD-managed VMA using
> > > > UFFD_EVENT_FORK forks, we make a new userfaultfd_ctx out of thin air
> > > > and enqueue it on the message queue for the parent process. When we
> > > > dequeue that context, we get to resolve_userfault_fork, which makes up
> > > > a new UFFD file object out of thin air in the context of the reading
> > > > process. Following normal SELinux rules, the SID attached to that new
> > > > file object would be the task SID of the process *reading* the fork
> > > > event, not the SID of the new fork child. That seems wrong, because
> > > > the label we give to the UFFD should correspond to the label of the
> > > > process that UFFD controls.

I must admit I have no idea about how SELinux works, but what's wrong with
making the new UFFD object to inherit the properties of the "original" one?

The new file object is created in the context of the same task that owns
the initial userfault file descriptor and it is used by the same task. So
if you have a process that registers some of its VMAs with userfaultfd
and enables UFFD_EVENT_FORK, the same process controls UFFD of itself and
its children.

> > >
> > > ...
> > >
> > > > But maybe we can go further: let's separate authentication and
> > > > authorization, as we do in other LSM hooks. Let's split my
> > > > inode_init_security_anon into two hooks, inode_init_security_anon and
> > > > inode_create_anon. We'd define the former to just initialize the file
> > > > object's security information --- in the SELinux case, figuring out
> > > > its class and SID --- and define the latter to answer the yes/no
> > > > question of whether a particular anonymous inode creation should be
> > > > allowed. Normally, anon_inode_getfile2() would just call both hooks.
> > > > We'd add another anon_inode_getfd flag, ANON_INODE_SKIP_AUTHORIZATION
> > > > or something, that would tell anon_inode_getfile2() to skip calling
> > > > the authorization hook, effectively making the creation always
> > > > succeed. We can then make the UFFD code pass
> > > > ANON_INODE_SKIP_AUTHORIZATION when it's creating a file object in the
> > > > fork child while creating UFFD_EVENT_FORK messages.
> > >
> > > That sounds like an improvement.  Or maybe just teach SELinux that
> > > this particular fd creation is actually making an anon_inode that is a
> > > child of an existing anon inode and that the context should be copied
> > > or whatever SELinux wants to do.  Like this, maybe:
> > >
> > > static int resolve_userfault_fork(struct userfaultfd_ctx *ctx,
> > >                                   struct userfaultfd_ctx *new,
> > >                                   struct uffd_msg *msg)
> > > {
> > >         int fd;
> > >
> > > Change this:
> > >
> > >         fd = anon_inode_getfd("[userfaultfd]", &userfaultfd_fops, new,
> > >                               O_RDWR | (new->flags & UFFD_SHARED_FCNTL_FLAGS));
> > >
> > > to something like:
> > >
> > >       fd = anon_inode_make_child_fd(..., ctx->inode, ...);
> > >
> > > where ctx->inode is the one context's inode.
> > >
> > > *** HOWEVER *** !!!
> > >
> > > Now that you've pointed this mechanism out, it is utterly and
> > > completely broken and should be removed from the kernel outright or at
> > > least severely restricted.  A .read implementation MUST NOT ACT ON THE
> > > CALLING TASK.  Ever.  Just imagine the effect of passing a userfaultfd
> > > as stdin to a setuid program.
> > >
> > > So I think the right solution might be to attempt to *remove*
> > > UFFD_EVENT_FORK.  Maybe the solution is to say that, unless the
> > > creator of a userfaultfd() has global CAP_SYS_ADMIN, then it cannot
> > > use UFFD_FEATURE_EVENT_FORK) and print a warning (once) when
> > > UFFD_FEATURE_EVENT_FORK is allowed.  And, after some suitable
> > > deprecation period, just remove it.  If it's genuinely useful, it
> > > needs an entirely new API based on ioctl() or a syscall.  Or even
> > > recvmsg() :)
> > >
> > > And UFFD_SECURE should just become automatic, since you don't have a
> > > problem any more. :-p
> > >
> > > --Andy
> > 
> 
> 	Cyrill

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API.
  2019-10-23 12:43             ` Mike Rapoport
@ 2019-10-23 17:13               ` Andy Lutomirski
  0 siblings, 0 replies; 44+ messages in thread
From: Andy Lutomirski @ 2019-10-23 17:13 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Cyrill Gorcunov, Andy Lutomirski, Pavel Emelyanov,
	Daniel Colascione, Linus Torvalds, Jann Horn, Andrea Arcangeli,
	Linux API, LKML, Lokesh Gidra, Nick Kralevich, Nosh Minwalla,
	Tim Murray, Mike Rapoport, Radostin Stoyanov, Andrey Vagin

On Wed, Oct 23, 2019 at 5:44 AM Mike Rapoport <rppt@linux.ibm.com> wrote:
>
> On Wed, Oct 23, 2019 at 10:29:20AM +0300, Cyrill Gorcunov wrote:
> > On Tue, Oct 22, 2019 at 09:11:04PM -0700, Andy Lutomirski wrote:
> > > Trying again.  It looks like I used the wrong address for Pavel.
> >
> > Thanks for CC Andy! I must confess I didn't dive into userfaultfd engine
> > personally but let me CC more people involved from criu side. (overquoting
> > left untouched for their sake).
>
> Thanks for CC Cyrill!
>
>
> > > On Sat, Oct 12, 2019 at 6:14 PM Andy Lutomirski <luto@kernel.org> wrote:
> > > >
> > > > [adding more people because this is going to be an ABI break, sigh]
> > > >
> > > > On Sat, Oct 12, 2019 at 5:52 PM Daniel Colascione <dancol@google.com> wrote:
> > > > >
> > > > > On Sat, Oct 12, 2019 at 4:10 PM Andy Lutomirski <luto@kernel.org> wrote:
> > > > > >
> > > > > > On Sat, Oct 12, 2019 at 12:16 PM Daniel Colascione <dancol@google.com> wrote:
> > > > > > >
> > > > > > > The new secure flag makes userfaultfd use a new "secure" anonymous
> > > > > > > file object instead of the default one, letting security modules
> > > > > > > supervise userfaultfd use.
> > > > > > >
> > > > > > > Requiring that users pass a new flag lets us avoid changing the
> > > > > > > semantics for existing callers.
> > > > > >
> > > > > > Is there any good reason not to make this be the default?
> > > > > >
> > > > > >
> > > > > > The only downside I can see is that it would increase the memory usage
> > > > > > of userfaultfd(), but that doesn't seem like such a big deal.  A
> > > > > > lighter-weight alternative would be to have a single inode shared by
> > > > > > all userfaultfd instances, which would require a somewhat different
> > > > > > internal anon_inode API.
> > > > >
> > > > > I'd also prefer to just make SELinux use mandatory, but there's a
> > > > > nasty interaction with UFFD_EVENT_FORK. Adding a new UFFD_SECURE mode
> > > > > which blocks UFFD_EVENT_FORK sidesteps this problem. Maybe you know a
> > > > > better way to deal with it.
> > > > >
> > > > > Right now, when a process with a UFFD-managed VMA using
> > > > > UFFD_EVENT_FORK forks, we make a new userfaultfd_ctx out of thin air
> > > > > and enqueue it on the message queue for the parent process. When we
> > > > > dequeue that context, we get to resolve_userfault_fork, which makes up
> > > > > a new UFFD file object out of thin air in the context of the reading
> > > > > process. Following normal SELinux rules, the SID attached to that new
> > > > > file object would be the task SID of the process *reading* the fork
> > > > > event, not the SID of the new fork child. That seems wrong, because
> > > > > the label we give to the UFFD should correspond to the label of the
> > > > > process that UFFD controls.
>
> I must admit I have no idea about how SELinux works, but what's wrong with
> making the new UFFD object to inherit the properties of the "original" one?
>
> The new file object is created in the context of the same task that owns
> the initial userfault file descriptor and it is used by the same task. So
> if you have a process that registers some of its VMAs with userfaultfd
> and enables UFFD_EVENT_FORK, the same process controls UFFD of itself and
> its children.

I'm not actually convinced this is a problem.

What *is* a problem is touching the file descriptor table at all from
read(2).  That's a big no-no.

--Andy

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API.
  2019-10-14 16:04         ` Jann Horn
@ 2019-10-23 19:09           ` Andrea Arcangeli
  2019-10-23 19:21             ` Andy Lutomirski
                               ` (3 more replies)
  0 siblings, 4 replies; 44+ messages in thread
From: Andrea Arcangeli @ 2019-10-23 19:09 UTC (permalink / raw)
  To: Andy Lutomirski, Jann Horn
  Cc: Daniel Colascione, Linus Torvalds, Pavel Emelyanov, Lokesh Gidra,
	Nick Kralevich, Nosh Minwalla, Tim Murray, Mike Rapoport,
	Linux API, LKML

Hello,

On Sat, Oct 12, 2019 at 06:14:23PM -0700, Andy Lutomirski wrote:
> [adding more people because this is going to be an ABI break, sigh]

That wouldn't break the ABI, no more than when if you boot a kernel
built with CONFIG_USERFAULTFD=n.

All non-cooperative features can be removed any time in a backwards
compatible way, the only precaution is to mark their feature bits as
reserved so they can't be reused for something else later.

> least severely restricted.  A .read implementation MUST NOT ACT ON THE
> CALLING TASK.  Ever.  Just imagine the effect of passing a userfaultfd
> as stdin to a setuid program.

With UFFD_EVENT_FORK, the newly created uffd that controls the child,
is not passed to the parent nor to the child. Instead it's passed to
the CRIU monitor only, which has to be already running as root and is
fully trusted and acts a hypervisor (despite there is no hypervisor).

By the time execve runs and any suid bit in the execve'd inode becomes
relevant, well before the new userland executable code can run, the
kernel throws away the "old_mm" controlled by any uffd and all
attached uffds are released as well.

All I found is your "A .read implementation MUST NOT ACT ON THE
CALLING TASK" as an explanation that something is broken but I need
further clarification.

Of course I can see you can always open a uffd and pass it to any task
you are going to execve on, but that simply means the suid program
will be able to control you, not the other way around. If you don't
want to be controlled by the next task, no matter if suid or not, just
don't that. What I don't see is how you're going to control the suid
binary from the outside, the suid binary at most will block in the
poll, read and write syscalls and get garbage or write some garbage
and get an error, it won't get signals and it cannot block in any page
fault either, it's not immediately clear what's out of ordinary.

On Mon, Oct 14, 2019 at 06:04:22PM +0200, Jann Horn wrote:
> FWIW, <https://codesearch.debian.net/search?q=UFFD_FEATURE_EVENT_FORK&literal=1>
> just shows the kernel, kernel selftests, and strace code for decoding
> syscall arguments. CRIU uses it though (probably for postcopy live
> migration / lazy migration?), I guess that code isn't in debian for
> some reason.

https://criu.org/Userfaultfd#Limitations

The CRIU developers did a truly amazing job by making container post
copy live migration work great for a subset of apps, that alone was an
amazing achievement. Is that achievement enough to use post copy live
migration of bare metal containers in production? Unfortunately
probably not and not just in debian.

If you're wrong and UFFDIO_EVENT_FORK isn't currently buggy and in
turn it isn't causing further maintenance burden, there is no hurry of
removing them, but in the long term, if none of the non-cooperative
features find its way in production (like it was reasonable to expect
initially), they must be removed from the kernel anyway, not just
UFFD_EVEN_FORK but all non-cooperative features associated with it.

In my view the kernel is complex enough that we can't keep in the
kernel anything that isn't actively used in production.

If you're right and UFFDIO_EVENT_FORK is flawed beyond repair well
then we should remove all non cooperative features right now.

Or is someone out there using CRIU --lazy-pages in production?

Virtual machine machine postcopy live migration is shipped in
production for years and it's fully reliable and by design it cannot
suffer from any of the above limitations.

In my view there's simply no justification not to use virtual machines
when the alternative requires so much more code to be written and so
much more complexity to be dealt with.

However the higher complexity happened in lots areas of the kernel
already where things got extremely complex just to avoid using virtual
machines, despite the end result is less secure, for the only sake of
slightly higher consolidation (especially if you ignore the existence
millisecond guest boot time, direct mapped pmem nvdimm, virtfs and
free page hinting).

The non-cooperative features of userfaultfd in principle aren't
fundamentally different from other cgroup vs KVM tradeoffs, so 1) it
wasn't apparent they wouldn't be used in production eventually and 2)
it didn't sound fair enough not to give a chance to bare metal
containers to achieve feature parity with VM (again with a much higher
code complexity, but that was to be expected and it has apparently
been dealt with in other areas with more or less satisfactory
results).

While at it, as far as userfaultfd is concerned I'd rather see people
spend time to write a malloc library that uses userfaultfd with the
UFFD_FEATURE_SIGBUS features and it replaces mmap with UFFDIO_ZEROPAGE
(plus adding the THP accelleration currently missing) and munmap with
MADV_DONTNEED to do allocations and freeing of memory with full
scalability without ever hitting on the map sem for writing. This is
already possible without any further kernel change (the THP
accelleration to UFFDIO_ZEROPAGE will only make it go faster but it
could be done later after the lib already works because it'd be
invisible to userland).

On my side, instead of trying to fix whatever issue in
UFFD_EVENT_FORK, I'd prefer to spend my time reviewing the uffd-wp
feature from Peter and the page fault enhancement patchset that Peter
and Linus were discussing. uffd-wp has the potential to drop fork()
from all apps calling fork() only to do an atomic snapshot of their
memory. Replacing fork() also means the uffd manager thread can decide
how much memory to reserve to the snapshot and it can start throttling
waiting for I/O completion if the threshold is exceeded, while fork
COWs cannot throttle and all apps using fork() risk to hit on x2
memory usage which can become oom-killer material if the memory size
of the process is huge. The side benefit is also that the way
userfaultfd works the fault granularity is entirely in control of
userland (because it's always userland that resolves the fault), it
could decide to use 8k or 16k even if that doesn't match the hardware
page size. That will allow to keep THP on without risking to hit on 2M
cows during the snapshot. Being able to keep THP enabled in nosql db
without hitting on slow 2M COW copies during snapshot, should allow a
further overall performance improvement when the snapshot is not
running than what it is possible today. In a completely different use
case, uffd-wp will also avoid JITs to set a dirty bit every time they
modify any data in memory. It should also be possible to provide the
same soft-dirty information in O(1) instead of O(N).

Thanks,
Andrea


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API.
  2019-10-23 19:09           ` Andrea Arcangeli
@ 2019-10-23 19:21             ` Andy Lutomirski
  2019-10-23 21:16               ` Andrea Arcangeli
  2019-10-23 20:05             ` Daniel Colascione
                               ` (2 subsequent siblings)
  3 siblings, 1 reply; 44+ messages in thread
From: Andy Lutomirski @ 2019-10-23 19:21 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andy Lutomirski, Jann Horn, Daniel Colascione, Linus Torvalds,
	Pavel Emelyanov, Lokesh Gidra, Nick Kralevich, Nosh Minwalla,
	Tim Murray, Mike Rapoport, Linux API, LKML

On Wed, Oct 23, 2019 at 12:10 PM Andrea Arcangeli <aarcange@redhat.com> wrote:
>
> Hello,
>
> On Sat, Oct 12, 2019 at 06:14:23PM -0700, Andy Lutomirski wrote:
> > [adding more people because this is going to be an ABI break, sigh]
>
> That wouldn't break the ABI, no more than when if you boot a kernel
> built with CONFIG_USERFAULTFD=n.
>
> All non-cooperative features can be removed any time in a backwards
> compatible way, the only precaution is to mark their feature bits as
> reserved so they can't be reused for something else later.
>
> > least severely restricted.  A .read implementation MUST NOT ACT ON THE
> > CALLING TASK.  Ever.  Just imagine the effect of passing a userfaultfd
> > as stdin to a setuid program.
>
> With UFFD_EVENT_FORK, the newly created uffd that controls the child,
> is not passed to the parent nor to the child. Instead it's passed to
> the CRIU monitor only, which has to be already running as root and is
> fully trusted and acts a hypervisor (despite there is no hypervisor).
>
> By the time execve runs and any suid bit in the execve'd inode becomes
> relevant, well before the new userland executable code can run, the
> kernel throws away the "old_mm" controlled by any uffd and all
> attached uffds are released as well.
>
> All I found is your "A .read implementation MUST NOT ACT ON THE
> CALLING TASK" as an explanation that something is broken but I need
> further clarification.

There are two things going on here.

1. Daniel wants to add LSM labels to userfaultfd objects.  This seems
reasonable to me.  The question, as I understand it, is: who is the
subject that creates a uffd referring to a forked child?  I'm sure
this is solvable in any number of straightforward ways, but I think
it's less important than:

2. The existing ABI is busted independently of #1.  Suppose you call
userfaultfd to get a userfaultfd and enable UFFD_FEATURE_EVENT_FORK.
Then you do:

$ sudo <&[userfaultfd number]

Sudo will read it and get a new fd unexpectedly added to its fd table.
It's worse if SCM_RIGHTS is involved.

So I think we either need to declare that UFFD_FEATURE_EVENT_FORK is
only usable by global root or we need to remove it and maybe re-add it
in some other form.


--Andy

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API.
  2019-10-23 19:09           ` Andrea Arcangeli
  2019-10-23 19:21             ` Andy Lutomirski
@ 2019-10-23 20:05             ` Daniel Colascione
  2019-10-24  0:23               ` Andrea Arcangeli
  2019-10-23 20:15             ` Linus Torvalds
  2019-10-24  9:02             ` Mike Rapoport
  3 siblings, 1 reply; 44+ messages in thread
From: Daniel Colascione @ 2019-10-23 20:05 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andy Lutomirski, Jann Horn, Linus Torvalds, Pavel Emelyanov,
	Lokesh Gidra, Nick Kralevich, Nosh Minwalla, Tim Murray,
	Mike Rapoport, Linux API, LKML

On Wed, Oct 23, 2019 at 12:10 PM Andrea Arcangeli <aarcange@redhat.com> wrote:
> On Sat, Oct 12, 2019 at 06:14:23PM -0700, Andy Lutomirski wrote:
> > [adding more people because this is going to be an ABI break, sigh]
>
> That wouldn't break the ABI, no more than when if you boot a kernel
> built with CONFIG_USERFAULTFD=n.

The change Andy is proposing would convert programs that work with
CONFIG_USERFAULTFD=y today into programs that don't work with
CONFIG_USERFAULTFD. Whether that counts as an ABI break is above my
ability to decide, but IMHO, I think it should count. Such a change at
least merits more-than-usual scrutiny. I'd love to get Linus's
opinion.

> All non-cooperative features can be removed any time in a backwards
> compatible way, the only precaution is to mark their feature bits as
> reserved so they can't be reused for something else later.
>
> > least severely restricted.  A .read implementation MUST NOT ACT ON THE
> > CALLING TASK.  Ever.  Just imagine the effect of passing a userfaultfd
> > as stdin to a setuid program.
>
> With UFFD_EVENT_FORK, the newly created uffd that controls the child,
> is not passed to the parent nor to the child. Instead it's passed to
> the CRIU monitor only, which has to be already running as root and is
> fully trusted and acts a hypervisor (despite there is no hypervisor).

The phrase "CRIU monitor" above stands out.  :-) Not every process
that uses userfaultfd will be CRIU-related, and in particular, there's
no requirement right now that limits UFFD_EVENT_FORK to privileged
processes.

The attack Andy is describing involves a random unprivileged process
creating a userfaultfd file object, configuring it to UFFD_EVENT_FORK,
somehow (stdin, SCM_RIGHTS, binder, etc.) passing that FD to a
more-privileged process, and convincing that privileged process to
read(2) that FD and disturb its file descriptor table, which in turn
can cause EoP or all kinds of other havoc. This is a serious bug that
needs some kind of fix.

> On Mon, Oct 14, 2019 at 06:04:22PM +0200, Jann Horn wrote:
> > FWIW, <https://codesearch.debian.net/search?q=UFFD_FEATURE_EVENT_FORK&literal=1>
> > just shows the kernel, kernel selftests, and strace code for decoding
> > syscall arguments. CRIU uses it though (probably for postcopy live
> > migration / lazy migration?), I guess that code isn't in debian for
> > some reason.
>
> https://criu.org/Userfaultfd#Limitations
>
> The CRIU developers did a truly amazing job by making container post
> copy live migration work great for a subset of apps, that alone was an
> amazing achievement. Is that achievement enough to use post copy live
> migration of bare metal containers in production? Unfortunately
> probably not and not just in debian.

Nobody is claiming that there's anything wrong with UFFD. That UFFD is
being used for features that have nothing to do with CRIU or
containerization is a signal that UFFD's creators made a good,
general-purpose tool. (We're considering it for two completely
unrelated purposes in Android in fact.) I don't think we can assume
that the UFFD feature has gone unused on the basis of CRIU's
slower-than-hoped-for adoption. Who's using it for something?
*Probably* nobody, but like I said above, it's worth thinking about
and being careful.

> In my view there's simply no justification not to use virtual machines
> when the alternative requires so much more code to be written and so
> much more complexity to be dealt with.

This is a debate that won't get resolved here. A ton of work has gone
into namespaces, migration, various cgroup things, and so on, and I
don't see that work getting torn out.

> While at it, as far as userfaultfd is concerned I'd rather see people
> spend time to write a malloc library that uses userfaultfd with the
> UFFD_FEATURE_SIGBUS features and it replaces mmap with UFFDIO_ZEROPAGE
> (plus adding the THP accelleration currently missing)

I'd also like to see realloc(3) use mremap(2) in real implementations
and for C++ to grow an allocator interface that can use realloc(3).
But I think that's a separate matter.

> and munmap with
> MADV_DONTNEED to do allocations and freeing of memory with full
> scalability without ever hitting on the map sem for writing.

Some allocators, e.g., jemalloc, already use MADV_DONTNEED.

> fork COWs cannot throttle

Sure they can. Can't we stick processes in a memcg and set a
memory.high threshold beyond which threads in that cgroup will enter
direct reclaim on page allocations? I'd call that throttling.

> and all apps using fork() risk to hit on x2
> memory usage which can become oom-killer material if the memory size
> of the process is huge.

fork is one of the reasons people use overcommit all the time. I'd
like to see a lot less overcommit in the world.

> On my side, instead of trying to fix whatever issue in
> UFFD_EVENT_FORK,

This issue *has* to get fixed one way or another.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API.
  2019-10-23 19:09           ` Andrea Arcangeli
  2019-10-23 19:21             ` Andy Lutomirski
  2019-10-23 20:05             ` Daniel Colascione
@ 2019-10-23 20:15             ` Linus Torvalds
  2019-10-24  9:02             ` Mike Rapoport
  3 siblings, 0 replies; 44+ messages in thread
From: Linus Torvalds @ 2019-10-23 20:15 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andy Lutomirski, Jann Horn, Daniel Colascione, Pavel Emelyanov,
	Lokesh Gidra, Nick Kralevich, Nosh Minwalla, Tim Murray,
	Mike Rapoport, Linux API, LKML

On Wed, Oct 23, 2019 at 3:10 PM Andrea Arcangeli <aarcange@redhat.com> wrote:
>
> That wouldn't break the ABI, no more than when if you boot a kernel
> built with CONFIG_USERFAULTFD=n.

What? No.

You're entirely incorrect.

If USEFAULTFD no longer works, and if people depend on it, then it's
breaking the ABI. End of story. No weaselwording of "as if built with
CONFIG_USERFAULTFD=n" allowed, no garbage.

Btw, the whole "breaking the ABI" is misleading wording anyway. It's
irrelevant. You can "break" the ABI all you want by changing
semantics, adding or removing features, or making it do anything else
- as long as nobody notices.

Because the only thing that matters is that it doesn't break any user
workflows. That's _all_ that matters, but it's a big deal, and it
means that your fantasy reading of what "ABI" means is irrelevant.
Just because there's a config option to turn something off, doesn't
mean that you can then claim that you can do whatever.

So your statement is nonsensical and pointless.

Please don't spread this kind of bogus claims.

                Linus

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API.
  2019-10-23 19:21             ` Andy Lutomirski
@ 2019-10-23 21:16               ` Andrea Arcangeli
  2019-10-23 21:25                 ` Andy Lutomirski
  0 siblings, 1 reply; 44+ messages in thread
From: Andrea Arcangeli @ 2019-10-23 21:16 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Jann Horn, Daniel Colascione, Linus Torvalds, Pavel Emelyanov,
	Lokesh Gidra, Nick Kralevich, Nosh Minwalla, Tim Murray,
	Mike Rapoport, Linux API, LKML, Dr. David Alan Gilbert

On Wed, Oct 23, 2019 at 12:21:18PM -0700, Andy Lutomirski wrote:
> There are two things going on here.
> 
> 1. Daniel wants to add LSM labels to userfaultfd objects.  This seems
> reasonable to me.  The question, as I understand it, is: who is the
> subject that creates a uffd referring to a forked child?  I'm sure
> this is solvable in any number of straightforward ways, but I think
> it's less important than:

The new uffd created during fork would definitely need to be accounted
on the criu monitor, nor to the parent nor the child, so it'd need to
be accounted to the process/context that has the fd in its file
descriptors array. But since this is less important let's ignore this
for a second.

> 2. The existing ABI is busted independently of #1.  Suppose you call
> userfaultfd to get a userfaultfd and enable UFFD_FEATURE_EVENT_FORK.
> Then you do:
> 
> $ sudo <&[userfaultfd number]
> 
> Sudo will read it and get a new fd unexpectedly added to its fd table.
> It's worse if SCM_RIGHTS is involved.

So the problem is just that a new fd is created. So for this to turn
out to a practical issue, it requires finding a reckless suid that
won't even bother checking the return value of the open/socket
syscalls or some equivalent fd number related side effect. All right
that makes more sense now and of course I agree it needs fixing.

> So I think we either need to declare that UFFD_FEATURE_EVENT_FORK is
> only usable by global root or we need to remove it and maybe re-add it
> in some other form.

If I had a time machine, I'd rather prefer to do the below:

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index fe6d804a38dc..574062051678 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1958,7 +1958,7 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
 		return -ENOMEM;
 
 	refcount_set(&ctx->refcount, 1);
-	ctx->flags = flags;
+	ctx->flags = flags | UFFD_CLOEXEC;
 	ctx->features = 0;
 	ctx->state = UFFD_STATE_WAIT_API;
 	ctx->released = false;

I mean there's no strong requirement to allow any uffd to survive exec
even if UFFD_FEATURE_EVENT_FORK never existed, it's enough if it can
be passed through unix domain sockets.

Until UFFD_FEATURE_EVENT_FORK come around, there was no particular
reason to implicitly enforce O_CLOEXEC on all uffd, it was totally
possible to clone() and exec() to pass the fd to a different
process. So it never rang a bell that this would turn out to be a
problem after UFFD_FEATURE_EVENT_FORK was introduced.

There are various ways to approach this:

1) drop all non cooperative features and mark their feature bitflags
   reserved (no ABI break)

2) enforce UFFD_CLOEXEC with above patch (potential ABI break all
   userfaultfd users)

3) enforce UFFD_CLOEXEC if UFFD_FEATURE_EVENT_FORK is set (ABI break
   only if UFFD_FEATURE_EVENT_FORK is set). Note all forked uffd
   are opened with the same flags inherited from the original uffd.

4) enforce the global root permission check when creating the uffd only if
   UFFD_FEATURE_EVENT_FORK is set.

5) drop all non cooperative features from API 0xaa and introduce API
   0xab with the features back, but with UFFD_CLOEXEC implicitly
   enforced and with UFFD_CLOEXEC forbidden to be set in the flags

6) stick to API 0xaa and drop only UFFD_FEATURE_EVENT_FORK, but add a
   UFFD_FEATURE_EVENT_FORK2 that requires UFFD_CLOEXEC to be set
   (instead of implicitly enforcing it)

7) stick to API 0xaa and drop only UFFD_FEATURE_EVENT_FORK, but add a
   UFFD_FEATURE_EVENT_FORK2 that does the global root permission check 


5 is the non-ABI-break version of 2.

6 is the non-ABI-break version of 3.

7 is the non-ABI-break version of 4.

My favorite is 1) for the reason explained in the previous email.

However if postcopy live migration of bare metal containers already
runs in production anywhere or is at least very close to reach that
milestone or if the non-cooperative features are used in production in
any other way, we'd like to know where and in such case that will
totally change my mind about it. In such case I'd suggest to pick any
of the other options except 1).

In short there shall be good reason for going through further
maintenance burden.

Thanks,
Andrea


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API.
  2019-10-23 21:16               ` Andrea Arcangeli
@ 2019-10-23 21:25                 ` Andy Lutomirski
  2019-10-23 22:41                   ` Andrea Arcangeli
  0 siblings, 1 reply; 44+ messages in thread
From: Andy Lutomirski @ 2019-10-23 21:25 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andy Lutomirski, Jann Horn, Daniel Colascione, Linus Torvalds,
	Pavel Emelyanov, Lokesh Gidra, Nick Kralevich, Nosh Minwalla,
	Tim Murray, Mike Rapoport, Linux API, LKML,
	Dr. David Alan Gilbert

On Wed, Oct 23, 2019 at 2:16 PM Andrea Arcangeli <aarcange@redhat.com> wrote:
>
> On Wed, Oct 23, 2019 at 12:21:18PM -0700, Andy Lutomirski wrote:
> > There are two things going on here.
> >
> > 1. Daniel wants to add LSM labels to userfaultfd objects.  This seems
> > reasonable to me.  The question, as I understand it, is: who is the
> > subject that creates a uffd referring to a forked child?  I'm sure
> > this is solvable in any number of straightforward ways, but I think
> > it's less important than:
>
> The new uffd created during fork would definitely need to be accounted
> on the criu monitor, nor to the parent nor the child, so it'd need to
> be accounted to the process/context that has the fd in its file
> descriptors array. But since this is less important let's ignore this
> for a second.
>
> > 2. The existing ABI is busted independently of #1.  Suppose you call
> > userfaultfd to get a userfaultfd and enable UFFD_FEATURE_EVENT_FORK.
> > Then you do:
> >
> > $ sudo <&[userfaultfd number]
> >
> > Sudo will read it and get a new fd unexpectedly added to its fd table.
> > It's worse if SCM_RIGHTS is involved.
>
> So the problem is just that a new fd is created. So for this to turn
> out to a practical issue, it requires finding a reckless suid that
> won't even bother checking the return value of the open/socket
> syscalls or some equivalent fd number related side effect. All right
> that makes more sense now and of course I agree it needs fixing.

Or it requires a long-lived daemon that receives fds over SCM_RIGHTS
and reads from them.

>
> > So I think we either need to declare that UFFD_FEATURE_EVENT_FORK is
> > only usable by global root or we need to remove it and maybe re-add it
> > in some other form.
>
> If I had a time machine, I'd rather prefer to do the below:
>
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index fe6d804a38dc..574062051678 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -1958,7 +1958,7 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
>                 return -ENOMEM;
>
>         refcount_set(&ctx->refcount, 1);
> -       ctx->flags = flags;
> +       ctx->flags = flags | UFFD_CLOEXEC;

That doesn't solve the problem.  With your time machine, you should
instead use ioctl() or recvmsg().

>
> 4) enforce the global root permission check when creating the uffd only if
>    UFFD_FEATURE_EVENT_FORK is set.

This could work, but we should also add a better way to do
UFFD_FEATURE_EVENT_FORK and get CRIU to start using it.  If CRIU is
the only user, we can probably drop the old ABI after a couple of
releases, since as far as I know, CRIU users need to upgrade their
CRIU more or less in sync with the kernel so that new kernel features
get checkpointed and restored.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API.
  2019-10-23 21:25                 ` Andy Lutomirski
@ 2019-10-23 22:41                   ` Andrea Arcangeli
  2019-10-23 23:01                     ` Andy Lutomirski
  0 siblings, 1 reply; 44+ messages in thread
From: Andrea Arcangeli @ 2019-10-23 22:41 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Jann Horn, Daniel Colascione, Linus Torvalds, Pavel Emelyanov,
	Lokesh Gidra, Nick Kralevich, Nosh Minwalla, Tim Murray,
	Mike Rapoport, Linux API, LKML, Dr. David Alan Gilbert

On Wed, Oct 23, 2019 at 02:25:35PM -0700, Andy Lutomirski wrote:
> That doesn't solve the problem.  With your time machine, you should

Would you elaborate what problem remains if execve closes all uffd
so that read() cannot run post execve?

> instead use ioctl() or recvmsg().

The event delivery is modeled after eventfd.c per userfaultfd.c header
file, so would then eventfd also need to be converted to ioctl or
recvmsg to deliver its event any better? Initially I evaluated to use
eventfd for it in fact, but it wasn't possible. I didn't look like it
could get any better than eventfd in terms of event delivery.

Or do you refer to single out only the delivery of the UFFD_EVENT_FORK
event not through read()?

> > 4) enforce the global root permission check when creating the uffd only if
> >    UFFD_FEATURE_EVENT_FORK is set.
> 
> This could work, but we should also add a better way to do
> UFFD_FEATURE_EVENT_FORK and get CRIU to start using it.  If CRIU is
> the only user, we can probably drop the old ABI after a couple of
> releases, since as far as I know, CRIU users need to upgrade their
> CRIU more or less in sync with the kernel so that new kernel features
> get checkpointed and restored.

Getting CRIU stat using it shouldn't be a problem at all, but we'll be
back to square one if you just stop there.

I don't see how to lift those limitations in the wiki to make it
usable in production by just providing a better way to do
UFFD_FEATURE_EVENT_FORK.

If you're volunteering to fix the limitations and make CRIU usable in
production that would be awesome, then of course we should do whatever
possible to improve UFFD_FEATURE_EVENT_FORK.

Thanks,
Andrea


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API.
  2019-10-23 22:41                   ` Andrea Arcangeli
@ 2019-10-23 23:01                     ` Andy Lutomirski
  2019-10-23 23:27                       ` Andrea Arcangeli
  0 siblings, 1 reply; 44+ messages in thread
From: Andy Lutomirski @ 2019-10-23 23:01 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andy Lutomirski, Jann Horn, Daniel Colascione, Linus Torvalds,
	Pavel Emelyanov, Lokesh Gidra, Nick Kralevich, Nosh Minwalla,
	Tim Murray, Mike Rapoport, Linux API, LKML,
	Dr. David Alan Gilbert



> On Oct 23, 2019, at 3:41 PM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> 
> On Wed, Oct 23, 2019 at 02:25:35PM -0700, Andy Lutomirski wrote:
>> That doesn't solve the problem.  With your time machine, you should
> 
> Would you elaborate what problem remains if execve closes all uffd
> so that read() cannot run post execve?
> 

fcntl() can clear the CLOEXEC flag. And CLOEXEC has no effect on SCM_RIGHTS.

>> instead use ioctl() or recvmsg().
> 
> The event delivery is modeled after eventfd.c per userfaultfd.c header
> file, so would then eventfd also need to be converted to ioctl or
> recvmsg to deliver its event any better? Initially I evaluated to use
> eventfd for it in fact, but it wasn't possible. I didn't look like it
> could get any better than eventfd in terms of event delivery.
> 
> Or do you refer to single out only the delivery of the UFFD_EVENT_FORK
> event not through read()?

Delivering events through read() is just fine. The problem is when delivering an event does more than just returning bytes. As far as I’ve noticed, uffd’s read() just returns bytes as long as FORK is disabled.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API.
  2019-10-23 23:01                     ` Andy Lutomirski
@ 2019-10-23 23:27                       ` Andrea Arcangeli
  0 siblings, 0 replies; 44+ messages in thread
From: Andrea Arcangeli @ 2019-10-23 23:27 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, Jann Horn, Daniel Colascione, Linus Torvalds,
	Pavel Emelyanov, Lokesh Gidra, Nick Kralevich, Nosh Minwalla,
	Tim Murray, Mike Rapoport, Linux API, LKML,
	Dr. David Alan Gilbert

On Wed, Oct 23, 2019 at 04:01:53PM -0700, Andy Lutomirski wrote:
> Delivering events through read() is just fine. The problem is when
> delivering an event does more than just returning bytes. As far as
> I’ve noticed, uffd’s read() just returns bytes as long as FORK is
> disabled.

Yes, fork is the only case where read doesn't return bytes.

Moving only the fd creation to a separate syscall would then avoid
involuntary creation of the fd.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API.
  2019-10-23 20:05             ` Daniel Colascione
@ 2019-10-24  0:23               ` Andrea Arcangeli
  0 siblings, 0 replies; 44+ messages in thread
From: Andrea Arcangeli @ 2019-10-24  0:23 UTC (permalink / raw)
  To: Daniel Colascione
  Cc: Andy Lutomirski, Jann Horn, Linus Torvalds, Pavel Emelyanov,
	Lokesh Gidra, Nick Kralevich, Nosh Minwalla, Tim Murray,
	Mike Rapoport, Linux API, LKML

On Wed, Oct 23, 2019 at 01:05:47PM -0700, Daniel Colascione wrote:
> This is a debate that won't get resolved here. A ton of work has gone
> into namespaces, migration, various cgroup things, and so on, and I
> don't see that work getting torn out.

This is precisely why I thought it was a good idea to support the
non-cooperative use case too even though we had no immediate use for
it.

> Sure they can. Can't we stick processes in a memcg and set a
> memory.high threshold beyond which threads in that cgroup will enter
> direct reclaim on page allocations? I'd call that throttling.

The uffd-wp solution during the throttling can resolve a wrprotect
fault in the parent for every 4k page that has been written to disk
and it'll prioritize writing to disk those userfaults that are
currently blocked. I don't see how you could reach an equivalent
optimal runtime without uffd-wp and just with memcg because the
snapshot process won't have a clue which pages are been duped by the
COWs. The uffd-wp by avoding fork will also avoid more expensive MM
switches during the snapshot.

> This issue *has* to get fixed one way or another.

Sure.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API.
  2019-10-23 19:09           ` Andrea Arcangeli
                               ` (2 preceding siblings ...)
  2019-10-23 20:15             ` Linus Torvalds
@ 2019-10-24  9:02             ` Mike Rapoport
  2019-10-24 15:10               ` Andrea Arcangeli
  3 siblings, 1 reply; 44+ messages in thread
From: Mike Rapoport @ 2019-10-24  9:02 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andy Lutomirski, Jann Horn, Daniel Colascione, Linus Torvalds,
	Pavel Emelyanov, Lokesh Gidra, Nick Kralevich, Nosh Minwalla,
	Tim Murray, Mike Rapoport, Linux API, LKML

On Wed, Oct 23, 2019 at 03:09:59PM -0400, Andrea Arcangeli wrote:
> Hello,
> 
> On Sat, Oct 12, 2019 at 06:14:23PM -0700, Andy Lutomirski wrote:
> > [adding more people because this is going to be an ABI break, sigh]
> 
> That wouldn't break the ABI, no more than when if you boot a kernel
> built with CONFIG_USERFAULTFD=n.
> 
> All non-cooperative features can be removed any time in a backwards
> compatible way, the only precaution is to mark their feature bits as
> reserved so they can't be reused for something else later.
> 
> > least severely restricted.  A .read implementation MUST NOT ACT ON THE
> > CALLING TASK.  Ever.  Just imagine the effect of passing a userfaultfd
> > as stdin to a setuid program.
> 
> With UFFD_EVENT_FORK, the newly created uffd that controls the child,
> is not passed to the parent nor to the child. Instead it's passed to
> the CRIU monitor only, which has to be already running as root and is
> fully trusted and acts a hypervisor (despite there is no hypervisor).
> 
> By the time execve runs and any suid bit in the execve'd inode becomes
> relevant, well before the new userland executable code can run, the
> kernel throws away the "old_mm" controlled by any uffd and all
> attached uffds are released as well.
> 
> All I found is your "A .read implementation MUST NOT ACT ON THE
> CALLING TASK" as an explanation that something is broken but I need
> further clarification.
> 
> Of course I can see you can always open a uffd and pass it to any task
> you are going to execve on, but that simply means the suid program
> will be able to control you, not the other way around. If you don't
> want to be controlled by the next task, no matter if suid or not, just
> don't that. What I don't see is how you're going to control the suid
> binary from the outside, the suid binary at most will block in the
> poll, read and write syscalls and get garbage or write some garbage
> and get an error, it won't get signals and it cannot block in any page
> fault either, it's not immediately clear what's out of ordinary.
> 
> On Mon, Oct 14, 2019 at 06:04:22PM +0200, Jann Horn wrote:
> > FWIW, <https://codesearch.debian.net/search?q=UFFD_FEATURE_EVENT_FORK&literal=1>
> > just shows the kernel, kernel selftests, and strace code for decoding
> > syscall arguments. CRIU uses it though (probably for postcopy live
> > migration / lazy migration?), I guess that code isn't in debian for
> > some reason.
> 
> https://criu.org/Userfaultfd#Limitations

That's no the reason that UFFD_FEATURE_EVENT_FORK does not show up in
Debian code search, CRIU simply is not there. Debian packages CRIU only in
experimental and I believe that's not indexed by the code search.

As for the limitations, the races were fixed, I just forgot to update the
wiki. As for the supported memory types and COW pages, these only affect
efficiency of post-copy, but not the correctness.
 
> The CRIU developers did a truly amazing job by making container post
> copy live migration work great for a subset of apps, that alone was an
> amazing achievement. Is that achievement enough to use post copy live
> migration of bare metal containers in production? Unfortunately
> probably not and not just in debian.
 
I don't know if anybody is using post-copy migration of containers in
production, but I don't think that the reason for that would be technical.
IMHO it's more about prevailing perception that there is no need to migrate
containers at all, not only with post-copy, and, as the result, slow rate
of adoption of container migration in general.

> If you're wrong and UFFDIO_EVENT_FORK isn't currently buggy and in
> turn it isn't causing further maintenance burden, there is no hurry of
> removing them, but in the long term, if none of the non-cooperative
> features find its way in production (like it was reasonable to expect
> initially), they must be removed from the kernel anyway, not just
> UFFD_EVEN_FORK but all non-cooperative features associated with it.

... 
 
> On my side, instead of trying to fix whatever issue in
> UFFD_EVENT_FORK, I'd prefer to spend my time reviewing the uffd-wp
> feature from Peter and the page fault enhancement patchset that Peter
> and Linus were discussing. uffd-wp has the potential to drop fork()
> from all apps calling fork() only to do an atomic snapshot of their
> memory. Replacing fork() also means the uffd manager thread can decide
> how much memory to reserve to the snapshot and it can start throttling
> waiting for I/O completion if the threshold is exceeded, while fork
> COWs cannot throttle and all apps using fork() risk to hit on x2
> memory usage which can become oom-killer material if the memory size
> of the process is huge. The side benefit is also that the way
> userfaultfd works the fault granularity is entirely in control of
> userland (because it's always userland that resolves the fault), it
> could decide to use 8k or 16k even if that doesn't match the hardware
> page size. That will allow to keep THP on without risking to hit on 2M
> cows during the snapshot. Being able to keep THP enabled in nosql db
> without hitting on slow 2M COW copies during snapshot, should allow a
> further overall performance improvement when the snapshot is not
> running than what it is possible today. In a completely different use
> case, uffd-wp will also avoid JITs to set a dirty bit every time they
> modify any data in memory. It should also be possible to provide the
> same soft-dirty information in O(1) instead of O(N).

If I remember correctly, there was an intention to deprecate soft-dirty in
favor of uffd-wp, which brings us back to the necessity to have
non-cooperative uffd because otherwise even pre-copy in CRIU will be broken
and that *is* used in production.
 
> Thanks,
> Andrea
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API.
  2019-10-24  9:02             ` Mike Rapoport
@ 2019-10-24 15:10               ` Andrea Arcangeli
  2019-10-25 20:12                 ` Mike Rapoport
  0 siblings, 1 reply; 44+ messages in thread
From: Andrea Arcangeli @ 2019-10-24 15:10 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: Andy Lutomirski, Jann Horn, Daniel Colascione, Linus Torvalds,
	Pavel Emelyanov, Lokesh Gidra, Nick Kralevich, Nosh Minwalla,
	Tim Murray, Mike Rapoport, Linux API, LKML

Hello,

On Thu, Oct 24, 2019 at 12:02:59PM +0300, Mike Rapoport wrote:
> That's no the reason that UFFD_FEATURE_EVENT_FORK does not show up in
> Debian code search, CRIU simply is not there. Debian packages CRIU only in
> experimental and I believe that's not indexed by the code search.
> 
> As for the limitations, the races were fixed, I just forgot to update the
> wiki. As for the supported memory types and COW pages, these only affect
> efficiency of post-copy, but not the correctness.

That's what I was hoping for. If the wiki information is stale and
there are no races it is totally plausible that it's being actively
used in production so we need to fix the kernel bug. I was just
checking because I wasn't sure anymore of the status after I read the
wiki.

If the CRIU initialization code that issues the uffd syscall runs as
global root the ABI breaking permission check from Andy sounds the
simplest for a short term fix, because it will be unnoticed by any
production usage with CIRU --lazy-pages.

Then later we could add a UFFD_FEATURE_EVENT_FORK2 that will not
require root permission.

Thanks,
Andrea


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API.
  2019-10-24 15:10               ` Andrea Arcangeli
@ 2019-10-25 20:12                 ` Mike Rapoport
  0 siblings, 0 replies; 44+ messages in thread
From: Mike Rapoport @ 2019-10-25 20:12 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andy Lutomirski, Jann Horn, Daniel Colascione, Linus Torvalds,
	Pavel Emelyanov, Lokesh Gidra, Nick Kralevich, Nosh Minwalla,
	Tim Murray, Mike Rapoport, Linux API, LKML

Hi,

On Thu, Oct 24, 2019 at 11:10:54AM -0400, Andrea Arcangeli wrote:
> Hello,
> 
> On Thu, Oct 24, 2019 at 12:02:59PM +0300, Mike Rapoport wrote:
> > That's no the reason that UFFD_FEATURE_EVENT_FORK does not show up in
> > Debian code search, CRIU simply is not there. Debian packages CRIU only in
> > experimental and I believe that's not indexed by the code search.
> > 
> > As for the limitations, the races were fixed, I just forgot to update the
> > wiki. As for the supported memory types and COW pages, these only affect
> > efficiency of post-copy, but not the correctness.
> 
> That's what I was hoping for. If the wiki information is stale and
> there are no races it is totally plausible that it's being actively
> used in production so we need to fix the kernel bug. I was just
> checking because I wasn't sure anymore of the status after I read the
> wiki.
> 
> If the CRIU initialization code that issues the uffd syscall runs as
> global root the ABI breaking permission check from Andy sounds the
> simplest for a short term fix, because it will be unnoticed by any
> production usage with CIRU --lazy-pages.
 
In general, criu can run as non-root, but such use of criu has limitations,
so allowing criu --lazy-pages only for root sounds reasonable as a short
term solution.

> Then later we could add a UFFD_FEATURE_EVENT_FORK2 that will not
> require root permission.

Agree.

> Thanks,
> Andrea
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/7] Harden userfaultfd
  2019-10-12 19:15 [PATCH 0/7] Harden userfaultfd Daniel Colascione
                   ` (7 preceding siblings ...)
  2019-10-16  0:02 ` [PATCH 0/7] Harden userfaultfd James Morris
@ 2019-11-15 15:09 ` Stephen Smalley
  8 siblings, 0 replies; 44+ messages in thread
From: Stephen Smalley @ 2019-11-15 15:09 UTC (permalink / raw)
  To: Daniel Colascione, linux-api, linux-kernel, lokeshgidra, nnk
  Cc: nosh, timmurray

On 10/12/19 3:15 PM, Daniel Colascione wrote:
> Userfaultfd in unprivileged contexts could be potentially very
> useful. We'd like to harden userfaultfd to make such unprivileged use
> less risky. This patch series allows SELinux to manage userfaultfd
> file descriptors (via a new flag, for compatibility with existing
> code) and allows administrators to limit userfaultfd to servicing
> user-mode faults, increasing the difficulty of using userfaultfd in
> exploit chains invoking delaying kernel faults.
> 
> A new anon_inodes interface allows callers to opt into SELinux
> management of anonymous file objects. In this mode, anon_inodes
> creates new ephemeral inodes for anonymous file objects instead of
> reusing a singleton dummy inode. A new LSM hook gives security modules
> an opportunity to configure and veto these ephemeral inodes.
> 
> Existing anon_inodes users must opt into the new functionality.
> 
> Daniel Colascione (7):
>    Add a new flags-accepting interface for anonymous inodes
>    Add a concept of a "secure" anonymous file
>    Add a UFFD_SECURE flag to the userfaultfd API.
>    Teach SELinux about a new userfaultfd class
>    Let userfaultfd opt out of handling kernel-mode faults
>    Allow users to require UFFD_SECURE
>    Add a new sysctl for limiting userfaultfd to user mode faults
> 
>   Documentation/admin-guide/sysctl/vm.rst | 19 +++++-
>   fs/anon_inodes.c                        | 89 +++++++++++++++++--------
>   fs/userfaultfd.c                        | 47 +++++++++++--
>   include/linux/anon_inodes.h             | 27 ++++++--
>   include/linux/lsm_hooks.h               |  8 +++
>   include/linux/security.h                |  2 +
>   include/linux/userfaultfd_k.h           |  3 +
>   include/uapi/linux/userfaultfd.h        | 14 ++++
>   kernel/sysctl.c                         |  9 +++
>   security/security.c                     |  8 +++
>   security/selinux/hooks.c                | 68 +++++++++++++++++++
>   security/selinux/include/classmap.h     |  2 +
>   12 files changed, 256 insertions(+), 40 deletions(-)

Please, in the future, cc selinux@vger.kernel.org for patches that 
modify SELinux.



^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, back to index

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-12 19:15 [PATCH 0/7] Harden userfaultfd Daniel Colascione
2019-10-12 19:15 ` [PATCH 1/7] Add a new flags-accepting interface for anonymous inodes Daniel Colascione
2019-10-14  4:26   ` kbuild test robot
2019-10-14 15:38   ` Jann Horn
2019-10-14 18:15     ` Daniel Colascione
2019-10-14 18:30       ` Jann Horn
2019-10-15  8:08   ` Christoph Hellwig
2019-10-12 19:15 ` [PATCH 2/7] Add a concept of a "secure" anonymous file Daniel Colascione
2019-10-14  3:01   ` kbuild test robot
2019-10-15  8:08   ` Christoph Hellwig
2019-10-12 19:15 ` [PATCH 3/7] Add a UFFD_SECURE flag to the userfaultfd API Daniel Colascione
2019-10-12 23:10   ` Andy Lutomirski
2019-10-13  0:51     ` Daniel Colascione
2019-10-13  1:14       ` Andy Lutomirski
2019-10-13  1:38         ` Daniel Colascione
2019-10-14 16:04         ` Jann Horn
2019-10-23 19:09           ` Andrea Arcangeli
2019-10-23 19:21             ` Andy Lutomirski
2019-10-23 21:16               ` Andrea Arcangeli
2019-10-23 21:25                 ` Andy Lutomirski
2019-10-23 22:41                   ` Andrea Arcangeli
2019-10-23 23:01                     ` Andy Lutomirski
2019-10-23 23:27                       ` Andrea Arcangeli
2019-10-23 20:05             ` Daniel Colascione
2019-10-24  0:23               ` Andrea Arcangeli
2019-10-23 20:15             ` Linus Torvalds
2019-10-24  9:02             ` Mike Rapoport
2019-10-24 15:10               ` Andrea Arcangeli
2019-10-25 20:12                 ` Mike Rapoport
2019-10-22 21:27         ` Daniel Colascione
2019-10-23  4:11         ` Andy Lutomirski
2019-10-23  7:29           ` Cyrill Gorcunov
2019-10-23 12:43             ` Mike Rapoport
2019-10-23 17:13               ` Andy Lutomirski
2019-10-12 19:15 ` [PATCH 4/7] Teach SELinux about a new userfaultfd class Daniel Colascione
2019-10-12 23:08   ` Andy Lutomirski
2019-10-13  0:11     ` Daniel Colascione
2019-10-13  0:46       ` Andy Lutomirski
2019-10-12 19:16 ` [PATCH 5/7] Let userfaultfd opt out of handling kernel-mode faults Daniel Colascione
2019-10-12 19:16 ` [PATCH 6/7] Allow users to require UFFD_SECURE Daniel Colascione
2019-10-12 23:12   ` Andy Lutomirski
2019-10-12 19:16 ` [PATCH 7/7] Add a new sysctl for limiting userfaultfd to user mode faults Daniel Colascione
2019-10-16  0:02 ` [PATCH 0/7] Harden userfaultfd James Morris
2019-11-15 15:09 ` Stephen Smalley

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git