[RFC][PATCH 0/2] keys: request_key() interception in containers

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC][PATCH 0/2] keys: request_key() interception in containers
@ 2021-02-04 17:47 David Howells
  2021-02-04 17:47 ` [PATCH 1/2] Add namespace tags that can be used for matching without pinning a ns David Howells
  2021-02-04 17:47 ` [PATCH 2/2] keys: Allow request_key upcalls from a container to be intercepted David Howells
  0 siblings, 2 replies; 9+ messages in thread
From: David Howells @ 2021-02-04 17:47 UTC (permalink / raw)
  To: sprabhu
  Cc: dhowells, Jarkko Sakkinen, christian, selinux, keyrings,
	linux-api, linux-fsdevel, linux-security-module, linux-kernel,
	containers


Here's a rough draft of a facility by which keys can be intercepted.

There are two patches:

 (1) Add tags to namespaces that can be used to find out, when we're
     looking for an intercept, if a namespace that an intercept is
     filtering on is the same as namespace of the caller of request_key()
     without the need for the intercept record to pin the namespaces that
     it's using as filters (which would also cause a dependency cycle).

     Tags contain only a refcount and are compared by address.

 (2) Add a new keyctl:

            keyctl(KEYCTL_SERVICE_INTERCEPT,
                   int queue_keyring, int userns_fd,
                   const char *type_name, unsigned int ns_mask);

     that allows a request_key() intercept to be added to the specified
     user namespace.  The authorisation key for an intercepted request is
     placed in the queue_keyring, which can be watched to gain a
     notification of this happening.  The watcher can then examine the auth
     key to determine what key is to be instantiated.

     A simple sample is provided that can be used to try this.

Some things that need to be worked out:

 (*) Intercepts are linked to the lifetime of the user_namespace on which
     they're placed, but not the daemon or the queue keyring.  Probably
     they should be removed when the queue keyring is removed, but they
     currently pin it.

 (*) Setting userns_fd to other than -1 is not yet supported (-1 indicates
     the current user namespace).

 (*) Multiple threads can monitor a queue keyring, but they will all get
     woken.  They can use keyctl_move() to decide who gets to process it.


The patches can be found on the following branch:

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=keys-intercept

David
---
David Howells (2):
      Add namespace tags that can be used for matching without pinning a ns
      keys: Allow request_key upcalls from a container to be intercepted


 include/linux/key-type.h                |   4 +-
 include/linux/user_namespace.h          |   2 +
 include/uapi/linux/keyctl.h             |  13 +
 kernel/user.c                           |   3 +
 kernel/user_namespace.c                 |   2 +
 samples/watch_queue/Makefile            |   2 +
 samples/watch_queue/key_req_intercept.c | 271 +++++++++++++++++++
 security/keys/Makefile                  |   2 +
 security/keys/compat.c                  |   3 +
 security/keys/internal.h                |   5 +
 security/keys/keyctl.c                  |   6 +
 security/keys/keyring.c                 |   1 +
 security/keys/process_keys.c            |   2 +-
 security/keys/request_key.c             |  16 +-
 security/keys/request_key_auth.c        |   3 +
 security/keys/service.c                 | 337 ++++++++++++++++++++++++
 16 files changed, 663 insertions(+), 9 deletions(-)
 create mode 100644 samples/watch_queue/key_req_intercept.c
 create mode 100644 security/keys/service.c



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] Add namespace tags that can be used for matching without pinning a ns
  2021-02-04 17:47 [RFC][PATCH 0/2] keys: request_key() interception in containers David Howells
@ 2021-02-04 17:47 ` David Howells
  2021-02-04 20:14   ` kernel test robot
                     ` (3 more replies)
  2021-02-04 17:47 ` [PATCH 2/2] keys: Allow request_key upcalls from a container to be intercepted David Howells
  1 sibling, 4 replies; 9+ messages in thread
From: David Howells @ 2021-02-04 17:47 UTC (permalink / raw)
  To: sprabhu
  Cc: dhowells, Jarkko Sakkinen, christian, selinux, keyrings,
	linux-api, linux-fsdevel, linux-security-module, linux-kernel,
	containers

Add a ns tag struct that consists of just a refcount.  It's address can be
used to compare namespaces without the need to pin a namespace.  Just the
tag needs pinning.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 fs/namespace.c            |   18 ++++++++----------
 include/linux/ns_common.h |   23 +++++++++++++++++++++++
 include/linux/proc_ns.h   |   38 +++++++++++++++++++++++++++++++++++---
 init/version.c            |    9 ++++++++-
 ipc/msgutil.c             |    7 ++++++-
 ipc/namespace.c           |    8 +++-----
 kernel/cgroup/cgroup.c    |    5 +++++
 kernel/cgroup/namespace.c |    6 +++---
 kernel/pid.c              |    5 +++++
 kernel/pid_namespace.c    |   18 +++++++++---------
 kernel/time/namespace.c   |   13 +++++--------
 kernel/user.c             |    5 +++++
 kernel/user_namespace.c   |    7 +++----
 kernel/utsname.c          |   24 +++++++++++++-----------
 net/core/net_namespace.c  |   38 +++++++++++++++-----------------------
 15 files changed, 146 insertions(+), 78 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 9d33909d0f9e..f8da9be8c6f7 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -3238,10 +3238,9 @@ static void dec_mnt_namespaces(struct ucounts *ucounts)
 
 static void free_mnt_ns(struct mnt_namespace *ns)
 {
-	if (!is_anon_ns(ns))
-		ns_free_inum(&ns->ns);
 	dec_mnt_namespaces(ns->ucounts);
 	put_user_ns(ns->user_ns);
+	destroy_ns_common(&ns->ns);
 	kfree(ns);
 }
 
@@ -3269,18 +3268,17 @@ static struct mnt_namespace *alloc_mnt_ns(struct user_namespace *user_ns, bool a
 		dec_mnt_namespaces(ucounts);
 		return ERR_PTR(-ENOMEM);
 	}
-	if (!anon) {
-		ret = ns_alloc_inum(&new_ns->ns);
-		if (ret) {
-			kfree(new_ns);
-			dec_mnt_namespaces(ucounts);
-			return ERR_PTR(ret);
-		}
+
+	ret = init_ns_common(&new_ns->ns, anon);
+	if (ret) {
+		destroy_ns_common(&new_ns->ns);
+		kfree(new_ns);
+		dec_mnt_namespaces(ucounts);
+		return ERR_PTR(ret);
 	}
 	new_ns->ns.ops = &mntns_operations;
 	if (!anon)
 		new_ns->seq = atomic64_add_return(1, &mnt_ns_seq);
-	refcount_set(&new_ns->ns.count, 1);
 	INIT_LIST_HEAD(&new_ns->list);
 	init_waitqueue_head(&new_ns->poll);
 	spin_lock_init(&new_ns->ns_lock);
diff --git a/include/linux/ns_common.h b/include/linux/ns_common.h
index 0f1d024bd958..45174ad8a435 100644
--- a/include/linux/ns_common.h
+++ b/include/linux/ns_common.h
@@ -3,14 +3,37 @@
 #define _LINUX_NS_COMMON_H
 
 #include <linux/refcount.h>
+#include <linux/slab.h>
 
 struct proc_ns_operations;
 
+/*
+ * Comparable tag for namespaces so that namespaces don't have to be pinned by
+ * something that wishes to detect if a namespace matches a criterion.
+ */
+struct ns_tag {
+	refcount_t	usage;
+};
+
 struct ns_common {
 	atomic_long_t stashed;
 	const struct proc_ns_operations *ops;
+	struct ns_tag *tag;
 	unsigned int inum;
 	refcount_t count;
 };
 
+static inline struct ns_tag *get_ns_tag(struct ns_tag *tag)
+{
+	if (tag)
+		refcount_inc(&tag->usage);
+	return tag;
+}
+
+static inline void put_ns_tag(struct ns_tag *tag)
+{
+	if (tag && refcount_dec_and_test(&tag->usage))
+		kfree(tag);
+}
+
 #endif
diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h
index 75807ecef880..9fb7eb403923 100644
--- a/include/linux/proc_ns.h
+++ b/include/linux/proc_ns.h
@@ -64,13 +64,45 @@ static inline void proc_free_inum(unsigned int inum) {}
 
 #endif /* CONFIG_PROC_FS */
 
-static inline int ns_alloc_inum(struct ns_common *ns)
+/**
+ * init_ns_common - Initialise the common part of a namespace
+ * @ns: The namespace to initialise
+ * @anon: The namespace will be anonymous
+ *
+ * Set up the common part of a namespace, assigning an inode number and
+ * creating a tag.  Returns 0 on success and a negative error code on failure.
+ * On failure, the caller must call destroy_ns_common().
+ */
+static inline int init_ns_common(struct ns_common *ns, bool anon)
 {
+	struct ns_tag *tag;
+
+	tag = kzalloc(sizeof(*tag), GFP_KERNEL);
+	if (!tag)
+		return -ENOMEM;
+
+	refcount_set(&tag->usage, 1);
+	ns->tag = tag;
+	ns->inum = 0;
 	atomic_long_set(&ns->stashed, 0);
-	return proc_alloc_inum(&ns->inum);
+	refcount_set(&ns->count, 1);
+
+	return anon ? 0 : proc_alloc_inum(&ns->inum);
 }
 
-#define ns_free_inum(ns) proc_free_inum((ns)->inum)
+/**
+ * destroy_ns_common - Clean up the common part of a namespace
+ * @ns: The namespace to clean up
+ */
+static inline void destroy_ns_common(struct ns_common *ns)
+{
+	put_ns_tag(ns->tag);
+	ns->tag = NULL;
+	if (ns->inum) {
+		proc_free_inum(ns->inum);
+		ns->inum = 0;
+	}
+}
 
 extern struct file *proc_ns_fget(int fd);
 #define get_proc_ns(inode) ((struct ns_common *)(inode)->i_private)
diff --git a/init/version.c b/init/version.c
index 80d2b7566b39..3c867b6c4aa4 100644
--- a/init/version.c
+++ b/init/version.c
@@ -24,8 +24,15 @@ extern int version_string(LINUX_VERSION_CODE);
 int version_string(LINUX_VERSION_CODE);
 #endif
 
+static struct ns_tag init_uts_ns_tag = {
+	.usage		= REFCOUNT_INIT(1),
+};
+
 struct uts_namespace init_uts_ns = {
-	.ns.count = REFCOUNT_INIT(2),
+	.ns = {
+		.count		= REFCOUNT_INIT(2),
+		.tag		= &init_uts_ns_tag,
+	},
 	.name = {
 		.sysname	= UTS_SYSNAME,
 		.nodename	= UTS_NODENAME,
diff --git a/ipc/msgutil.c b/ipc/msgutil.c
index d0a0e877cadd..62bf194c38c6 100644
--- a/ipc/msgutil.c
+++ b/ipc/msgutil.c
@@ -20,13 +20,18 @@
 
 DEFINE_SPINLOCK(mq_lock);
 
+static struct ns_tag init_ipc_ns_tag = {
+	.usage		= REFCOUNT_INIT(1),
+};
+
 /*
  * The next 2 defines are here bc this is the only file
  * compiled when either CONFIG_SYSVIPC and CONFIG_POSIX_MQUEUE
  * and not CONFIG_IPC_NS.
  */
 struct ipc_namespace init_ipc_ns = {
-	.ns.count = REFCOUNT_INIT(1),
+	.ns.tag	= &init_ipc_ns_tag,
+	.ns.count = REFCOUNT_INIT(2),
 	.user_ns = &init_user_ns,
 	.ns.inum = PROC_IPC_INIT_INO,
 #ifdef CONFIG_IPC_NS
diff --git a/ipc/namespace.c b/ipc/namespace.c
index 7bd0766ddc3b..06c0829ab866 100644
--- a/ipc/namespace.c
+++ b/ipc/namespace.c
@@ -46,12 +46,10 @@ static struct ipc_namespace *create_ipc_ns(struct user_namespace *user_ns,
 	if (ns == NULL)
 		goto fail_dec;
 
-	err = ns_alloc_inum(&ns->ns);
+	err = init_ns_common(&ns->ns, false);
 	if (err)
 		goto fail_free;
 	ns->ns.ops = &ipcns_operations;
-
-	refcount_set(&ns->ns.count, 1);
 	ns->user_ns = get_user_ns(user_ns);
 	ns->ucounts = ucounts;
 
@@ -67,8 +65,8 @@ static struct ipc_namespace *create_ipc_ns(struct user_namespace *user_ns,
 
 fail_put:
 	put_user_ns(ns->user_ns);
-	ns_free_inum(&ns->ns);
 fail_free:
+	destroy_ns_common(&ns->ns);
 	kfree(ns);
 fail_dec:
 	dec_ipc_namespaces(ucounts);
@@ -127,7 +125,7 @@ static void free_ipc_ns(struct ipc_namespace *ns)
 
 	dec_ipc_namespaces(ns->ucounts);
 	put_user_ns(ns->user_ns);
-	ns_free_inum(&ns->ns);
+	destroy_ns_common(&ns->ns);
 	kfree(ns);
 }
 
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 613845769103..fb397fa2386f 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -197,8 +197,13 @@ static u16 have_exit_callback __read_mostly;
 static u16 have_release_callback __read_mostly;
 static u16 have_canfork_callback __read_mostly;
 
+static struct ns_tag init_cgroup_ns_tag = {
+	.usage		= REFCOUNT_INIT(1),
+};
+
 /* cgroup namespace for init task */
 struct cgroup_namespace init_cgroup_ns = {
+	.ns.tag		= &init_cgroup_ns_tag,
 	.ns.count	= REFCOUNT_INIT(2),
 	.user_ns	= &init_user_ns,
 	.ns.ops		= &cgroupns_operations,
diff --git a/kernel/cgroup/namespace.c b/kernel/cgroup/namespace.c
index f5e8828c109c..7c8c0ccd1feb 100644
--- a/kernel/cgroup/namespace.c
+++ b/kernel/cgroup/namespace.c
@@ -27,12 +27,12 @@ static struct cgroup_namespace *alloc_cgroup_ns(void)
 	new_ns = kzalloc(sizeof(struct cgroup_namespace), GFP_KERNEL);
 	if (!new_ns)
 		return ERR_PTR(-ENOMEM);
-	ret = ns_alloc_inum(&new_ns->ns);
+	ret = init_ns_common(&new_ns->ns, false);
 	if (ret) {
+		destroy_ns_common(&new_ns->ns);
 		kfree(new_ns);
 		return ERR_PTR(ret);
 	}
-	refcount_set(&new_ns->ns.count, 1);
 	new_ns->ns.ops = &cgroupns_operations;
 	return new_ns;
 }
@@ -42,7 +42,7 @@ void free_cgroup_ns(struct cgroup_namespace *ns)
 	put_css_set(ns->root_cset);
 	dec_cgroup_namespaces(ns->ucounts);
 	put_user_ns(ns->user_ns);
-	ns_free_inum(&ns->ns);
+	destroy_ns_common(&ns->ns);
 	kfree(ns);
 }
 EXPORT_SYMBOL(free_cgroup_ns);
diff --git a/kernel/pid.c b/kernel/pid.c
index ebdf9c60cd0b..65015c5b26db 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -66,6 +66,10 @@ int pid_max = PID_MAX_DEFAULT;
 int pid_max_min = RESERVED_PIDS + 1;
 int pid_max_max = PID_MAX_LIMIT;
 
+static struct ns_tag init_pid_ns_tag = {
+	.usage		= REFCOUNT_INIT(1),
+};
+
 /*
  * PID-map pages start out as NULL, they get allocated upon
  * first use and are never deallocated. This way a low pid_max
@@ -73,6 +77,7 @@ int pid_max_max = PID_MAX_LIMIT;
  * the scheme scales to up to 4 million PIDs, runtime.
  */
 struct pid_namespace init_pid_ns = {
+	.ns.tag = &init_pid_ns_tag,
 	.ns.count = REFCOUNT_INIT(2),
 	.idr = IDR_INIT(init_pid_ns.idr),
 	.pid_allocated = PIDNS_ADDING,
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index ca43239a255a..a562071e52e1 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -93,16 +93,15 @@ static struct pid_namespace *create_pid_namespace(struct user_namespace *user_ns
 
 	idr_init(&ns->idr);
 
-	ns->pid_cachep = create_pid_cachep(level);
-	if (ns->pid_cachep == NULL)
-		goto out_free_idr;
-
-	err = ns_alloc_inum(&ns->ns);
+	err = init_ns_common(&ns->ns, false);
 	if (err)
-		goto out_free_idr;
+		goto out_free;
 	ns->ns.ops = &pidns_operations;
 
-	refcount_set(&ns->ns.count, 1);
+	ns->pid_cachep = create_pid_cachep(level);
+	if (ns->pid_cachep == NULL)
+		goto out_free;
+
 	ns->level = level;
 	ns->parent = get_pid_ns(parent_pid_ns);
 	ns->user_ns = get_user_ns(user_ns);
@@ -111,8 +110,9 @@ static struct pid_namespace *create_pid_namespace(struct user_namespace *user_ns
 
 	return ns;
 
-out_free_idr:
+out_free:
 	idr_destroy(&ns->idr);
+	destroy_ns_common(&ns->ns);
 	kmem_cache_free(pid_ns_cachep, ns);
 out_dec:
 	dec_pid_namespaces(ucounts);
@@ -132,7 +132,7 @@ static void delayed_free_pidns(struct rcu_head *p)
 
 static void destroy_pid_namespace(struct pid_namespace *ns)
 {
-	ns_free_inum(&ns->ns);
+	destroy_ns_common(&ns->ns);
 
 	idr_destroy(&ns->idr);
 	call_rcu(&ns->rcu, delayed_free_pidns);
diff --git a/kernel/time/namespace.c b/kernel/time/namespace.c
index 6ca625f5e554..5c5847048900 100644
--- a/kernel/time/namespace.c
+++ b/kernel/time/namespace.c
@@ -92,16 +92,14 @@ static struct time_namespace *clone_time_ns(struct user_namespace *user_ns,
 	if (!ns)
 		goto fail_dec;
 
-	refcount_set(&ns->ns.count, 1);
+	err = init_ns_common(&ns->ns, false);
+	if (err)
+		goto fail_free;
 
 	ns->vvar_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
 	if (!ns->vvar_page)
 		goto fail_free;
 
-	err = ns_alloc_inum(&ns->ns);
-	if (err)
-		goto fail_free_page;
-
 	ns->ucounts = ucounts;
 	ns->ns.ops = &timens_operations;
 	ns->user_ns = get_user_ns(user_ns);
@@ -109,9 +107,8 @@ static struct time_namespace *clone_time_ns(struct user_namespace *user_ns,
 	ns->frozen_offsets = false;
 	return ns;
 
-fail_free_page:
-	__free_page(ns->vvar_page);
 fail_free:
+	destroy_ns_common(&ns->ns);
 	kfree(ns);
 fail_dec:
 	dec_time_namespaces(ucounts);
@@ -230,7 +227,7 @@ void free_time_ns(struct time_namespace *ns)
 {
 	dec_time_namespaces(ns->ucounts);
 	put_user_ns(ns->user_ns);
-	ns_free_inum(&ns->ns);
+	destroy_ns_common(&ns->ns);
 	__free_page(ns->vvar_page);
 	kfree(ns);
 }
diff --git a/kernel/user.c b/kernel/user.c
index a2478cddf536..78ee75f4cd21 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -20,6 +20,10 @@
 #include <linux/user_namespace.h>
 #include <linux/proc_ns.h>
 
+static struct ns_tag init_user_ns_tag = {
+	.usage	= REFCOUNT_INIT(1),
+};
+
 /*
  * userns count is 1 for root user, 1 for init_uts_ns,
  * and 1 for... ?
@@ -55,6 +59,7 @@ struct user_namespace init_user_ns = {
 			},
 		},
 	},
+	.ns.tag	= &init_user_ns_tag,
 	.ns.count = REFCOUNT_INIT(3),
 	.owner = GLOBAL_ROOT_UID,
 	.group = GLOBAL_ROOT_GID,
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index af612945a4d0..f60cf7b5973c 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -106,12 +106,11 @@ int create_user_ns(struct cred *new)
 	if (!ns)
 		goto fail_dec;
 
-	ret = ns_alloc_inum(&ns->ns);
+	ret = init_ns_common(&ns->ns, false);
 	if (ret)
 		goto fail_free;
 	ns->ns.ops = &userns_operations;
 
-	refcount_set(&ns->ns.count, 1);
 	/* Leave the new->user_ns reference with the new user namespace. */
 	ns->parent = parent_ns;
 	ns->level = parent_ns->level + 1;
@@ -142,8 +141,8 @@ int create_user_ns(struct cred *new)
 #ifdef CONFIG_PERSISTENT_KEYRINGS
 	key_put(ns->persistent_keyring_register);
 #endif
-	ns_free_inum(&ns->ns);
 fail_free:
+	destroy_ns_common(&ns->ns);
 	kmem_cache_free(user_ns_cachep, ns);
 fail_dec:
 	dec_user_namespaces(ucounts);
@@ -193,7 +192,7 @@ static void free_user_ns(struct work_struct *work)
 		}
 		retire_userns_sysctls(ns);
 		key_free_user_ns(ns);
-		ns_free_inum(&ns->ns);
+		destroy_ns_common(&ns->ns);
 		kmem_cache_free(user_ns_cachep, ns);
 		dec_user_namespaces(ucounts);
 		ns = parent;
diff --git a/kernel/utsname.c b/kernel/utsname.c
index b1ac3ca870f2..4755f007199f 100644
--- a/kernel/utsname.c
+++ b/kernel/utsname.c
@@ -30,10 +30,17 @@ static void dec_uts_namespaces(struct ucounts *ucounts)
 static struct uts_namespace *create_uts_ns(void)
 {
 	struct uts_namespace *uts_ns;
+	int err;
 
 	uts_ns = kmem_cache_alloc(uts_ns_cache, GFP_KERNEL);
-	if (uts_ns)
-		refcount_set(&uts_ns->ns.count, 1);
+	if (uts_ns) {
+		err = init_ns_common(&uts_ns->ns, false);
+		if (err < 0) {
+			destroy_ns_common(&uts_ns->ns);
+			kmem_cache_free(uts_ns_cache, uts_ns);
+			return ERR_PTR(err);
+		}
+	}
 	return uts_ns;
 }
 
@@ -54,14 +61,11 @@ static struct uts_namespace *clone_uts_ns(struct user_namespace *user_ns,
 	if (!ucounts)
 		goto fail;
 
-	err = -ENOMEM;
 	ns = create_uts_ns();
-	if (!ns)
+	if (IS_ERR(ns)) {
+		err = PTR_ERR(ns);
 		goto fail_dec;
-
-	err = ns_alloc_inum(&ns->ns);
-	if (err)
-		goto fail_free;
+	}
 
 	ns->ucounts = ucounts;
 	ns->ns.ops = &utsns_operations;
@@ -72,8 +76,6 @@ static struct uts_namespace *clone_uts_ns(struct user_namespace *user_ns,
 	up_read(&uts_sem);
 	return ns;
 
-fail_free:
-	kmem_cache_free(uts_ns_cache, ns);
 fail_dec:
 	dec_uts_namespaces(ucounts);
 fail:
@@ -107,7 +109,7 @@ void free_uts_ns(struct uts_namespace *ns)
 {
 	dec_uts_namespaces(ns->ucounts);
 	put_user_ns(ns->user_ns);
-	ns_free_inum(&ns->ns);
+	destroy_ns_common(&ns->ns);
 	kmem_cache_free(uts_ns_cache, ns);
 }
 
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 2ef3b4557f40..f53f7ddec553 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -44,8 +44,14 @@ EXPORT_SYMBOL_GPL(net_rwsem);
 static struct key_tag init_net_key_domain = { .usage = REFCOUNT_INIT(1) };
 #endif
 
+static struct ns_tag init_net_tag = {
+	.usage		= REFCOUNT_INIT(1),
+};
+
 struct net init_net = {
-	.ns.count	= REFCOUNT_INIT(1),
+	.ns.tag		= &init_net_tag,
+	.ns.count	= REFCOUNT_INIT(2),
+	.ns.ops		= &netns_operations,
 	.dev_base_head	= LIST_HEAD_INIT(init_net.dev_base_head),
 #ifdef CONFIG_KEYS
 	.key_domain	= &init_net_key_domain,
@@ -329,7 +335,6 @@ static __net_init int setup_net(struct net *net, struct user_namespace *user_ns)
 	int error = 0;
 	LIST_HEAD(net_exit_list);
 
-	refcount_set(&net->ns.count, 1);
 	refcount_set(&net->passive, 1);
 	get_random_bytes(&net->hash_mix, sizeof(u32));
 	net->dev_base_seq = 1;
@@ -419,6 +424,10 @@ static struct net *net_alloc(void)
 	if (!net)
 		goto out_free;
 
+	if (init_ns_common(&net->ns, false) < 0)
+		goto out_free_2;
+	net->ns.ops = &netns_operations;
+
 #ifdef CONFIG_KEYS
 	net->key_domain = kzalloc(sizeof(struct key_tag), GFP_KERNEL);
 	if (!net->key_domain)
@@ -432,6 +441,7 @@ static struct net *net_alloc(void)
 
 #ifdef CONFIG_KEYS
 out_free_2:
+	destroy_ns_common(&net->ns);
 	kmem_cache_free(net_cachep, net);
 	net = NULL;
 #endif
@@ -443,6 +453,7 @@ static struct net *net_alloc(void)
 static void net_free(struct net *net)
 {
 	kfree(rcu_access_pointer(net->gen));
+	destroy_ns_common(&net->ns);
 	kmem_cache_free(net_cachep, net);
 }
 
@@ -700,24 +711,6 @@ struct net *get_net_ns_by_pid(pid_t pid)
 }
 EXPORT_SYMBOL_GPL(get_net_ns_by_pid);
 
-static __net_init int net_ns_net_init(struct net *net)
-{
-#ifdef CONFIG_NET_NS
-	net->ns.ops = &netns_operations;
-#endif
-	return ns_alloc_inum(&net->ns);
-}
-
-static __net_exit void net_ns_net_exit(struct net *net)
-{
-	ns_free_inum(&net->ns);
-}
-
-static struct pernet_operations __net_initdata net_ns_ops = {
-	.init = net_ns_net_init,
-	.exit = net_ns_net_exit,
-};
-
 static const struct nla_policy rtnl_net_policy[NETNSA_MAX + 1] = {
 	[NETNSA_NONE]		= { .type = NLA_UNSPEC },
 	[NETNSA_NSID]		= { .type = NLA_S32 },
@@ -1097,6 +1090,8 @@ static int __init net_ns_init(void)
 		panic("Could not create netns workq");
 #endif
 
+	proc_alloc_inum(&init_net.ns.inum);
+
 	ng = net_alloc_generic();
 	if (!ng)
 		panic("Could not allocate generic netns");
@@ -1114,9 +1109,6 @@ static int __init net_ns_init(void)
 	init_net_initialized = true;
 	up_write(&pernet_ops_rwsem);
 
-	if (register_pernet_subsys(&net_ns_ops))
-		panic("Could not register network namespace subsystems");
-
 	rtnl_register(PF_UNSPEC, RTM_NEWNSID, rtnl_net_newid, NULL,
 		      RTNL_FLAG_DOIT_UNLOCKED);
 	rtnl_register(PF_UNSPEC, RTM_GETNSID, rtnl_net_getid, rtnl_net_dumpid,



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] keys: Allow request_key upcalls from a container to be intercepted
  2021-02-04 17:47 [RFC][PATCH 0/2] keys: request_key() interception in containers David Howells
  2021-02-04 17:47 ` [PATCH 1/2] Add namespace tags that can be used for matching without pinning a ns David Howells
@ 2021-02-04 17:47 ` David Howells
  2021-02-04 19:55   ` kernel test robot
  1 sibling, 1 reply; 9+ messages in thread
From: David Howells @ 2021-02-04 17:47 UTC (permalink / raw)
  To: sprabhu
  Cc: dhowells, Jarkko Sakkinen, christian, selinux, keyrings,
	linux-api, linux-fsdevel, linux-security-module, linux-kernel,
	containers

Provide a mechanism by which daemons can intercept request_key upcalls,
filtered by namespace and key type, and service them.  The list of active
services is per-user_namespace.


====
WHY?
====

Requests to upcall and instantiate a key are directed to /sbin/request_key
run in the init namespaces.  This is the wrong thing to do for
request_key() called inside a container.  Rather, the container manager
should be able to intercept a request and deal with it itself, applying the
appropriate namespaces.

For example, a container with a different network namespace that routes
packets out of a different NIC should probably not use the main system DNS
settings.


==================
SETTING INTERCEPTS
==================

An intercept is set by calling:

	keyctl(KEYCTL_SERVICE_INTERCEPT,
	       int queue_keyring, int userns_fd,
	       const char *type_name, unsigned int ns_mask);

where "queue_keyring" indicates a keyring into which authorisation keys
will be placed as request_key() calls happen; "userns_fd" indicates the
user namespace on which to place the interception (or -1 for the caller's);
"type_name" indicates the type of key to filter for (or NULL); and
"ns_mask" is a bitwise-OR of:

	KEY_SERVICE_NS_UTS
	KEY_SERVICE_NS_IPC
	KEY_SERVICE_NS_MNT
	KEY_SERVICE_NS_PID
	KEY_SERVICE_NS_NET
	KEY_SERVICE_NS_CGROUP

which select those namespaces of the caller that must match for the
interceptionto occur..  So, for example, a daemon that wanted to service
DNS requests from the kernel would do something like:

	queue = add_key("keyring", "queue", NULL, 0,
			KEY_SPEC_THREAD_KEYRING);
	keyctl(KEYCTL_SERVICE_INTERCEPT, queue, -1, "dns_resolver",
	       KEY_SERVICE_NS_NET);

so that it gets all the DNS records issued by processes in the current
network namespace, no matter what other namespaces are in force at the
time.  On the other hand, the following call:

	keyctl(KEYCTL_SERVICE_INTERCEPT, queue, -1, NULL, 0);

will match everything.

If conflicts arise between two filter records (and different daemons can
have filter records in the same list), the most specific record is matched
by preference, otherwise the first added gets the work.  So, in the example
above, the dns_resolver-specific record would match in preference to the
match-everything record as the former in more specific (it specifies a type
and a particular namespace).  EEXIST is given if an intercept exactly
matches one already in place.

An intercept can be removed by setting queue_keyring to 0, e.g.:

	keyctl(KEYCTL_SERVICE_INTERCEPT, 0, -1, "dns_resolver",
	       KEY_SERVICE_NS_NET);

All the other parameters must be given as when the intercept was set.

Notes:

 (1) Note that anyone can create a channel, but only a sysadmin or the root
     user of the current user_namespace may add filters.

     [!] NOTE: I'm really not sure how to get the security right here.  Who
	 is allowed to intercept requests?  Getting it wrong opens a
	 definite security hole.

 (2) It doesn't really handle multiple service threads watching the same
     keyring.

 (3) The intercepts are not tied to the lifetime of the queue keyring,
     though they can be removed later.  This is probably wrong, but it's
     more tricky since they currently pin the queue keyring.  They are,
     however, cleaned up when the user namespace that owns then is
     destroyed.

 (4) I'm not sure it really handles cases where some of the caller's
     namespaces aren't owned by the caller's user_namespace.


==================
SERVICING REQUESTS
==================

The daemon servicing requests should place a watch on the queue keyring.
This will inform it of when an authorisation key is placed in there.

An authorisation key's description indicates the target key and the callout
data can be read from the authorisation key.

The daemon can then gain permission to instantiate the associated key.

	keyctl_assume_authority(key_id);

After which it can do one of:

	keyctl_instantiate(key_id, "foo_bar", 7, 0);
	keyctl_negate(key_id, 0, 0);
	keyctl_reject(key_id, 0, ENOANO, 0);

and the authorisation key will be automatically revoked and unlinked.

If the authorisation key is unlinked from all keyrings, the target key will
be rejected if it hasn't been instantiated yet.


[!] NOTE: Need to provide some way to find out the operation type and other
    parameters from the auth key.  /sbin/request_key supplies this on the
    command line, but I can't do that here.  It's probably something that
    needs storing into the request_key_auth-type key and an additional
    keyctl providing to retrieve it.


===========
SAMPLE CODE
===========

A sample program is provided.  This can be run as:

	./samples/watch_queue/key_req_intercept

it will then watch for requests to be made for user keyrings in the user
namespace in which it resides.  Such requests can be made by:

	keyctl request2 user a @s

This key will be rejected and the command will fail with ENOANO.
Subsequent accesses to the key will also fail with ENOANO.

Signed-off-by: David Howells <dhowells@redhat.com>
---

 include/linux/key-type.h                |    4 
 include/linux/user_namespace.h          |    2 
 include/uapi/linux/keyctl.h             |   13 +
 kernel/user.c                           |    3 
 kernel/user_namespace.c                 |    2 
 samples/watch_queue/Makefile            |    2 
 samples/watch_queue/key_req_intercept.c |  271 +++++++++++++++++++++++++
 security/keys/Makefile                  |    2 
 security/keys/compat.c                  |    3 
 security/keys/internal.h                |    5 
 security/keys/keyctl.c                  |    6 +
 security/keys/keyring.c                 |    1 
 security/keys/process_keys.c            |    2 
 security/keys/request_key.c             |   16 +
 security/keys/request_key_auth.c        |    3 
 security/keys/service.c                 |  337 +++++++++++++++++++++++++++++++
 16 files changed, 663 insertions(+), 9 deletions(-)
 create mode 100644 samples/watch_queue/key_req_intercept.c
 create mode 100644 security/keys/service.c

diff --git a/include/linux/key-type.h b/include/linux/key-type.h
index 7d985a1dfe4a..6218688eb254 100644
--- a/include/linux/key-type.h
+++ b/include/linux/key-type.h
@@ -62,8 +62,10 @@ struct key_match_data {
  * kernel managed key type definition
  */
 struct key_type {
+	struct module *owner;
+
 	/* name of the type */
-	const char *name;
+	const char name[24];
 
 	/* default payload length for quota precalculation (optional)
 	 * - this can be used instead of calling key_payload_reserve(), that
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index 64cf8ebdc4ec..7691778aff3a 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -73,6 +73,8 @@ struct user_namespace {
 	struct list_head	keyring_name_list;
 	struct key		*user_keyring_register;
 	struct rw_semaphore	keyring_sem;
+	struct hlist_head	request_key_services;
+	spinlock_t		request_key_services_lock;
 #endif
 
 	/* Register of per-UID persistent keyrings for this namespace */
diff --git a/include/uapi/linux/keyctl.h b/include/uapi/linux/keyctl.h
index 4c8884eea808..a2e553df139b 100644
--- a/include/uapi/linux/keyctl.h
+++ b/include/uapi/linux/keyctl.h
@@ -36,6 +36,18 @@
 #define KEY_REQKEY_DEFL_GROUP_KEYRING		6
 #define KEY_REQKEY_DEFL_REQUESTOR_KEYRING	7
 
+/* Request_key service daemon namespace selection specifiers. */
+#define KEY_SERVICE_NS_UTS		0x0001
+#define KEY_SERVICE_NS_IPC		0x0002
+#define KEY_SERVICE_NS_MNT		0x0004
+#define KEY_SERVICE_NS_PID		0x0008
+#define KEY_SERVICE_NS_NET		0x0010
+#define KEY_SERVICE_NS_CGROUP		0x0020
+#define KEY_SERVICE___ALL_NS		0x003f
+
+#define KEY_SERVICE_FD_CLOEXEC		0x0001
+#define KEY_SERVICE_FD_NONBLOCK		0x0002
+
 /* keyctl commands */
 #define KEYCTL_GET_KEYRING_ID		0	/* ask for a keyring's ID */
 #define KEYCTL_JOIN_SESSION_KEYRING	1	/* join or start named session keyring */
@@ -70,6 +82,7 @@
 #define KEYCTL_MOVE			30	/* Move keys between keyrings */
 #define KEYCTL_CAPABILITIES		31	/* Find capabilities of keyrings subsystem */
 #define KEYCTL_WATCH_KEY		32	/* Watch a key or ring of keys for changes */
+#define KEYCTL_SERVICE_INTERCEPT	33	/* Intercept request_key services on a user_ns */
 
 /* keyctl structures */
 struct keyctl_dh_params {
diff --git a/kernel/user.c b/kernel/user.c
index 78ee75f4cd21..0362d738286a 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -71,6 +71,9 @@ struct user_namespace init_user_ns = {
 #ifdef CONFIG_KEYS
 	.keyring_name_list = LIST_HEAD_INIT(init_user_ns.keyring_name_list),
 	.keyring_sem = __RWSEM_INITIALIZER(init_user_ns.keyring_sem),
+	.request_key_services = HLIST_HEAD_INIT,
+	.request_key_services_lock =
+	__SPIN_LOCK_UNLOCKED(init_user_ns.request_key_services_lock),
 #endif
 };
 EXPORT_SYMBOL_GPL(init_user_ns);
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index f60cf7b5973c..9aebb4be4c00 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -130,6 +130,8 @@ int create_user_ns(struct cred *new)
 #ifdef CONFIG_KEYS
 	INIT_LIST_HEAD(&ns->keyring_name_list);
 	init_rwsem(&ns->keyring_sem);
+	INIT_HLIST_HEAD(&ns->request_key_services);
+	spin_lock_init(&ns->request_key_services_lock);
 #endif
 	ret = -ENOMEM;
 	if (!setup_userns_sysctls(ns))
diff --git a/samples/watch_queue/Makefile b/samples/watch_queue/Makefile
index c0db3a6bc524..4aaa2e2f67a0 100644
--- a/samples/watch_queue/Makefile
+++ b/samples/watch_queue/Makefile
@@ -1,4 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0-only
 userprogs-always-y += watch_test
+userprogs-always-y += key_req_intercept
 
 userccflags += -I usr/include
+userldflags += -lkeyutils
diff --git a/samples/watch_queue/key_req_intercept.c b/samples/watch_queue/key_req_intercept.c
new file mode 100644
index 000000000000..c187024a0ce0
--- /dev/null
+++ b/samples/watch_queue/key_req_intercept.c
@@ -0,0 +1,271 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Intercept request_key upcalls
+ *
+ * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+
+#define _GNU_SOURCE
+#include <stdbool.h>
+#include <stdarg.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <signal.h>
+#include <unistd.h>
+#include <errno.h>
+#include <sys/ioctl.h>
+#include <limits.h>
+#include <keyutils.h>
+#include <linux/watch_queue.h>
+#include <linux/unistd.h>
+
+#ifndef KEYCTL_WATCH_KEY
+#define KEYCTL_WATCH_KEY 32
+#endif
+#ifndef KEYCTL_SERVICE_INTERCEPT
+#define KEYCTL_SERVICE_INTERCEPT 33
+#endif
+#ifndef __NR_keyctl
+#define __NR_keyctl -1
+#endif
+
+#define BUF_SIZE 256
+
+typedef int key_serial_t;
+key_serial_t queue_keyring;
+
+static long keyctl_watch_key(int key, int watch_fd, int watch_id)
+{
+	return syscall(__NR_keyctl, KEYCTL_WATCH_KEY, key, watch_fd, watch_id);
+}
+
+static long keyctl_service_intercept(int queue_keyring, int userns_fd,
+				     const char *type_name, unsigned int ns_mask)
+{
+	return syscall(__NR_keyctl, KEYCTL_SERVICE_INTERCEPT,
+		       queue_keyring, userns_fd, type_name, ns_mask);
+}
+
+static const char auth_key_type[] = ".request_key_auth;";
+
+/*
+ * Instantiate a key.
+ */
+static void do_instantiation(key_serial_t key, char *desc)
+{
+	printf("INSTANTIATE %u '%s'\n", key, desc);
+
+	if (keyctl_assume_authority(key) == -1) {
+		perror("keyctl_assume_authority");
+		exit(1);
+	}
+
+	if (keyctl_reject(key, 20, ENOANO, 0) == -1) {
+		perror("keyctl_reject");
+		exit(1);
+	}
+}
+
+/*
+ * Process a notification.
+ */
+static void process_request(struct watch_notification *n, size_t len)
+{
+	struct key_notification *k = (struct key_notification *)n;
+	key_serial_t auth_key, key;
+	char desc[1024], *p;
+
+	if (len != sizeof(struct key_notification)) {
+		fprintf(stderr, "Incorrect key message length\n");
+		return;
+	}
+
+	auth_key = k->aux;
+	printf("REQUEST %d aux=%d\n", k->key_id, k->aux);
+
+	if (keyctl_describe(auth_key, desc, sizeof(desc)) == -1) {
+		perror("keyctl_describe(auth_key)");
+		exit(1);
+	}
+
+	printf("AUTH_KEY '%s'\n", desc);
+	if (memcmp(desc, auth_key_type, sizeof(auth_key_type) - 1) != 0) {
+		printf("NOT AUTH_KEY TYPE\n");
+	} else {
+		p = strrchr(desc, ';');
+		if (p) {
+			key = strtoul(p + 1, NULL, 16);
+			printf("KEY '%d'\n", key);
+
+			if (keyctl_describe(key, desc, sizeof(desc)) == -1) {
+				perror("keyctl_describe(key)");
+				exit(1);
+			}
+
+			do_instantiation(key, desc);
+			return;
+		}
+	}
+
+	/* Shouldn't need to do this if we successfully instantiated/rejected
+	 * the target key.
+	 */
+	if (keyctl_unlink(auth_key, queue_keyring) == -1)
+		perror("keyctl_unlink");
+}
+
+/*
+ * Consume and display events.
+ */
+static void consumer(int fd)
+{
+	unsigned char buffer[433], *p, *end;
+	union {
+		struct watch_notification n;
+		unsigned char buf1[128];
+	} n;
+	ssize_t buf_len;
+
+	for (;;) {
+		buf_len = read(fd, buffer, sizeof(buffer));
+		if (buf_len == -1) {
+			perror("read");
+			exit(1);
+		}
+
+		if (buf_len == 0) {
+			printf("-- END --\n");
+			return;
+		}
+
+		if (buf_len > sizeof(buffer)) {
+			fprintf(stderr, "Read buffer overrun: %zd\n", buf_len);
+			return;
+		}
+
+		printf("read() = %zd\n", buf_len);
+
+		p = buffer;
+		end = buffer + buf_len;
+		while (p < end) {
+			size_t largest, len;
+
+			largest = end - p;
+			if (largest > 128)
+				largest = 128;
+			if (largest < sizeof(struct watch_notification)) {
+				fprintf(stderr, "Short message header: %zu\n", largest);
+				return;
+			}
+			memcpy(&n, p, largest);
+
+			printf("NOTIFY[%03zx]: ty=%06x sy=%02x i=%08x\n",
+			       p - buffer, n.n.type, n.n.subtype, n.n.info);
+
+			len = n.n.info & WATCH_INFO_LENGTH;
+			if (len < sizeof(n.n) || len > largest) {
+				fprintf(stderr, "Bad message length: %zu/%zu\n", len, largest);
+				exit(1);
+			}
+
+			switch (n.n.type) {
+			case WATCH_TYPE_META:
+				switch (n.n.subtype) {
+				case WATCH_META_REMOVAL_NOTIFICATION:
+					printf("REMOVAL of watchpoint %08x\n",
+					       (n.n.info & WATCH_INFO_ID) >>
+					       WATCH_INFO_ID__SHIFT);
+					break;
+				case WATCH_META_LOSS_NOTIFICATION:
+					printf("-- LOSS --\n");
+					break;
+				default:
+					printf("other meta record\n");
+					break;
+				}
+				break;
+			case WATCH_TYPE_KEY_NOTIFY:
+				switch (n.n.subtype) {
+				case NOTIFY_KEY_LINKED:
+					process_request(&n.n, len);
+					break;
+				default:
+					printf("other key subtype\n");
+					break;
+				}
+				break;
+			default:
+				printf("other type\n");
+				break;
+			}
+
+			p += len;
+		}
+	}
+}
+
+static struct watch_notification_filter filter = {
+	.nr_filters	= 1,
+	.filters = {
+		[0]	= {
+			.type			= WATCH_TYPE_KEY_NOTIFY,
+			.subtype_filter[0]	= (1 << NOTIFY_KEY_LINKED),
+		},
+	},
+};
+
+static void cleanup(void)
+{
+	printf("--- clean up ---\n");
+	if (keyctl_service_intercept(0, -1, "user", 0) == -1)
+		perror("unintercept");
+	if (keyctl_clear(queue_keyring) == -1)
+		perror("clear");
+	if (keyctl_unlink(queue_keyring, KEY_SPEC_SESSION_KEYRING) == -1)
+		perror("unlink/q");
+}
+
+int main(int argc, char **argv)
+{
+	int pipefd[2], fd;
+
+	queue_keyring = add_key("keyring", "intercept", NULL, 0, KEY_SPEC_SESSION_KEYRING);
+	if (queue_keyring == -1) {
+		perror("add_key");
+		exit(1);
+	}
+
+	printf("QUEUE KEYRING %d\n", queue_keyring);
+
+	if (pipe2(pipefd, O_NOTIFICATION_PIPE) == -1) {
+		perror("pipe2");
+		exit(1);
+	}
+	fd = pipefd[0];
+
+	if (ioctl(fd, IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE) == -1) {
+		perror("watch_queue(size)");
+		exit(1);
+	}
+
+	if (ioctl(fd, IOC_WATCH_QUEUE_SET_FILTER, &filter) == -1) {
+		perror("watch_queue(filter)");
+		exit(1);
+	}
+
+	if (keyctl_watch_key(queue_keyring, fd, 0x01) == -1) {
+		perror("keyctl_watch_key");
+		exit(1);
+	}
+
+	if (keyctl_service_intercept(queue_keyring, -1, "user", 0) == -1) {
+		perror("keyctl_service_intercept");
+		exit(1);
+	}
+
+	atexit(cleanup);
+
+	consumer(fd);
+	exit(0);
+}
diff --git a/security/keys/Makefile b/security/keys/Makefile
index 5f40807f05b3..3626340df84b 100644
--- a/security/keys/Makefile
+++ b/security/keys/Makefile
@@ -15,7 +15,9 @@ obj-y := \
 	process_keys.o \
 	request_key.o \
 	request_key_auth.o \
+	service.o \
 	user_defined.o
+
 compat-obj-$(CONFIG_KEY_DH_OPERATIONS) += compat_dh.o
 obj-$(CONFIG_COMPAT) += compat.o $(compat-obj-y)
 obj-$(CONFIG_PROC_FS) += proc.o
diff --git a/security/keys/compat.c b/security/keys/compat.c
index 1545efdca562..39c7b1b2a7c7 100644
--- a/security/keys/compat.c
+++ b/security/keys/compat.c
@@ -126,6 +126,9 @@ COMPAT_SYSCALL_DEFINE5(keyctl, u32, option,
 	case KEYCTL_WATCH_KEY:
 		return keyctl_watch_key(arg2, arg3, arg4);
 
+	case KEYCTL_SERVICE_INTERCEPT:
+		return keyctl_service_intercept(arg2, arg3, compat_ptr(arg4), arg5);
+
 	default:
 		return -EOPNOTSUPP;
 	}
diff --git a/security/keys/internal.h b/security/keys/internal.h
index 9b9cf3b6fcbb..b777ef755626 100644
--- a/security/keys/internal.h
+++ b/security/keys/internal.h
@@ -150,6 +150,7 @@ extern struct key *find_keyring_by_name(const char *name, bool uid_keyring);
 
 extern int look_up_user_keyrings(struct key **, struct key **);
 extern struct key *get_user_session_keyring_rcu(const struct cred *);
+extern int install_thread_keyring(void);
 extern int install_thread_keyring_to_cred(struct cred *);
 extern int install_process_keyring_to_cred(struct cred *);
 extern int install_session_keyring_to_cred(struct cred *, struct key *);
@@ -183,6 +184,8 @@ extern void key_gc_keytype(struct key_type *ktype);
 extern int key_task_permission(const key_ref_t key_ref,
 			       const struct cred *cred,
 			       enum key_need_perm need_perm);
+extern int queue_request_key(struct key *key, struct key *auth_key);
+extern void clear_request_key_services(struct user_namespace *ns);
 
 static inline void notify_key(struct key *key,
 			      enum key_notification_subtype subtype, u32 aux)
@@ -265,6 +268,8 @@ extern long keyctl_invalidate_key(key_serial_t);
 extern long keyctl_restrict_keyring(key_serial_t id,
 				    const char __user *_type,
 				    const char __user *_restriction);
+extern long keyctl_service_intercept(int, int, const char __user *, unsigned int);
+
 #ifdef CONFIG_PERSISTENT_KEYRINGS
 extern long keyctl_get_persistent(uid_t, key_serial_t);
 extern unsigned persistent_keyring_expiry;
diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c
index 96a92a645216..5cac7979c5c0 100644
--- a/security/keys/keyctl.c
+++ b/security/keys/keyctl.c
@@ -2015,6 +2015,12 @@ SYSCALL_DEFINE5(keyctl, int, option, unsigned long, arg2, unsigned long, arg3,
 	case KEYCTL_WATCH_KEY:
 		return keyctl_watch_key((key_serial_t)arg2, (int)arg3, (int)arg4);
 
+	case KEYCTL_SERVICE_INTERCEPT:
+		return keyctl_service_intercept((key_serial_t)arg2,
+						(int)arg3,
+						(const char __user *)arg4,
+						(unsigned int)arg5);
+
 	default:
 		return -EOPNOTSUPP;
 	}
diff --git a/security/keys/keyring.c b/security/keys/keyring.c
index 5e6a90760753..7ab9e5130f18 100644
--- a/security/keys/keyring.c
+++ b/security/keys/keyring.c
@@ -60,6 +60,7 @@ void key_free_user_ns(struct user_namespace *ns)
 	list_del_init(&ns->keyring_name_list);
 	write_unlock(&keyring_name_lock);
 
+	clear_request_key_services(ns);
 	key_put(ns->user_keyring_register);
 #ifdef CONFIG_PERSISTENT_KEYRINGS
 	key_put(ns->persistent_keyring_register);
diff --git a/security/keys/process_keys.c b/security/keys/process_keys.c
index e3d79a7b6db6..3b515781bc8b 100644
--- a/security/keys/process_keys.c
+++ b/security/keys/process_keys.c
@@ -241,7 +241,7 @@ int install_thread_keyring_to_cred(struct cred *new)
  *
  * Return: 0 if a thread keyring is now present; -errno on failure.
  */
-static int install_thread_keyring(void)
+int install_thread_keyring(void)
 {
 	struct cred *new;
 	int ret;
diff --git a/security/keys/request_key.c b/security/keys/request_key.c
index 2da4404276f0..81fe5aa02dee 100644
--- a/security/keys/request_key.c
+++ b/security/keys/request_key.c
@@ -14,6 +14,7 @@
 #include <linux/keyctl.h>
 #include <linux/slab.h>
 #include <net/net_namespace.h>
+#include <linux/init_task.h>
 #include "internal.h"
 #include <keys/request_key_auth-type.h>
 
@@ -112,7 +113,7 @@ static int call_usermodehelper_keys(const char *path, char **argv, char **envp,
  * Request userspace finish the construction of a key
  * - execute "/sbin/request-key <op> <key> <uid> <gid> <keyring> <keyring> <keyring>"
  */
-static int call_sbin_request_key(struct key *authkey, void *aux)
+static int call_sbin_request_key(struct key *authkey)
 {
 	static char const request_key[] = "/sbin/request-key";
 	struct request_key_auth *rka = get_request_key_auth(authkey);
@@ -224,7 +225,6 @@ static int construct_key(struct key *key, const void *callout_info,
 			 size_t callout_len, void *aux,
 			 struct key *dest_keyring)
 {
-	request_key_actor_t actor;
 	struct key *authkey;
 	int ret;
 
@@ -237,11 +237,13 @@ static int construct_key(struct key *key, const void *callout_info,
 		return PTR_ERR(authkey);
 
 	/* Make the call */
-	actor = call_sbin_request_key;
-	if (key->type->request_key)
-		actor = key->type->request_key;
-
-	ret = actor(authkey, aux);
+	if (key->type->request_key) {
+		ret = key->type->request_key(authkey, aux);
+	} else {
+		ret = queue_request_key(key, authkey);
+		if (ret == -ENOPARAM)
+			ret = call_sbin_request_key(authkey);
+	}
 
 	/* check that the actor called complete_request_key() prior to
 	 * returning an error */
diff --git a/security/keys/request_key_auth.c b/security/keys/request_key_auth.c
index 41e9735006d0..8379dfd15395 100644
--- a/security/keys/request_key_auth.c
+++ b/security/keys/request_key_auth.c
@@ -152,6 +152,9 @@ static void request_key_auth_destroy(struct key *key)
 		rcu_assign_keypointer(key, NULL);
 		call_rcu(&rka->rcu, request_key_auth_rcu_disposal);
 	}
+
+	if (key_read_state(rka->target_key) == KEY_IS_UNINSTANTIATED)
+		key_reject_and_link(rka->target_key, 0, -ENOKEY, NULL, NULL);
 }
 
 /*
diff --git a/security/keys/service.c b/security/keys/service.c
new file mode 100644
index 000000000000..cea8ae993bea
--- /dev/null
+++ b/security/keys/service.c
@@ -0,0 +1,337 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Service daemon interface
+ *
+ * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+
+#include <linux/key.h>
+#include <linux/key-type.h>
+#include <linux/utsname.h>
+#include <linux/ipc_namespace.h>
+#include <linux/mnt_namespace.h>
+#include <linux/pid_namespace.h>
+#include <linux/cgroup.h>
+#include <net/net_namespace.h>
+#include "internal.h"
+#include "../fs/mount.h"
+
+/*
+ * Request service filter record.
+ */
+struct request_key_service {
+	struct hlist_node	user_ns_link;	/* Link in the user_ns service list */
+	struct key		*queue_keyring;	/* Keyring into which requests are placed */
+
+	/* The following fields define the selection criteria that we select
+	 * this record on.  All these references must be pinned just in case
+	 * the fd gets passed to another process or the owning process changes
+	 * its own namespaces.
+	 *
+	 * Most of the criteria can be NULL if that criterion is irrelevant to
+	 * the filter.
+	 */
+	char			type[24];	/* Key type of interest (or "") */
+	struct ns_tag		*uts_ns;	/* Matching UTS namespace (or NULL) */
+	struct ns_tag		*ipc_ns;	/* Matching IPC namespace (or NULL) */
+	struct ns_tag		*mnt_ns;	/* Matching mount namespace (or NULL) */
+	struct ns_tag		*pid_ns;	/* Matching process namespace (or NULL) */
+	struct ns_tag		*net_ns;	/* Matching network namespace (or NULL) */
+	struct ns_tag		*cgroup_ns;	/* Matching cgroup namespace (or NULL) */
+	u8			selectivity;	/* Number of exact-match fields */
+	bool			dead;
+};
+
+/*
+ * Free a request_key service record
+ */
+static void free_key_service(struct request_key_service *svc)
+{
+	if (svc) {
+		put_ns_tag(svc->uts_ns);
+		put_ns_tag(svc->ipc_ns);
+		put_ns_tag(svc->mnt_ns);
+		put_ns_tag(svc->pid_ns);
+		put_ns_tag(svc->net_ns);
+		put_ns_tag(svc->cgroup_ns);
+		key_put(svc->queue_keyring);
+		kfree(svc);
+	}
+}
+
+/*
+ * Allocate a service record.
+ */
+static struct request_key_service *alloc_key_service(key_serial_t queue_keyring,
+						     const char __user *type_name,
+						     unsigned int ns_mask)
+{
+	struct request_key_service *svc;
+	struct key_type *type;
+	key_ref_t key_ref;
+	int ret;
+	u8 selectivity = 0;
+
+	svc = kzalloc(sizeof(struct request_key_service), GFP_KERNEL);
+	if (!svc)
+		return ERR_PTR(-ENOMEM);
+
+	if (queue_keyring != 0) {
+		key_ref = lookup_user_key(queue_keyring, 0, KEY_NEED_SEARCH);
+		if (IS_ERR(key_ref)) {
+			ret = PTR_ERR(key_ref);
+			goto err_svc;
+		}
+
+		svc->queue_keyring = key_ref_to_ptr(key_ref);
+	}
+
+	/* Save the matching criteria.  Anything the caller doesn't care about
+	 * we leave as NULL.
+	 */
+	if (type_name) {
+		ret = strncpy_from_user(svc->type, type_name, sizeof(svc->type));
+		if (ret < 0)
+			goto err_keyring;
+		if (ret >= sizeof(svc->type)) {
+			ret = -EINVAL;
+			goto err_keyring;
+		}
+
+		type = key_type_lookup(type_name);
+		if (IS_ERR(type)) {
+			ret = -EINVAL;
+			goto err_keyring;
+		}
+		memcpy(svc->type, type->name, sizeof(svc->type));
+		key_type_put(type);
+	}
+
+	if (ns_mask & KEY_SERVICE_NS_UTS) {
+		svc->uts_ns = get_ns_tag(current->nsproxy->uts_ns->ns.tag);
+		selectivity++;
+	}
+	if (ns_mask & KEY_SERVICE_NS_IPC) {
+		svc->ipc_ns = get_ns_tag(current->nsproxy->ipc_ns->ns.tag);
+		selectivity++;
+	}
+	if (ns_mask & KEY_SERVICE_NS_MNT) {
+		svc->mnt_ns = get_ns_tag(current->nsproxy->mnt_ns->ns.tag);
+		selectivity++;
+	}
+	if (ns_mask & KEY_SERVICE_NS_PID) {
+		svc->pid_ns = get_ns_tag(task_active_pid_ns(current)->ns.tag);
+		selectivity++;
+	}
+	if (ns_mask & KEY_SERVICE_NS_NET) {
+		svc->net_ns = get_ns_tag(current->nsproxy->net_ns->ns.tag);
+		selectivity++;
+	}
+	if (ns_mask & KEY_SERVICE_NS_CGROUP) {
+		svc->cgroup_ns = get_ns_tag(current->nsproxy->cgroup_ns->ns.tag);
+		selectivity++;
+	}
+
+	svc->selectivity = selectivity;
+	return svc;
+
+err_keyring:
+	key_put(svc->queue_keyring);
+err_svc:
+	kfree(svc);
+	return ERR_PTR(ret);
+}
+
+/*
+ * Install a request_key service into the user namespace's list
+ */
+static int install_key_service(struct user_namespace *user_ns,
+			       struct request_key_service *svc)
+{
+	struct request_key_service *p;
+	struct hlist_node **pp;
+	int ret = 0;
+
+	spin_lock(&user_ns->request_key_services_lock);
+
+	/* The services list is kept in order of selectivity.  The more exact
+	 * matches a service requires, the earlier it is in the list.
+	 */
+	for (pp = &user_ns->request_key_services.first; *pp; pp = &(*pp)->next) {
+		p = hlist_entry(*pp, struct request_key_service, user_ns_link);
+		if (p->selectivity < svc->selectivity)
+			goto insert_before;
+		if (p->selectivity > svc->selectivity)
+			continue;
+		if (memcmp(p->type, svc->type, sizeof(p->type)) == 0 &&
+		    p->uts_ns == svc->uts_ns &&
+		    p->ipc_ns == svc->ipc_ns &&
+		    p->mnt_ns == svc->mnt_ns &&
+		    p->pid_ns == svc->pid_ns &&
+		    p->net_ns == svc->net_ns &&
+		    p->cgroup_ns == svc->cgroup_ns)
+			goto duplicate;
+	}
+
+	svc->user_ns_link.pprev = pp;
+	rcu_assign_pointer(*pp, &svc->user_ns_link);
+	goto out;
+
+insert_before:
+	hlist_add_before_rcu(&svc->user_ns_link, &p->user_ns_link);
+	goto out;
+
+duplicate:
+	free_key_service(svc);
+	ret = -EEXIST;
+out:
+	spin_unlock(&user_ns->request_key_services_lock);
+	return ret;
+}
+
+/*
+ * Remove a request_key service interception from the user namespace's list
+ */
+static int remove_key_service(struct user_namespace *user_ns,
+			      struct request_key_service *svc)
+{
+	struct request_key_service *p;
+	struct hlist_node **pp;
+	int ret = 0;
+
+	spin_lock(&user_ns->request_key_services_lock);
+
+	/* The services list is kept in order of selectivity.  The more exact
+	 * matches a service requires, the earlier it is in the list.
+	 */
+	for (pp = &user_ns->request_key_services.first; *pp; pp = &(*pp)->next) {
+		p = hlist_entry(*pp, struct request_key_service, user_ns_link);
+		if (p->selectivity < svc->selectivity)
+			break;
+		if (p->selectivity > svc->selectivity)
+			continue;
+		if (memcmp(p->type, svc->type, sizeof(p->type)) == 0 &&
+		    p->uts_ns == svc->uts_ns &&
+		    p->ipc_ns == svc->ipc_ns &&
+		    p->mnt_ns == svc->mnt_ns &&
+		    p->pid_ns == svc->pid_ns &&
+		    p->net_ns == svc->net_ns &&
+		    p->cgroup_ns == svc->cgroup_ns)
+			goto found;
+	}
+
+	p = NULL;
+	ret = -ENOENT;
+	goto out;
+
+found:
+	hlist_del_rcu(&p->user_ns_link);
+out:
+	spin_unlock(&user_ns->request_key_services_lock);
+	free_key_service(p);
+	free_key_service(svc);
+	return ret;
+}
+
+/*
+ * Add a request_key service handler for a subset of the calling process's
+ * particular set of namespaces.
+ */
+long keyctl_service_intercept(key_serial_t queue_keyring,
+			      int userns_fd,
+			      const char __user *type_name,
+			      unsigned int ns_mask)
+{
+	struct request_key_service *svc;
+	struct user_namespace *user_ns = current_user_ns();
+
+	if (ns_mask & ~KEY_SERVICE___ALL_NS)
+		return -EINVAL;
+	if (userns_fd != -1)
+		return -EINVAL; /* Not supported yet */
+
+	/* Require the caller to be the owner of the user namespace in which
+	 * the fd was created if they're not the sysadmin.  Possibly we should
+	 * be more strict about what namespaces one can select, but it's not
+	 * clear how best to do that.
+	 */
+	if (!capable(CAP_SYS_ADMIN) &&
+	    !uid_eq(user_ns->owner, current_cred()->euid))
+		return -EPERM;
+
+	svc = alloc_key_service(queue_keyring, type_name, ns_mask);
+	if (IS_ERR(svc))
+		return PTR_ERR(svc);
+
+	if (queue_keyring == 0)
+		return remove_key_service(user_ns, svc);
+
+	return install_key_service(user_ns, svc);
+}
+
+/*
+ * Queue a construction record if we can find a queue to punt it off to.
+ */
+int queue_request_key(struct key *key, struct key *auth_key)
+{
+	struct request_key_service *svc;
+	struct user_namespace *user_ns = current_user_ns();
+	struct pid_namespace *pid_ns = task_active_pid_ns(current);
+	struct nsproxy *nsproxy = current->nsproxy;
+	struct key *queue_keyring = NULL;
+	int ret;
+
+	if (hlist_empty(&user_ns->request_key_services))
+		return false;
+
+	rcu_read_lock();
+
+	hlist_for_each_entry_rcu(svc, &user_ns->request_key_services, user_ns_link) {
+		if (svc->type[0] &&
+		    memcmp(svc->type, key->type->name, sizeof(svc->type)) != 0)
+			continue;
+		if ((svc->uts_ns && svc->uts_ns != nsproxy->uts_ns->ns.tag) ||
+		    (svc->ipc_ns && svc->ipc_ns != nsproxy->ipc_ns->ns.tag) ||
+		    (svc->mnt_ns && svc->mnt_ns != nsproxy->mnt_ns->ns.tag) ||
+		    (svc->pid_ns && svc->pid_ns != pid_ns->ns.tag) ||
+		    (svc->net_ns && svc->net_ns != nsproxy->net_ns->ns.tag) ||
+		    (svc->cgroup_ns && svc->cgroup_ns != nsproxy->cgroup_ns->ns.tag))
+			continue;
+		goto found_match;
+	}
+
+	rcu_read_unlock();
+	return -ENOPARAM;
+
+found_match:
+	spin_lock(&user_ns->request_key_services_lock);
+	if (!svc->dead)
+		queue_keyring = key_get(svc->queue_keyring);
+	spin_unlock(&user_ns->request_key_services_lock);
+	rcu_read_unlock();
+
+	ret = -ENOPARAM;
+	if (queue_keyring) {
+		ret = key_link(queue_keyring, auth_key);
+		if (ret < 0)
+			key_reject_and_link(key, 0, ret, NULL, auth_key);
+		key_put(queue_keyring);
+	}
+
+	return ret;
+}
+
+/*
+ * Clear all the service intercept records on a user namespace.
+ */
+void clear_request_key_services(struct user_namespace *user_ns)
+{
+	struct request_key_service *svc;
+
+	while (!hlist_empty(&user_ns->request_key_services)) {
+		svc = hlist_entry(user_ns->request_key_services.first,
+				  struct request_key_service, user_ns_link);
+		hlist_del(&svc->user_ns_link);
+		free_key_service(svc);
+	}
+}



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] keys: Allow request_key upcalls from a container to be intercepted
  2021-02-04 17:47 ` [PATCH 2/2] keys: Allow request_key upcalls from a container to be intercepted David Howells
@ 2021-02-04 19:55   ` kernel test robot
  0 siblings, 0 replies; 9+ messages in thread
From: kernel test robot @ 2021-02-04 19:55 UTC (permalink / raw)
  To: David Howells, sprabhu
  Cc: kbuild-all, dhowells, Jarkko Sakkinen, christian, selinux,
	keyrings, linux-api, linux-fsdevel, linux-security-module,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 6615 bytes --]

Hi David,

I love your patch! Perhaps something to improve:

[auto build test WARNING on cgroup/for-next]
[also build test WARNING on dhowells-fs/fscache-next linus/master v5.11-rc6]
[cannot apply to security/next-testing tip/timers/core next-20210125]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/David-Howells/keys-request_key-interception-in-containers/20210205-015946
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-next
config: powerpc64-randconfig-s031-20210204 (attached as .config)
compiler: powerpc-linux-gcc (GCC) 9.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.3-215-g0fb77bb6-dirty
        # https://github.com/0day-ci/linux/commit/6d049eb50238910e375143259391790a8b69ebc6
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review David-Howells/keys-request_key-interception-in-containers/20210205-015946
        git checkout 6d049eb50238910e375143259391790a8b69ebc6
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=powerpc64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


"sparse warnings: (new ones prefixed by >>)"
>> security/keys/service.c:101:40: sparse: sparse: incorrect type in argument 1 (different address spaces) @@     expected char const *type @@     got char const [noderef] __user *type_name @@
   security/keys/service.c:101:40: sparse:     expected char const *type
   security/keys/service.c:101:40: sparse:     got char const [noderef] __user *type_name
>> security/keys/service.c:177:9: sparse: sparse: incompatible types in comparison expression (different address spaces):
>> security/keys/service.c:177:9: sparse:    struct hlist_node [noderef] __rcu *
>> security/keys/service.c:177:9: sparse:    struct hlist_node *

vim +101 security/keys/service.c

    61	
    62	/*
    63	 * Allocate a service record.
    64	 */
    65	static struct request_key_service *alloc_key_service(key_serial_t queue_keyring,
    66							     const char __user *type_name,
    67							     unsigned int ns_mask)
    68	{
    69		struct request_key_service *svc;
    70		struct key_type *type;
    71		key_ref_t key_ref;
    72		int ret;
    73		u8 selectivity = 0;
    74	
    75		svc = kzalloc(sizeof(struct request_key_service), GFP_KERNEL);
    76		if (!svc)
    77			return ERR_PTR(-ENOMEM);
    78	
    79		if (queue_keyring != 0) {
    80			key_ref = lookup_user_key(queue_keyring, 0, KEY_NEED_SEARCH);
    81			if (IS_ERR(key_ref)) {
    82				ret = PTR_ERR(key_ref);
    83				goto err_svc;
    84			}
    85	
    86			svc->queue_keyring = key_ref_to_ptr(key_ref);
    87		}
    88	
    89		/* Save the matching criteria.  Anything the caller doesn't care about
    90		 * we leave as NULL.
    91		 */
    92		if (type_name) {
    93			ret = strncpy_from_user(svc->type, type_name, sizeof(svc->type));
    94			if (ret < 0)
    95				goto err_keyring;
    96			if (ret >= sizeof(svc->type)) {
    97				ret = -EINVAL;
    98				goto err_keyring;
    99			}
   100	
 > 101			type = key_type_lookup(type_name);
   102			if (IS_ERR(type)) {
   103				ret = -EINVAL;
   104				goto err_keyring;
   105			}
   106			memcpy(svc->type, type->name, sizeof(svc->type));
   107			key_type_put(type);
   108		}
   109	
   110		if (ns_mask & KEY_SERVICE_NS_UTS) {
   111			svc->uts_ns = get_ns_tag(current->nsproxy->uts_ns->ns.tag);
   112			selectivity++;
   113		}
   114		if (ns_mask & KEY_SERVICE_NS_IPC) {
   115			svc->ipc_ns = get_ns_tag(current->nsproxy->ipc_ns->ns.tag);
   116			selectivity++;
   117		}
   118		if (ns_mask & KEY_SERVICE_NS_MNT) {
   119			svc->mnt_ns = get_ns_tag(current->nsproxy->mnt_ns->ns.tag);
   120			selectivity++;
   121		}
   122		if (ns_mask & KEY_SERVICE_NS_PID) {
   123			svc->pid_ns = get_ns_tag(task_active_pid_ns(current)->ns.tag);
   124			selectivity++;
   125		}
   126		if (ns_mask & KEY_SERVICE_NS_NET) {
   127			svc->net_ns = get_ns_tag(current->nsproxy->net_ns->ns.tag);
   128			selectivity++;
   129		}
   130		if (ns_mask & KEY_SERVICE_NS_CGROUP) {
   131			svc->cgroup_ns = get_ns_tag(current->nsproxy->cgroup_ns->ns.tag);
   132			selectivity++;
   133		}
   134	
   135		svc->selectivity = selectivity;
   136		return svc;
   137	
   138	err_keyring:
   139		key_put(svc->queue_keyring);
   140	err_svc:
   141		kfree(svc);
   142		return ERR_PTR(ret);
   143	}
   144	
   145	/*
   146	 * Install a request_key service into the user namespace's list
   147	 */
   148	static int install_key_service(struct user_namespace *user_ns,
   149				       struct request_key_service *svc)
   150	{
   151		struct request_key_service *p;
   152		struct hlist_node **pp;
   153		int ret = 0;
   154	
   155		spin_lock(&user_ns->request_key_services_lock);
   156	
   157		/* The services list is kept in order of selectivity.  The more exact
   158		 * matches a service requires, the earlier it is in the list.
   159		 */
   160		for (pp = &user_ns->request_key_services.first; *pp; pp = &(*pp)->next) {
   161			p = hlist_entry(*pp, struct request_key_service, user_ns_link);
   162			if (p->selectivity < svc->selectivity)
   163				goto insert_before;
   164			if (p->selectivity > svc->selectivity)
   165				continue;
   166			if (memcmp(p->type, svc->type, sizeof(p->type)) == 0 &&
   167			    p->uts_ns == svc->uts_ns &&
   168			    p->ipc_ns == svc->ipc_ns &&
   169			    p->mnt_ns == svc->mnt_ns &&
   170			    p->pid_ns == svc->pid_ns &&
   171			    p->net_ns == svc->net_ns &&
   172			    p->cgroup_ns == svc->cgroup_ns)
   173				goto duplicate;
   174		}
   175	
   176		svc->user_ns_link.pprev = pp;
 > 177		rcu_assign_pointer(*pp, &svc->user_ns_link);
   178		goto out;
   179	
   180	insert_before:
   181		hlist_add_before_rcu(&svc->user_ns_link, &p->user_ns_link);
   182		goto out;
   183	
   184	duplicate:
   185		free_key_service(svc);
   186		ret = -EEXIST;
   187	out:
   188		spin_unlock(&user_ns->request_key_services_lock);
   189		return ret;
   190	}
   191	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 23971 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] Add namespace tags that can be used for matching without pinning a ns
  2021-02-04 17:47 ` [PATCH 1/2] Add namespace tags that can be used for matching without pinning a ns David Howells
@ 2021-02-04 20:14   ` kernel test robot
  2021-02-04 20:58   ` kernel test robot
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: kernel test robot @ 2021-02-04 20:14 UTC (permalink / raw)
  To: David Howells, sprabhu
  Cc: kbuild-all, dhowells, Jarkko Sakkinen, christian, selinux,
	keyrings, linux-api, linux-fsdevel, linux-security-module,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1747 bytes --]

Hi David,

I love your patch! Yet something to improve:

[auto build test ERROR on cgroup/for-next]
[also build test ERROR on dhowells-fs/fscache-next linus/master v5.11-rc6]
[cannot apply to security/next-testing tip/timers/core next-20210125]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/David-Howells/keys-request_key-interception-in-containers/20210205-015946
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-next
config: parisc-randconfig-r001-20210204 (attached as .config)
compiler: hppa-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/9ec38626d1262923d124035eeb6e2fdaa181b6c4
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review David-Howells/keys-request_key-interception-in-containers/20210205-015946
        git checkout 9ec38626d1262923d124035eeb6e2fdaa181b6c4
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=parisc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> hppa-linux-ld: net/core/net_namespace.o:(.data+0x108): undefined reference to `netns_operations'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 26279 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] Add namespace tags that can be used for matching without pinning a ns
  2021-02-04 17:47 ` [PATCH 1/2] Add namespace tags that can be used for matching without pinning a ns David Howells
  2021-02-04 20:14   ` kernel test robot
@ 2021-02-04 20:58   ` kernel test robot
  2021-02-05  2:46   ` Jarkko Sakkinen
  2021-02-05  8:25   ` David Howells
  3 siblings, 0 replies; 9+ messages in thread
From: kernel test robot @ 2021-02-04 20:58 UTC (permalink / raw)
  To: David Howells, sprabhu
  Cc: kbuild-all, dhowells, Jarkko Sakkinen, christian, selinux,
	keyrings, linux-api, linux-fsdevel, linux-security-module,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1727 bytes --]

Hi David,

I love your patch! Yet something to improve:

[auto build test ERROR on cgroup/for-next]
[also build test ERROR on dhowells-fs/fscache-next linus/master v5.11-rc6]
[cannot apply to security/next-testing tip/timers/core next-20210125]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/David-Howells/keys-request_key-interception-in-containers/20210205-015946
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-next
config: m68k-defconfig (attached as .config)
compiler: m68k-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/9ec38626d1262923d124035eeb6e2fdaa181b6c4
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review David-Howells/keys-request_key-interception-in-containers/20210205-015946
        git checkout 9ec38626d1262923d124035eeb6e2fdaa181b6c4
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=m68k 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> m68k-linux-ld: net/core/net_namespace.o:(.data+0xac): undefined reference to `netns_operations'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 17029 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] Add namespace tags that can be used for matching without pinning a ns
  2021-02-04 17:47 ` [PATCH 1/2] Add namespace tags that can be used for matching without pinning a ns David Howells
  2021-02-04 20:14   ` kernel test robot
  2021-02-04 20:58   ` kernel test robot
@ 2021-02-05  2:46   ` Jarkko Sakkinen
  2021-02-05  8:25   ` David Howells
  3 siblings, 0 replies; 9+ messages in thread
From: Jarkko Sakkinen @ 2021-02-05  2:46 UTC (permalink / raw)
  To: David Howells
  Cc: sprabhu, christian, selinux, keyrings, linux-api, linux-fsdevel,
	linux-security-module, linux-kernel, containers

On Thu, Feb 04, 2021 at 05:47:39PM +0000, David Howells wrote:
> Add a ns tag struct that consists of just a refcount.  It's address can be
> used to compare namespaces without the need to pin a namespace.  Just the
> tag needs pinning.
> 
> Signed-off-by: David Howells <dhowells@redhat.com>
> ---
> 
>  fs/namespace.c            |   18 ++++++++----------
>  include/linux/ns_common.h |   23 +++++++++++++++++++++++
>  include/linux/proc_ns.h   |   38 +++++++++++++++++++++++++++++++++++---
>  init/version.c            |    9 ++++++++-
>  ipc/msgutil.c             |    7 ++++++-
>  ipc/namespace.c           |    8 +++-----
>  kernel/cgroup/cgroup.c    |    5 +++++
>  kernel/cgroup/namespace.c |    6 +++---
>  kernel/pid.c              |    5 +++++
>  kernel/pid_namespace.c    |   18 +++++++++---------
>  kernel/time/namespace.c   |   13 +++++--------
>  kernel/user.c             |    5 +++++
>  kernel/user_namespace.c   |    7 +++----
>  kernel/utsname.c          |   24 +++++++++++++-----------
>  net/core/net_namespace.c  |   38 +++++++++++++++-----------------------
>  15 files changed, 146 insertions(+), 78 deletions(-)
> 
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 9d33909d0f9e..f8da9be8c6f7 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -3238,10 +3238,9 @@ static void dec_mnt_namespaces(struct ucounts *ucounts)
>  
>  static void free_mnt_ns(struct mnt_namespace *ns)
>  {
> -	if (!is_anon_ns(ns))
> -		ns_free_inum(&ns->ns);
>  	dec_mnt_namespaces(ns->ucounts);
>  	put_user_ns(ns->user_ns);
> +	destroy_ns_common(&ns->ns);
>  	kfree(ns);
>  }
>  
> @@ -3269,18 +3268,17 @@ static struct mnt_namespace *alloc_mnt_ns(struct user_namespace *user_ns, bool a
>  		dec_mnt_namespaces(ucounts);
>  		return ERR_PTR(-ENOMEM);
>  	}
> -	if (!anon) {
> -		ret = ns_alloc_inum(&new_ns->ns);
> -		if (ret) {
> -			kfree(new_ns);
> -			dec_mnt_namespaces(ucounts);
> -			return ERR_PTR(ret);
> -		}
> +
> +	ret = init_ns_common(&new_ns->ns, anon);
> +	if (ret) {
> +		destroy_ns_common(&new_ns->ns);
> +		kfree(new_ns);
> +		dec_mnt_namespaces(ucounts);
> +		return ERR_PTR(ret);
>  	}
>  	new_ns->ns.ops = &mntns_operations;
>  	if (!anon)
>  		new_ns->seq = atomic64_add_return(1, &mnt_ns_seq);
> -	refcount_set(&new_ns->ns.count, 1);
>  	INIT_LIST_HEAD(&new_ns->list);
>  	init_waitqueue_head(&new_ns->poll);
>  	spin_lock_init(&new_ns->ns_lock);
> diff --git a/include/linux/ns_common.h b/include/linux/ns_common.h
> index 0f1d024bd958..45174ad8a435 100644
> --- a/include/linux/ns_common.h
> +++ b/include/linux/ns_common.h
> @@ -3,14 +3,37 @@
>  #define _LINUX_NS_COMMON_H
>  
>  #include <linux/refcount.h>
> +#include <linux/slab.h>
>  
>  struct proc_ns_operations;
>  
> +/*
> + * Comparable tag for namespaces so that namespaces don't have to be pinned by
> + * something that wishes to detect if a namespace matches a criterion.
> + */
> +struct ns_tag {
> +	refcount_t	usage;

Is that indentation necessary? I'd put just a space.

> +};
> +
>  struct ns_common {
>  	atomic_long_t stashed;
>  	const struct proc_ns_operations *ops;
> +	struct ns_tag *tag;
>  	unsigned int inum;
>  	refcount_t count;
>  };
>  
> +static inline struct ns_tag *get_ns_tag(struct ns_tag *tag)
> +{
> +	if (tag)
> +		refcount_inc(&tag->usage);
> +	return tag;
> +}
> +
> +static inline void put_ns_tag(struct ns_tag *tag)
> +{
> +	if (tag && refcount_dec_and_test(&tag->usage))
> +		kfree(tag);
> +}
> +
>  #endif
> diff --git a/include/linux/proc_ns.h b/include/linux/proc_ns.h
> index 75807ecef880..9fb7eb403923 100644
> --- a/include/linux/proc_ns.h
> +++ b/include/linux/proc_ns.h
> @@ -64,13 +64,45 @@ static inline void proc_free_inum(unsigned int inum) {}
>  
>  #endif /* CONFIG_PROC_FS */
>  
> -static inline int ns_alloc_inum(struct ns_common *ns)
> +/**
> + * init_ns_common - Initialise the common part of a namespace

Nit: init_ns_common()

> + * @ns: The namespace to initialise
> + * @anon: The namespace will be anonymous
> + *
> + * Set up the common part of a namespace, assigning an inode number and
> + * creating a tag.  Returns 0 on success and a negative error code on failure.
> + * On failure, the caller must call destroy_ns_common().

I've used lately (e.g. arch/x86/kernel/cpu/sgx/ioctl.c) along the lines:

* Return:
* - 0:          Initialization was successful.
* - -ENOMEM:    Out of memory.

Looking at the implementation, I guess this is a complete representation of
what it can return?

The driving point here is that this nicely lines up when rendered with
"make htmldocs".

> + */
> +static inline int init_ns_common(struct ns_common *ns, bool anon)
>  {
> +	struct ns_tag *tag;
> +
> +	tag = kzalloc(sizeof(*tag), GFP_KERNEL);
> +	if (!tag)
> +		return -ENOMEM;
> +
> +	refcount_set(&tag->usage, 1);
> +	ns->tag = tag;
> +	ns->inum = 0;
>  	atomic_long_set(&ns->stashed, 0);
> -	return proc_alloc_inum(&ns->inum);
> +	refcount_set(&ns->count, 1);
> +
> +	return anon ? 0 : proc_alloc_inum(&ns->inum);
>  }
>  
> -#define ns_free_inum(ns) proc_free_inum((ns)->inum)
> +/**
> + * destroy_ns_common - Clean up the common part of a namespace

Nit: destroy_ns_common()

/Jarkko

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] Add namespace tags that can be used for matching without pinning a ns
  2021-02-04 17:47 ` [PATCH 1/2] Add namespace tags that can be used for matching without pinning a ns David Howells
                     ` (2 preceding siblings ...)
  2021-02-05  2:46   ` Jarkko Sakkinen
@ 2021-02-05  8:25   ` David Howells
  2021-02-07 23:55     ` Jarkko Sakkinen
  3 siblings, 1 reply; 9+ messages in thread
From: David Howells @ 2021-02-05  8:25 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: dhowells, sprabhu, christian, selinux, keyrings, linux-api,
	linux-fsdevel, linux-security-module, linux-kernel, containers

Jarkko Sakkinen <jarkko@kernel.org> wrote:

> > + * init_ns_common - Initialise the common part of a namespace
> 
> Nit: init_ns_common()

Interesting.  The majority of code doesn't put the brackets in.

> I've used lately (e.g. arch/x86/kernel/cpu/sgx/ioctl.c) along the lines:
> 
> * Return:
> * - 0:          Initialization was successful.
> * - -ENOMEM:    Out of memory.

Actually, looking at kernel-doc.rst, this isn't necessarily the recommended
approach as it will much everything into one line, complete with dashes, and
can't handle splitting over lines.  You probably meant:

      * Return:
      * * 0		- OK to runtime suspend the device
      * * -EBUSY	- Device should not be runtime suspended

> * Return:
> * - 0:          Initialization was successful.
> * - -ENOMEM:    Out of memory.
> 
> Looking at the implementation, I guess this is a complete representation of
> what it can return?

It isn't.  It can return at least -ENOSPC as well, but it's awkward detailing
the errors from functions it calls since they can change and then the
description here is wrong.  I'm not sure there's a perfect answer to that.

David


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] Add namespace tags that can be used for matching without pinning a ns
  2021-02-05  8:25   ` David Howells
@ 2021-02-07 23:55     ` Jarkko Sakkinen
  0 siblings, 0 replies; 9+ messages in thread
From: Jarkko Sakkinen @ 2021-02-07 23:55 UTC (permalink / raw)
  To: David Howells, mchehab+huawei
  Cc: sprabhu, christian, selinux, keyrings, linux-api, linux-fsdevel,
	linux-security-module, linux-kernel, containers

On Fri, Feb 05, 2021 at 08:25:35AM +0000, David Howells wrote:
> Jarkko Sakkinen <jarkko@kernel.org> wrote:
> 
> > > + * init_ns_common - Initialise the common part of a namespace
> > 
> > Nit: init_ns_common()
> 
> Interesting.  The majority of code doesn't put the brackets in.
> 
> > I've used lately (e.g. arch/x86/kernel/cpu/sgx/ioctl.c) along the lines:
> > 
> > * Return:
> > * - 0:          Initialization was successful.
> > * - -ENOMEM:    Out of memory.
> 
> Actually, looking at kernel-doc.rst, this isn't necessarily the recommended
> approach as it will much everything into one line, complete with dashes, and
> can't handle splitting over lines.  You probably meant:
> 
>       * Return:
>       * * 0		- OK to runtime suspend the device
>       * * -EBUSY	- Device should not be runtime suspended

A line beginning with dash, lines up just as well, as one beginning with
an asterisk. I've also tested this with "make htmldocs".

This is Mauro's response to my recent patch:

https://lore.kernel.org/lkml/20210125105353.5c695d42@coco.lan/

So, what I can make up from this is that they are equally good
alternatives.

What I'm not still fully registering is the dash after the return value.

I mean double comma is used after parameter. Why this weird dash syntax
is used after return value I have no idea, and the kernel-doc.rst does
not provide any explanation.

> 
> > * Return:
> > * - 0:          Initialization was successful.
> > * - -ENOMEM:    Out of memory.
> > 
> > Looking at the implementation, I guess this is a complete representation of
> > what it can return?
> 
> It isn't.  It can return at least -ENOSPC as well, but it's awkward detailing
> the errors from functions it calls since they can change and then the
> description here is wrong.  I'm not sure there's a perfect answer to that.
> 
> David

What if you just add this as the last entry:

* * -errno:     Otherwise.

/Jarkko

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-02-07 23:56 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-04 17:47 [RFC][PATCH 0/2] keys: request_key() interception in containers David Howells
2021-02-04 17:47 ` [PATCH 1/2] Add namespace tags that can be used for matching without pinning a ns David Howells
2021-02-04 20:14   ` kernel test robot
2021-02-04 20:58   ` kernel test robot
2021-02-05  2:46   ` Jarkko Sakkinen
2021-02-05  8:25   ` David Howells
2021-02-07 23:55     ` Jarkko Sakkinen
2021-02-04 17:47 ` [PATCH 2/2] keys: Allow request_key upcalls from a container to be intercepted David Howells
2021-02-04 19:55   ` kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).