linux-security-module.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Howells <dhowells@redhat.com>
To: viro@zeniv.linux.org.uk
Cc: linux-fsdevel@vger.kernel.org, dhowells@redhat.com,
	torvalds@linux-foundation.org, ebiederm@xmission.com,
	linux-security-module@vger.kernel.org
Subject: [PATCH 02/43] cgroup: saner refcounting for cgroup_root
Date: Tue, 19 Feb 2019 16:28:13 +0000	[thread overview]
Message-ID: <155059369332.12449.14447911414340183902.stgit@warthog.procyon.org.uk> (raw)
In-Reply-To: <155059366914.12449.4669870128936536848.stgit@warthog.procyon.org.uk>

From: Al Viro <viro@zeniv.linux.org.uk>

* make the reference from superblock to cgroup_root counting -
do cgroup_put() in cgroup_kill_sb() whether we'd done
percpu_ref_kill() or not; matching grab is done when we allocate
a new root.  That gives the same refcounting rules for all callers
of cgroup_do_mount() - a reference to cgroup_root has been grabbed
by caller and it either is transferred to new superblock or dropped.

* have cgroup_kill_sb() treat an already killed refcount as "just
don't bother killing it, then".

* after successful cgroup_do_mount() have cgroup1_mount() recheck
if we'd raced with mount/umount from somebody else and cgroup_root
got killed.  In that case we drop the superblock and bugger off
with -ERESTARTSYS, same as if we'd found it in the list already
dying.

* don't bother with delayed initialization of refcount - it's
unreliable and not needed.  No need to prevent attempts to bump
the refcount if we find cgroup_root of another mount in progress -
sget will reuse an existing superblock just fine and if the
other sb manages to die before we get there, we'll catch
that immediately after cgroup_do_mount().

* don't bother with kernfs_pin_sb() - no need for doing that
either.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---

 kernel/cgroup/cgroup-internal.h |    2 +
 kernel/cgroup/cgroup-v1.c       |   58 +++++++++------------------------------
 kernel/cgroup/cgroup.c          |   16 +++++------
 3 files changed, 21 insertions(+), 55 deletions(-)

diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h
index c950864016e2..c9a35f09e4b9 100644
--- a/kernel/cgroup/cgroup-internal.h
+++ b/kernel/cgroup/cgroup-internal.h
@@ -198,7 +198,7 @@ int cgroup_path_ns_locked(struct cgroup *cgrp, char *buf, size_t buflen,
 
 void cgroup_free_root(struct cgroup_root *root);
 void init_cgroup_root(struct cgroup_root *root, struct cgroup_sb_opts *opts);
-int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask, int ref_flags);
+int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask);
 int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask);
 struct dentry *cgroup_do_mount(struct file_system_type *fs_type, int flags,
 			       struct cgroup_root *root, unsigned long magic,
diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
index 583b969b0c0e..f94a7229974e 100644
--- a/kernel/cgroup/cgroup-v1.c
+++ b/kernel/cgroup/cgroup-v1.c
@@ -1116,13 +1116,11 @@ struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags,
 			     void *data, unsigned long magic,
 			     struct cgroup_namespace *ns)
 {
-	struct super_block *pinned_sb = NULL;
 	struct cgroup_sb_opts opts;
 	struct cgroup_root *root;
 	struct cgroup_subsys *ss;
 	struct dentry *dentry;
 	int i, ret;
-	bool new_root = false;
 
 	cgroup_lock_and_drain_offline(&cgrp_dfl_root.cgrp);
 
@@ -1184,29 +1182,6 @@ struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags,
 		if (root->flags ^ opts.flags)
 			pr_warn("new mount options do not match the existing superblock, will be ignored\n");
 
-		/*
-		 * We want to reuse @root whose lifetime is governed by its
-		 * ->cgrp.  Let's check whether @root is alive and keep it
-		 * that way.  As cgroup_kill_sb() can happen anytime, we
-		 * want to block it by pinning the sb so that @root doesn't
-		 * get killed before mount is complete.
-		 *
-		 * With the sb pinned, tryget_live can reliably indicate
-		 * whether @root can be reused.  If it's being killed,
-		 * drain it.  We can use wait_queue for the wait but this
-		 * path is super cold.  Let's just sleep a bit and retry.
-		 */
-		pinned_sb = kernfs_pin_sb(root->kf_root, NULL);
-		if (IS_ERR(pinned_sb) ||
-		    !percpu_ref_tryget_live(&root->cgrp.self.refcnt)) {
-			mutex_unlock(&cgroup_mutex);
-			if (!IS_ERR_OR_NULL(pinned_sb))
-				deactivate_super(pinned_sb);
-			msleep(10);
-			ret = restart_syscall();
-			goto out_free;
-		}
-
 		ret = 0;
 		goto out_unlock;
 	}
@@ -1232,15 +1207,20 @@ struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags,
 		ret = -ENOMEM;
 		goto out_unlock;
 	}
-	new_root = true;
 
 	init_cgroup_root(root, &opts);
 
-	ret = cgroup_setup_root(root, opts.subsys_mask, PERCPU_REF_INIT_DEAD);
+	ret = cgroup_setup_root(root, opts.subsys_mask);
 	if (ret)
 		cgroup_free_root(root);
 
 out_unlock:
+	if (!ret && !percpu_ref_tryget_live(&root->cgrp.self.refcnt)) {
+		mutex_unlock(&cgroup_mutex);
+		msleep(10);
+		ret = restart_syscall();
+		goto out_free;
+	}
 	mutex_unlock(&cgroup_mutex);
 out_free:
 	kfree(opts.release_agent);
@@ -1252,25 +1232,13 @@ struct dentry *cgroup1_mount(struct file_system_type *fs_type, int flags,
 	dentry = cgroup_do_mount(&cgroup_fs_type, flags, root,
 				 CGROUP_SUPER_MAGIC, ns);
 
-	/*
-	 * There's a race window after we release cgroup_mutex and before
-	 * allocating a superblock. Make sure a concurrent process won't
-	 * be able to re-use the root during this window by delaying the
-	 * initialization of root refcnt.
-	 */
-	if (new_root) {
-		mutex_lock(&cgroup_mutex);
-		percpu_ref_reinit(&root->cgrp.self.refcnt);
-		mutex_unlock(&cgroup_mutex);
+	if (!IS_ERR(dentry) && percpu_ref_is_dying(&root->cgrp.self.refcnt)) {
+		struct super_block *sb = dentry->d_sb;
+		dput(dentry);
+		deactivate_locked_super(sb);
+		msleep(10);
+		dentry = ERR_PTR(restart_syscall());
 	}
-
-	/*
-	 * If @pinned_sb, we're reusing an existing root and holding an
-	 * extra ref on its sb.  Mount is complete.  Put the extra ref.
-	 */
-	if (pinned_sb)
-		deactivate_super(pinned_sb);
-
 	return dentry;
 }
 
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 503bba3c4bae..7fd9f22e406d 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1927,7 +1927,7 @@ void init_cgroup_root(struct cgroup_root *root, struct cgroup_sb_opts *opts)
 		set_bit(CGRP_CPUSET_CLONE_CHILDREN, &root->cgrp.flags);
 }
 
-int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask, int ref_flags)
+int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask)
 {
 	LIST_HEAD(tmp_links);
 	struct cgroup *root_cgrp = &root->cgrp;
@@ -1944,7 +1944,7 @@ int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask, int ref_flags)
 	root_cgrp->ancestor_ids[0] = ret;
 
 	ret = percpu_ref_init(&root_cgrp->self.refcnt, css_release,
-			      ref_flags, GFP_KERNEL);
+			      0, GFP_KERNEL);
 	if (ret)
 		goto out;
 
@@ -2121,18 +2121,16 @@ static void cgroup_kill_sb(struct super_block *sb)
 	struct cgroup_root *root = cgroup_root_from_kf(kf_root);
 
 	/*
-	 * If @root doesn't have any mounts or children, start killing it.
+	 * If @root doesn't have any children, start killing it.
 	 * This prevents new mounts by disabling percpu_ref_tryget_live().
 	 * cgroup_mount() may wait for @root's release.
 	 *
 	 * And don't kill the default root.
 	 */
-	if (!list_empty(&root->cgrp.self.children) ||
-	    root == &cgrp_dfl_root)
-		cgroup_put(&root->cgrp);
-	else
+	if (list_empty(&root->cgrp.self.children) && root != &cgrp_dfl_root &&
+	    !percpu_ref_is_dying(&root->cgrp.self.refcnt))
 		percpu_ref_kill(&root->cgrp.self.refcnt);
-
+	cgroup_put(&root->cgrp);
 	kernfs_kill_sb(sb);
 }
 
@@ -5402,7 +5400,7 @@ int __init cgroup_init(void)
 	hash_add(css_set_table, &init_css_set.hlist,
 		 css_set_hash(init_css_set.subsys));
 
-	BUG_ON(cgroup_setup_root(&cgrp_dfl_root, 0, 0));
+	BUG_ON(cgroup_setup_root(&cgrp_dfl_root, 0));
 
 	mutex_unlock(&cgroup_mutex);
 


  parent reply	other threads:[~2019-02-19 16:28 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-19 16:27 [PATCH 00/43] VFS: Introduce filesystem context David Howells
2019-02-19 16:28 ` [PATCH 01/43] fix cgroup_do_mount() handling of failure exits David Howells
2019-02-19 16:28 ` David Howells [this message]
2019-02-19 16:28 ` [PATCH 03/43] kill kernfs_pin_sb() David Howells
2019-02-19 16:28 ` [PATCH 04/43] separate copying and locking mount tree on cross-userns copies David Howells
2019-02-20 18:55   ` Alan Jenkins
2019-02-26 15:44   ` David Howells
2019-02-26 17:45     ` Alan Jenkins
2019-02-19 16:29 ` [PATCH 05/43] saner handling of temporary namespaces David Howells
2019-02-19 16:29 ` [PATCH 06/43] vfs: Introduce fs_context, switch vfs_kern_mount() to it David Howells
2019-02-19 16:29 ` [PATCH 07/43] new helpers: vfs_create_mount(), fc_mount() David Howells
2019-02-19 16:29 ` [PATCH 08/43] teach vfs_get_tree() to handle subtype, switch do_new_mount() to it David Howells
2019-02-19 16:29 ` [PATCH 09/43] new helper: do_new_mount_fc() David Howells
2019-02-19 16:29 ` [PATCH 10/43] vfs_get_tree(): evict the call of security_sb_kern_mount() David Howells
2019-02-19 16:29 ` [PATCH 11/43] convert do_remount_sb() to fs_context David Howells
2019-03-22 11:19   ` Andreas Schwab
2019-03-22 11:25   ` David Howells
2019-03-22 13:28     ` Andreas Schwab
2019-03-22 14:00       ` Andreas Schwab
2019-02-19 16:30 ` [PATCH 12/43] fs_context flavour for submounts David Howells
2019-02-19 16:30 ` [PATCH 13/43] introduce fs_context methods David Howells
2019-02-19 16:30 ` [PATCH 14/43] vfs: Introduce logging functions David Howells
2019-02-19 16:30 ` [PATCH 15/43] vfs: Add configuration parser helpers David Howells
2019-03-03  2:53   ` Al Viro
2019-02-19 16:30 ` [PATCH 16/43] vfs: Add LSM hooks for the new mount API David Howells
2019-02-19 16:30 ` [PATCH 17/43] selinux: Implement the new mount API LSM hooks David Howells
2019-02-19 16:30 ` [PATCH 18/43] smack: Implement filesystem context security hooks David Howells
2019-02-19 16:30 ` [PATCH 19/43] vfs: Put security flags into the fs_context struct David Howells
2019-02-19 16:31 ` [PATCH 20/43] vfs: Implement a filesystem superblock creation/configuration context David Howells
2019-02-19 16:31 ` [PATCH 21/43] convenience helpers: vfs_get_super() and sget_fc() David Howells
2019-02-19 16:31 ` [PATCH 22/43] introduce cloning of fs_context David Howells
2019-02-19 16:31 ` [PATCH 23/43] procfs: Move proc_fill_super() to fs/proc/root.c David Howells
2019-02-19 16:31 ` [PATCH 24/43] proc: Add fs_context support to procfs David Howells
2019-02-19 16:31 ` [PATCH 25/43] ipc: Convert mqueue fs to fs_context David Howells
2019-02-19 16:31 ` [PATCH 26/43] cgroup: start switching " David Howells
2019-02-19 16:32 ` [PATCH 27/43] cgroup: fold cgroup1_mount() into cgroup1_get_tree() David Howells
2019-02-19 16:32 ` [PATCH 28/43] cgroup: take options parsing into ->parse_monolithic() David Howells
2019-02-19 16:32 ` [PATCH 29/43] cgroup1: switch to option-by-option parsing David Howells
2019-02-19 16:32 ` [PATCH 30/43] cgroup2: " David Howells
2019-02-19 16:32 ` [PATCH 31/43] cgroup: stash cgroup_root reference into cgroup_fs_context David Howells
2019-02-19 16:32 ` [PATCH 32/43] cgroup_do_mount(): massage calling conventions David Howells
2019-02-19 16:32 ` [PATCH 33/43] cgroup1_get_tree(): separate "get cgroup_root to use" into a separate helper David Howells
2019-02-19 16:33 ` [PATCH 34/43] cgroup: store a reference to cgroup_ns into cgroup_fs_context David Howells
2019-02-19 16:33 ` [PATCH 35/43] kernfs, sysfs, cgroup, intel_rdt: Support fs_context David Howells
2019-02-19 16:33 ` [PATCH 36/43] cpuset: Use fs_context David Howells
2019-02-19 16:33 ` [PATCH 37/43] hugetlbfs: Convert to fs_context David Howells
2019-02-19 16:33 ` [PATCH 38/43] vfs: Remove kern_mount_data() David Howells
2019-02-19 16:33 ` [PATCH 39/43] vfs: Provide documentation for new mount API David Howells
2019-02-19 16:34 ` [PATCH 40/43] vfs: Implement logging through fs_context David Howells
2019-02-19 16:34 ` [PATCH 41/43] vfs: Add some logging to the core users of the fs_context log David Howells
2019-02-19 16:34 ` [PATCH 42/43] afs: Add fs_context support David Howells
2019-02-19 16:34 ` [PATCH 43/43] afs: Use fs_context to pass parameters over automount David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=155059369332.12449.14447911414340183902.stgit@warthog.procyon.org.uk \
    --to=dhowells@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).