linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] cgroup: Reorganize css_set_lock and kernfs path processing
@ 2022-09-05 17:09 Michal Koutný
  2022-09-06 17:13 ` Tejun Heo
  0 siblings, 1 reply; 7+ messages in thread
From: Michal Koutný @ 2022-09-05 17:09 UTC (permalink / raw)
  To: cgroups, linux-kernel; +Cc: Tejun Heo, Zefan Li, Johannes Weiner, Dan Carpenter

The commit 74e4b956eb1c incorrectly wrapped kernfs_walk_and_get
(might_sleep) under css_set_lock (spinlock). css_set_lock is needed by
__cset_cgroup_from_root to ensure stable cset->cgrp_links. The returned
cgroup object is pinned by the css_set (*).

Because current cannot switch namespace asynchronously, the css_set is
also pinned by ns_proxy->cgroup_ns (regardless of current's cgroup
migration).

Kernfs code that traverses paths with relative root_cgroup not need
css_set_lock.

(*) Except for root cgroups. The default hierarchy root (under which
cgroup id and path resolution only happens) is eternal so it's moot.
cgroup_show_path (VFS callback) is expected to be synchronized (**) wrt
kill_sb (VFS callback) (mnt_namespace.list with namespace_sem).
(**) If not, it's still an independent issue from this and the fixed one.

Fixes: 74e4b956eb1c: ("cgroup: Honor caller's cgroup NS when resolving path")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Michal Koutný <mkoutny@suse.com>
---
 kernel/cgroup/cgroup.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

I considered adding get_cgroup() into current_cgns_cgroup_from_root to
avoid reliance on the transitive pinning via css_set. 
After reasoning about no asynchronous NS switch and v1 hiearchies kill_sb it
didn't seem to bring that much benefit (it didn't compose well with
BUG_ON(!cgrp) neither).

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index e0b72eb5d283..8c9497f01332 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1391,11 +1391,16 @@ static void cgroup_destroy_root(struct cgroup_root *root)
 	cgroup_free_root(root);
 }
 
+/*
+ * Returned cgroup is without refcount but it's valid as long as cset pins it.
+ */
 static inline struct cgroup *__cset_cgroup_from_root(struct css_set *cset,
 					    struct cgroup_root *root)
 {
 	struct cgroup *res_cgroup = NULL;
 
+	lockdep_assert_held(&css_set_lock);
+
 	if (cset == &init_css_set) {
 		res_cgroup = &root->cgrp;
 	} else if (root == &cgrp_dfl_root) {
@@ -1426,8 +1431,6 @@ current_cgns_cgroup_from_root(struct cgroup_root *root)
 	struct cgroup *res = NULL;
 	struct css_set *cset;
 
-	lockdep_assert_held(&css_set_lock);
-
 	rcu_read_lock();
 
 	cset = current->nsproxy->cgroup_ns->root_cset;
@@ -1446,7 +1449,6 @@ static struct cgroup *cset_cgroup_from_root(struct css_set *cset,
 	struct cgroup *res = NULL;
 
 	lockdep_assert_held(&cgroup_mutex);
-	lockdep_assert_held(&css_set_lock);
 
 	res = __cset_cgroup_from_root(cset, root);
 
@@ -1861,8 +1863,8 @@ int cgroup_show_path(struct seq_file *sf, struct kernfs_node *kf_node,
 
 	spin_lock_irq(&css_set_lock);
 	ns_cgroup = current_cgns_cgroup_from_root(kf_cgroot);
-	len = kernfs_path_from_node(kf_node, ns_cgroup->kn, buf, PATH_MAX);
 	spin_unlock_irq(&css_set_lock);
+	len = kernfs_path_from_node(kf_node, ns_cgroup->kn, buf, PATH_MAX);
 
 	if (len >= PATH_MAX)
 		len = -ERANGE;
@@ -6649,8 +6651,8 @@ struct cgroup *cgroup_get_from_path(const char *path)
 
 	spin_lock_irq(&css_set_lock);
 	root_cgrp = current_cgns_cgroup_from_root(&cgrp_dfl_root);
-	kn = kernfs_walk_and_get(root_cgrp->kn, path);
 	spin_unlock_irq(&css_set_lock);
+	kn = kernfs_walk_and_get(root_cgrp->kn, path);
 	if (!kn)
 		goto out;
 

base-commit: a8c52eba880a6e8c07fc2130604f8e386b90b763
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] cgroup: Reorganize css_set_lock and kernfs path processing
  2022-09-05 17:09 [PATCH] cgroup: Reorganize css_set_lock and kernfs path processing Michal Koutný
@ 2022-09-06 17:13 ` Tejun Heo
  2022-09-28 11:33   ` [PATCH v2] " Michal Koutný
  0 siblings, 1 reply; 7+ messages in thread
From: Tejun Heo @ 2022-09-06 17:13 UTC (permalink / raw)
  To: Michal Koutný
  Cc: cgroups, linux-kernel, Zefan Li, Johannes Weiner, Dan Carpenter

Hello, Michal.

On Mon, Sep 05, 2022 at 07:09:44PM +0200, Michal Koutný wrote:
> I considered adding get_cgroup() into current_cgns_cgroup_from_root to
> avoid reliance on the transitive pinning via css_set. 
> After reasoning about no asynchronous NS switch and v1 hiearchies kill_sb it
> didn't seem to bring that much benefit (it didn't compose well with
> BUG_ON(!cgrp) neither).

I still think this is too subtle and incidental. If we go this way, we'd
need to add comments explaining why this obviously confusing pattern (lock,
find obj, unlock, use obj) is being used which goes into how the object is
directly pinned through the css_set which happens to be pinned also
because... and so on. Even if the code looks a bit uglier, I'd much prefer
straight-forward pinning.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2] cgroup: Reorganize css_set_lock and kernfs path processing
  2022-09-06 17:13 ` Tejun Heo
@ 2022-09-28 11:33   ` Michal Koutný
  2022-10-05 16:47     ` Tejun Heo
  0 siblings, 1 reply; 7+ messages in thread
From: Michal Koutný @ 2022-09-28 11:33 UTC (permalink / raw)
  To: Tejun Heo; +Cc: cgroups, linux-kernel, Zefan Li, Johannes Weiner, Dan Carpenter

The commit 74e4b956eb1c incorrectly wrapped kernfs_walk_and_get
(might_sleep) under css_set_lock (spinlock). css_set_lock is needed by
__cset_cgroup_from_root to ensure stable cset->cgrp_links but not for
kernfs_walk_and_get.

We only need to make sure that the returned root_cgrp won't be freed
under us. This is given in the case of global root because it is static
(cgrp_dfl_root.cgrp). When the root_cgrp is lower in the hierarchy, it
is pinned by cgroup_ns->root_cset (and `current` task cannot switch
namespace asynchronously so ns_proxy pins cgroup_ns).

(Note this reasoning won't hold for root cgroups in v1 hierarchies but
the path resolution works only with the default hierarchy.)

Fixes: 74e4b956eb1c: ("cgroup: Honor caller's cgroup NS when resolving path")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Michal Koutný <mkoutny@suse.com>
---
 kernel/cgroup/cgroup.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Hello.

v2: dropped changes around kernfs_path_from_node(), reworded commit
    message

I realized the pinning with reference taking won't really work
generally. The code would get the reference within RCU read section, so
it'd have to be cgroup_get_live() and if that fails there's not much to
do.

So, instead of generalization, I only post special-cased patch that
fixes the introduced bug and doesn't touch the rest.

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index c37b8265c0a3..ac71af8ef65c 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1392,11 +1392,16 @@ static void cgroup_destroy_root(struct cgroup_root *root)
 	cgroup_free_root(root);
 }
 
+/*
+ * Returned cgroup is without refcount but it's valid as long as cset pins it.
+ */
 static inline struct cgroup *__cset_cgroup_from_root(struct css_set *cset,
 					    struct cgroup_root *root)
 {
 	struct cgroup *res_cgroup = NULL;
 
+	lockdep_assert_held(&css_set_lock);
+
 	if (cset == &init_css_set) {
 		res_cgroup = &root->cgrp;
 	} else if (root == &cgrp_dfl_root) {
@@ -6673,8 +6678,8 @@ struct cgroup *cgroup_get_from_path(const char *path)
 
 	spin_lock_irq(&css_set_lock);
 	root_cgrp = current_cgns_cgroup_from_root(&cgrp_dfl_root);
-	kn = kernfs_walk_and_get(root_cgrp->kn, path);
 	spin_unlock_irq(&css_set_lock);
+	kn = kernfs_walk_and_get(root_cgrp->kn, path);
 	if (!kn)
 		goto out;
 
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] cgroup: Reorganize css_set_lock and kernfs path processing
  2022-09-28 11:33   ` [PATCH v2] " Michal Koutný
@ 2022-10-05 16:47     ` Tejun Heo
  2022-10-10  8:22       ` Michal Koutný
  2022-10-10  8:29       ` [PATCH v3] " Michal Koutný
  0 siblings, 2 replies; 7+ messages in thread
From: Tejun Heo @ 2022-10-05 16:47 UTC (permalink / raw)
  To: Michal Koutný
  Cc: cgroups, linux-kernel, Zefan Li, Johannes Weiner, Dan Carpenter

Hello,

On Wed, Sep 28, 2022 at 01:33:16PM +0200, Michal Koutný wrote:
...
> I realized the pinning with reference taking won't really work
> generally. The code would get the reference within RCU read section, so
> it'd have to be cgroup_get_live() and if that fails there's not much to
> do.

Hmm... isn't current's root cgrp guaranteed to be alive? How would
cgroup_get_live() fail? Also, shouldn't cgroup_get() enough for path
walking?

> @@ -6673,8 +6678,8 @@ struct cgroup *cgroup_get_from_path(const char *path)
>  
>  	spin_lock_irq(&css_set_lock);
>  	root_cgrp = current_cgns_cgroup_from_root(&cgrp_dfl_root);
> -	kn = kernfs_walk_and_get(root_cgrp->kn, path);
>  	spin_unlock_irq(&css_set_lock);
> +	kn = kernfs_walk_and_get(root_cgrp->kn, path);

If you really wanna do it this way, can you please add a detailed comment
here why this is safe? But I'd prefer just doing a strightforward ref
inc/dec around it.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] cgroup: Reorganize css_set_lock and kernfs path processing
  2022-10-05 16:47     ` Tejun Heo
@ 2022-10-10  8:22       ` Michal Koutný
  2022-10-10  8:29       ` [PATCH v3] " Michal Koutný
  1 sibling, 0 replies; 7+ messages in thread
From: Michal Koutný @ 2022-10-10  8:22 UTC (permalink / raw)
  To: Tejun Heo; +Cc: cgroups, linux-kernel, Zefan Li, Johannes Weiner, Dan Carpenter

On Wed, Oct 05, 2022 at 06:47:31AM -1000, Tejun Heo <tj@kernel.org> wrote:
> Hmm... isn't current's root cgrp guaranteed to be alive?

True on the default hierarchy. v1 hierarchies (singular ones with root
cgroup only) can be unmounted.

> How would cgroup_get_live() fail?

kill_sb is not synchronized via css_set_lock.

> Also, shouldn't cgroup_get() enough for path walking?

If ref count dropped to zero, release callback (css_release_work_fn)
would be queued, cgroup_get would increase the refcount but it won't
cancel this.

Note these were concerns with the first version of the patch that also touched
cgroup_show_path() (that processes v1 hierarchies too). With the
reduction I avoided this.

Strictly speaking, even css_set_lock is unnecessary around
current_cgns_cgroup_from_root() when called with cgrp_dfl_root as the
cset->cgrp_links is not traversed at all.

> If you really wanna do it this way, can you please add a detailed comment
> here why this is safe? But I'd prefer just doing a strightforward ref
> inc/dec around it.

I see the the extraction under css_set_lock without inc/dec turns out
confusing. Let me expand the idea above and avoid css_set_lock
completely (another message).

Michal

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v3] cgroup: Reorganize css_set_lock and kernfs path processing
  2022-10-05 16:47     ` Tejun Heo
  2022-10-10  8:22       ` Michal Koutný
@ 2022-10-10  8:29       ` Michal Koutný
  2022-10-10 20:24         ` Tejun Heo
  1 sibling, 1 reply; 7+ messages in thread
From: Michal Koutný @ 2022-10-10  8:29 UTC (permalink / raw)
  To: tj; +Cc: cgroups, dan.carpenter, hannes, linux-kernel, lizefan.x, mkoutny

The commit 74e4b956eb1c incorrectly wrapped kernfs_walk_and_get
(might_sleep) under css_set_lock (spinlock). css_set_lock is needed by
__cset_cgroup_from_root to ensure stable cset->cgrp_links but not for
kernfs_walk_and_get.

We only need to make sure that the returned root_cgrp won't be freed
under us. This is given in the case of global root because it is static
(cgrp_dfl_root.cgrp). When the root_cgrp is lower in the hierarchy, it
is pinned by cgroup_ns->root_cset (and `current` task cannot switch
namespace asynchronously so ns_proxy pins cgroup_ns).

Note this reasoning won't hold for root cgroups in v1 hierarchies,
therefore create a special-cased helper function just for the default
hierarchy.

Fixes: 74e4b956eb1c ("cgroup: Honor caller's cgroup NS when resolving path")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Michal Koutný <mkoutny@suse.com>
---
 kernel/cgroup/cgroup.c | 41 ++++++++++++++++++++++++++++-------------
 1 file changed, 28 insertions(+), 13 deletions(-)

Tested only with test_core selftests (i.e. the path/id resolution not
checed, only the migration code).

diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index c37b8265c0a3..a7ec96f26997 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1392,6 +1392,9 @@ static void cgroup_destroy_root(struct cgroup_root *root)
 	cgroup_free_root(root);
 }
 
+/*
+ * Returned cgroup is without refcount but it's valid as long as cset pins it.
+ */
 static inline struct cgroup *__cset_cgroup_from_root(struct css_set *cset,
 					    struct cgroup_root *root)
 {
@@ -1403,6 +1406,7 @@ static inline struct cgroup *__cset_cgroup_from_root(struct css_set *cset,
 		res_cgroup = cset->dfl_cgrp;
 	} else {
 		struct cgrp_cset_link *link;
+		lockdep_assert_held(&css_set_lock);
 
 		list_for_each_entry(link, &cset->cgrp_links, cgrp_link) {
 			struct cgroup *c = link->cgrp;
@@ -1414,6 +1418,7 @@ static inline struct cgroup *__cset_cgroup_from_root(struct css_set *cset,
 		}
 	}
 
+	BUG_ON(!res_cgroup);
 	return res_cgroup;
 }
 
@@ -1436,23 +1441,37 @@ current_cgns_cgroup_from_root(struct cgroup_root *root)
 
 	rcu_read_unlock();
 
-	BUG_ON(!res);
 	return res;
 }
 
+/*
+ * look up cgroup associated with current task's cgroup namespace on the
+ * default hierarchy
+ *
+ * Note this doesn't need locks unlike generic colleagues. Why?
+ * - Internal rcu_read_lock is unnecessary because we don't dereference any rcu
+ *   pointers.
+ * - css_set_lock is not needed because we just read cset->dfl_cgrp.
+ * - As a bonus returned cgrp is pinned with the current because it cannot
+ *   switch cgroup_ns asynchronously.
+ */
+static struct cgroup *
+current_cgns_cgroup_dfl(void)
+{
+	struct css_set *cset;
+
+	cset = current->nsproxy->cgroup_ns->root_cset;
+	return __cset_cgroup_from_root(cset, &cgrp_dfl_root);
+}
+
 /* look up cgroup associated with given css_set on the specified hierarchy */
 static struct cgroup *cset_cgroup_from_root(struct css_set *cset,
 					    struct cgroup_root *root)
 {
-	struct cgroup *res = NULL;
-
 	lockdep_assert_held(&cgroup_mutex);
 	lockdep_assert_held(&css_set_lock);
 
-	res = __cset_cgroup_from_root(cset, root);
-
-	BUG_ON(!res);
-	return res;
+	return __cset_cgroup_from_root(cset, root);
 }
 
 /*
@@ -6085,9 +6104,7 @@ struct cgroup *cgroup_get_from_id(u64 id)
 	if (!cgrp)
 		return ERR_PTR(-ENOENT);
 
-	spin_lock_irq(&css_set_lock);
-	root_cgrp = current_cgns_cgroup_from_root(&cgrp_dfl_root);
-	spin_unlock_irq(&css_set_lock);
+	root_cgrp = current_cgns_cgroup_dfl();
 	if (!cgroup_is_descendant(cgrp, root_cgrp)) {
 		cgroup_put(cgrp);
 		return ERR_PTR(-ENOENT);
@@ -6671,10 +6688,8 @@ struct cgroup *cgroup_get_from_path(const char *path)
 	struct cgroup *cgrp = ERR_PTR(-ENOENT);
 	struct cgroup *root_cgrp;
 
-	spin_lock_irq(&css_set_lock);
-	root_cgrp = current_cgns_cgroup_from_root(&cgrp_dfl_root);
+	root_cgrp = current_cgns_cgroup_dfl();
 	kn = kernfs_walk_and_get(root_cgrp->kn, path);
-	spin_unlock_irq(&css_set_lock);
 	if (!kn)
 		goto out;
 
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] cgroup: Reorganize css_set_lock and kernfs path processing
  2022-10-10  8:29       ` [PATCH v3] " Michal Koutný
@ 2022-10-10 20:24         ` Tejun Heo
  0 siblings, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2022-10-10 20:24 UTC (permalink / raw)
  To: Michal Koutný
  Cc: cgroups, dan.carpenter, hannes, linux-kernel, lizefan.x

On Mon, Oct 10, 2022 at 10:29:18AM +0200, Michal Koutný wrote:
> The commit 74e4b956eb1c incorrectly wrapped kernfs_walk_and_get
> (might_sleep) under css_set_lock (spinlock). css_set_lock is needed by
> __cset_cgroup_from_root to ensure stable cset->cgrp_links but not for
> kernfs_walk_and_get.
> 
> We only need to make sure that the returned root_cgrp won't be freed
> under us. This is given in the case of global root because it is static
> (cgrp_dfl_root.cgrp). When the root_cgrp is lower in the hierarchy, it
> is pinned by cgroup_ns->root_cset (and `current` task cannot switch
> namespace asynchronously so ns_proxy pins cgroup_ns).
> 
> Note this reasoning won't hold for root cgroups in v1 hierarchies,
> therefore create a special-cased helper function just for the default
> hierarchy.
> 
> Fixes: 74e4b956eb1c ("cgroup: Honor caller's cgroup NS when resolving path")
> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> Signed-off-by: Michal Koutný <mkoutny@suse.com>

Applied to cgroup/for-6.1-fixes w/ trivial comment / line break adjustments.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-10-10 20:26 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-05 17:09 [PATCH] cgroup: Reorganize css_set_lock and kernfs path processing Michal Koutný
2022-09-06 17:13 ` Tejun Heo
2022-09-28 11:33   ` [PATCH v2] " Michal Koutný
2022-10-05 16:47     ` Tejun Heo
2022-10-10  8:22       ` Michal Koutný
2022-10-10  8:29       ` [PATCH v3] " Michal Koutný
2022-10-10 20:24         ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).