All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yafang Shao <laoar.shao@gmail.com>
To: ast@kernel.org, daniel@iogearbox.net, john.fastabend@gmail.com,
	andrii@kernel.org, martin.lau@linux.dev, song@kernel.org,
	yonghong.song@linux.dev, kpsingh@kernel.org, sdf@google.com,
	haoluo@google.com, jolsa@kernel.org, tj@kernel.org,
	lizefan.x@bytedance.com, hannes@cmpxchg.org,
	yosryahmed@google.com, mkoutny@suse.com, sinquersw@gmail.com
Cc: cgroups@vger.kernel.org, bpf@vger.kernel.org,
	Yafang Shao <laoar.shao@gmail.com>
Subject: [RFC PATCH bpf-next v2 1/9] cgroup: Make operations on the cgroup root_list RCU safe
Date: Tue, 17 Oct 2023 12:45:38 +0000	[thread overview]
Message-ID: <20231017124546.24608-2-laoar.shao@gmail.com> (raw)
In-Reply-To: <20231017124546.24608-1-laoar.shao@gmail.com>

At present, when we perform operations on the cgroup root_list, we must
hold the cgroup_mutex, which is a relatively heavyweight lock. In reality,
we can make operations on this list RCU-safe, eliminating the need to hold
the cgroup_mutex during traversal. Modifications to the list only occur in
the cgroup root setup and destroy paths, which should be infrequent in a
production environment. In contrast, traversal may occur frequently.
Therefore, making it RCU-safe would be beneficial.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 include/linux/cgroup-defs.h     |  1 +
 kernel/cgroup/cgroup-internal.h |  3 ++-
 kernel/cgroup/cgroup.c          | 17 ++++++++++-------
 3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index f1b3151ac30b..8505eeae6e41 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -558,6 +558,7 @@ struct cgroup_root {
 
 	/* A list running through the active hierarchies */
 	struct list_head root_list;
+	struct rcu_head rcu;
 
 	/* Hierarchy-specific flags */
 	unsigned int flags;
diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h
index c56071f150f2..321af20ea15f 100644
--- a/kernel/cgroup/cgroup-internal.h
+++ b/kernel/cgroup/cgroup-internal.h
@@ -170,7 +170,8 @@ extern struct list_head cgroup_roots;
 
 /* iterate across the hierarchies */
 #define for_each_root(root)						\
-	list_for_each_entry((root), &cgroup_roots, root_list)
+	list_for_each_entry_rcu((root), &cgroup_roots, root_list,	\
+				!lockdep_is_held(&cgroup_mutex))
 
 /**
  * for_each_subsys - iterate all enabled cgroup subsystems
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 1fb7f562289d..bae8f9f27792 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1313,7 +1313,7 @@ static void cgroup_exit_root_id(struct cgroup_root *root)
 
 void cgroup_free_root(struct cgroup_root *root)
 {
-	kfree(root);
+	kfree_rcu(root, rcu);
 }
 
 static void cgroup_destroy_root(struct cgroup_root *root)
@@ -1346,7 +1346,7 @@ static void cgroup_destroy_root(struct cgroup_root *root)
 	spin_unlock_irq(&css_set_lock);
 
 	if (!list_empty(&root->root_list)) {
-		list_del(&root->root_list);
+		list_del_rcu(&root->root_list);
 		cgroup_root_count--;
 	}
 
@@ -1386,13 +1386,15 @@ static inline struct cgroup *__cset_cgroup_from_root(struct css_set *cset,
 		}
 	}
 
-	BUG_ON(!res_cgroup);
+	WARN_ON_ONCE(!res_cgroup && lockdep_is_held(&cgroup_mutex));
 	return res_cgroup;
 }
 
 /*
  * look up cgroup associated with current task's cgroup namespace on the
- * specified hierarchy
+ * specified hierarchy. Umount synchronization is ensured via VFS layer,
+ * so we don't have to hold cgroup_mutex to prevent the root from being
+ * destroyed.
  */
 static struct cgroup *
 current_cgns_cgroup_from_root(struct cgroup_root *root)
@@ -1445,7 +1447,6 @@ static struct cgroup *current_cgns_cgroup_dfl(void)
 static struct cgroup *cset_cgroup_from_root(struct css_set *cset,
 					    struct cgroup_root *root)
 {
-	lockdep_assert_held(&cgroup_mutex);
 	lockdep_assert_held(&css_set_lock);
 
 	return __cset_cgroup_from_root(cset, root);
@@ -1453,7 +1454,9 @@ static struct cgroup *cset_cgroup_from_root(struct css_set *cset,
 
 /*
  * Return the cgroup for "task" from the given hierarchy. Must be
- * called with cgroup_mutex and css_set_lock held.
+ * called with css_set_lock held to prevent task's groups from being modified.
+ * Must be called with either cgroup_mutex or rcu read lock to prevent the
+ * cgroup root from being destroyed.
  */
 struct cgroup *task_cgroup_from_root(struct task_struct *task,
 				     struct cgroup_root *root)
@@ -2097,7 +2100,7 @@ int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask)
 	 * care of subsystems' refcounts, which are explicitly dropped in
 	 * the failure exit path.
 	 */
-	list_add(&root->root_list, &cgroup_roots);
+	list_add_rcu(&root->root_list, &cgroup_roots);
 	cgroup_root_count++;
 
 	/*
-- 
2.30.1 (Apple Git-130)


  reply	other threads:[~2023-10-17 12:46 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-17 12:45 [RFC PATCH bpf-next v2 0/9] bpf, cgroup: Add BPF support for cgroup1 hierarchy Yafang Shao
2023-10-17 12:45 ` Yafang Shao [this message]
2023-10-17 13:20   ` [RFC PATCH bpf-next v2 1/9] cgroup: Make operations on the cgroup root_list RCU safe Michal Koutný
2023-10-18  2:51     ` Yafang Shao
2023-10-18  9:35   ` Tejun Heo
2023-10-19  6:38     ` Yafang Shao
2023-10-19 19:08       ` Tejun Heo
2023-10-19 19:43         ` Waiman Long
2023-10-20  9:36           ` Yafang Shao
2023-10-20 17:51             ` Tejun Heo
2023-10-22  9:32               ` Yafang Shao
2023-10-20  6:35   ` kernel test robot
2023-10-20  9:37     ` Yafang Shao
2023-10-25  8:06   ` kernel test robot
2023-10-17 12:45 ` [RFC PATCH bpf-next v2 2/9] cgroup: Eliminate the need for cgroup_mutex in proc_cgroup_show() Yafang Shao
2023-10-17 14:04   ` Michal Koutný
2023-10-18  3:12     ` Yafang Shao
2023-10-18  9:38   ` Tejun Heo
2023-10-19  6:39     ` Yafang Shao
2023-10-17 12:45 ` [RFC PATCH bpf-next v2 3/9] cgroup: Add a new helper for cgroup1 hierarchy Yafang Shao
2023-10-17 15:55   ` kernel test robot
2023-10-18  3:34     ` Yafang Shao
2023-10-18  9:44   ` Tejun Heo
2023-10-19  6:41     ` Yafang Shao
2023-10-17 12:45 ` [RFC PATCH bpf-next v2 4/9] bpf: Add a new kfunc " Yafang Shao
2023-10-17 16:39   ` kernel test robot
2023-10-18  3:34     ` Yafang Shao
2023-10-18  9:40   ` Tejun Heo
2023-10-19  6:40     ` Yafang Shao
2023-10-17 12:45 ` [RFC PATCH bpf-next v2 5/9] selftests/bpf: Fix issues in setup_classid_environment() Yafang Shao
2023-10-17 12:45 ` [RFC PATCH bpf-next v2 6/9] selftests/bpf: Add parallel support for classid Yafang Shao
2023-10-17 12:45 ` [RFC PATCH bpf-next v2 7/9] selftests/bpf: Add a new cgroup helper get_classid_cgroup_id() Yafang Shao
2023-10-17 12:45 ` [RFC PATCH bpf-next v2 8/9] selftests/bpf: Add a new cgroup helper get_cgroup_hierarchy_id() Yafang Shao
2023-10-17 12:45 ` [RFC PATCH bpf-next v2 9/9] selftests/bpf: Add selftests for cgroup1 hierarchy Yafang Shao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231017124546.24608-2-laoar.shao@gmail.com \
    --to=laoar.shao@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=hannes@cmpxchg.org \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kpsingh@kernel.org \
    --cc=lizefan.x@bytedance.com \
    --cc=martin.lau@linux.dev \
    --cc=mkoutny@suse.com \
    --cc=sdf@google.com \
    --cc=sinquersw@gmail.com \
    --cc=song@kernel.org \
    --cc=tj@kernel.org \
    --cc=yonghong.song@linux.dev \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.