All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage
@ 2022-02-20  5:14 Chengming Zhou
  2022-02-20  5:14 ` [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock Chengming Zhou
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Chengming Zhou @ 2022-02-20  5:14 UTC (permalink / raw)
  To: mingo, peterz, vincent.guittot, bristot, zhaolei, tj, lizefan.x, hannes
  Cc: linux-kernel, Chengming Zhou, Minye Zhu

The cpuacct_account_field() is always called by the current task
itself, so it's ok to use __this_cpu_add() to charge the tick time.

But cpuacct_charge() maybe called by update_curr() in load_balance()
on a random CPU, different from the CPU on which the task is running.
So __this_cpu_add() will charge that cputime to a random incorrect CPU.

Fixes: 73e6aafd9ea8 ("sched/cpuacct: Simplify the cpuacct code")
Reported-by: Minye Zhu <zhuminye@bytedance.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 kernel/sched/cpuacct.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
index 3d06c5e4220d..307800586ac8 100644
--- a/kernel/sched/cpuacct.c
+++ b/kernel/sched/cpuacct.c
@@ -334,12 +334,13 @@ static struct cftype files[] = {
  */
 void cpuacct_charge(struct task_struct *tsk, u64 cputime)
 {
+	unsigned int cpu = task_cpu(tsk);
 	struct cpuacct *ca;
 
 	rcu_read_lock();
 
 	for (ca = task_ca(tsk); ca; ca = parent_ca(ca))
-		__this_cpu_add(*ca->cpuusage, cputime);
+		*per_cpu_ptr(ca->cpuusage, cpu) += cputime;
 
 	rcu_read_unlock();
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock
  2022-02-20  5:14 [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Chengming Zhou
@ 2022-02-20  5:14 ` Chengming Zhou
  2022-03-01 15:24   ` [tip: sched/core] sched/cpuacct: Optimize " tip-bot2 for Chengming Zhou
       [not found]   ` <CGME20220308232034eucas1p2b0f39cee0f462af6004ebdfbe5bacb9f@eucas1p2.samsung.com>
  2022-02-20  5:14 ` [PATCH v3 3/3] sched/cpuacct: remove redundant " Chengming Zhou
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 18+ messages in thread
From: Chengming Zhou @ 2022-02-20  5:14 UTC (permalink / raw)
  To: mingo, peterz, vincent.guittot, bristot, zhaolei, tj, lizefan.x, hannes
  Cc: linux-kernel, Chengming Zhou

Since cpuacct_charge() is called from the scheduler update_curr(),
we must already have rq lock held, then the RCU read lock can
be optimized away.

And do the same thing in it's wrapper cgroup_account_cputime(),
but we can't use lockdep_assert_rq_held() there, which defined
in kernel/sched/sched.h.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 include/linux/cgroup.h | 2 --
 kernel/sched/cpuacct.c | 4 +---
 2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 75c151413fda..9a109c6ac0e0 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -791,11 +791,9 @@ static inline void cgroup_account_cputime(struct task_struct *task,
 
 	cpuacct_charge(task, delta_exec);
 
-	rcu_read_lock();
 	cgrp = task_dfl_cgroup(task);
 	if (cgroup_parent(cgrp))
 		__cgroup_account_cputime(cgrp, delta_exec);
-	rcu_read_unlock();
 }
 
 static inline void cgroup_account_cputime_field(struct task_struct *task,
diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
index 307800586ac8..f79f88456d72 100644
--- a/kernel/sched/cpuacct.c
+++ b/kernel/sched/cpuacct.c
@@ -337,12 +337,10 @@ void cpuacct_charge(struct task_struct *tsk, u64 cputime)
 	unsigned int cpu = task_cpu(tsk);
 	struct cpuacct *ca;
 
-	rcu_read_lock();
+	lockdep_assert_rq_held(cpu_rq(cpu));
 
 	for (ca = task_ca(tsk); ca; ca = parent_ca(ca))
 		*per_cpu_ptr(ca->cpuusage, cpu) += cputime;
-
-	rcu_read_unlock();
 }
 
 /*
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 3/3] sched/cpuacct: remove redundant RCU read lock
  2022-02-20  5:14 [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Chengming Zhou
  2022-02-20  5:14 ` [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock Chengming Zhou
@ 2022-02-20  5:14 ` Chengming Zhou
  2022-03-01 15:24   ` [tip: sched/core] sched/cpuacct: Remove " tip-bot2 for Chengming Zhou
       [not found]   ` <CGME20220308233107eucas1p119a2f5a8d4f5b5eec38ea8dde92b3368@eucas1p1.samsung.com>
  2022-02-22 18:01 ` [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Tejun Heo
  2022-03-01 15:24 ` [tip: sched/core] sched/cpuacct: Fix " tip-bot2 for Chengming Zhou
  3 siblings, 2 replies; 18+ messages in thread
From: Chengming Zhou @ 2022-02-20  5:14 UTC (permalink / raw)
  To: mingo, peterz, vincent.guittot, bristot, zhaolei, tj, lizefan.x, hannes
  Cc: linux-kernel, Chengming Zhou

The cpuacct_account_field() and it's cgroup v2 wrapper
cgroup_account_cputime_field() is only called from cputime
in task_group_account_field(), which is already in RCU read-side
critical section. So remove these redundant RCU read lock.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 include/linux/cgroup.h | 2 --
 kernel/sched/cpuacct.c | 2 --
 2 files changed, 4 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 9a109c6ac0e0..1e356c222756 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -804,11 +804,9 @@ static inline void cgroup_account_cputime_field(struct task_struct *task,
 
 	cpuacct_account_field(task, index, delta_exec);
 
-	rcu_read_lock();
 	cgrp = task_dfl_cgroup(task);
 	if (cgroup_parent(cgrp))
 		__cgroup_account_cputime_field(cgrp, index, delta_exec);
-	rcu_read_unlock();
 }
 
 #else	/* CONFIG_CGROUPS */
diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
index f79f88456d72..d269ede84e85 100644
--- a/kernel/sched/cpuacct.c
+++ b/kernel/sched/cpuacct.c
@@ -352,10 +352,8 @@ void cpuacct_account_field(struct task_struct *tsk, int index, u64 val)
 {
 	struct cpuacct *ca;
 
-	rcu_read_lock();
 	for (ca = task_ca(tsk); ca != &root_cpuacct; ca = parent_ca(ca))
 		__this_cpu_add(ca->cpustat->cpustat[index], val);
-	rcu_read_unlock();
 }
 
 struct cgroup_subsys cpuacct_cgrp_subsys = {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage
  2022-02-20  5:14 [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Chengming Zhou
  2022-02-20  5:14 ` [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock Chengming Zhou
  2022-02-20  5:14 ` [PATCH v3 3/3] sched/cpuacct: remove redundant " Chengming Zhou
@ 2022-02-22 18:01 ` Tejun Heo
  2022-02-23  9:11   ` Peter Zijlstra
  2022-03-01 15:24 ` [tip: sched/core] sched/cpuacct: Fix " tip-bot2 for Chengming Zhou
  3 siblings, 1 reply; 18+ messages in thread
From: Tejun Heo @ 2022-02-22 18:01 UTC (permalink / raw)
  To: Chengming Zhou
  Cc: mingo, peterz, vincent.guittot, bristot, zhaolei, lizefan.x,
	hannes, linux-kernel, Minye Zhu

On Sun, Feb 20, 2022 at 01:14:24PM +0800, Chengming Zhou wrote:
> The cpuacct_account_field() is always called by the current task
> itself, so it's ok to use __this_cpu_add() to charge the tick time.
> 
> But cpuacct_charge() maybe called by update_curr() in load_balance()
> on a random CPU, different from the CPU on which the task is running.
> So __this_cpu_add() will charge that cputime to a random incorrect CPU.
> 
> Fixes: 73e6aafd9ea8 ("sched/cpuacct: Simplify the cpuacct code")
> Reported-by: Minye Zhu <zhuminye@bytedance.com>
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>

For all three patches,

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage
  2022-02-22 18:01 ` [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Tejun Heo
@ 2022-02-23  9:11   ` Peter Zijlstra
  0 siblings, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2022-02-23  9:11 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Chengming Zhou, mingo, vincent.guittot, bristot, zhaolei,
	lizefan.x, hannes, linux-kernel, Minye Zhu

On Tue, Feb 22, 2022 at 08:01:51AM -1000, Tejun Heo wrote:
> On Sun, Feb 20, 2022 at 01:14:24PM +0800, Chengming Zhou wrote:
> > The cpuacct_account_field() is always called by the current task
> > itself, so it's ok to use __this_cpu_add() to charge the tick time.
> > 
> > But cpuacct_charge() maybe called by update_curr() in load_balance()
> > on a random CPU, different from the CPU on which the task is running.
> > So __this_cpu_add() will charge that cputime to a random incorrect CPU.
> > 
> > Fixes: 73e6aafd9ea8 ("sched/cpuacct: Simplify the cpuacct code")
> > Reported-by: Minye Zhu <zhuminye@bytedance.com>
> > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> 
> For all three patches,
> 
> Acked-by: Tejun Heo <tj@kernel.org>

Thanks!

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [tip: sched/core] sched/cpuacct: Remove redundant RCU read lock
  2022-02-20  5:14 ` [PATCH v3 3/3] sched/cpuacct: remove redundant " Chengming Zhou
@ 2022-03-01 15:24   ` tip-bot2 for Chengming Zhou
       [not found]   ` <CGME20220308233107eucas1p119a2f5a8d4f5b5eec38ea8dde92b3368@eucas1p1.samsung.com>
  1 sibling, 0 replies; 18+ messages in thread
From: tip-bot2 for Chengming Zhou @ 2022-03-01 15:24 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Tejun Heo, Chengming Zhou, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     3eba0505d03a9c1eb30d40c2330c0880b22d1b3a
Gitweb:        https://git.kernel.org/tip/3eba0505d03a9c1eb30d40c2330c0880b22d1b3a
Author:        Chengming Zhou <zhouchengming@bytedance.com>
AuthorDate:    Sun, 20 Feb 2022 13:14:26 +08:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 01 Mar 2022 16:18:38 +01:00

sched/cpuacct: Remove redundant RCU read lock

The cpuacct_account_field() and it's cgroup v2 wrapper
cgroup_account_cputime_field() is only called from cputime
in task_group_account_field(), which is already in RCU read-side
critical section. So remove these redundant RCU read lock.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220220051426.5274-3-zhouchengming@bytedance.com
---
 include/linux/cgroup.h | 2 --
 kernel/sched/cpuacct.c | 2 --
 2 files changed, 4 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 9a109c6..1e356c2 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -804,11 +804,9 @@ static inline void cgroup_account_cputime_field(struct task_struct *task,
 
 	cpuacct_account_field(task, index, delta_exec);
 
-	rcu_read_lock();
 	cgrp = task_dfl_cgroup(task);
 	if (cgroup_parent(cgrp))
 		__cgroup_account_cputime_field(cgrp, index, delta_exec);
-	rcu_read_unlock();
 }
 
 #else	/* CONFIG_CGROUPS */
diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
index f79f884..d269ede 100644
--- a/kernel/sched/cpuacct.c
+++ b/kernel/sched/cpuacct.c
@@ -352,10 +352,8 @@ void cpuacct_account_field(struct task_struct *tsk, int index, u64 val)
 {
 	struct cpuacct *ca;
 
-	rcu_read_lock();
 	for (ca = task_ca(tsk); ca != &root_cpuacct; ca = parent_ca(ca))
 		__this_cpu_add(ca->cpustat->cpustat[index], val);
-	rcu_read_unlock();
 }
 
 struct cgroup_subsys cpuacct_cgrp_subsys = {

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [tip: sched/core] sched/cpuacct: Optimize away RCU read lock
  2022-02-20  5:14 ` [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock Chengming Zhou
@ 2022-03-01 15:24   ` tip-bot2 for Chengming Zhou
       [not found]   ` <CGME20220308232034eucas1p2b0f39cee0f462af6004ebdfbe5bacb9f@eucas1p2.samsung.com>
  1 sibling, 0 replies; 18+ messages in thread
From: tip-bot2 for Chengming Zhou @ 2022-03-01 15:24 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel), Chengming Zhou, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     dc6e0818bc9a0336d9accf3ea35d146d72aa7a18
Gitweb:        https://git.kernel.org/tip/dc6e0818bc9a0336d9accf3ea35d146d72aa7a18
Author:        Chengming Zhou <zhouchengming@bytedance.com>
AuthorDate:    Sun, 20 Feb 2022 13:14:25 +08:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 01 Mar 2022 16:18:38 +01:00

sched/cpuacct: Optimize away RCU read lock

Since cpuacct_charge() is called from the scheduler update_curr(),
we must already have rq lock held, then the RCU read lock can
be optimized away.

And do the same thing in it's wrapper cgroup_account_cputime(),
but we can't use lockdep_assert_rq_held() there, which defined
in kernel/sched/sched.h.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220220051426.5274-2-zhouchengming@bytedance.com
---
 include/linux/cgroup.h | 2 --
 kernel/sched/cpuacct.c | 4 +---
 2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 75c1514..9a109c6 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -791,11 +791,9 @@ static inline void cgroup_account_cputime(struct task_struct *task,
 
 	cpuacct_charge(task, delta_exec);
 
-	rcu_read_lock();
 	cgrp = task_dfl_cgroup(task);
 	if (cgroup_parent(cgrp))
 		__cgroup_account_cputime(cgrp, delta_exec);
-	rcu_read_unlock();
 }
 
 static inline void cgroup_account_cputime_field(struct task_struct *task,
diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
index 3078005..f79f884 100644
--- a/kernel/sched/cpuacct.c
+++ b/kernel/sched/cpuacct.c
@@ -337,12 +337,10 @@ void cpuacct_charge(struct task_struct *tsk, u64 cputime)
 	unsigned int cpu = task_cpu(tsk);
 	struct cpuacct *ca;
 
-	rcu_read_lock();
+	lockdep_assert_rq_held(cpu_rq(cpu));
 
 	for (ca = task_ca(tsk); ca; ca = parent_ca(ca))
 		*per_cpu_ptr(ca->cpuusage, cpu) += cputime;
-
-	rcu_read_unlock();
 }
 
 /*

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [tip: sched/core] sched/cpuacct: Fix charge percpu cpuusage
  2022-02-20  5:14 [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Chengming Zhou
                   ` (2 preceding siblings ...)
  2022-02-22 18:01 ` [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Tejun Heo
@ 2022-03-01 15:24 ` tip-bot2 for Chengming Zhou
  3 siblings, 0 replies; 18+ messages in thread
From: tip-bot2 for Chengming Zhou @ 2022-03-01 15:24 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Minye Zhu, Chengming Zhou, Peter Zijlstra (Intel),
	Tejun Heo, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     248cc9993d1cc12b8e9ed716cc3fc09f6c3517dd
Gitweb:        https://git.kernel.org/tip/248cc9993d1cc12b8e9ed716cc3fc09f6c3517dd
Author:        Chengming Zhou <zhouchengming@bytedance.com>
AuthorDate:    Sun, 20 Feb 2022 13:14:24 +08:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 01 Mar 2022 16:18:37 +01:00

sched/cpuacct: Fix charge percpu cpuusage

The cpuacct_account_field() is always called by the current task
itself, so it's ok to use __this_cpu_add() to charge the tick time.

But cpuacct_charge() maybe called by update_curr() in load_balance()
on a random CPU, different from the CPU on which the task is running.
So __this_cpu_add() will charge that cputime to a random incorrect CPU.

Fixes: 73e6aafd9ea8 ("sched/cpuacct: Simplify the cpuacct code")
Reported-by: Minye Zhu <zhuminye@bytedance.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220220051426.5274-1-zhouchengming@bytedance.com
---
 kernel/sched/cpuacct.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
index 3d06c5e..3078005 100644
--- a/kernel/sched/cpuacct.c
+++ b/kernel/sched/cpuacct.c
@@ -334,12 +334,13 @@ static struct cftype files[] = {
  */
 void cpuacct_charge(struct task_struct *tsk, u64 cputime)
 {
+	unsigned int cpu = task_cpu(tsk);
 	struct cpuacct *ca;
 
 	rcu_read_lock();
 
 	for (ca = task_ca(tsk); ca; ca = parent_ca(ca))
-		__this_cpu_add(*ca->cpuusage, cputime);
+		*per_cpu_ptr(ca->cpuusage, cpu) += cputime;
 
 	rcu_read_unlock();
 }

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock
       [not found]   ` <CGME20220308232034eucas1p2b0f39cee0f462af6004ebdfbe5bacb9f@eucas1p2.samsung.com>
@ 2022-03-08 23:20     ` Marek Szyprowski
  2022-03-08 23:32       ` Peter Zijlstra
  2022-03-09  3:08       ` [External] " Chengming Zhou
  0 siblings, 2 replies; 18+ messages in thread
From: Marek Szyprowski @ 2022-03-08 23:20 UTC (permalink / raw)
  To: Chengming Zhou, mingo, peterz, vincent.guittot, bristot, zhaolei,
	tj, lizefan.x, hannes
  Cc: linux-kernel

On 20.02.2022 06:14, Chengming Zhou wrote:
> Since cpuacct_charge() is called from the scheduler update_curr(),
> we must already have rq lock held, then the RCU read lock can
> be optimized away.
>
> And do the same thing in it's wrapper cgroup_account_cputime(),
> but we can't use lockdep_assert_rq_held() there, which defined
> in kernel/sched/sched.h.
>
> Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>

This patch landed recently in linux-next as commit dc6e0818bc9a 
("sched/cpuacct: Optimize away RCU read lock"). On my test systems I 
found that it triggers a following warning in the early boot stage:

Calibrating delay loop (skipped), value calculated using timer 
frequency.. 48.00 BogoMIPS (lpj=240000)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
CPU: Testing write buffer coherency: ok
CPU0: Spectre v2: using BPIALL workaround

=============================
WARNING: suspicious RCU usage
5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted
-----------------------------
./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!

other info that might help us debug this:


rcu_scheduler_active = 1, debug_locks = 1
2 locks held by kthreadd/2:
  #0: c1d7972c (&p->pi_lock){....}-{2:2}, at: task_rq_lock+0x30/0x118
  #1: ef7b52d0 (&rq->__lock){-...}-{2:2}, at: 
raw_spin_rq_lock_nested+0x24/0x34

stack backtrace:
CPU: 0 PID: 2 Comm: kthreadd Not tainted 5.17.0-rc5-00050-gdc6e0818bc9a 
#11458
Hardware name: Samsung Exynos (Flattened Device Tree)
  unwind_backtrace from show_stack+0x10/0x14
  show_stack from dump_stack_lvl+0x58/0x70
  dump_stack_lvl from update_curr+0x1bc/0x35c
  update_curr from dequeue_task_fair+0xb0/0x8e8
  dequeue_task_fair from __do_set_cpus_allowed+0x19c/0x258
  __do_set_cpus_allowed from __set_cpus_allowed_ptr_locked+0x130/0x1d8
  __set_cpus_allowed_ptr_locked from __set_cpus_allowed_ptr+0x48/0x64
  __set_cpus_allowed_ptr from kthreadd+0x44/0x16c
  kthreadd from ret_from_fork+0x14/0x2c
Exception stack(0xc1cb9fb0 to 0xc1cb9ff8)
9fa0:                                     00000000 00000000 00000000 
00000000
9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
00000000
9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
CPU0: thread -1, cpu 0, socket 9, mpidr 80000900
cblist_init_generic: Setting adjustable number of callback queues.
cblist_init_generic: Setting shift to 1 and lim to 1.
Running RCU-tasks wait API self tests
Setting up static identity map for 0x40100000 - 0x40100060
rcu: Hierarchical SRCU implementation.

=============================
WARNING: suspicious RCU usage
5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted
-----------------------------
./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!

other info that might help us debug this:


rcu_scheduler_active = 1, debug_locks = 1
1 lock held by migration/0/13:
  #0: ef7b52d0 (&rq->__lock){-...}-{2:2}, at: 
raw_spin_rq_lock_nested+0x24/0x34

stack backtrace:
CPU: 0 PID: 13 Comm: migration/0 Not tainted 
5.17.0-rc5-00050-gdc6e0818bc9a #11458
Hardware name: Samsung Exynos (Flattened Device Tree)
Stopper: 0x0 <- 0x0
  unwind_backtrace from show_stack+0x10/0x14
  show_stack from dump_stack_lvl+0x58/0x70
  dump_stack_lvl from put_prev_task_stop+0x16c/0x25c
  put_prev_task_stop from __schedule+0x698/0x964
  __schedule from schedule+0x54/0xe0
  schedule from smpboot_thread_fn+0x218/0x288
  smpboot_thread_fn from kthread+0xf0/0x134
  kthread from ret_from_fork+0x14/0x2c
Exception stack(0xc1ccffb0 to 0xc1ccfff8)
ffa0:                                     00000000 00000000 00000000 
00000000
ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
00000000
ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
smp: Bringing up secondary CPUs ...
CPU1: thread -1, cpu 1, socket 9, mpidr 80000901
CPU1: Spectre v2: using BPIALL workaround
smp: Brought up 1 node, 2 CPUs
SMP: Total of 2 processors activated (96.00 BogoMIPS).

The above log comes from ARM 32bit Samsung Exnyos4210 based Trats board.

> ---
>   include/linux/cgroup.h | 2 --
>   kernel/sched/cpuacct.c | 4 +---
>   2 files changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index 75c151413fda..9a109c6ac0e0 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -791,11 +791,9 @@ static inline void cgroup_account_cputime(struct task_struct *task,
>   
>   	cpuacct_charge(task, delta_exec);
>   
> -	rcu_read_lock();
>   	cgrp = task_dfl_cgroup(task);
>   	if (cgroup_parent(cgrp))
>   		__cgroup_account_cputime(cgrp, delta_exec);
> -	rcu_read_unlock();
>   }
>   
>   static inline void cgroup_account_cputime_field(struct task_struct *task,
> diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
> index 307800586ac8..f79f88456d72 100644
> --- a/kernel/sched/cpuacct.c
> +++ b/kernel/sched/cpuacct.c
> @@ -337,12 +337,10 @@ void cpuacct_charge(struct task_struct *tsk, u64 cputime)
>   	unsigned int cpu = task_cpu(tsk);
>   	struct cpuacct *ca;
>   
> -	rcu_read_lock();
> +	lockdep_assert_rq_held(cpu_rq(cpu));
>   
>   	for (ca = task_ca(tsk); ca; ca = parent_ca(ca))
>   		*per_cpu_ptr(ca->cpuusage, cpu) += cputime;
> -
> -	rcu_read_unlock();
>   }
>   
>   /*

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 3/3] sched/cpuacct: remove redundant RCU read lock
       [not found]   ` <CGME20220308233107eucas1p119a2f5a8d4f5b5eec38ea8dde92b3368@eucas1p1.samsung.com>
@ 2022-03-08 23:31     ` Marek Szyprowski
  0 siblings, 0 replies; 18+ messages in thread
From: Marek Szyprowski @ 2022-03-08 23:31 UTC (permalink / raw)
  To: Chengming Zhou, mingo, peterz, vincent.guittot, bristot, zhaolei,
	tj, lizefan.x, hannes
  Cc: linux-kernel

On 20.02.2022 06:14, Chengming Zhou wrote:
> The cpuacct_account_field() and it's cgroup v2 wrapper
> cgroup_account_cputime_field() is only called from cputime
> in task_group_account_field(), which is already in RCU read-side
> critical section. So remove these redundant RCU read lock.
>
> Suggested-by: Tejun Heo <tj@kernel.org>
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>

This patch landed recently in linux-next as commit 3eba0505d03a 
("sched/cpuacct: Remove redundant RCU read lock"). On my test systems I 
found that it triggers a following warning in the early boot stage:

Calibrating delay loop (skipped), value calculated using timer 
frequency.. 48.00 BogoMIPS (lpj=240000)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
CPU: Testing write buffer coherency: ok
CPU0: Spectre v2: using BPIALL workaround
CPU0: thread -1, cpu 0, socket 9, mpidr 80000900
cblist_init_generic: Setting adjustable number of callback queues.

=============================
WARNING: suspicious RCU usage
5.17.0-rc5-00052-g167ee9b08bed #11459 Not tainted
-----------------------------
./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!

other info that might help us debug this:


rcu_scheduler_active = 1, debug_locks = 1
no locks held by swapper/0/1.

stack backtrace:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.0-rc5-00052-g167ee9b08bed 
#11459
Hardware name: Samsung Exynos (Flattened Device Tree)
  unwind_backtrace from show_stack+0x10/0x14
  show_stack from dump_stack_lvl+0x58/0x70
  dump_stack_lvl from account_system_index_time+0x13c/0x148
  account_system_index_time from account_system_time+0x78/0xa0
  account_system_time from update_process_times+0x50/0xc4
  update_process_times from tick_handle_periodic+0x1c/0x94
  tick_handle_periodic from exynos4_mct_tick_isr+0x30/0x38
  exynos4_mct_tick_isr from __handle_irq_event_percpu+0x74/0x3ac
  __handle_irq_event_percpu from handle_irq_event_percpu+0xc/0x38
  handle_irq_event_percpu from handle_irq_event+0x38/0x5c
  handle_irq_event from handle_fasteoi_irq+0xc4/0x180
  handle_fasteoi_irq from generic_handle_domain_irq+0x44/0x88
  generic_handle_domain_irq from gic_handle_irq+0x88/0xac
  gic_handle_irq from generic_handle_arch_irq+0x58/0x78
  generic_handle_arch_irq from __irq_svc+0x54/0x88
Exception stack(0xc1cafea0 to 0xc1cafee8)
fea0: 00000000 c0f33ac4 00000001 00000129 60000053 c127b458 00000008 
2e64e000
fec0: ef7bd5fc c127af84 c1209048 c1208f10 00000000 c1cafef0 c0b7e25c 
c0b8976c
fee0: 20000053 ffffffff
  __irq_svc from _raw_spin_unlock_irqrestore+0x2c/0x60
  _raw_spin_unlock_irqrestore from 
cblist_init_generic.constprop.14+0x298/0x328
  cblist_init_generic.constprop.14 from rcu_init_tasks_generic+0x18/0x110
  rcu_init_tasks_generic from kernel_init_freeable+0xa0/0x214
  kernel_init_freeable from kernel_init+0x18/0x12c
  kernel_init from ret_from_fork+0x14/0x2c
Exception stack(0xc1caffb0 to 0xc1cafff8)
ffa0:                                     00000000 00000000 00000000 
00000000
ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
00000000
ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
cblist_init_generic: Setting shift to 1 and lim to 1.
Running RCU-tasks wait API self tests
Setting up static identity map for 0x40100000 - 0x40100060
rcu: Hierarchical SRCU implementation.
smp: Bringing up secondary CPUs ...
CPU1: thread -1, cpu 1, socket 9, mpidr 80000901
CPU1: Spectre v2: using BPIALL workaround
smp: Brought up 1 node, 2 CPUs
SMP: Total of 2 processors activated (96.00 BogoMIPS).
CPU: All CPU(s) started in SVC mode.

The above log comes from ARM 32bit Samsung Exnyos4210 based Trats board. 
To get above log I reverted commit dc6e0818bc9a ("sched/cpuacct: 
Optimize away RCU read lock"), because it triggered a similar warning.

> ---
>   include/linux/cgroup.h | 2 --
>   kernel/sched/cpuacct.c | 2 --
>   2 files changed, 4 deletions(-)
>
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index 9a109c6ac0e0..1e356c222756 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -804,11 +804,9 @@ static inline void cgroup_account_cputime_field(struct task_struct *task,
>   
>   	cpuacct_account_field(task, index, delta_exec);
>   
> -	rcu_read_lock();
>   	cgrp = task_dfl_cgroup(task);
>   	if (cgroup_parent(cgrp))
>   		__cgroup_account_cputime_field(cgrp, index, delta_exec);
> -	rcu_read_unlock();
>   }
>   
>   #else	/* CONFIG_CGROUPS */
> diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
> index f79f88456d72..d269ede84e85 100644
> --- a/kernel/sched/cpuacct.c
> +++ b/kernel/sched/cpuacct.c
> @@ -352,10 +352,8 @@ void cpuacct_account_field(struct task_struct *tsk, int index, u64 val)
>   {
>   	struct cpuacct *ca;
>   
> -	rcu_read_lock();
>   	for (ca = task_ca(tsk); ca != &root_cpuacct; ca = parent_ca(ca))
>   		__this_cpu_add(ca->cpustat->cpustat[index], val);
> -	rcu_read_unlock();
>   }
>   
>   struct cgroup_subsys cpuacct_cgrp_subsys = {

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock
  2022-03-08 23:20     ` [PATCH v3 2/3] sched/cpuacct: optimize " Marek Szyprowski
@ 2022-03-08 23:32       ` Peter Zijlstra
  2022-03-08 23:44         ` Paul E. McKenney
  2022-03-09  3:08       ` [External] " Chengming Zhou
  1 sibling, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2022-03-08 23:32 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Chengming Zhou, mingo, vincent.guittot, bristot, zhaolei, tj,
	lizefan.x, hannes, linux-kernel, Paul McKenney

On Wed, Mar 09, 2022 at 12:20:33AM +0100, Marek Szyprowski wrote:
> On 20.02.2022 06:14, Chengming Zhou wrote:
> > Since cpuacct_charge() is called from the scheduler update_curr(),
> > we must already have rq lock held, then the RCU read lock can
> > be optimized away.
> >
> > And do the same thing in it's wrapper cgroup_account_cputime(),
> > but we can't use lockdep_assert_rq_held() there, which defined
> > in kernel/sched/sched.h.
> >
> > Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> 
> This patch landed recently in linux-next as commit dc6e0818bc9a 
> ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I 
> found that it triggers a following warning in the early boot stage:
> 
> Calibrating delay loop (skipped), value calculated using timer 
> frequency.. 48.00 BogoMIPS (lpj=240000)
> pid_max: default: 32768 minimum: 301
> Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> CPU: Testing write buffer coherency: ok
> CPU0: Spectre v2: using BPIALL workaround
> 
> =============================
> WARNING: suspicious RCU usage
> 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted
> -----------------------------
> ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!

Arguably, with the flavours folded again, rcu_dereference_check() ought
to default include rcu_read_lock_sched_held() or its equivalent I
suppose.

Paul?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock
  2022-03-08 23:32       ` Peter Zijlstra
@ 2022-03-08 23:44         ` Paul E. McKenney
  2022-03-09  0:21           ` Paul E. McKenney
  2022-03-10  8:45           ` Peter Zijlstra
  0 siblings, 2 replies; 18+ messages in thread
From: Paul E. McKenney @ 2022-03-08 23:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Marek Szyprowski, Chengming Zhou, mingo, vincent.guittot,
	bristot, zhaolei, tj, lizefan.x, hannes, linux-kernel

On Wed, Mar 09, 2022 at 12:32:25AM +0100, Peter Zijlstra wrote:
> On Wed, Mar 09, 2022 at 12:20:33AM +0100, Marek Szyprowski wrote:
> > On 20.02.2022 06:14, Chengming Zhou wrote:
> > > Since cpuacct_charge() is called from the scheduler update_curr(),
> > > we must already have rq lock held, then the RCU read lock can
> > > be optimized away.
> > >
> > > And do the same thing in it's wrapper cgroup_account_cputime(),
> > > but we can't use lockdep_assert_rq_held() there, which defined
> > > in kernel/sched/sched.h.
> > >
> > > Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> > 
> > This patch landed recently in linux-next as commit dc6e0818bc9a 
> > ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I 
> > found that it triggers a following warning in the early boot stage:
> > 
> > Calibrating delay loop (skipped), value calculated using timer 
> > frequency.. 48.00 BogoMIPS (lpj=240000)
> > pid_max: default: 32768 minimum: 301
> > Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> > Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> > CPU: Testing write buffer coherency: ok
> > CPU0: Spectre v2: using BPIALL workaround
> > 
> > =============================
> > WARNING: suspicious RCU usage
> > 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted
> > -----------------------------
> > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
> 
> Arguably, with the flavours folded again, rcu_dereference_check() ought
> to default include rcu_read_lock_sched_held() or its equivalent I
> suppose.
> 
> Paul?

That would reduce the number of warnings, but it also would hide bugs.

So, are you sure you really want this?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock
  2022-03-08 23:44         ` Paul E. McKenney
@ 2022-03-09  0:21           ` Paul E. McKenney
  2022-03-10  8:45           ` Peter Zijlstra
  1 sibling, 0 replies; 18+ messages in thread
From: Paul E. McKenney @ 2022-03-09  0:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Marek Szyprowski, Chengming Zhou, mingo, vincent.guittot,
	bristot, zhaolei, tj, lizefan.x, hannes, linux-kernel

On Tue, Mar 08, 2022 at 03:44:03PM -0800, Paul E. McKenney wrote:
> On Wed, Mar 09, 2022 at 12:32:25AM +0100, Peter Zijlstra wrote:
> > On Wed, Mar 09, 2022 at 12:20:33AM +0100, Marek Szyprowski wrote:
> > > On 20.02.2022 06:14, Chengming Zhou wrote:
> > > > Since cpuacct_charge() is called from the scheduler update_curr(),
> > > > we must already have rq lock held, then the RCU read lock can
> > > > be optimized away.
> > > >
> > > > And do the same thing in it's wrapper cgroup_account_cputime(),
> > > > but we can't use lockdep_assert_rq_held() there, which defined
> > > > in kernel/sched/sched.h.
> > > >
> > > > Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> > > 
> > > This patch landed recently in linux-next as commit dc6e0818bc9a 
> > > ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I 
> > > found that it triggers a following warning in the early boot stage:
> > > 
> > > Calibrating delay loop (skipped), value calculated using timer 
> > > frequency.. 48.00 BogoMIPS (lpj=240000)
> > > pid_max: default: 32768 minimum: 301
> > > Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> > > Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> > > CPU: Testing write buffer coherency: ok
> > > CPU0: Spectre v2: using BPIALL workaround
> > > 
> > > =============================
> > > WARNING: suspicious RCU usage
> > > 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted
> > > -----------------------------
> > > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
> > 
> > Arguably, with the flavours folded again, rcu_dereference_check() ought
> > to default include rcu_read_lock_sched_held() or its equivalent I
> > suppose.
> > 
> > Paul?
> 
> That would reduce the number of warnings, but it also would hide bugs.
> 
> So, are you sure you really want this?

Of course, if you are designing a use case that really expects multiple
types of readers...

Another approach might be rcu_dereference_brs(), but hopefully with a
better name, that checks for either rcu_read_lock(), disabled BH, or
disabled preemption.  Or if you are looking only for rcu_read_lock()
or disabled preemption, rcu_dereference_rs(), again hopefully with a
better name.

Is that what you had in mind?

							Thanx, Paul



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [External] Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock
  2022-03-08 23:20     ` [PATCH v3 2/3] sched/cpuacct: optimize " Marek Szyprowski
  2022-03-08 23:32       ` Peter Zijlstra
@ 2022-03-09  3:08       ` Chengming Zhou
  1 sibling, 0 replies; 18+ messages in thread
From: Chengming Zhou @ 2022-03-09  3:08 UTC (permalink / raw)
  To: Marek Szyprowski, mingo, peterz, vincent.guittot, bristot,
	zhaolei, tj, lizefan.x, hannes
  Cc: linux-kernel

On 2022/3/9 7:20 上午, Marek Szyprowski wrote:
> On 20.02.2022 06:14, Chengming Zhou wrote:
>> Since cpuacct_charge() is called from the scheduler update_curr(),
>> we must already have rq lock held, then the RCU read lock can
>> be optimized away.
>>
>> And do the same thing in it's wrapper cgroup_account_cputime(),
>> but we can't use lockdep_assert_rq_held() there, which defined
>> in kernel/sched/sched.h.
>>
>> Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> 
> This patch landed recently in linux-next as commit dc6e0818bc9a 
> ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I 
> found that it triggers a following warning in the early boot stage:

Hi, thanks for the report. I've send a fix patch[1] for review.

[1] https://lore.kernel.org/lkml/20220305034103.57123-1-zhouchengming@bytedance.com/

> 
> Calibrating delay loop (skipped), value calculated using timer 
> frequency.. 48.00 BogoMIPS (lpj=240000)
> pid_max: default: 32768 minimum: 301
> Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> CPU: Testing write buffer coherency: ok
> CPU0: Spectre v2: using BPIALL workaround
> 
> =============================
> WARNING: suspicious RCU usage
> 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted
> -----------------------------
> ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
> 
> other info that might help us debug this:
> 
> 
> rcu_scheduler_active = 1, debug_locks = 1
> 2 locks held by kthreadd/2:
>   #0: c1d7972c (&p->pi_lock){....}-{2:2}, at: task_rq_lock+0x30/0x118
>   #1: ef7b52d0 (&rq->__lock){-...}-{2:2}, at: 
> raw_spin_rq_lock_nested+0x24/0x34
> 
> stack backtrace:
> CPU: 0 PID: 2 Comm: kthreadd Not tainted 5.17.0-rc5-00050-gdc6e0818bc9a 
> #11458
> Hardware name: Samsung Exynos (Flattened Device Tree)
>   unwind_backtrace from show_stack+0x10/0x14
>   show_stack from dump_stack_lvl+0x58/0x70
>   dump_stack_lvl from update_curr+0x1bc/0x35c
>   update_curr from dequeue_task_fair+0xb0/0x8e8
>   dequeue_task_fair from __do_set_cpus_allowed+0x19c/0x258
>   __do_set_cpus_allowed from __set_cpus_allowed_ptr_locked+0x130/0x1d8
>   __set_cpus_allowed_ptr_locked from __set_cpus_allowed_ptr+0x48/0x64
>   __set_cpus_allowed_ptr from kthreadd+0x44/0x16c
>   kthreadd from ret_from_fork+0x14/0x2c
> Exception stack(0xc1cb9fb0 to 0xc1cb9ff8)
> 9fa0:                                     00000000 00000000 00000000 
> 00000000
> 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
> 00000000
> 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> CPU0: thread -1, cpu 0, socket 9, mpidr 80000900
> cblist_init_generic: Setting adjustable number of callback queues.
> cblist_init_generic: Setting shift to 1 and lim to 1.
> Running RCU-tasks wait API self tests
> Setting up static identity map for 0x40100000 - 0x40100060
> rcu: Hierarchical SRCU implementation.
> 
> =============================
> WARNING: suspicious RCU usage
> 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted
> -----------------------------
> ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
> 
> other info that might help us debug this:
> 
> 
> rcu_scheduler_active = 1, debug_locks = 1
> 1 lock held by migration/0/13:
>   #0: ef7b52d0 (&rq->__lock){-...}-{2:2}, at: 
> raw_spin_rq_lock_nested+0x24/0x34
> 
> stack backtrace:
> CPU: 0 PID: 13 Comm: migration/0 Not tainted 
> 5.17.0-rc5-00050-gdc6e0818bc9a #11458
> Hardware name: Samsung Exynos (Flattened Device Tree)
> Stopper: 0x0 <- 0x0
>   unwind_backtrace from show_stack+0x10/0x14
>   show_stack from dump_stack_lvl+0x58/0x70
>   dump_stack_lvl from put_prev_task_stop+0x16c/0x25c
>   put_prev_task_stop from __schedule+0x698/0x964
>   __schedule from schedule+0x54/0xe0
>   schedule from smpboot_thread_fn+0x218/0x288
>   smpboot_thread_fn from kthread+0xf0/0x134
>   kthread from ret_from_fork+0x14/0x2c
> Exception stack(0xc1ccffb0 to 0xc1ccfff8)
> ffa0:                                     00000000 00000000 00000000 
> 00000000
> ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
> 00000000
> ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
> smp: Bringing up secondary CPUs ...
> CPU1: thread -1, cpu 1, socket 9, mpidr 80000901
> CPU1: Spectre v2: using BPIALL workaround
> smp: Brought up 1 node, 2 CPUs
> SMP: Total of 2 processors activated (96.00 BogoMIPS).
> 
> The above log comes from ARM 32bit Samsung Exnyos4210 based Trats board.
> 
>> ---
>>   include/linux/cgroup.h | 2 --
>>   kernel/sched/cpuacct.c | 4 +---
>>   2 files changed, 1 insertion(+), 5 deletions(-)
>>
>> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
>> index 75c151413fda..9a109c6ac0e0 100644
>> --- a/include/linux/cgroup.h
>> +++ b/include/linux/cgroup.h
>> @@ -791,11 +791,9 @@ static inline void cgroup_account_cputime(struct task_struct *task,
>>   
>>   	cpuacct_charge(task, delta_exec);
>>   
>> -	rcu_read_lock();
>>   	cgrp = task_dfl_cgroup(task);
>>   	if (cgroup_parent(cgrp))
>>   		__cgroup_account_cputime(cgrp, delta_exec);
>> -	rcu_read_unlock();
>>   }
>>   
>>   static inline void cgroup_account_cputime_field(struct task_struct *task,
>> diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
>> index 307800586ac8..f79f88456d72 100644
>> --- a/kernel/sched/cpuacct.c
>> +++ b/kernel/sched/cpuacct.c
>> @@ -337,12 +337,10 @@ void cpuacct_charge(struct task_struct *tsk, u64 cputime)
>>   	unsigned int cpu = task_cpu(tsk);
>>   	struct cpuacct *ca;
>>   
>> -	rcu_read_lock();
>> +	lockdep_assert_rq_held(cpu_rq(cpu));
>>   
>>   	for (ca = task_ca(tsk); ca; ca = parent_ca(ca))
>>   		*per_cpu_ptr(ca->cpuusage, cpu) += cputime;
>> -
>> -	rcu_read_unlock();
>>   }
>>   
>>   /*
> 
> Best regards

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock
  2022-03-08 23:44         ` Paul E. McKenney
  2022-03-09  0:21           ` Paul E. McKenney
@ 2022-03-10  8:45           ` Peter Zijlstra
  2022-03-10 15:01             ` Paul E. McKenney
  1 sibling, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2022-03-10  8:45 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Marek Szyprowski, Chengming Zhou, mingo, vincent.guittot,
	bristot, zhaolei, tj, lizefan.x, hannes, linux-kernel

On Tue, Mar 08, 2022 at 03:44:03PM -0800, Paul E. McKenney wrote:
> On Wed, Mar 09, 2022 at 12:32:25AM +0100, Peter Zijlstra wrote:
> > On Wed, Mar 09, 2022 at 12:20:33AM +0100, Marek Szyprowski wrote:
> > > On 20.02.2022 06:14, Chengming Zhou wrote:
> > > > Since cpuacct_charge() is called from the scheduler update_curr(),
> > > > we must already have rq lock held, then the RCU read lock can
> > > > be optimized away.
> > > >
> > > > And do the same thing in it's wrapper cgroup_account_cputime(),
> > > > but we can't use lockdep_assert_rq_held() there, which defined
> > > > in kernel/sched/sched.h.
> > > >
> > > > Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> > > 
> > > This patch landed recently in linux-next as commit dc6e0818bc9a 
> > > ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I 
> > > found that it triggers a following warning in the early boot stage:
> > > 
> > > Calibrating delay loop (skipped), value calculated using timer 
> > > frequency.. 48.00 BogoMIPS (lpj=240000)
> > > pid_max: default: 32768 minimum: 301
> > > Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> > > Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> > > CPU: Testing write buffer coherency: ok
> > > CPU0: Spectre v2: using BPIALL workaround
> > > 
> > > =============================
> > > WARNING: suspicious RCU usage
> > > 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted
> > > -----------------------------
> > > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
> > 
> > Arguably, with the flavours folded again, rcu_dereference_check() ought
> > to default include rcu_read_lock_sched_held() or its equivalent I
> > suppose.
> > 
> > Paul?
> 
> That would reduce the number of warnings, but it also would hide bugs.
> 
> So, are you sure you really want this?

I don't understand... Since the flavours got merged regular RCU has it's
quescent state held off by preempt_disable. So how can relying on that
cause bugs?

And if we can rely on that, then surely rcu_dereferenced_check() ought
to play by the same rules, otherwise we get silly warnings like these at
hand.

Specifically, we removed the rcu_read_lock() here because this has
rq->lock held, which is a raw_spinlock_t which very much implies preempt
disable, on top of that, it's also an IRQ-safe lock and thus IRQs will
be disabled.

There is no possible way for RCU to make progress.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock
  2022-03-10  8:45           ` Peter Zijlstra
@ 2022-03-10 15:01             ` Paul E. McKenney
  2022-03-12 12:15               ` Peter Zijlstra
  0 siblings, 1 reply; 18+ messages in thread
From: Paul E. McKenney @ 2022-03-10 15:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Marek Szyprowski, Chengming Zhou, mingo, vincent.guittot,
	bristot, zhaolei, tj, lizefan.x, hannes, linux-kernel

On Thu, Mar 10, 2022 at 09:45:17AM +0100, Peter Zijlstra wrote:
> On Tue, Mar 08, 2022 at 03:44:03PM -0800, Paul E. McKenney wrote:
> > On Wed, Mar 09, 2022 at 12:32:25AM +0100, Peter Zijlstra wrote:
> > > On Wed, Mar 09, 2022 at 12:20:33AM +0100, Marek Szyprowski wrote:
> > > > On 20.02.2022 06:14, Chengming Zhou wrote:
> > > > > Since cpuacct_charge() is called from the scheduler update_curr(),
> > > > > we must already have rq lock held, then the RCU read lock can
> > > > > be optimized away.
> > > > >
> > > > > And do the same thing in it's wrapper cgroup_account_cputime(),
> > > > > but we can't use lockdep_assert_rq_held() there, which defined
> > > > > in kernel/sched/sched.h.
> > > > >
> > > > > Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> > > > 
> > > > This patch landed recently in linux-next as commit dc6e0818bc9a 
> > > > ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I 
> > > > found that it triggers a following warning in the early boot stage:
> > > > 
> > > > Calibrating delay loop (skipped), value calculated using timer 
> > > > frequency.. 48.00 BogoMIPS (lpj=240000)
> > > > pid_max: default: 32768 minimum: 301
> > > > Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> > > > Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> > > > CPU: Testing write buffer coherency: ok
> > > > CPU0: Spectre v2: using BPIALL workaround
> > > > 
> > > > =============================
> > > > WARNING: suspicious RCU usage
> > > > 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted
> > > > -----------------------------
> > > > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
> > > 
> > > Arguably, with the flavours folded again, rcu_dereference_check() ought
> > > to default include rcu_read_lock_sched_held() or its equivalent I
> > > suppose.
> > > 
> > > Paul?
> > 
> > That would reduce the number of warnings, but it also would hide bugs.
> > 
> > So, are you sure you really want this?
> 
> I don't understand... Since the flavours got merged regular RCU has it's
> quescent state held off by preempt_disable. So how can relying on that
> cause bugs?

Somene forgets an rcu_read_lock() and there happens to be something
like a preempt_disable() that by coincidence covers that particular
rcu_dereference().  The kernel therefore doesn't complain.  That someone
goes on to other things, maybe even posthumously.  Then some time later
the preempt_disable() goes away, for good and sufficient reasons.

Good luck figuring out where to put the needed rcu_read_lock() and
rcu_read_unlock().

> And if we can rely on that, then surely rcu_dereferenced_check() ought
> to play by the same rules, otherwise we get silly warnings like these at
> hand.
> 
> Specifically, we removed the rcu_read_lock() here because this has
> rq->lock held, which is a raw_spinlock_t which very much implies preempt
> disable, on top of that, it's also an IRQ-safe lock and thus IRQs will
> be disabled.
> 
> There is no possible way for RCU to make progress.

Then let's have that particular rcu_dereference_check() explicitly state
what it needs, which seems to be either rcu_read_lock() on the one hand.
Right now, that could be just this:

	p = rcu_dereference_check(gp, rcu_read_lock_sched_held());

Or am I missing something here?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock
  2022-03-10 15:01             ` Paul E. McKenney
@ 2022-03-12 12:15               ` Peter Zijlstra
  2022-03-12 17:44                 ` Paul E. McKenney
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2022-03-12 12:15 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Marek Szyprowski, Chengming Zhou, mingo, vincent.guittot,
	bristot, zhaolei, tj, lizefan.x, hannes, linux-kernel

On Thu, Mar 10, 2022 at 07:01:52AM -0800, Paul E. McKenney wrote:

> > > > > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
> > > > 
> > > > Arguably, with the flavours folded again, rcu_dereference_check() ought
> > > > to default include rcu_read_lock_sched_held() or its equivalent I
> > > > suppose.
> > > > 
> > > > Paul?
> > > 
> > > That would reduce the number of warnings, but it also would hide bugs.
> > > 
> > > So, are you sure you really want this?
> > 
> > I don't understand... Since the flavours got merged regular RCU has it's
> > quescent state held off by preempt_disable. So how can relying on that
> > cause bugs?
> 
> Somene forgets an rcu_read_lock() and there happens to be something
> like a preempt_disable() that by coincidence covers that particular
> rcu_dereference().  The kernel therefore doesn't complain.  That someone
> goes on to other things, maybe even posthumously.  Then some time later
> the preempt_disable() goes away, for good and sufficient reasons.
> 
> Good luck figuring out where to put the needed rcu_read_lock() and
> rcu_read_unlock().

Well, that's software engineering for you. Also in that case the warning
will work as expected. Then figuring out how to fix it is not the
problem of the warning -- that worked as advertised.

(also, I don't think it'll be too hard, you just gotta figure out which
object is rcu protected -- the warning gives you this, where the lookup
happens -- again the warning helps, and how long it's used for, all
relatively well definted things)

I don't see a problem. No bugs hidden.

> > And if we can rely on that, then surely rcu_dereferenced_check() ought
> > to play by the same rules, otherwise we get silly warnings like these at
> > hand.
> > 
> > Specifically, we removed the rcu_read_lock() here because this has
> > rq->lock held, which is a raw_spinlock_t which very much implies preempt
> > disable, on top of that, it's also an IRQ-safe lock and thus IRQs will
> > be disabled.
> > 
> > There is no possible way for RCU to make progress.
> 
> Then let's have that particular rcu_dereference_check() explicitly state
> what it needs, which seems to be either rcu_read_lock() on the one hand.
> Right now, that could be just this:
> 
> 	p = rcu_dereference_check(gp, rcu_read_lock_sched_held());
> 
> Or am I missing something here?

That will work; I just don't agree with it. Per the rules of RCU it is
entirely correct to mix rcu_read_lock() and preempt_disable() (or
anything that implies the same). So I strongly feel that
rcu_dereference() should not warn about obviously correct code. Why
would we need to special case this ?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock
  2022-03-12 12:15               ` Peter Zijlstra
@ 2022-03-12 17:44                 ` Paul E. McKenney
  0 siblings, 0 replies; 18+ messages in thread
From: Paul E. McKenney @ 2022-03-12 17:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Marek Szyprowski, Chengming Zhou, mingo, vincent.guittot,
	bristot, zhaolei, tj, lizefan.x, hannes, linux-kernel

On Sat, Mar 12, 2022 at 01:15:33PM +0100, Peter Zijlstra wrote:
> On Thu, Mar 10, 2022 at 07:01:52AM -0800, Paul E. McKenney wrote:
> 
> > > > > > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
> > > > > 
> > > > > Arguably, with the flavours folded again, rcu_dereference_check() ought
> > > > > to default include rcu_read_lock_sched_held() or its equivalent I
> > > > > suppose.
> > > > > 
> > > > > Paul?
> > > > 
> > > > That would reduce the number of warnings, but it also would hide bugs.
> > > > 
> > > > So, are you sure you really want this?
> > > 
> > > I don't understand... Since the flavours got merged regular RCU has it's
> > > quescent state held off by preempt_disable. So how can relying on that
> > > cause bugs?
> > 
> > Somene forgets an rcu_read_lock() and there happens to be something
> > like a preempt_disable() that by coincidence covers that particular
> > rcu_dereference().  The kernel therefore doesn't complain.  That someone
> > goes on to other things, maybe even posthumously.  Then some time later
> > the preempt_disable() goes away, for good and sufficient reasons.
> > 
> > Good luck figuring out where to put the needed rcu_read_lock() and
> > rcu_read_unlock().
> 
> Well, that's software engineering for you.

My point exactly!!!

>                                            Also in that case the warning
> will work as expected. Then figuring out how to fix it is not the
> problem of the warning -- that worked as advertised.
> 
> (also, I don't think it'll be too hard, you just gotta figure out which
> object is rcu protected -- the warning gives you this, where the lookup
> happens -- again the warning helps, and how long it's used for, all
> relatively well definted things)

Without in any way agreeing with that assessment of difficulty, especially
in the general case...  It is -way- easier just to tell RCU what your
design rules are for the code in question.

> I don't see a problem. No bugs hidden.

C'mon, Peter!

There really was a bug hidden.  That someone intended to add some
calls to rcu_read_lock() and rcu_read_unlock() in the proper places.
Their failure to add them really was a bug.

That bug was hidden by: (1) There being a preempt_disable() or
whatever that by coincidence happened to be covering the part of the
code containing the rcu_dereference() and (2) Your proposed change that
would make rcu_dereference() unable to detect that bug.

And that bug can be quite bad.  Given your proposed change, RCU
cannot detect this bug:


	/* Preemption is enabled. */
	/* There should be an rcu_read_lock() here. */
	preempt_disable();
	p = rcu_dereference(gp);
	do_something_with(p);
	preempt_enable();
	/* Without the rcu_read_lock(), *p is history. */
	do_something_else_with(p);
	/* There should be an rcu_read_unlock() here. */

> > > And if we can rely on that, then surely rcu_dereferenced_check() ought
> > > to play by the same rules, otherwise we get silly warnings like these at
> > > hand.
> > > 
> > > Specifically, we removed the rcu_read_lock() here because this has
> > > rq->lock held, which is a raw_spinlock_t which very much implies preempt
> > > disable, on top of that, it's also an IRQ-safe lock and thus IRQs will
> > > be disabled.
> > > 
> > > There is no possible way for RCU to make progress.
> > 
> > Then let's have that particular rcu_dereference_check() explicitly state
> > what it needs, which seems to be either rcu_read_lock() on the one hand.
> > Right now, that could be just this:
> > 
> > 	p = rcu_dereference_check(gp, rcu_read_lock_sched_held());
> > 
> > Or am I missing something here?
> 
> That will work; I just don't agree with it. Per the rules of RCU it is
> entirely correct to mix rcu_read_lock() and preempt_disable() (or
> anything that implies the same). So I strongly feel that
> rcu_dereference() should not warn about obviously correct code. Why
> would we need to special case this ?

This use case might well be entirely correct, but it is most certainly
not the common case.

Therefore, my answer to this requested chance in rcu_dereference()
semantics is "no".

							Thanx, Paul

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-03-12 17:45 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-20  5:14 [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Chengming Zhou
2022-02-20  5:14 ` [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock Chengming Zhou
2022-03-01 15:24   ` [tip: sched/core] sched/cpuacct: Optimize " tip-bot2 for Chengming Zhou
     [not found]   ` <CGME20220308232034eucas1p2b0f39cee0f462af6004ebdfbe5bacb9f@eucas1p2.samsung.com>
2022-03-08 23:20     ` [PATCH v3 2/3] sched/cpuacct: optimize " Marek Szyprowski
2022-03-08 23:32       ` Peter Zijlstra
2022-03-08 23:44         ` Paul E. McKenney
2022-03-09  0:21           ` Paul E. McKenney
2022-03-10  8:45           ` Peter Zijlstra
2022-03-10 15:01             ` Paul E. McKenney
2022-03-12 12:15               ` Peter Zijlstra
2022-03-12 17:44                 ` Paul E. McKenney
2022-03-09  3:08       ` [External] " Chengming Zhou
2022-02-20  5:14 ` [PATCH v3 3/3] sched/cpuacct: remove redundant " Chengming Zhou
2022-03-01 15:24   ` [tip: sched/core] sched/cpuacct: Remove " tip-bot2 for Chengming Zhou
     [not found]   ` <CGME20220308233107eucas1p119a2f5a8d4f5b5eec38ea8dde92b3368@eucas1p1.samsung.com>
2022-03-08 23:31     ` [PATCH v3 3/3] sched/cpuacct: remove " Marek Szyprowski
2022-02-22 18:01 ` [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Tejun Heo
2022-02-23  9:11   ` Peter Zijlstra
2022-03-01 15:24 ` [tip: sched/core] sched/cpuacct: Fix " tip-bot2 for Chengming Zhou

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.