linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage
@ 2022-02-20  5:14 Chengming Zhou
  2022-02-20  5:14 ` [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock Chengming Zhou
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Chengming Zhou @ 2022-02-20  5:14 UTC (permalink / raw)
  To: mingo, peterz, vincent.guittot, bristot, zhaolei, tj, lizefan.x, hannes
  Cc: linux-kernel, Chengming Zhou, Minye Zhu

The cpuacct_account_field() is always called by the current task
itself, so it's ok to use __this_cpu_add() to charge the tick time.

But cpuacct_charge() maybe called by update_curr() in load_balance()
on a random CPU, different from the CPU on which the task is running.
So __this_cpu_add() will charge that cputime to a random incorrect CPU.

Fixes: 73e6aafd9ea8 ("sched/cpuacct: Simplify the cpuacct code")
Reported-by: Minye Zhu <zhuminye@bytedance.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 kernel/sched/cpuacct.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
index 3d06c5e4220d..307800586ac8 100644
--- a/kernel/sched/cpuacct.c
+++ b/kernel/sched/cpuacct.c
@@ -334,12 +334,13 @@ static struct cftype files[] = {
  */
 void cpuacct_charge(struct task_struct *tsk, u64 cputime)
 {
+	unsigned int cpu = task_cpu(tsk);
 	struct cpuacct *ca;
 
 	rcu_read_lock();
 
 	for (ca = task_ca(tsk); ca; ca = parent_ca(ca))
-		__this_cpu_add(*ca->cpuusage, cputime);
+		*per_cpu_ptr(ca->cpuusage, cpu) += cputime;
 
 	rcu_read_unlock();
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock
  2022-02-20  5:14 [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Chengming Zhou
@ 2022-02-20  5:14 ` Chengming Zhou
  2022-03-01 15:24   ` [tip: sched/core] sched/cpuacct: Optimize " tip-bot2 for Chengming Zhou
       [not found]   ` <CGME20220308232034eucas1p2b0f39cee0f462af6004ebdfbe5bacb9f@eucas1p2.samsung.com>
  2022-02-20  5:14 ` [PATCH v3 3/3] sched/cpuacct: remove redundant " Chengming Zhou
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 18+ messages in thread
From: Chengming Zhou @ 2022-02-20  5:14 UTC (permalink / raw)
  To: mingo, peterz, vincent.guittot, bristot, zhaolei, tj, lizefan.x, hannes
  Cc: linux-kernel, Chengming Zhou

Since cpuacct_charge() is called from the scheduler update_curr(),
we must already have rq lock held, then the RCU read lock can
be optimized away.

And do the same thing in it's wrapper cgroup_account_cputime(),
but we can't use lockdep_assert_rq_held() there, which defined
in kernel/sched/sched.h.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 include/linux/cgroup.h | 2 --
 kernel/sched/cpuacct.c | 4 +---
 2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 75c151413fda..9a109c6ac0e0 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -791,11 +791,9 @@ static inline void cgroup_account_cputime(struct task_struct *task,
 
 	cpuacct_charge(task, delta_exec);
 
-	rcu_read_lock();
 	cgrp = task_dfl_cgroup(task);
 	if (cgroup_parent(cgrp))
 		__cgroup_account_cputime(cgrp, delta_exec);
-	rcu_read_unlock();
 }
 
 static inline void cgroup_account_cputime_field(struct task_struct *task,
diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
index 307800586ac8..f79f88456d72 100644
--- a/kernel/sched/cpuacct.c
+++ b/kernel/sched/cpuacct.c
@@ -337,12 +337,10 @@ void cpuacct_charge(struct task_struct *tsk, u64 cputime)
 	unsigned int cpu = task_cpu(tsk);
 	struct cpuacct *ca;
 
-	rcu_read_lock();
+	lockdep_assert_rq_held(cpu_rq(cpu));
 
 	for (ca = task_ca(tsk); ca; ca = parent_ca(ca))
 		*per_cpu_ptr(ca->cpuusage, cpu) += cputime;
-
-	rcu_read_unlock();
 }
 
 /*
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3 3/3] sched/cpuacct: remove redundant RCU read lock
  2022-02-20  5:14 [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Chengming Zhou
  2022-02-20  5:14 ` [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock Chengming Zhou
@ 2022-02-20  5:14 ` Chengming Zhou
  2022-03-01 15:24   ` [tip: sched/core] sched/cpuacct: Remove " tip-bot2 for Chengming Zhou
       [not found]   ` <CGME20220308233107eucas1p119a2f5a8d4f5b5eec38ea8dde92b3368@eucas1p1.samsung.com>
  2022-02-22 18:01 ` [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Tejun Heo
  2022-03-01 15:24 ` [tip: sched/core] sched/cpuacct: Fix " tip-bot2 for Chengming Zhou
  3 siblings, 2 replies; 18+ messages in thread
From: Chengming Zhou @ 2022-02-20  5:14 UTC (permalink / raw)
  To: mingo, peterz, vincent.guittot, bristot, zhaolei, tj, lizefan.x, hannes
  Cc: linux-kernel, Chengming Zhou

The cpuacct_account_field() and it's cgroup v2 wrapper
cgroup_account_cputime_field() is only called from cputime
in task_group_account_field(), which is already in RCU read-side
critical section. So remove these redundant RCU read lock.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
 include/linux/cgroup.h | 2 --
 kernel/sched/cpuacct.c | 2 --
 2 files changed, 4 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 9a109c6ac0e0..1e356c222756 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -804,11 +804,9 @@ static inline void cgroup_account_cputime_field(struct task_struct *task,
 
 	cpuacct_account_field(task, index, delta_exec);
 
-	rcu_read_lock();
 	cgrp = task_dfl_cgroup(task);
 	if (cgroup_parent(cgrp))
 		__cgroup_account_cputime_field(cgrp, index, delta_exec);
-	rcu_read_unlock();
 }
 
 #else	/* CONFIG_CGROUPS */
diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
index f79f88456d72..d269ede84e85 100644
--- a/kernel/sched/cpuacct.c
+++ b/kernel/sched/cpuacct.c
@@ -352,10 +352,8 @@ void cpuacct_account_field(struct task_struct *tsk, int index, u64 val)
 {
 	struct cpuacct *ca;
 
-	rcu_read_lock();
 	for (ca = task_ca(tsk); ca != &root_cpuacct; ca = parent_ca(ca))
 		__this_cpu_add(ca->cpustat->cpustat[index], val);
-	rcu_read_unlock();
 }
 
 struct cgroup_subsys cpuacct_cgrp_subsys = {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage
  2022-02-20  5:14 [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Chengming Zhou
  2022-02-20  5:14 ` [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock Chengming Zhou
  2022-02-20  5:14 ` [PATCH v3 3/3] sched/cpuacct: remove redundant " Chengming Zhou
@ 2022-02-22 18:01 ` Tejun Heo
  2022-02-23  9:11   ` Peter Zijlstra
  2022-03-01 15:24 ` [tip: sched/core] sched/cpuacct: Fix " tip-bot2 for Chengming Zhou
  3 siblings, 1 reply; 18+ messages in thread
From: Tejun Heo @ 2022-02-22 18:01 UTC (permalink / raw)
  To: Chengming Zhou
  Cc: mingo, peterz, vincent.guittot, bristot, zhaolei, lizefan.x,
	hannes, linux-kernel, Minye Zhu

On Sun, Feb 20, 2022 at 01:14:24PM +0800, Chengming Zhou wrote:
> The cpuacct_account_field() is always called by the current task
> itself, so it's ok to use __this_cpu_add() to charge the tick time.
> 
> But cpuacct_charge() maybe called by update_curr() in load_balance()
> on a random CPU, different from the CPU on which the task is running.
> So __this_cpu_add() will charge that cputime to a random incorrect CPU.
> 
> Fixes: 73e6aafd9ea8 ("sched/cpuacct: Simplify the cpuacct code")
> Reported-by: Minye Zhu <zhuminye@bytedance.com>
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>

For all three patches,

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage
  2022-02-22 18:01 ` [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Tejun Heo
@ 2022-02-23  9:11   ` Peter Zijlstra
  0 siblings, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2022-02-23  9:11 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Chengming Zhou, mingo, vincent.guittot, bristot, zhaolei,
	lizefan.x, hannes, linux-kernel, Minye Zhu

On Tue, Feb 22, 2022 at 08:01:51AM -1000, Tejun Heo wrote:
> On Sun, Feb 20, 2022 at 01:14:24PM +0800, Chengming Zhou wrote:
> > The cpuacct_account_field() is always called by the current task
> > itself, so it's ok to use __this_cpu_add() to charge the tick time.
> > 
> > But cpuacct_charge() maybe called by update_curr() in load_balance()
> > on a random CPU, different from the CPU on which the task is running.
> > So __this_cpu_add() will charge that cputime to a random incorrect CPU.
> > 
> > Fixes: 73e6aafd9ea8 ("sched/cpuacct: Simplify the cpuacct code")
> > Reported-by: Minye Zhu <zhuminye@bytedance.com>
> > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> 
> For all three patches,
> 
> Acked-by: Tejun Heo <tj@kernel.org>

Thanks!

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [tip: sched/core] sched/cpuacct: Remove redundant RCU read lock
  2022-02-20  5:14 ` [PATCH v3 3/3] sched/cpuacct: remove redundant " Chengming Zhou
@ 2022-03-01 15:24   ` tip-bot2 for Chengming Zhou
       [not found]   ` <CGME20220308233107eucas1p119a2f5a8d4f5b5eec38ea8dde92b3368@eucas1p1.samsung.com>
  1 sibling, 0 replies; 18+ messages in thread
From: tip-bot2 for Chengming Zhou @ 2022-03-01 15:24 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Tejun Heo, Chengming Zhou, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     3eba0505d03a9c1eb30d40c2330c0880b22d1b3a
Gitweb:        https://git.kernel.org/tip/3eba0505d03a9c1eb30d40c2330c0880b22d1b3a
Author:        Chengming Zhou <zhouchengming@bytedance.com>
AuthorDate:    Sun, 20 Feb 2022 13:14:26 +08:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 01 Mar 2022 16:18:38 +01:00

sched/cpuacct: Remove redundant RCU read lock

The cpuacct_account_field() and it's cgroup v2 wrapper
cgroup_account_cputime_field() is only called from cputime
in task_group_account_field(), which is already in RCU read-side
critical section. So remove these redundant RCU read lock.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220220051426.5274-3-zhouchengming@bytedance.com
---
 include/linux/cgroup.h | 2 --
 kernel/sched/cpuacct.c | 2 --
 2 files changed, 4 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 9a109c6..1e356c2 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -804,11 +804,9 @@ static inline void cgroup_account_cputime_field(struct task_struct *task,
 
 	cpuacct_account_field(task, index, delta_exec);
 
-	rcu_read_lock();
 	cgrp = task_dfl_cgroup(task);
 	if (cgroup_parent(cgrp))
 		__cgroup_account_cputime_field(cgrp, index, delta_exec);
-	rcu_read_unlock();
 }
 
 #else	/* CONFIG_CGROUPS */
diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
index f79f884..d269ede 100644
--- a/kernel/sched/cpuacct.c
+++ b/kernel/sched/cpuacct.c
@@ -352,10 +352,8 @@ void cpuacct_account_field(struct task_struct *tsk, int index, u64 val)
 {
 	struct cpuacct *ca;
 
-	rcu_read_lock();
 	for (ca = task_ca(tsk); ca != &root_cpuacct; ca = parent_ca(ca))
 		__this_cpu_add(ca->cpustat->cpustat[index], val);
-	rcu_read_unlock();
 }
 
 struct cgroup_subsys cpuacct_cgrp_subsys = {

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [tip: sched/core] sched/cpuacct: Optimize away RCU read lock
  2022-02-20  5:14 ` [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock Chengming Zhou
@ 2022-03-01 15:24   ` tip-bot2 for Chengming Zhou
       [not found]   ` <CGME20220308232034eucas1p2b0f39cee0f462af6004ebdfbe5bacb9f@eucas1p2.samsung.com>
  1 sibling, 0 replies; 18+ messages in thread
From: tip-bot2 for Chengming Zhou @ 2022-03-01 15:24 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel), Chengming Zhou, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     dc6e0818bc9a0336d9accf3ea35d146d72aa7a18
Gitweb:        https://git.kernel.org/tip/dc6e0818bc9a0336d9accf3ea35d146d72aa7a18
Author:        Chengming Zhou <zhouchengming@bytedance.com>
AuthorDate:    Sun, 20 Feb 2022 13:14:25 +08:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 01 Mar 2022 16:18:38 +01:00

sched/cpuacct: Optimize away RCU read lock

Since cpuacct_charge() is called from the scheduler update_curr(),
we must already have rq lock held, then the RCU read lock can
be optimized away.

And do the same thing in it's wrapper cgroup_account_cputime(),
but we can't use lockdep_assert_rq_held() there, which defined
in kernel/sched/sched.h.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220220051426.5274-2-zhouchengming@bytedance.com
---
 include/linux/cgroup.h | 2 --
 kernel/sched/cpuacct.c | 4 +---
 2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 75c1514..9a109c6 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -791,11 +791,9 @@ static inline void cgroup_account_cputime(struct task_struct *task,
 
 	cpuacct_charge(task, delta_exec);
 
-	rcu_read_lock();
 	cgrp = task_dfl_cgroup(task);
 	if (cgroup_parent(cgrp))
 		__cgroup_account_cputime(cgrp, delta_exec);
-	rcu_read_unlock();
 }
 
 static inline void cgroup_account_cputime_field(struct task_struct *task,
diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
index 3078005..f79f884 100644
--- a/kernel/sched/cpuacct.c
+++ b/kernel/sched/cpuacct.c
@@ -337,12 +337,10 @@ void cpuacct_charge(struct task_struct *tsk, u64 cputime)
 	unsigned int cpu = task_cpu(tsk);
 	struct cpuacct *ca;
 
-	rcu_read_lock();
+	lockdep_assert_rq_held(cpu_rq(cpu));
 
 	for (ca = task_ca(tsk); ca; ca = parent_ca(ca))
 		*per_cpu_ptr(ca->cpuusage, cpu) += cputime;
-
-	rcu_read_unlock();
 }
 
 /*

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [tip: sched/core] sched/cpuacct: Fix charge percpu cpuusage
  2022-02-20  5:14 [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Chengming Zhou
                   ` (2 preceding siblings ...)
  2022-02-22 18:01 ` [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Tejun Heo
@ 2022-03-01 15:24 ` tip-bot2 for Chengming Zhou
  3 siblings, 0 replies; 18+ messages in thread
From: tip-bot2 for Chengming Zhou @ 2022-03-01 15:24 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Minye Zhu, Chengming Zhou, Peter Zijlstra (Intel),
	Tejun Heo, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     248cc9993d1cc12b8e9ed716cc3fc09f6c3517dd
Gitweb:        https://git.kernel.org/tip/248cc9993d1cc12b8e9ed716cc3fc09f6c3517dd
Author:        Chengming Zhou <zhouchengming@bytedance.com>
AuthorDate:    Sun, 20 Feb 2022 13:14:24 +08:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 01 Mar 2022 16:18:37 +01:00

sched/cpuacct: Fix charge percpu cpuusage

The cpuacct_account_field() is always called by the current task
itself, so it's ok to use __this_cpu_add() to charge the tick time.

But cpuacct_charge() maybe called by update_curr() in load_balance()
on a random CPU, different from the CPU on which the task is running.
So __this_cpu_add() will charge that cputime to a random incorrect CPU.

Fixes: 73e6aafd9ea8 ("sched/cpuacct: Simplify the cpuacct code")
Reported-by: Minye Zhu <zhuminye@bytedance.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220220051426.5274-1-zhouchengming@bytedance.com
---
 kernel/sched/cpuacct.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
index 3d06c5e..3078005 100644
--- a/kernel/sched/cpuacct.c
+++ b/kernel/sched/cpuacct.c
@@ -334,12 +334,13 @@ static struct cftype files[] = {
  */
 void cpuacct_charge(struct task_struct *tsk, u64 cputime)
 {
+	unsigned int cpu = task_cpu(tsk);
 	struct cpuacct *ca;
 
 	rcu_read_lock();
 
 	for (ca = task_ca(tsk); ca; ca = parent_ca(ca))
-		__this_cpu_add(*ca->cpuusage, cputime);
+		*per_cpu_ptr(ca->cpuusage, cpu) += cputime;
 
 	rcu_read_unlock();
 }

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock
       [not found]   ` <CGME20220308232034eucas1p2b0f39cee0f462af6004ebdfbe5bacb9f@eucas1p2.samsung.com>
@ 2022-03-08 23:20     ` Marek Szyprowski
  2022-03-08 23:32       ` Peter Zijlstra
  2022-03-09  3:08       ` [External] " Chengming Zhou
  0 siblings, 2 replies; 18+ messages in thread
From: Marek Szyprowski @ 2022-03-08 23:20 UTC (permalink / raw)
  To: Chengming Zhou, mingo, peterz, vincent.guittot, bristot, zhaolei,
	tj, lizefan.x, hannes
  Cc: linux-kernel

On 20.02.2022 06:14, Chengming Zhou wrote:
> Since cpuacct_charge() is called from the scheduler update_curr(),
> we must already have rq lock held, then the RCU read lock can
> be optimized away.
>
> And do the same thing in it's wrapper cgroup_account_cputime(),
> but we can't use lockdep_assert_rq_held() there, which defined
> in kernel/sched/sched.h.
>
> Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>

This patch landed recently in linux-next as commit dc6e0818bc9a 
("sched/cpuacct: Optimize away RCU read lock"). On my test systems I 
found that it triggers a following warning in the early boot stage:

Calibrating delay loop (skipped), value calculated using timer 
frequency.. 48.00 BogoMIPS (lpj=240000)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
CPU: Testing write buffer coherency: ok
CPU0: Spectre v2: using BPIALL workaround

=============================
WARNING: suspicious RCU usage
5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted
-----------------------------
./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!

other info that might help us debug this:


rcu_scheduler_active = 1, debug_locks = 1
2 locks held by kthreadd/2:
  #0: c1d7972c (&p->pi_lock){....}-{2:2}, at: task_rq_lock+0x30/0x118
  #1: ef7b52d0 (&rq->__lock){-...}-{2:2}, at: 
raw_spin_rq_lock_nested+0x24/0x34

stack backtrace:
CPU: 0 PID: 2 Comm: kthreadd Not tainted 5.17.0-rc5-00050-gdc6e0818bc9a 
#11458
Hardware name: Samsung Exynos (Flattened Device Tree)
  unwind_backtrace from show_stack+0x10/0x14
  show_stack from dump_stack_lvl+0x58/0x70
  dump_stack_lvl from update_curr+0x1bc/0x35c
  update_curr from dequeue_task_fair+0xb0/0x8e8
  dequeue_task_fair from __do_set_cpus_allowed+0x19c/0x258
  __do_set_cpus_allowed from __set_cpus_allowed_ptr_locked+0x130/0x1d8
  __set_cpus_allowed_ptr_locked from __set_cpus_allowed_ptr+0x48/0x64
  __set_cpus_allowed_ptr from kthreadd+0x44/0x16c
  kthreadd from ret_from_fork+0x14/0x2c
Exception stack(0xc1cb9fb0 to 0xc1cb9ff8)
9fa0:                                     00000000 00000000 00000000 
00000000
9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
00000000
9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
CPU0: thread -1, cpu 0, socket 9, mpidr 80000900
cblist_init_generic: Setting adjustable number of callback queues.
cblist_init_generic: Setting shift to 1 and lim to 1.
Running RCU-tasks wait API self tests
Setting up static identity map for 0x40100000 - 0x40100060
rcu: Hierarchical SRCU implementation.

=============================
WARNING: suspicious RCU usage
5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted
-----------------------------
./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!

other info that might help us debug this:


rcu_scheduler_active = 1, debug_locks = 1
1 lock held by migration/0/13:
  #0: ef7b52d0 (&rq->__lock){-...}-{2:2}, at: 
raw_spin_rq_lock_nested+0x24/0x34

stack backtrace:
CPU: 0 PID: 13 Comm: migration/0 Not tainted 
5.17.0-rc5-00050-gdc6e0818bc9a #11458
Hardware name: Samsung Exynos (Flattened Device Tree)
Stopper: 0x0 <- 0x0
  unwind_backtrace from show_stack+0x10/0x14
  show_stack from dump_stack_lvl+0x58/0x70
  dump_stack_lvl from put_prev_task_stop+0x16c/0x25c
  put_prev_task_stop from __schedule+0x698/0x964
  __schedule from schedule+0x54/0xe0
  schedule from smpboot_thread_fn+0x218/0x288
  smpboot_thread_fn from kthread+0xf0/0x134
  kthread from ret_from_fork+0x14/0x2c
Exception stack(0xc1ccffb0 to 0xc1ccfff8)
ffa0:                                     00000000 00000000 00000000 
00000000
ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
00000000
ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
smp: Bringing up secondary CPUs ...
CPU1: thread -1, cpu 1, socket 9, mpidr 80000901
CPU1: Spectre v2: using BPIALL workaround
smp: Brought up 1 node, 2 CPUs
SMP: Total of 2 processors activated (96.00 BogoMIPS).

The above log comes from ARM 32bit Samsung Exnyos4210 based Trats board.

> ---
>   include/linux/cgroup.h | 2 --
>   kernel/sched/cpuacct.c | 4 +---
>   2 files changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index 75c151413fda..9a109c6ac0e0 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -791,11 +791,9 @@ static inline void cgroup_account_cputime(struct task_struct *task,
>   
>   	cpuacct_charge(task, delta_exec);
>   
> -	rcu_read_lock();
>   	cgrp = task_dfl_cgroup(task);
>   	if (cgroup_parent(cgrp))
>   		__cgroup_account_cputime(cgrp, delta_exec);
> -	rcu_read_unlock();
>   }
>   
>   static inline void cgroup_account_cputime_field(struct task_struct *task,
> diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
> index 307800586ac8..f79f88456d72 100644
> --- a/kernel/sched/cpuacct.c
> +++ b/kernel/sched/cpuacct.c
> @@ -337,12 +337,10 @@ void cpuacct_charge(struct task_struct *tsk, u64 cputime)
>   	unsigned int cpu = task_cpu(tsk);
>   	struct cpuacct *ca;
>   
> -	rcu_read_lock();
> +	lockdep_assert_rq_held(cpu_rq(cpu));
>   
>   	for (ca = task_ca(tsk); ca; ca = parent_ca(ca))
>   		*per_cpu_ptr(ca->cpuusage, cpu) += cputime;
> -
> -	rcu_read_unlock();
>   }
>   
>   /*

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 3/3] sched/cpuacct: remove redundant RCU read lock
       [not found]   ` <CGME20220308233107eucas1p119a2f5a8d4f5b5eec38ea8dde92b3368@eucas1p1.samsung.com>
@ 2022-03-08 23:31     ` Marek Szyprowski
  0 siblings, 0 replies; 18+ messages in thread
From: Marek Szyprowski @ 2022-03-08 23:31 UTC (permalink / raw)
  To: Chengming Zhou, mingo, peterz, vincent.guittot, bristot, zhaolei,
	tj, lizefan.x, hannes
  Cc: linux-kernel

On 20.02.2022 06:14, Chengming Zhou wrote:
> The cpuacct_account_field() and it's cgroup v2 wrapper
> cgroup_account_cputime_field() is only called from cputime
> in task_group_account_field(), which is already in RCU read-side
> critical section. So remove these redundant RCU read lock.
>
> Suggested-by: Tejun Heo <tj@kernel.org>
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>

This patch landed recently in linux-next as commit 3eba0505d03a 
("sched/cpuacct: Remove redundant RCU read lock"). On my test systems I 
found that it triggers a following warning in the early boot stage:

Calibrating delay loop (skipped), value calculated using timer 
frequency.. 48.00 BogoMIPS (lpj=240000)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
CPU: Testing write buffer coherency: ok
CPU0: Spectre v2: using BPIALL workaround
CPU0: thread -1, cpu 0, socket 9, mpidr 80000900
cblist_init_generic: Setting adjustable number of callback queues.

=============================
WARNING: suspicious RCU usage
5.17.0-rc5-00052-g167ee9b08bed #11459 Not tainted
-----------------------------
./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!

other info that might help us debug this:


rcu_scheduler_active = 1, debug_locks = 1
no locks held by swapper/0/1.

stack backtrace:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.0-rc5-00052-g167ee9b08bed 
#11459
Hardware name: Samsung Exynos (Flattened Device Tree)
  unwind_backtrace from show_stack+0x10/0x14
  show_stack from dump_stack_lvl+0x58/0x70
  dump_stack_lvl from account_system_index_time+0x13c/0x148
  account_system_index_time from account_system_time+0x78/0xa0
  account_system_time from update_process_times+0x50/0xc4
  update_process_times from tick_handle_periodic+0x1c/0x94
  tick_handle_periodic from exynos4_mct_tick_isr+0x30/0x38
  exynos4_mct_tick_isr from __handle_irq_event_percpu+0x74/0x3ac
  __handle_irq_event_percpu from handle_irq_event_percpu+0xc/0x38
  handle_irq_event_percpu from handle_irq_event+0x38/0x5c
  handle_irq_event from handle_fasteoi_irq+0xc4/0x180
  handle_fasteoi_irq from generic_handle_domain_irq+0x44/0x88
  generic_handle_domain_irq from gic_handle_irq+0x88/0xac
  gic_handle_irq from generic_handle_arch_irq+0x58/0x78
  generic_handle_arch_irq from __irq_svc+0x54/0x88
Exception stack(0xc1cafea0 to 0xc1cafee8)
fea0: 00000000 c0f33ac4 00000001 00000129 60000053 c127b458 00000008 
2e64e000
fec0: ef7bd5fc c127af84 c1209048 c1208f10 00000000 c1cafef0 c0b7e25c 
c0b8976c
fee0: 20000053 ffffffff
  __irq_svc from _raw_spin_unlock_irqrestore+0x2c/0x60
  _raw_spin_unlock_irqrestore from 
cblist_init_generic.constprop.14+0x298/0x328
  cblist_init_generic.constprop.14 from rcu_init_tasks_generic+0x18/0x110
  rcu_init_tasks_generic from kernel_init_freeable+0xa0/0x214
  kernel_init_freeable from kernel_init+0x18/0x12c
  kernel_init from ret_from_fork+0x14/0x2c
Exception stack(0xc1caffb0 to 0xc1cafff8)
ffa0:                                     00000000 00000000 00000000 
00000000
ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
00000000
ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
cblist_init_generic: Setting shift to 1 and lim to 1.
Running RCU-tasks wait API self tests
Setting up static identity map for 0x40100000 - 0x40100060
rcu: Hierarchical SRCU implementation.
smp: Bringing up secondary CPUs ...
CPU1: thread -1, cpu 1, socket 9, mpidr 80000901
CPU1: Spectre v2: using BPIALL workaround
smp: Brought up 1 node, 2 CPUs
SMP: Total of 2 processors activated (96.00 BogoMIPS).
CPU: All CPU(s) started in SVC mode.

The above log comes from ARM 32bit Samsung Exnyos4210 based Trats board. 
To get above log I reverted commit dc6e0818bc9a ("sched/cpuacct: 
Optimize away RCU read lock"), because it triggered a similar warning.

> ---
>   include/linux/cgroup.h | 2 --
>   kernel/sched/cpuacct.c | 2 --
>   2 files changed, 4 deletions(-)
>
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index 9a109c6ac0e0..1e356c222756 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -804,11 +804,9 @@ static inline void cgroup_account_cputime_field(struct task_struct *task,
>   
>   	cpuacct_account_field(task, index, delta_exec);
>   
> -	rcu_read_lock();
>   	cgrp = task_dfl_cgroup(task);
>   	if (cgroup_parent(cgrp))
>   		__cgroup_account_cputime_field(cgrp, index, delta_exec);
> -	rcu_read_unlock();
>   }
>   
>   #else	/* CONFIG_CGROUPS */
> diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
> index f79f88456d72..d269ede84e85 100644
> --- a/kernel/sched/cpuacct.c
> +++ b/kernel/sched/cpuacct.c
> @@ -352,10 +352,8 @@ void cpuacct_account_field(struct task_struct *tsk, int index, u64 val)
>   {
>   	struct cpuacct *ca;
>   
> -	rcu_read_lock();
>   	for (ca = task_ca(tsk); ca != &root_cpuacct; ca = parent_ca(ca))
>   		__this_cpu_add(ca->cpustat->cpustat[index], val);
> -	rcu_read_unlock();
>   }
>   
>   struct cgroup_subsys cpuacct_cgrp_subsys = {

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock
  2022-03-08 23:20     ` [PATCH v3 2/3] sched/cpuacct: optimize " Marek Szyprowski
@ 2022-03-08 23:32       ` Peter Zijlstra
  2022-03-08 23:44         ` Paul E. McKenney
  2022-03-09  3:08       ` [External] " Chengming Zhou
  1 sibling, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2022-03-08 23:32 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Chengming Zhou, mingo, vincent.guittot, bristot, zhaolei, tj,
	lizefan.x, hannes, linux-kernel, Paul McKenney

On Wed, Mar 09, 2022 at 12:20:33AM +0100, Marek Szyprowski wrote:
> On 20.02.2022 06:14, Chengming Zhou wrote:
> > Since cpuacct_charge() is called from the scheduler update_curr(),
> > we must already have rq lock held, then the RCU read lock can
> > be optimized away.
> >
> > And do the same thing in it's wrapper cgroup_account_cputime(),
> > but we can't use lockdep_assert_rq_held() there, which defined
> > in kernel/sched/sched.h.
> >
> > Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> 
> This patch landed recently in linux-next as commit dc6e0818bc9a 
> ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I 
> found that it triggers a following warning in the early boot stage:
> 
> Calibrating delay loop (skipped), value calculated using timer 
> frequency.. 48.00 BogoMIPS (lpj=240000)
> pid_max: default: 32768 minimum: 301
> Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> CPU: Testing write buffer coherency: ok
> CPU0: Spectre v2: using BPIALL workaround
> 
> =============================
> WARNING: suspicious RCU usage
> 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted
> -----------------------------
> ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!

Arguably, with the flavours folded again, rcu_dereference_check() ought
to default include rcu_read_lock_sched_held() or its equivalent I
suppose.

Paul?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock
  2022-03-08 23:32       ` Peter Zijlstra
@ 2022-03-08 23:44         ` Paul E. McKenney
  2022-03-09  0:21           ` Paul E. McKenney
  2022-03-10  8:45           ` Peter Zijlstra
  0 siblings, 2 replies; 18+ messages in thread
From: Paul E. McKenney @ 2022-03-08 23:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Marek Szyprowski, Chengming Zhou, mingo, vincent.guittot,
	bristot, zhaolei, tj, lizefan.x, hannes, linux-kernel

On Wed, Mar 09, 2022 at 12:32:25AM +0100, Peter Zijlstra wrote:
> On Wed, Mar 09, 2022 at 12:20:33AM +0100, Marek Szyprowski wrote:
> > On 20.02.2022 06:14, Chengming Zhou wrote:
> > > Since cpuacct_charge() is called from the scheduler update_curr(),
> > > we must already have rq lock held, then the RCU read lock can
> > > be optimized away.
> > >
> > > And do the same thing in it's wrapper cgroup_account_cputime(),
> > > but we can't use lockdep_assert_rq_held() there, which defined
> > > in kernel/sched/sched.h.
> > >
> > > Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> > 
> > This patch landed recently in linux-next as commit dc6e0818bc9a 
> > ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I 
> > found that it triggers a following warning in the early boot stage:
> > 
> > Calibrating delay loop (skipped), value calculated using timer 
> > frequency.. 48.00 BogoMIPS (lpj=240000)
> > pid_max: default: 32768 minimum: 301
> > Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> > Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> > CPU: Testing write buffer coherency: ok
> > CPU0: Spectre v2: using BPIALL workaround
> > 
> > =============================
> > WARNING: suspicious RCU usage
> > 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted
> > -----------------------------
> > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
> 
> Arguably, with the flavours folded again, rcu_dereference_check() ought
> to default include rcu_read_lock_sched_held() or its equivalent I
> suppose.
> 
> Paul?

That would reduce the number of warnings, but it also would hide bugs.

So, are you sure you really want this?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock
  2022-03-08 23:44         ` Paul E. McKenney
@ 2022-03-09  0:21           ` Paul E. McKenney
  2022-03-10  8:45           ` Peter Zijlstra
  1 sibling, 0 replies; 18+ messages in thread
From: Paul E. McKenney @ 2022-03-09  0:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Marek Szyprowski, Chengming Zhou, mingo, vincent.guittot,
	bristot, zhaolei, tj, lizefan.x, hannes, linux-kernel

On Tue, Mar 08, 2022 at 03:44:03PM -0800, Paul E. McKenney wrote:
> On Wed, Mar 09, 2022 at 12:32:25AM +0100, Peter Zijlstra wrote:
> > On Wed, Mar 09, 2022 at 12:20:33AM +0100, Marek Szyprowski wrote:
> > > On 20.02.2022 06:14, Chengming Zhou wrote:
> > > > Since cpuacct_charge() is called from the scheduler update_curr(),
> > > > we must already have rq lock held, then the RCU read lock can
> > > > be optimized away.
> > > >
> > > > And do the same thing in it's wrapper cgroup_account_cputime(),
> > > > but we can't use lockdep_assert_rq_held() there, which defined
> > > > in kernel/sched/sched.h.
> > > >
> > > > Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> > > 
> > > This patch landed recently in linux-next as commit dc6e0818bc9a 
> > > ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I 
> > > found that it triggers a following warning in the early boot stage:
> > > 
> > > Calibrating delay loop (skipped), value calculated using timer 
> > > frequency.. 48.00 BogoMIPS (lpj=240000)
> > > pid_max: default: 32768 minimum: 301
> > > Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> > > Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> > > CPU: Testing write buffer coherency: ok
> > > CPU0: Spectre v2: using BPIALL workaround
> > > 
> > > =============================
> > > WARNING: suspicious RCU usage
> > > 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted
> > > -----------------------------
> > > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
> > 
> > Arguably, with the flavours folded again, rcu_dereference_check() ought
> > to default include rcu_read_lock_sched_held() or its equivalent I
> > suppose.
> > 
> > Paul?
> 
> That would reduce the number of warnings, but it also would hide bugs.
> 
> So, are you sure you really want this?

Of course, if you are designing a use case that really expects multiple
types of readers...

Another approach might be rcu_dereference_brs(), but hopefully with a
better name, that checks for either rcu_read_lock(), disabled BH, or
disabled preemption.  Or if you are looking only for rcu_read_lock()
or disabled preemption, rcu_dereference_rs(), again hopefully with a
better name.

Is that what you had in mind?

							Thanx, Paul



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [External] Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock
  2022-03-08 23:20     ` [PATCH v3 2/3] sched/cpuacct: optimize " Marek Szyprowski
  2022-03-08 23:32       ` Peter Zijlstra
@ 2022-03-09  3:08       ` Chengming Zhou
  1 sibling, 0 replies; 18+ messages in thread
From: Chengming Zhou @ 2022-03-09  3:08 UTC (permalink / raw)
  To: Marek Szyprowski, mingo, peterz, vincent.guittot, bristot,
	zhaolei, tj, lizefan.x, hannes
  Cc: linux-kernel

On 2022/3/9 7:20 上午, Marek Szyprowski wrote:
> On 20.02.2022 06:14, Chengming Zhou wrote:
>> Since cpuacct_charge() is called from the scheduler update_curr(),
>> we must already have rq lock held, then the RCU read lock can
>> be optimized away.
>>
>> And do the same thing in it's wrapper cgroup_account_cputime(),
>> but we can't use lockdep_assert_rq_held() there, which defined
>> in kernel/sched/sched.h.
>>
>> Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> 
> This patch landed recently in linux-next as commit dc6e0818bc9a 
> ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I 
> found that it triggers a following warning in the early boot stage:

Hi, thanks for the report. I've send a fix patch[1] for review.

[1] https://lore.kernel.org/lkml/20220305034103.57123-1-zhouchengming@bytedance.com/

> 
> Calibrating delay loop (skipped), value calculated using timer 
> frequency.. 48.00 BogoMIPS (lpj=240000)
> pid_max: default: 32768 minimum: 301
> Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> CPU: Testing write buffer coherency: ok
> CPU0: Spectre v2: using BPIALL workaround
> 
> =============================
> WARNING: suspicious RCU usage
> 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted
> -----------------------------
> ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
> 
> other info that might help us debug this:
> 
> 
> rcu_scheduler_active = 1, debug_locks = 1
> 2 locks held by kthreadd/2:
>   #0: c1d7972c (&p->pi_lock){....}-{2:2}, at: task_rq_lock+0x30/0x118
>   #1: ef7b52d0 (&rq->__lock){-...}-{2:2}, at: 
> raw_spin_rq_lock_nested+0x24/0x34
> 
> stack backtrace:
> CPU: 0 PID: 2 Comm: kthreadd Not tainted 5.17.0-rc5-00050-gdc6e0818bc9a 
> #11458
> Hardware name: Samsung Exynos (Flattened Device Tree)
>   unwind_backtrace from show_stack+0x10/0x14
>   show_stack from dump_stack_lvl+0x58/0x70
>   dump_stack_lvl from update_curr+0x1bc/0x35c
>   update_curr from dequeue_task_fair+0xb0/0x8e8
>   dequeue_task_fair from __do_set_cpus_allowed+0x19c/0x258
>   __do_set_cpus_allowed from __set_cpus_allowed_ptr_locked+0x130/0x1d8
>   __set_cpus_allowed_ptr_locked from __set_cpus_allowed_ptr+0x48/0x64
>   __set_cpus_allowed_ptr from kthreadd+0x44/0x16c
>   kthreadd from ret_from_fork+0x14/0x2c
> Exception stack(0xc1cb9fb0 to 0xc1cb9ff8)
> 9fa0:                                     00000000 00000000 00000000 
> 00000000
> 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
> 00000000
> 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> CPU0: thread -1, cpu 0, socket 9, mpidr 80000900
> cblist_init_generic: Setting adjustable number of callback queues.
> cblist_init_generic: Setting shift to 1 and lim to 1.
> Running RCU-tasks wait API self tests
> Setting up static identity map for 0x40100000 - 0x40100060
> rcu: Hierarchical SRCU implementation.
> 
> =============================
> WARNING: suspicious RCU usage
> 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted
> -----------------------------
> ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
> 
> other info that might help us debug this:
> 
> 
> rcu_scheduler_active = 1, debug_locks = 1
> 1 lock held by migration/0/13:
>   #0: ef7b52d0 (&rq->__lock){-...}-{2:2}, at: 
> raw_spin_rq_lock_nested+0x24/0x34
> 
> stack backtrace:
> CPU: 0 PID: 13 Comm: migration/0 Not tainted 
> 5.17.0-rc5-00050-gdc6e0818bc9a #11458
> Hardware name: Samsung Exynos (Flattened Device Tree)
> Stopper: 0x0 <- 0x0
>   unwind_backtrace from show_stack+0x10/0x14
>   show_stack from dump_stack_lvl+0x58/0x70
>   dump_stack_lvl from put_prev_task_stop+0x16c/0x25c
>   put_prev_task_stop from __schedule+0x698/0x964
>   __schedule from schedule+0x54/0xe0
>   schedule from smpboot_thread_fn+0x218/0x288
>   smpboot_thread_fn from kthread+0xf0/0x134
>   kthread from ret_from_fork+0x14/0x2c
> Exception stack(0xc1ccffb0 to 0xc1ccfff8)
> ffa0:                                     00000000 00000000 00000000 
> 00000000
> ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
> 00000000
> ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
> smp: Bringing up secondary CPUs ...
> CPU1: thread -1, cpu 1, socket 9, mpidr 80000901
> CPU1: Spectre v2: using BPIALL workaround
> smp: Brought up 1 node, 2 CPUs
> SMP: Total of 2 processors activated (96.00 BogoMIPS).
> 
> The above log comes from ARM 32bit Samsung Exnyos4210 based Trats board.
> 
>> ---
>>   include/linux/cgroup.h | 2 --
>>   kernel/sched/cpuacct.c | 4 +---
>>   2 files changed, 1 insertion(+), 5 deletions(-)
>>
>> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
>> index 75c151413fda..9a109c6ac0e0 100644
>> --- a/include/linux/cgroup.h
>> +++ b/include/linux/cgroup.h
>> @@ -791,11 +791,9 @@ static inline void cgroup_account_cputime(struct task_struct *task,
>>   
>>   	cpuacct_charge(task, delta_exec);
>>   
>> -	rcu_read_lock();
>>   	cgrp = task_dfl_cgroup(task);
>>   	if (cgroup_parent(cgrp))
>>   		__cgroup_account_cputime(cgrp, delta_exec);
>> -	rcu_read_unlock();
>>   }
>>   
>>   static inline void cgroup_account_cputime_field(struct task_struct *task,
>> diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
>> index 307800586ac8..f79f88456d72 100644
>> --- a/kernel/sched/cpuacct.c
>> +++ b/kernel/sched/cpuacct.c
>> @@ -337,12 +337,10 @@ void cpuacct_charge(struct task_struct *tsk, u64 cputime)
>>   	unsigned int cpu = task_cpu(tsk);
>>   	struct cpuacct *ca;
>>   
>> -	rcu_read_lock();
>> +	lockdep_assert_rq_held(cpu_rq(cpu));
>>   
>>   	for (ca = task_ca(tsk); ca; ca = parent_ca(ca))
>>   		*per_cpu_ptr(ca->cpuusage, cpu) += cputime;
>> -
>> -	rcu_read_unlock();
>>   }
>>   
>>   /*
> 
> Best regards

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock
  2022-03-08 23:44         ` Paul E. McKenney
  2022-03-09  0:21           ` Paul E. McKenney
@ 2022-03-10  8:45           ` Peter Zijlstra
  2022-03-10 15:01             ` Paul E. McKenney
  1 sibling, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2022-03-10  8:45 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Marek Szyprowski, Chengming Zhou, mingo, vincent.guittot,
	bristot, zhaolei, tj, lizefan.x, hannes, linux-kernel

On Tue, Mar 08, 2022 at 03:44:03PM -0800, Paul E. McKenney wrote:
> On Wed, Mar 09, 2022 at 12:32:25AM +0100, Peter Zijlstra wrote:
> > On Wed, Mar 09, 2022 at 12:20:33AM +0100, Marek Szyprowski wrote:
> > > On 20.02.2022 06:14, Chengming Zhou wrote:
> > > > Since cpuacct_charge() is called from the scheduler update_curr(),
> > > > we must already have rq lock held, then the RCU read lock can
> > > > be optimized away.
> > > >
> > > > And do the same thing in it's wrapper cgroup_account_cputime(),
> > > > but we can't use lockdep_assert_rq_held() there, which defined
> > > > in kernel/sched/sched.h.
> > > >
> > > > Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> > > 
> > > This patch landed recently in linux-next as commit dc6e0818bc9a 
> > > ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I 
> > > found that it triggers a following warning in the early boot stage:
> > > 
> > > Calibrating delay loop (skipped), value calculated using timer 
> > > frequency.. 48.00 BogoMIPS (lpj=240000)
> > > pid_max: default: 32768 minimum: 301
> > > Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> > > Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> > > CPU: Testing write buffer coherency: ok
> > > CPU0: Spectre v2: using BPIALL workaround
> > > 
> > > =============================
> > > WARNING: suspicious RCU usage
> > > 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted
> > > -----------------------------
> > > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
> > 
> > Arguably, with the flavours folded again, rcu_dereference_check() ought
> > to default include rcu_read_lock_sched_held() or its equivalent I
> > suppose.
> > 
> > Paul?
> 
> That would reduce the number of warnings, but it also would hide bugs.
> 
> So, are you sure you really want this?

I don't understand... Since the flavours got merged regular RCU has it's
quescent state held off by preempt_disable. So how can relying on that
cause bugs?

And if we can rely on that, then surely rcu_dereferenced_check() ought
to play by the same rules, otherwise we get silly warnings like these at
hand.

Specifically, we removed the rcu_read_lock() here because this has
rq->lock held, which is a raw_spinlock_t which very much implies preempt
disable, on top of that, it's also an IRQ-safe lock and thus IRQs will
be disabled.

There is no possible way for RCU to make progress.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock
  2022-03-10  8:45           ` Peter Zijlstra
@ 2022-03-10 15:01             ` Paul E. McKenney
  2022-03-12 12:15               ` Peter Zijlstra
  0 siblings, 1 reply; 18+ messages in thread
From: Paul E. McKenney @ 2022-03-10 15:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Marek Szyprowski, Chengming Zhou, mingo, vincent.guittot,
	bristot, zhaolei, tj, lizefan.x, hannes, linux-kernel

On Thu, Mar 10, 2022 at 09:45:17AM +0100, Peter Zijlstra wrote:
> On Tue, Mar 08, 2022 at 03:44:03PM -0800, Paul E. McKenney wrote:
> > On Wed, Mar 09, 2022 at 12:32:25AM +0100, Peter Zijlstra wrote:
> > > On Wed, Mar 09, 2022 at 12:20:33AM +0100, Marek Szyprowski wrote:
> > > > On 20.02.2022 06:14, Chengming Zhou wrote:
> > > > > Since cpuacct_charge() is called from the scheduler update_curr(),
> > > > > we must already have rq lock held, then the RCU read lock can
> > > > > be optimized away.
> > > > >
> > > > > And do the same thing in it's wrapper cgroup_account_cputime(),
> > > > > but we can't use lockdep_assert_rq_held() there, which defined
> > > > > in kernel/sched/sched.h.
> > > > >
> > > > > Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> > > > 
> > > > This patch landed recently in linux-next as commit dc6e0818bc9a 
> > > > ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I 
> > > > found that it triggers a following warning in the early boot stage:
> > > > 
> > > > Calibrating delay loop (skipped), value calculated using timer 
> > > > frequency.. 48.00 BogoMIPS (lpj=240000)
> > > > pid_max: default: 32768 minimum: 301
> > > > Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> > > > Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> > > > CPU: Testing write buffer coherency: ok
> > > > CPU0: Spectre v2: using BPIALL workaround
> > > > 
> > > > =============================
> > > > WARNING: suspicious RCU usage
> > > > 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted
> > > > -----------------------------
> > > > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
> > > 
> > > Arguably, with the flavours folded again, rcu_dereference_check() ought
> > > to default include rcu_read_lock_sched_held() or its equivalent I
> > > suppose.
> > > 
> > > Paul?
> > 
> > That would reduce the number of warnings, but it also would hide bugs.
> > 
> > So, are you sure you really want this?
> 
> I don't understand... Since the flavours got merged regular RCU has it's
> quescent state held off by preempt_disable. So how can relying on that
> cause bugs?

Somene forgets an rcu_read_lock() and there happens to be something
like a preempt_disable() that by coincidence covers that particular
rcu_dereference().  The kernel therefore doesn't complain.  That someone
goes on to other things, maybe even posthumously.  Then some time later
the preempt_disable() goes away, for good and sufficient reasons.

Good luck figuring out where to put the needed rcu_read_lock() and
rcu_read_unlock().

> And if we can rely on that, then surely rcu_dereferenced_check() ought
> to play by the same rules, otherwise we get silly warnings like these at
> hand.
> 
> Specifically, we removed the rcu_read_lock() here because this has
> rq->lock held, which is a raw_spinlock_t which very much implies preempt
> disable, on top of that, it's also an IRQ-safe lock and thus IRQs will
> be disabled.
> 
> There is no possible way for RCU to make progress.

Then let's have that particular rcu_dereference_check() explicitly state
what it needs, which seems to be either rcu_read_lock() on the one hand.
Right now, that could be just this:

	p = rcu_dereference_check(gp, rcu_read_lock_sched_held());

Or am I missing something here?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock
  2022-03-10 15:01             ` Paul E. McKenney
@ 2022-03-12 12:15               ` Peter Zijlstra
  2022-03-12 17:44                 ` Paul E. McKenney
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2022-03-12 12:15 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Marek Szyprowski, Chengming Zhou, mingo, vincent.guittot,
	bristot, zhaolei, tj, lizefan.x, hannes, linux-kernel

On Thu, Mar 10, 2022 at 07:01:52AM -0800, Paul E. McKenney wrote:

> > > > > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
> > > > 
> > > > Arguably, with the flavours folded again, rcu_dereference_check() ought
> > > > to default include rcu_read_lock_sched_held() or its equivalent I
> > > > suppose.
> > > > 
> > > > Paul?
> > > 
> > > That would reduce the number of warnings, but it also would hide bugs.
> > > 
> > > So, are you sure you really want this?
> > 
> > I don't understand... Since the flavours got merged regular RCU has it's
> > quescent state held off by preempt_disable. So how can relying on that
> > cause bugs?
> 
> Somene forgets an rcu_read_lock() and there happens to be something
> like a preempt_disable() that by coincidence covers that particular
> rcu_dereference().  The kernel therefore doesn't complain.  That someone
> goes on to other things, maybe even posthumously.  Then some time later
> the preempt_disable() goes away, for good and sufficient reasons.
> 
> Good luck figuring out where to put the needed rcu_read_lock() and
> rcu_read_unlock().

Well, that's software engineering for you. Also in that case the warning
will work as expected. Then figuring out how to fix it is not the
problem of the warning -- that worked as advertised.

(also, I don't think it'll be too hard, you just gotta figure out which
object is rcu protected -- the warning gives you this, where the lookup
happens -- again the warning helps, and how long it's used for, all
relatively well definted things)

I don't see a problem. No bugs hidden.

> > And if we can rely on that, then surely rcu_dereferenced_check() ought
> > to play by the same rules, otherwise we get silly warnings like these at
> > hand.
> > 
> > Specifically, we removed the rcu_read_lock() here because this has
> > rq->lock held, which is a raw_spinlock_t which very much implies preempt
> > disable, on top of that, it's also an IRQ-safe lock and thus IRQs will
> > be disabled.
> > 
> > There is no possible way for RCU to make progress.
> 
> Then let's have that particular rcu_dereference_check() explicitly state
> what it needs, which seems to be either rcu_read_lock() on the one hand.
> Right now, that could be just this:
> 
> 	p = rcu_dereference_check(gp, rcu_read_lock_sched_held());
> 
> Or am I missing something here?

That will work; I just don't agree with it. Per the rules of RCU it is
entirely correct to mix rcu_read_lock() and preempt_disable() (or
anything that implies the same). So I strongly feel that
rcu_dereference() should not warn about obviously correct code. Why
would we need to special case this ?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock
  2022-03-12 12:15               ` Peter Zijlstra
@ 2022-03-12 17:44                 ` Paul E. McKenney
  0 siblings, 0 replies; 18+ messages in thread
From: Paul E. McKenney @ 2022-03-12 17:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Marek Szyprowski, Chengming Zhou, mingo, vincent.guittot,
	bristot, zhaolei, tj, lizefan.x, hannes, linux-kernel

On Sat, Mar 12, 2022 at 01:15:33PM +0100, Peter Zijlstra wrote:
> On Thu, Mar 10, 2022 at 07:01:52AM -0800, Paul E. McKenney wrote:
> 
> > > > > > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
> > > > > 
> > > > > Arguably, with the flavours folded again, rcu_dereference_check() ought
> > > > > to default include rcu_read_lock_sched_held() or its equivalent I
> > > > > suppose.
> > > > > 
> > > > > Paul?
> > > > 
> > > > That would reduce the number of warnings, but it also would hide bugs.
> > > > 
> > > > So, are you sure you really want this?
> > > 
> > > I don't understand... Since the flavours got merged regular RCU has it's
> > > quescent state held off by preempt_disable. So how can relying on that
> > > cause bugs?
> > 
> > Somene forgets an rcu_read_lock() and there happens to be something
> > like a preempt_disable() that by coincidence covers that particular
> > rcu_dereference().  The kernel therefore doesn't complain.  That someone
> > goes on to other things, maybe even posthumously.  Then some time later
> > the preempt_disable() goes away, for good and sufficient reasons.
> > 
> > Good luck figuring out where to put the needed rcu_read_lock() and
> > rcu_read_unlock().
> 
> Well, that's software engineering for you.

My point exactly!!!

>                                            Also in that case the warning
> will work as expected. Then figuring out how to fix it is not the
> problem of the warning -- that worked as advertised.
> 
> (also, I don't think it'll be too hard, you just gotta figure out which
> object is rcu protected -- the warning gives you this, where the lookup
> happens -- again the warning helps, and how long it's used for, all
> relatively well definted things)

Without in any way agreeing with that assessment of difficulty, especially
in the general case...  It is -way- easier just to tell RCU what your
design rules are for the code in question.

> I don't see a problem. No bugs hidden.

C'mon, Peter!

There really was a bug hidden.  That someone intended to add some
calls to rcu_read_lock() and rcu_read_unlock() in the proper places.
Their failure to add them really was a bug.

That bug was hidden by: (1) There being a preempt_disable() or
whatever that by coincidence happened to be covering the part of the
code containing the rcu_dereference() and (2) Your proposed change that
would make rcu_dereference() unable to detect that bug.

And that bug can be quite bad.  Given your proposed change, RCU
cannot detect this bug:


	/* Preemption is enabled. */
	/* There should be an rcu_read_lock() here. */
	preempt_disable();
	p = rcu_dereference(gp);
	do_something_with(p);
	preempt_enable();
	/* Without the rcu_read_lock(), *p is history. */
	do_something_else_with(p);
	/* There should be an rcu_read_unlock() here. */

> > > And if we can rely on that, then surely rcu_dereferenced_check() ought
> > > to play by the same rules, otherwise we get silly warnings like these at
> > > hand.
> > > 
> > > Specifically, we removed the rcu_read_lock() here because this has
> > > rq->lock held, which is a raw_spinlock_t which very much implies preempt
> > > disable, on top of that, it's also an IRQ-safe lock and thus IRQs will
> > > be disabled.
> > > 
> > > There is no possible way for RCU to make progress.
> > 
> > Then let's have that particular rcu_dereference_check() explicitly state
> > what it needs, which seems to be either rcu_read_lock() on the one hand.
> > Right now, that could be just this:
> > 
> > 	p = rcu_dereference_check(gp, rcu_read_lock_sched_held());
> > 
> > Or am I missing something here?
> 
> That will work; I just don't agree with it. Per the rules of RCU it is
> entirely correct to mix rcu_read_lock() and preempt_disable() (or
> anything that implies the same). So I strongly feel that
> rcu_dereference() should not warn about obviously correct code. Why
> would we need to special case this ?

This use case might well be entirely correct, but it is most certainly
not the common case.

Therefore, my answer to this requested chance in rcu_dereference()
semantics is "no".

							Thanx, Paul

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-03-12 17:45 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-20  5:14 [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Chengming Zhou
2022-02-20  5:14 ` [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock Chengming Zhou
2022-03-01 15:24   ` [tip: sched/core] sched/cpuacct: Optimize " tip-bot2 for Chengming Zhou
     [not found]   ` <CGME20220308232034eucas1p2b0f39cee0f462af6004ebdfbe5bacb9f@eucas1p2.samsung.com>
2022-03-08 23:20     ` [PATCH v3 2/3] sched/cpuacct: optimize " Marek Szyprowski
2022-03-08 23:32       ` Peter Zijlstra
2022-03-08 23:44         ` Paul E. McKenney
2022-03-09  0:21           ` Paul E. McKenney
2022-03-10  8:45           ` Peter Zijlstra
2022-03-10 15:01             ` Paul E. McKenney
2022-03-12 12:15               ` Peter Zijlstra
2022-03-12 17:44                 ` Paul E. McKenney
2022-03-09  3:08       ` [External] " Chengming Zhou
2022-02-20  5:14 ` [PATCH v3 3/3] sched/cpuacct: remove redundant " Chengming Zhou
2022-03-01 15:24   ` [tip: sched/core] sched/cpuacct: Remove " tip-bot2 for Chengming Zhou
     [not found]   ` <CGME20220308233107eucas1p119a2f5a8d4f5b5eec38ea8dde92b3368@eucas1p1.samsung.com>
2022-03-08 23:31     ` [PATCH v3 3/3] sched/cpuacct: remove " Marek Szyprowski
2022-02-22 18:01 ` [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Tejun Heo
2022-02-23  9:11   ` Peter Zijlstra
2022-03-01 15:24 ` [tip: sched/core] sched/cpuacct: Fix " tip-bot2 for Chengming Zhou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).