* [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage
@ 2022-02-20 5:14 Chengming Zhou
2022-02-20 5:14 ` [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock Chengming Zhou
` (3 more replies)
0 siblings, 4 replies; 18+ messages in thread
From: Chengming Zhou @ 2022-02-20 5:14 UTC (permalink / raw)
To: mingo, peterz, vincent.guittot, bristot, zhaolei, tj, lizefan.x, hannes
Cc: linux-kernel, Chengming Zhou, Minye Zhu
The cpuacct_account_field() is always called by the current task
itself, so it's ok to use __this_cpu_add() to charge the tick time.
But cpuacct_charge() maybe called by update_curr() in load_balance()
on a random CPU, different from the CPU on which the task is running.
So __this_cpu_add() will charge that cputime to a random incorrect CPU.
Fixes: 73e6aafd9ea8 ("sched/cpuacct: Simplify the cpuacct code")
Reported-by: Minye Zhu <zhuminye@bytedance.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
kernel/sched/cpuacct.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
index 3d06c5e4220d..307800586ac8 100644
--- a/kernel/sched/cpuacct.c
+++ b/kernel/sched/cpuacct.c
@@ -334,12 +334,13 @@ static struct cftype files[] = {
*/
void cpuacct_charge(struct task_struct *tsk, u64 cputime)
{
+ unsigned int cpu = task_cpu(tsk);
struct cpuacct *ca;
rcu_read_lock();
for (ca = task_ca(tsk); ca; ca = parent_ca(ca))
- __this_cpu_add(*ca->cpuusage, cputime);
+ *per_cpu_ptr(ca->cpuusage, cpu) += cputime;
rcu_read_unlock();
}
--
2.20.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock
2022-02-20 5:14 [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Chengming Zhou
@ 2022-02-20 5:14 ` Chengming Zhou
2022-03-01 15:24 ` [tip: sched/core] sched/cpuacct: Optimize " tip-bot2 for Chengming Zhou
[not found] ` <CGME20220308232034eucas1p2b0f39cee0f462af6004ebdfbe5bacb9f@eucas1p2.samsung.com>
2022-02-20 5:14 ` [PATCH v3 3/3] sched/cpuacct: remove redundant " Chengming Zhou
` (2 subsequent siblings)
3 siblings, 2 replies; 18+ messages in thread
From: Chengming Zhou @ 2022-02-20 5:14 UTC (permalink / raw)
To: mingo, peterz, vincent.guittot, bristot, zhaolei, tj, lizefan.x, hannes
Cc: linux-kernel, Chengming Zhou
Since cpuacct_charge() is called from the scheduler update_curr(),
we must already have rq lock held, then the RCU read lock can
be optimized away.
And do the same thing in it's wrapper cgroup_account_cputime(),
but we can't use lockdep_assert_rq_held() there, which defined
in kernel/sched/sched.h.
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
include/linux/cgroup.h | 2 --
kernel/sched/cpuacct.c | 4 +---
2 files changed, 1 insertion(+), 5 deletions(-)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 75c151413fda..9a109c6ac0e0 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -791,11 +791,9 @@ static inline void cgroup_account_cputime(struct task_struct *task,
cpuacct_charge(task, delta_exec);
- rcu_read_lock();
cgrp = task_dfl_cgroup(task);
if (cgroup_parent(cgrp))
__cgroup_account_cputime(cgrp, delta_exec);
- rcu_read_unlock();
}
static inline void cgroup_account_cputime_field(struct task_struct *task,
diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
index 307800586ac8..f79f88456d72 100644
--- a/kernel/sched/cpuacct.c
+++ b/kernel/sched/cpuacct.c
@@ -337,12 +337,10 @@ void cpuacct_charge(struct task_struct *tsk, u64 cputime)
unsigned int cpu = task_cpu(tsk);
struct cpuacct *ca;
- rcu_read_lock();
+ lockdep_assert_rq_held(cpu_rq(cpu));
for (ca = task_ca(tsk); ca; ca = parent_ca(ca))
*per_cpu_ptr(ca->cpuusage, cpu) += cputime;
-
- rcu_read_unlock();
}
/*
--
2.20.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v3 3/3] sched/cpuacct: remove redundant RCU read lock
2022-02-20 5:14 [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Chengming Zhou
2022-02-20 5:14 ` [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock Chengming Zhou
@ 2022-02-20 5:14 ` Chengming Zhou
2022-03-01 15:24 ` [tip: sched/core] sched/cpuacct: Remove " tip-bot2 for Chengming Zhou
[not found] ` <CGME20220308233107eucas1p119a2f5a8d4f5b5eec38ea8dde92b3368@eucas1p1.samsung.com>
2022-02-22 18:01 ` [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Tejun Heo
2022-03-01 15:24 ` [tip: sched/core] sched/cpuacct: Fix " tip-bot2 for Chengming Zhou
3 siblings, 2 replies; 18+ messages in thread
From: Chengming Zhou @ 2022-02-20 5:14 UTC (permalink / raw)
To: mingo, peterz, vincent.guittot, bristot, zhaolei, tj, lizefan.x, hannes
Cc: linux-kernel, Chengming Zhou
The cpuacct_account_field() and it's cgroup v2 wrapper
cgroup_account_cputime_field() is only called from cputime
in task_group_account_field(), which is already in RCU read-side
critical section. So remove these redundant RCU read lock.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
---
include/linux/cgroup.h | 2 --
kernel/sched/cpuacct.c | 2 --
2 files changed, 4 deletions(-)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 9a109c6ac0e0..1e356c222756 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -804,11 +804,9 @@ static inline void cgroup_account_cputime_field(struct task_struct *task,
cpuacct_account_field(task, index, delta_exec);
- rcu_read_lock();
cgrp = task_dfl_cgroup(task);
if (cgroup_parent(cgrp))
__cgroup_account_cputime_field(cgrp, index, delta_exec);
- rcu_read_unlock();
}
#else /* CONFIG_CGROUPS */
diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
index f79f88456d72..d269ede84e85 100644
--- a/kernel/sched/cpuacct.c
+++ b/kernel/sched/cpuacct.c
@@ -352,10 +352,8 @@ void cpuacct_account_field(struct task_struct *tsk, int index, u64 val)
{
struct cpuacct *ca;
- rcu_read_lock();
for (ca = task_ca(tsk); ca != &root_cpuacct; ca = parent_ca(ca))
__this_cpu_add(ca->cpustat->cpustat[index], val);
- rcu_read_unlock();
}
struct cgroup_subsys cpuacct_cgrp_subsys = {
--
2.20.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage
2022-02-20 5:14 [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Chengming Zhou
2022-02-20 5:14 ` [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock Chengming Zhou
2022-02-20 5:14 ` [PATCH v3 3/3] sched/cpuacct: remove redundant " Chengming Zhou
@ 2022-02-22 18:01 ` Tejun Heo
2022-02-23 9:11 ` Peter Zijlstra
2022-03-01 15:24 ` [tip: sched/core] sched/cpuacct: Fix " tip-bot2 for Chengming Zhou
3 siblings, 1 reply; 18+ messages in thread
From: Tejun Heo @ 2022-02-22 18:01 UTC (permalink / raw)
To: Chengming Zhou
Cc: mingo, peterz, vincent.guittot, bristot, zhaolei, lizefan.x,
hannes, linux-kernel, Minye Zhu
On Sun, Feb 20, 2022 at 01:14:24PM +0800, Chengming Zhou wrote:
> The cpuacct_account_field() is always called by the current task
> itself, so it's ok to use __this_cpu_add() to charge the tick time.
>
> But cpuacct_charge() maybe called by update_curr() in load_balance()
> on a random CPU, different from the CPU on which the task is running.
> So __this_cpu_add() will charge that cputime to a random incorrect CPU.
>
> Fixes: 73e6aafd9ea8 ("sched/cpuacct: Simplify the cpuacct code")
> Reported-by: Minye Zhu <zhuminye@bytedance.com>
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
For all three patches,
Acked-by: Tejun Heo <tj@kernel.org>
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage
2022-02-22 18:01 ` [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Tejun Heo
@ 2022-02-23 9:11 ` Peter Zijlstra
0 siblings, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2022-02-23 9:11 UTC (permalink / raw)
To: Tejun Heo
Cc: Chengming Zhou, mingo, vincent.guittot, bristot, zhaolei,
lizefan.x, hannes, linux-kernel, Minye Zhu
On Tue, Feb 22, 2022 at 08:01:51AM -1000, Tejun Heo wrote:
> On Sun, Feb 20, 2022 at 01:14:24PM +0800, Chengming Zhou wrote:
> > The cpuacct_account_field() is always called by the current task
> > itself, so it's ok to use __this_cpu_add() to charge the tick time.
> >
> > But cpuacct_charge() maybe called by update_curr() in load_balance()
> > on a random CPU, different from the CPU on which the task is running.
> > So __this_cpu_add() will charge that cputime to a random incorrect CPU.
> >
> > Fixes: 73e6aafd9ea8 ("sched/cpuacct: Simplify the cpuacct code")
> > Reported-by: Minye Zhu <zhuminye@bytedance.com>
> > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
>
> For all three patches,
>
> Acked-by: Tejun Heo <tj@kernel.org>
Thanks!
^ permalink raw reply [flat|nested] 18+ messages in thread
* [tip: sched/core] sched/cpuacct: Remove redundant RCU read lock
2022-02-20 5:14 ` [PATCH v3 3/3] sched/cpuacct: remove redundant " Chengming Zhou
@ 2022-03-01 15:24 ` tip-bot2 for Chengming Zhou
[not found] ` <CGME20220308233107eucas1p119a2f5a8d4f5b5eec38ea8dde92b3368@eucas1p1.samsung.com>
1 sibling, 0 replies; 18+ messages in thread
From: tip-bot2 for Chengming Zhou @ 2022-03-01 15:24 UTC (permalink / raw)
To: linux-tip-commits
Cc: Tejun Heo, Chengming Zhou, Peter Zijlstra (Intel), x86, linux-kernel
The following commit has been merged into the sched/core branch of tip:
Commit-ID: 3eba0505d03a9c1eb30d40c2330c0880b22d1b3a
Gitweb: https://git.kernel.org/tip/3eba0505d03a9c1eb30d40c2330c0880b22d1b3a
Author: Chengming Zhou <zhouchengming@bytedance.com>
AuthorDate: Sun, 20 Feb 2022 13:14:26 +08:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 01 Mar 2022 16:18:38 +01:00
sched/cpuacct: Remove redundant RCU read lock
The cpuacct_account_field() and it's cgroup v2 wrapper
cgroup_account_cputime_field() is only called from cputime
in task_group_account_field(), which is already in RCU read-side
critical section. So remove these redundant RCU read lock.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220220051426.5274-3-zhouchengming@bytedance.com
---
include/linux/cgroup.h | 2 --
kernel/sched/cpuacct.c | 2 --
2 files changed, 4 deletions(-)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 9a109c6..1e356c2 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -804,11 +804,9 @@ static inline void cgroup_account_cputime_field(struct task_struct *task,
cpuacct_account_field(task, index, delta_exec);
- rcu_read_lock();
cgrp = task_dfl_cgroup(task);
if (cgroup_parent(cgrp))
__cgroup_account_cputime_field(cgrp, index, delta_exec);
- rcu_read_unlock();
}
#else /* CONFIG_CGROUPS */
diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
index f79f884..d269ede 100644
--- a/kernel/sched/cpuacct.c
+++ b/kernel/sched/cpuacct.c
@@ -352,10 +352,8 @@ void cpuacct_account_field(struct task_struct *tsk, int index, u64 val)
{
struct cpuacct *ca;
- rcu_read_lock();
for (ca = task_ca(tsk); ca != &root_cpuacct; ca = parent_ca(ca))
__this_cpu_add(ca->cpustat->cpustat[index], val);
- rcu_read_unlock();
}
struct cgroup_subsys cpuacct_cgrp_subsys = {
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [tip: sched/core] sched/cpuacct: Optimize away RCU read lock
2022-02-20 5:14 ` [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock Chengming Zhou
@ 2022-03-01 15:24 ` tip-bot2 for Chengming Zhou
[not found] ` <CGME20220308232034eucas1p2b0f39cee0f462af6004ebdfbe5bacb9f@eucas1p2.samsung.com>
1 sibling, 0 replies; 18+ messages in thread
From: tip-bot2 for Chengming Zhou @ 2022-03-01 15:24 UTC (permalink / raw)
To: linux-tip-commits
Cc: Peter Zijlstra (Intel), Chengming Zhou, x86, linux-kernel
The following commit has been merged into the sched/core branch of tip:
Commit-ID: dc6e0818bc9a0336d9accf3ea35d146d72aa7a18
Gitweb: https://git.kernel.org/tip/dc6e0818bc9a0336d9accf3ea35d146d72aa7a18
Author: Chengming Zhou <zhouchengming@bytedance.com>
AuthorDate: Sun, 20 Feb 2022 13:14:25 +08:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 01 Mar 2022 16:18:38 +01:00
sched/cpuacct: Optimize away RCU read lock
Since cpuacct_charge() is called from the scheduler update_curr(),
we must already have rq lock held, then the RCU read lock can
be optimized away.
And do the same thing in it's wrapper cgroup_account_cputime(),
but we can't use lockdep_assert_rq_held() there, which defined
in kernel/sched/sched.h.
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220220051426.5274-2-zhouchengming@bytedance.com
---
include/linux/cgroup.h | 2 --
kernel/sched/cpuacct.c | 4 +---
2 files changed, 1 insertion(+), 5 deletions(-)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 75c1514..9a109c6 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -791,11 +791,9 @@ static inline void cgroup_account_cputime(struct task_struct *task,
cpuacct_charge(task, delta_exec);
- rcu_read_lock();
cgrp = task_dfl_cgroup(task);
if (cgroup_parent(cgrp))
__cgroup_account_cputime(cgrp, delta_exec);
- rcu_read_unlock();
}
static inline void cgroup_account_cputime_field(struct task_struct *task,
diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
index 3078005..f79f884 100644
--- a/kernel/sched/cpuacct.c
+++ b/kernel/sched/cpuacct.c
@@ -337,12 +337,10 @@ void cpuacct_charge(struct task_struct *tsk, u64 cputime)
unsigned int cpu = task_cpu(tsk);
struct cpuacct *ca;
- rcu_read_lock();
+ lockdep_assert_rq_held(cpu_rq(cpu));
for (ca = task_ca(tsk); ca; ca = parent_ca(ca))
*per_cpu_ptr(ca->cpuusage, cpu) += cputime;
-
- rcu_read_unlock();
}
/*
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [tip: sched/core] sched/cpuacct: Fix charge percpu cpuusage
2022-02-20 5:14 [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Chengming Zhou
` (2 preceding siblings ...)
2022-02-22 18:01 ` [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Tejun Heo
@ 2022-03-01 15:24 ` tip-bot2 for Chengming Zhou
3 siblings, 0 replies; 18+ messages in thread
From: tip-bot2 for Chengming Zhou @ 2022-03-01 15:24 UTC (permalink / raw)
To: linux-tip-commits
Cc: Minye Zhu, Chengming Zhou, Peter Zijlstra (Intel),
Tejun Heo, x86, linux-kernel
The following commit has been merged into the sched/core branch of tip:
Commit-ID: 248cc9993d1cc12b8e9ed716cc3fc09f6c3517dd
Gitweb: https://git.kernel.org/tip/248cc9993d1cc12b8e9ed716cc3fc09f6c3517dd
Author: Chengming Zhou <zhouchengming@bytedance.com>
AuthorDate: Sun, 20 Feb 2022 13:14:24 +08:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 01 Mar 2022 16:18:37 +01:00
sched/cpuacct: Fix charge percpu cpuusage
The cpuacct_account_field() is always called by the current task
itself, so it's ok to use __this_cpu_add() to charge the tick time.
But cpuacct_charge() maybe called by update_curr() in load_balance()
on a random CPU, different from the CPU on which the task is running.
So __this_cpu_add() will charge that cputime to a random incorrect CPU.
Fixes: 73e6aafd9ea8 ("sched/cpuacct: Simplify the cpuacct code")
Reported-by: Minye Zhu <zhuminye@bytedance.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220220051426.5274-1-zhouchengming@bytedance.com
---
kernel/sched/cpuacct.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
index 3d06c5e..3078005 100644
--- a/kernel/sched/cpuacct.c
+++ b/kernel/sched/cpuacct.c
@@ -334,12 +334,13 @@ static struct cftype files[] = {
*/
void cpuacct_charge(struct task_struct *tsk, u64 cputime)
{
+ unsigned int cpu = task_cpu(tsk);
struct cpuacct *ca;
rcu_read_lock();
for (ca = task_ca(tsk); ca; ca = parent_ca(ca))
- __this_cpu_add(*ca->cpuusage, cputime);
+ *per_cpu_ptr(ca->cpuusage, cpu) += cputime;
rcu_read_unlock();
}
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock
[not found] ` <CGME20220308232034eucas1p2b0f39cee0f462af6004ebdfbe5bacb9f@eucas1p2.samsung.com>
@ 2022-03-08 23:20 ` Marek Szyprowski
2022-03-08 23:32 ` Peter Zijlstra
2022-03-09 3:08 ` [External] " Chengming Zhou
0 siblings, 2 replies; 18+ messages in thread
From: Marek Szyprowski @ 2022-03-08 23:20 UTC (permalink / raw)
To: Chengming Zhou, mingo, peterz, vincent.guittot, bristot, zhaolei,
tj, lizefan.x, hannes
Cc: linux-kernel
On 20.02.2022 06:14, Chengming Zhou wrote:
> Since cpuacct_charge() is called from the scheduler update_curr(),
> we must already have rq lock held, then the RCU read lock can
> be optimized away.
>
> And do the same thing in it's wrapper cgroup_account_cputime(),
> but we can't use lockdep_assert_rq_held() there, which defined
> in kernel/sched/sched.h.
>
> Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
This patch landed recently in linux-next as commit dc6e0818bc9a
("sched/cpuacct: Optimize away RCU read lock"). On my test systems I
found that it triggers a following warning in the early boot stage:
Calibrating delay loop (skipped), value calculated using timer
frequency.. 48.00 BogoMIPS (lpj=240000)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
CPU: Testing write buffer coherency: ok
CPU0: Spectre v2: using BPIALL workaround
=============================
WARNING: suspicious RCU usage
5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted
-----------------------------
./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 1
2 locks held by kthreadd/2:
#0: c1d7972c (&p->pi_lock){....}-{2:2}, at: task_rq_lock+0x30/0x118
#1: ef7b52d0 (&rq->__lock){-...}-{2:2}, at:
raw_spin_rq_lock_nested+0x24/0x34
stack backtrace:
CPU: 0 PID: 2 Comm: kthreadd Not tainted 5.17.0-rc5-00050-gdc6e0818bc9a
#11458
Hardware name: Samsung Exynos (Flattened Device Tree)
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x58/0x70
dump_stack_lvl from update_curr+0x1bc/0x35c
update_curr from dequeue_task_fair+0xb0/0x8e8
dequeue_task_fair from __do_set_cpus_allowed+0x19c/0x258
__do_set_cpus_allowed from __set_cpus_allowed_ptr_locked+0x130/0x1d8
__set_cpus_allowed_ptr_locked from __set_cpus_allowed_ptr+0x48/0x64
__set_cpus_allowed_ptr from kthreadd+0x44/0x16c
kthreadd from ret_from_fork+0x14/0x2c
Exception stack(0xc1cb9fb0 to 0xc1cb9ff8)
9fa0: 00000000 00000000 00000000
00000000
9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000
9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
CPU0: thread -1, cpu 0, socket 9, mpidr 80000900
cblist_init_generic: Setting adjustable number of callback queues.
cblist_init_generic: Setting shift to 1 and lim to 1.
Running RCU-tasks wait API self tests
Setting up static identity map for 0x40100000 - 0x40100060
rcu: Hierarchical SRCU implementation.
=============================
WARNING: suspicious RCU usage
5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted
-----------------------------
./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 1
1 lock held by migration/0/13:
#0: ef7b52d0 (&rq->__lock){-...}-{2:2}, at:
raw_spin_rq_lock_nested+0x24/0x34
stack backtrace:
CPU: 0 PID: 13 Comm: migration/0 Not tainted
5.17.0-rc5-00050-gdc6e0818bc9a #11458
Hardware name: Samsung Exynos (Flattened Device Tree)
Stopper: 0x0 <- 0x0
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x58/0x70
dump_stack_lvl from put_prev_task_stop+0x16c/0x25c
put_prev_task_stop from __schedule+0x698/0x964
__schedule from schedule+0x54/0xe0
schedule from smpboot_thread_fn+0x218/0x288
smpboot_thread_fn from kthread+0xf0/0x134
kthread from ret_from_fork+0x14/0x2c
Exception stack(0xc1ccffb0 to 0xc1ccfff8)
ffa0: 00000000 00000000 00000000
00000000
ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000
ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
smp: Bringing up secondary CPUs ...
CPU1: thread -1, cpu 1, socket 9, mpidr 80000901
CPU1: Spectre v2: using BPIALL workaround
smp: Brought up 1 node, 2 CPUs
SMP: Total of 2 processors activated (96.00 BogoMIPS).
The above log comes from ARM 32bit Samsung Exnyos4210 based Trats board.
> ---
> include/linux/cgroup.h | 2 --
> kernel/sched/cpuacct.c | 4 +---
> 2 files changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index 75c151413fda..9a109c6ac0e0 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -791,11 +791,9 @@ static inline void cgroup_account_cputime(struct task_struct *task,
>
> cpuacct_charge(task, delta_exec);
>
> - rcu_read_lock();
> cgrp = task_dfl_cgroup(task);
> if (cgroup_parent(cgrp))
> __cgroup_account_cputime(cgrp, delta_exec);
> - rcu_read_unlock();
> }
>
> static inline void cgroup_account_cputime_field(struct task_struct *task,
> diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
> index 307800586ac8..f79f88456d72 100644
> --- a/kernel/sched/cpuacct.c
> +++ b/kernel/sched/cpuacct.c
> @@ -337,12 +337,10 @@ void cpuacct_charge(struct task_struct *tsk, u64 cputime)
> unsigned int cpu = task_cpu(tsk);
> struct cpuacct *ca;
>
> - rcu_read_lock();
> + lockdep_assert_rq_held(cpu_rq(cpu));
>
> for (ca = task_ca(tsk); ca; ca = parent_ca(ca))
> *per_cpu_ptr(ca->cpuusage, cpu) += cputime;
> -
> - rcu_read_unlock();
> }
>
> /*
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 3/3] sched/cpuacct: remove redundant RCU read lock
[not found] ` <CGME20220308233107eucas1p119a2f5a8d4f5b5eec38ea8dde92b3368@eucas1p1.samsung.com>
@ 2022-03-08 23:31 ` Marek Szyprowski
0 siblings, 0 replies; 18+ messages in thread
From: Marek Szyprowski @ 2022-03-08 23:31 UTC (permalink / raw)
To: Chengming Zhou, mingo, peterz, vincent.guittot, bristot, zhaolei,
tj, lizefan.x, hannes
Cc: linux-kernel
On 20.02.2022 06:14, Chengming Zhou wrote:
> The cpuacct_account_field() and it's cgroup v2 wrapper
> cgroup_account_cputime_field() is only called from cputime
> in task_group_account_field(), which is already in RCU read-side
> critical section. So remove these redundant RCU read lock.
>
> Suggested-by: Tejun Heo <tj@kernel.org>
> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
This patch landed recently in linux-next as commit 3eba0505d03a
("sched/cpuacct: Remove redundant RCU read lock"). On my test systems I
found that it triggers a following warning in the early boot stage:
Calibrating delay loop (skipped), value calculated using timer
frequency.. 48.00 BogoMIPS (lpj=240000)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
CPU: Testing write buffer coherency: ok
CPU0: Spectre v2: using BPIALL workaround
CPU0: thread -1, cpu 0, socket 9, mpidr 80000900
cblist_init_generic: Setting adjustable number of callback queues.
=============================
WARNING: suspicious RCU usage
5.17.0-rc5-00052-g167ee9b08bed #11459 Not tainted
-----------------------------
./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 1
no locks held by swapper/0/1.
stack backtrace:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.0-rc5-00052-g167ee9b08bed
#11459
Hardware name: Samsung Exynos (Flattened Device Tree)
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x58/0x70
dump_stack_lvl from account_system_index_time+0x13c/0x148
account_system_index_time from account_system_time+0x78/0xa0
account_system_time from update_process_times+0x50/0xc4
update_process_times from tick_handle_periodic+0x1c/0x94
tick_handle_periodic from exynos4_mct_tick_isr+0x30/0x38
exynos4_mct_tick_isr from __handle_irq_event_percpu+0x74/0x3ac
__handle_irq_event_percpu from handle_irq_event_percpu+0xc/0x38
handle_irq_event_percpu from handle_irq_event+0x38/0x5c
handle_irq_event from handle_fasteoi_irq+0xc4/0x180
handle_fasteoi_irq from generic_handle_domain_irq+0x44/0x88
generic_handle_domain_irq from gic_handle_irq+0x88/0xac
gic_handle_irq from generic_handle_arch_irq+0x58/0x78
generic_handle_arch_irq from __irq_svc+0x54/0x88
Exception stack(0xc1cafea0 to 0xc1cafee8)
fea0: 00000000 c0f33ac4 00000001 00000129 60000053 c127b458 00000008
2e64e000
fec0: ef7bd5fc c127af84 c1209048 c1208f10 00000000 c1cafef0 c0b7e25c
c0b8976c
fee0: 20000053 ffffffff
__irq_svc from _raw_spin_unlock_irqrestore+0x2c/0x60
_raw_spin_unlock_irqrestore from
cblist_init_generic.constprop.14+0x298/0x328
cblist_init_generic.constprop.14 from rcu_init_tasks_generic+0x18/0x110
rcu_init_tasks_generic from kernel_init_freeable+0xa0/0x214
kernel_init_freeable from kernel_init+0x18/0x12c
kernel_init from ret_from_fork+0x14/0x2c
Exception stack(0xc1caffb0 to 0xc1cafff8)
ffa0: 00000000 00000000 00000000
00000000
ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000
ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
cblist_init_generic: Setting shift to 1 and lim to 1.
Running RCU-tasks wait API self tests
Setting up static identity map for 0x40100000 - 0x40100060
rcu: Hierarchical SRCU implementation.
smp: Bringing up secondary CPUs ...
CPU1: thread -1, cpu 1, socket 9, mpidr 80000901
CPU1: Spectre v2: using BPIALL workaround
smp: Brought up 1 node, 2 CPUs
SMP: Total of 2 processors activated (96.00 BogoMIPS).
CPU: All CPU(s) started in SVC mode.
The above log comes from ARM 32bit Samsung Exnyos4210 based Trats board.
To get above log I reverted commit dc6e0818bc9a ("sched/cpuacct:
Optimize away RCU read lock"), because it triggered a similar warning.
> ---
> include/linux/cgroup.h | 2 --
> kernel/sched/cpuacct.c | 2 --
> 2 files changed, 4 deletions(-)
>
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index 9a109c6ac0e0..1e356c222756 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -804,11 +804,9 @@ static inline void cgroup_account_cputime_field(struct task_struct *task,
>
> cpuacct_account_field(task, index, delta_exec);
>
> - rcu_read_lock();
> cgrp = task_dfl_cgroup(task);
> if (cgroup_parent(cgrp))
> __cgroup_account_cputime_field(cgrp, index, delta_exec);
> - rcu_read_unlock();
> }
>
> #else /* CONFIG_CGROUPS */
> diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
> index f79f88456d72..d269ede84e85 100644
> --- a/kernel/sched/cpuacct.c
> +++ b/kernel/sched/cpuacct.c
> @@ -352,10 +352,8 @@ void cpuacct_account_field(struct task_struct *tsk, int index, u64 val)
> {
> struct cpuacct *ca;
>
> - rcu_read_lock();
> for (ca = task_ca(tsk); ca != &root_cpuacct; ca = parent_ca(ca))
> __this_cpu_add(ca->cpustat->cpustat[index], val);
> - rcu_read_unlock();
> }
>
> struct cgroup_subsys cpuacct_cgrp_subsys = {
Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock
2022-03-08 23:20 ` [PATCH v3 2/3] sched/cpuacct: optimize " Marek Szyprowski
@ 2022-03-08 23:32 ` Peter Zijlstra
2022-03-08 23:44 ` Paul E. McKenney
2022-03-09 3:08 ` [External] " Chengming Zhou
1 sibling, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2022-03-08 23:32 UTC (permalink / raw)
To: Marek Szyprowski
Cc: Chengming Zhou, mingo, vincent.guittot, bristot, zhaolei, tj,
lizefan.x, hannes, linux-kernel, Paul McKenney
On Wed, Mar 09, 2022 at 12:20:33AM +0100, Marek Szyprowski wrote:
> On 20.02.2022 06:14, Chengming Zhou wrote:
> > Since cpuacct_charge() is called from the scheduler update_curr(),
> > we must already have rq lock held, then the RCU read lock can
> > be optimized away.
> >
> > And do the same thing in it's wrapper cgroup_account_cputime(),
> > but we can't use lockdep_assert_rq_held() there, which defined
> > in kernel/sched/sched.h.
> >
> > Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
>
> This patch landed recently in linux-next as commit dc6e0818bc9a
> ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I
> found that it triggers a following warning in the early boot stage:
>
> Calibrating delay loop (skipped), value calculated using timer
> frequency.. 48.00 BogoMIPS (lpj=240000)
> pid_max: default: 32768 minimum: 301
> Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> CPU: Testing write buffer coherency: ok
> CPU0: Spectre v2: using BPIALL workaround
>
> =============================
> WARNING: suspicious RCU usage
> 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted
> -----------------------------
> ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
Arguably, with the flavours folded again, rcu_dereference_check() ought
to default include rcu_read_lock_sched_held() or its equivalent I
suppose.
Paul?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock
2022-03-08 23:32 ` Peter Zijlstra
@ 2022-03-08 23:44 ` Paul E. McKenney
2022-03-09 0:21 ` Paul E. McKenney
2022-03-10 8:45 ` Peter Zijlstra
0 siblings, 2 replies; 18+ messages in thread
From: Paul E. McKenney @ 2022-03-08 23:44 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Marek Szyprowski, Chengming Zhou, mingo, vincent.guittot,
bristot, zhaolei, tj, lizefan.x, hannes, linux-kernel
On Wed, Mar 09, 2022 at 12:32:25AM +0100, Peter Zijlstra wrote:
> On Wed, Mar 09, 2022 at 12:20:33AM +0100, Marek Szyprowski wrote:
> > On 20.02.2022 06:14, Chengming Zhou wrote:
> > > Since cpuacct_charge() is called from the scheduler update_curr(),
> > > we must already have rq lock held, then the RCU read lock can
> > > be optimized away.
> > >
> > > And do the same thing in it's wrapper cgroup_account_cputime(),
> > > but we can't use lockdep_assert_rq_held() there, which defined
> > > in kernel/sched/sched.h.
> > >
> > > Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> >
> > This patch landed recently in linux-next as commit dc6e0818bc9a
> > ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I
> > found that it triggers a following warning in the early boot stage:
> >
> > Calibrating delay loop (skipped), value calculated using timer
> > frequency.. 48.00 BogoMIPS (lpj=240000)
> > pid_max: default: 32768 minimum: 301
> > Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> > Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> > CPU: Testing write buffer coherency: ok
> > CPU0: Spectre v2: using BPIALL workaround
> >
> > =============================
> > WARNING: suspicious RCU usage
> > 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted
> > -----------------------------
> > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
>
> Arguably, with the flavours folded again, rcu_dereference_check() ought
> to default include rcu_read_lock_sched_held() or its equivalent I
> suppose.
>
> Paul?
That would reduce the number of warnings, but it also would hide bugs.
So, are you sure you really want this?
Thanx, Paul
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock
2022-03-08 23:44 ` Paul E. McKenney
@ 2022-03-09 0:21 ` Paul E. McKenney
2022-03-10 8:45 ` Peter Zijlstra
1 sibling, 0 replies; 18+ messages in thread
From: Paul E. McKenney @ 2022-03-09 0:21 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Marek Szyprowski, Chengming Zhou, mingo, vincent.guittot,
bristot, zhaolei, tj, lizefan.x, hannes, linux-kernel
On Tue, Mar 08, 2022 at 03:44:03PM -0800, Paul E. McKenney wrote:
> On Wed, Mar 09, 2022 at 12:32:25AM +0100, Peter Zijlstra wrote:
> > On Wed, Mar 09, 2022 at 12:20:33AM +0100, Marek Szyprowski wrote:
> > > On 20.02.2022 06:14, Chengming Zhou wrote:
> > > > Since cpuacct_charge() is called from the scheduler update_curr(),
> > > > we must already have rq lock held, then the RCU read lock can
> > > > be optimized away.
> > > >
> > > > And do the same thing in it's wrapper cgroup_account_cputime(),
> > > > but we can't use lockdep_assert_rq_held() there, which defined
> > > > in kernel/sched/sched.h.
> > > >
> > > > Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> > >
> > > This patch landed recently in linux-next as commit dc6e0818bc9a
> > > ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I
> > > found that it triggers a following warning in the early boot stage:
> > >
> > > Calibrating delay loop (skipped), value calculated using timer
> > > frequency.. 48.00 BogoMIPS (lpj=240000)
> > > pid_max: default: 32768 minimum: 301
> > > Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> > > Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> > > CPU: Testing write buffer coherency: ok
> > > CPU0: Spectre v2: using BPIALL workaround
> > >
> > > =============================
> > > WARNING: suspicious RCU usage
> > > 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted
> > > -----------------------------
> > > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
> >
> > Arguably, with the flavours folded again, rcu_dereference_check() ought
> > to default include rcu_read_lock_sched_held() or its equivalent I
> > suppose.
> >
> > Paul?
>
> That would reduce the number of warnings, but it also would hide bugs.
>
> So, are you sure you really want this?
Of course, if you are designing a use case that really expects multiple
types of readers...
Another approach might be rcu_dereference_brs(), but hopefully with a
better name, that checks for either rcu_read_lock(), disabled BH, or
disabled preemption. Or if you are looking only for rcu_read_lock()
or disabled preemption, rcu_dereference_rs(), again hopefully with a
better name.
Is that what you had in mind?
Thanx, Paul
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [External] Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock
2022-03-08 23:20 ` [PATCH v3 2/3] sched/cpuacct: optimize " Marek Szyprowski
2022-03-08 23:32 ` Peter Zijlstra
@ 2022-03-09 3:08 ` Chengming Zhou
1 sibling, 0 replies; 18+ messages in thread
From: Chengming Zhou @ 2022-03-09 3:08 UTC (permalink / raw)
To: Marek Szyprowski, mingo, peterz, vincent.guittot, bristot,
zhaolei, tj, lizefan.x, hannes
Cc: linux-kernel
On 2022/3/9 7:20 上午, Marek Szyprowski wrote:
> On 20.02.2022 06:14, Chengming Zhou wrote:
>> Since cpuacct_charge() is called from the scheduler update_curr(),
>> we must already have rq lock held, then the RCU read lock can
>> be optimized away.
>>
>> And do the same thing in it's wrapper cgroup_account_cputime(),
>> but we can't use lockdep_assert_rq_held() there, which defined
>> in kernel/sched/sched.h.
>>
>> Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
>
> This patch landed recently in linux-next as commit dc6e0818bc9a
> ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I
> found that it triggers a following warning in the early boot stage:
Hi, thanks for the report. I've send a fix patch[1] for review.
[1] https://lore.kernel.org/lkml/20220305034103.57123-1-zhouchengming@bytedance.com/
>
> Calibrating delay loop (skipped), value calculated using timer
> frequency.. 48.00 BogoMIPS (lpj=240000)
> pid_max: default: 32768 minimum: 301
> Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> CPU: Testing write buffer coherency: ok
> CPU0: Spectre v2: using BPIALL workaround
>
> =============================
> WARNING: suspicious RCU usage
> 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted
> -----------------------------
> ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
>
> other info that might help us debug this:
>
>
> rcu_scheduler_active = 1, debug_locks = 1
> 2 locks held by kthreadd/2:
> #0: c1d7972c (&p->pi_lock){....}-{2:2}, at: task_rq_lock+0x30/0x118
> #1: ef7b52d0 (&rq->__lock){-...}-{2:2}, at:
> raw_spin_rq_lock_nested+0x24/0x34
>
> stack backtrace:
> CPU: 0 PID: 2 Comm: kthreadd Not tainted 5.17.0-rc5-00050-gdc6e0818bc9a
> #11458
> Hardware name: Samsung Exynos (Flattened Device Tree)
> unwind_backtrace from show_stack+0x10/0x14
> show_stack from dump_stack_lvl+0x58/0x70
> dump_stack_lvl from update_curr+0x1bc/0x35c
> update_curr from dequeue_task_fair+0xb0/0x8e8
> dequeue_task_fair from __do_set_cpus_allowed+0x19c/0x258
> __do_set_cpus_allowed from __set_cpus_allowed_ptr_locked+0x130/0x1d8
> __set_cpus_allowed_ptr_locked from __set_cpus_allowed_ptr+0x48/0x64
> __set_cpus_allowed_ptr from kthreadd+0x44/0x16c
> kthreadd from ret_from_fork+0x14/0x2c
> Exception stack(0xc1cb9fb0 to 0xc1cb9ff8)
> 9fa0: 00000000 00000000 00000000
> 00000000
> 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 00000000
> 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> CPU0: thread -1, cpu 0, socket 9, mpidr 80000900
> cblist_init_generic: Setting adjustable number of callback queues.
> cblist_init_generic: Setting shift to 1 and lim to 1.
> Running RCU-tasks wait API self tests
> Setting up static identity map for 0x40100000 - 0x40100060
> rcu: Hierarchical SRCU implementation.
>
> =============================
> WARNING: suspicious RCU usage
> 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted
> -----------------------------
> ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
>
> other info that might help us debug this:
>
>
> rcu_scheduler_active = 1, debug_locks = 1
> 1 lock held by migration/0/13:
> #0: ef7b52d0 (&rq->__lock){-...}-{2:2}, at:
> raw_spin_rq_lock_nested+0x24/0x34
>
> stack backtrace:
> CPU: 0 PID: 13 Comm: migration/0 Not tainted
> 5.17.0-rc5-00050-gdc6e0818bc9a #11458
> Hardware name: Samsung Exynos (Flattened Device Tree)
> Stopper: 0x0 <- 0x0
> unwind_backtrace from show_stack+0x10/0x14
> show_stack from dump_stack_lvl+0x58/0x70
> dump_stack_lvl from put_prev_task_stop+0x16c/0x25c
> put_prev_task_stop from __schedule+0x698/0x964
> __schedule from schedule+0x54/0xe0
> schedule from smpboot_thread_fn+0x218/0x288
> smpboot_thread_fn from kthread+0xf0/0x134
> kthread from ret_from_fork+0x14/0x2c
> Exception stack(0xc1ccffb0 to 0xc1ccfff8)
> ffa0: 00000000 00000000 00000000
> 00000000
> ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 00000000
> ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
> smp: Bringing up secondary CPUs ...
> CPU1: thread -1, cpu 1, socket 9, mpidr 80000901
> CPU1: Spectre v2: using BPIALL workaround
> smp: Brought up 1 node, 2 CPUs
> SMP: Total of 2 processors activated (96.00 BogoMIPS).
>
> The above log comes from ARM 32bit Samsung Exnyos4210 based Trats board.
>
>> ---
>> include/linux/cgroup.h | 2 --
>> kernel/sched/cpuacct.c | 4 +---
>> 2 files changed, 1 insertion(+), 5 deletions(-)
>>
>> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
>> index 75c151413fda..9a109c6ac0e0 100644
>> --- a/include/linux/cgroup.h
>> +++ b/include/linux/cgroup.h
>> @@ -791,11 +791,9 @@ static inline void cgroup_account_cputime(struct task_struct *task,
>>
>> cpuacct_charge(task, delta_exec);
>>
>> - rcu_read_lock();
>> cgrp = task_dfl_cgroup(task);
>> if (cgroup_parent(cgrp))
>> __cgroup_account_cputime(cgrp, delta_exec);
>> - rcu_read_unlock();
>> }
>>
>> static inline void cgroup_account_cputime_field(struct task_struct *task,
>> diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
>> index 307800586ac8..f79f88456d72 100644
>> --- a/kernel/sched/cpuacct.c
>> +++ b/kernel/sched/cpuacct.c
>> @@ -337,12 +337,10 @@ void cpuacct_charge(struct task_struct *tsk, u64 cputime)
>> unsigned int cpu = task_cpu(tsk);
>> struct cpuacct *ca;
>>
>> - rcu_read_lock();
>> + lockdep_assert_rq_held(cpu_rq(cpu));
>>
>> for (ca = task_ca(tsk); ca; ca = parent_ca(ca))
>> *per_cpu_ptr(ca->cpuusage, cpu) += cputime;
>> -
>> - rcu_read_unlock();
>> }
>>
>> /*
>
> Best regards
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock
2022-03-08 23:44 ` Paul E. McKenney
2022-03-09 0:21 ` Paul E. McKenney
@ 2022-03-10 8:45 ` Peter Zijlstra
2022-03-10 15:01 ` Paul E. McKenney
1 sibling, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2022-03-10 8:45 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Marek Szyprowski, Chengming Zhou, mingo, vincent.guittot,
bristot, zhaolei, tj, lizefan.x, hannes, linux-kernel
On Tue, Mar 08, 2022 at 03:44:03PM -0800, Paul E. McKenney wrote:
> On Wed, Mar 09, 2022 at 12:32:25AM +0100, Peter Zijlstra wrote:
> > On Wed, Mar 09, 2022 at 12:20:33AM +0100, Marek Szyprowski wrote:
> > > On 20.02.2022 06:14, Chengming Zhou wrote:
> > > > Since cpuacct_charge() is called from the scheduler update_curr(),
> > > > we must already have rq lock held, then the RCU read lock can
> > > > be optimized away.
> > > >
> > > > And do the same thing in it's wrapper cgroup_account_cputime(),
> > > > but we can't use lockdep_assert_rq_held() there, which defined
> > > > in kernel/sched/sched.h.
> > > >
> > > > Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> > >
> > > This patch landed recently in linux-next as commit dc6e0818bc9a
> > > ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I
> > > found that it triggers a following warning in the early boot stage:
> > >
> > > Calibrating delay loop (skipped), value calculated using timer
> > > frequency.. 48.00 BogoMIPS (lpj=240000)
> > > pid_max: default: 32768 minimum: 301
> > > Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> > > Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> > > CPU: Testing write buffer coherency: ok
> > > CPU0: Spectre v2: using BPIALL workaround
> > >
> > > =============================
> > > WARNING: suspicious RCU usage
> > > 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted
> > > -----------------------------
> > > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
> >
> > Arguably, with the flavours folded again, rcu_dereference_check() ought
> > to default include rcu_read_lock_sched_held() or its equivalent I
> > suppose.
> >
> > Paul?
>
> That would reduce the number of warnings, but it also would hide bugs.
>
> So, are you sure you really want this?
I don't understand... Since the flavours got merged regular RCU has it's
quescent state held off by preempt_disable. So how can relying on that
cause bugs?
And if we can rely on that, then surely rcu_dereferenced_check() ought
to play by the same rules, otherwise we get silly warnings like these at
hand.
Specifically, we removed the rcu_read_lock() here because this has
rq->lock held, which is a raw_spinlock_t which very much implies preempt
disable, on top of that, it's also an IRQ-safe lock and thus IRQs will
be disabled.
There is no possible way for RCU to make progress.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock
2022-03-10 8:45 ` Peter Zijlstra
@ 2022-03-10 15:01 ` Paul E. McKenney
2022-03-12 12:15 ` Peter Zijlstra
0 siblings, 1 reply; 18+ messages in thread
From: Paul E. McKenney @ 2022-03-10 15:01 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Marek Szyprowski, Chengming Zhou, mingo, vincent.guittot,
bristot, zhaolei, tj, lizefan.x, hannes, linux-kernel
On Thu, Mar 10, 2022 at 09:45:17AM +0100, Peter Zijlstra wrote:
> On Tue, Mar 08, 2022 at 03:44:03PM -0800, Paul E. McKenney wrote:
> > On Wed, Mar 09, 2022 at 12:32:25AM +0100, Peter Zijlstra wrote:
> > > On Wed, Mar 09, 2022 at 12:20:33AM +0100, Marek Szyprowski wrote:
> > > > On 20.02.2022 06:14, Chengming Zhou wrote:
> > > > > Since cpuacct_charge() is called from the scheduler update_curr(),
> > > > > we must already have rq lock held, then the RCU read lock can
> > > > > be optimized away.
> > > > >
> > > > > And do the same thing in it's wrapper cgroup_account_cputime(),
> > > > > but we can't use lockdep_assert_rq_held() there, which defined
> > > > > in kernel/sched/sched.h.
> > > > >
> > > > > Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
> > > >
> > > > This patch landed recently in linux-next as commit dc6e0818bc9a
> > > > ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I
> > > > found that it triggers a following warning in the early boot stage:
> > > >
> > > > Calibrating delay loop (skipped), value calculated using timer
> > > > frequency.. 48.00 BogoMIPS (lpj=240000)
> > > > pid_max: default: 32768 minimum: 301
> > > > Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> > > > Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> > > > CPU: Testing write buffer coherency: ok
> > > > CPU0: Spectre v2: using BPIALL workaround
> > > >
> > > > =============================
> > > > WARNING: suspicious RCU usage
> > > > 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted
> > > > -----------------------------
> > > > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
> > >
> > > Arguably, with the flavours folded again, rcu_dereference_check() ought
> > > to default include rcu_read_lock_sched_held() or its equivalent I
> > > suppose.
> > >
> > > Paul?
> >
> > That would reduce the number of warnings, but it also would hide bugs.
> >
> > So, are you sure you really want this?
>
> I don't understand... Since the flavours got merged regular RCU has it's
> quescent state held off by preempt_disable. So how can relying on that
> cause bugs?
Somene forgets an rcu_read_lock() and there happens to be something
like a preempt_disable() that by coincidence covers that particular
rcu_dereference(). The kernel therefore doesn't complain. That someone
goes on to other things, maybe even posthumously. Then some time later
the preempt_disable() goes away, for good and sufficient reasons.
Good luck figuring out where to put the needed rcu_read_lock() and
rcu_read_unlock().
> And if we can rely on that, then surely rcu_dereferenced_check() ought
> to play by the same rules, otherwise we get silly warnings like these at
> hand.
>
> Specifically, we removed the rcu_read_lock() here because this has
> rq->lock held, which is a raw_spinlock_t which very much implies preempt
> disable, on top of that, it's also an IRQ-safe lock and thus IRQs will
> be disabled.
>
> There is no possible way for RCU to make progress.
Then let's have that particular rcu_dereference_check() explicitly state
what it needs, which seems to be either rcu_read_lock() on the one hand.
Right now, that could be just this:
p = rcu_dereference_check(gp, rcu_read_lock_sched_held());
Or am I missing something here?
Thanx, Paul
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock
2022-03-10 15:01 ` Paul E. McKenney
@ 2022-03-12 12:15 ` Peter Zijlstra
2022-03-12 17:44 ` Paul E. McKenney
0 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2022-03-12 12:15 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Marek Szyprowski, Chengming Zhou, mingo, vincent.guittot,
bristot, zhaolei, tj, lizefan.x, hannes, linux-kernel
On Thu, Mar 10, 2022 at 07:01:52AM -0800, Paul E. McKenney wrote:
> > > > > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
> > > >
> > > > Arguably, with the flavours folded again, rcu_dereference_check() ought
> > > > to default include rcu_read_lock_sched_held() or its equivalent I
> > > > suppose.
> > > >
> > > > Paul?
> > >
> > > That would reduce the number of warnings, but it also would hide bugs.
> > >
> > > So, are you sure you really want this?
> >
> > I don't understand... Since the flavours got merged regular RCU has it's
> > quescent state held off by preempt_disable. So how can relying on that
> > cause bugs?
>
> Somene forgets an rcu_read_lock() and there happens to be something
> like a preempt_disable() that by coincidence covers that particular
> rcu_dereference(). The kernel therefore doesn't complain. That someone
> goes on to other things, maybe even posthumously. Then some time later
> the preempt_disable() goes away, for good and sufficient reasons.
>
> Good luck figuring out where to put the needed rcu_read_lock() and
> rcu_read_unlock().
Well, that's software engineering for you. Also in that case the warning
will work as expected. Then figuring out how to fix it is not the
problem of the warning -- that worked as advertised.
(also, I don't think it'll be too hard, you just gotta figure out which
object is rcu protected -- the warning gives you this, where the lookup
happens -- again the warning helps, and how long it's used for, all
relatively well definted things)
I don't see a problem. No bugs hidden.
> > And if we can rely on that, then surely rcu_dereferenced_check() ought
> > to play by the same rules, otherwise we get silly warnings like these at
> > hand.
> >
> > Specifically, we removed the rcu_read_lock() here because this has
> > rq->lock held, which is a raw_spinlock_t which very much implies preempt
> > disable, on top of that, it's also an IRQ-safe lock and thus IRQs will
> > be disabled.
> >
> > There is no possible way for RCU to make progress.
>
> Then let's have that particular rcu_dereference_check() explicitly state
> what it needs, which seems to be either rcu_read_lock() on the one hand.
> Right now, that could be just this:
>
> p = rcu_dereference_check(gp, rcu_read_lock_sched_held());
>
> Or am I missing something here?
That will work; I just don't agree with it. Per the rules of RCU it is
entirely correct to mix rcu_read_lock() and preempt_disable() (or
anything that implies the same). So I strongly feel that
rcu_dereference() should not warn about obviously correct code. Why
would we need to special case this ?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock
2022-03-12 12:15 ` Peter Zijlstra
@ 2022-03-12 17:44 ` Paul E. McKenney
0 siblings, 0 replies; 18+ messages in thread
From: Paul E. McKenney @ 2022-03-12 17:44 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Marek Szyprowski, Chengming Zhou, mingo, vincent.guittot,
bristot, zhaolei, tj, lizefan.x, hannes, linux-kernel
On Sat, Mar 12, 2022 at 01:15:33PM +0100, Peter Zijlstra wrote:
> On Thu, Mar 10, 2022 at 07:01:52AM -0800, Paul E. McKenney wrote:
>
> > > > > > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
> > > > >
> > > > > Arguably, with the flavours folded again, rcu_dereference_check() ought
> > > > > to default include rcu_read_lock_sched_held() or its equivalent I
> > > > > suppose.
> > > > >
> > > > > Paul?
> > > >
> > > > That would reduce the number of warnings, but it also would hide bugs.
> > > >
> > > > So, are you sure you really want this?
> > >
> > > I don't understand... Since the flavours got merged regular RCU has it's
> > > quescent state held off by preempt_disable. So how can relying on that
> > > cause bugs?
> >
> > Somene forgets an rcu_read_lock() and there happens to be something
> > like a preempt_disable() that by coincidence covers that particular
> > rcu_dereference(). The kernel therefore doesn't complain. That someone
> > goes on to other things, maybe even posthumously. Then some time later
> > the preempt_disable() goes away, for good and sufficient reasons.
> >
> > Good luck figuring out where to put the needed rcu_read_lock() and
> > rcu_read_unlock().
>
> Well, that's software engineering for you.
My point exactly!!!
> Also in that case the warning
> will work as expected. Then figuring out how to fix it is not the
> problem of the warning -- that worked as advertised.
>
> (also, I don't think it'll be too hard, you just gotta figure out which
> object is rcu protected -- the warning gives you this, where the lookup
> happens -- again the warning helps, and how long it's used for, all
> relatively well definted things)
Without in any way agreeing with that assessment of difficulty, especially
in the general case... It is -way- easier just to tell RCU what your
design rules are for the code in question.
> I don't see a problem. No bugs hidden.
C'mon, Peter!
There really was a bug hidden. That someone intended to add some
calls to rcu_read_lock() and rcu_read_unlock() in the proper places.
Their failure to add them really was a bug.
That bug was hidden by: (1) There being a preempt_disable() or
whatever that by coincidence happened to be covering the part of the
code containing the rcu_dereference() and (2) Your proposed change that
would make rcu_dereference() unable to detect that bug.
And that bug can be quite bad. Given your proposed change, RCU
cannot detect this bug:
/* Preemption is enabled. */
/* There should be an rcu_read_lock() here. */
preempt_disable();
p = rcu_dereference(gp);
do_something_with(p);
preempt_enable();
/* Without the rcu_read_lock(), *p is history. */
do_something_else_with(p);
/* There should be an rcu_read_unlock() here. */
> > > And if we can rely on that, then surely rcu_dereferenced_check() ought
> > > to play by the same rules, otherwise we get silly warnings like these at
> > > hand.
> > >
> > > Specifically, we removed the rcu_read_lock() here because this has
> > > rq->lock held, which is a raw_spinlock_t which very much implies preempt
> > > disable, on top of that, it's also an IRQ-safe lock and thus IRQs will
> > > be disabled.
> > >
> > > There is no possible way for RCU to make progress.
> >
> > Then let's have that particular rcu_dereference_check() explicitly state
> > what it needs, which seems to be either rcu_read_lock() on the one hand.
> > Right now, that could be just this:
> >
> > p = rcu_dereference_check(gp, rcu_read_lock_sched_held());
> >
> > Or am I missing something here?
>
> That will work; I just don't agree with it. Per the rules of RCU it is
> entirely correct to mix rcu_read_lock() and preempt_disable() (or
> anything that implies the same). So I strongly feel that
> rcu_dereference() should not warn about obviously correct code. Why
> would we need to special case this ?
This use case might well be entirely correct, but it is most certainly
not the common case.
Therefore, my answer to this requested chance in rcu_dereference()
semantics is "no".
Thanx, Paul
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2022-03-12 17:45 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-20 5:14 [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Chengming Zhou
2022-02-20 5:14 ` [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock Chengming Zhou
2022-03-01 15:24 ` [tip: sched/core] sched/cpuacct: Optimize " tip-bot2 for Chengming Zhou
[not found] ` <CGME20220308232034eucas1p2b0f39cee0f462af6004ebdfbe5bacb9f@eucas1p2.samsung.com>
2022-03-08 23:20 ` [PATCH v3 2/3] sched/cpuacct: optimize " Marek Szyprowski
2022-03-08 23:32 ` Peter Zijlstra
2022-03-08 23:44 ` Paul E. McKenney
2022-03-09 0:21 ` Paul E. McKenney
2022-03-10 8:45 ` Peter Zijlstra
2022-03-10 15:01 ` Paul E. McKenney
2022-03-12 12:15 ` Peter Zijlstra
2022-03-12 17:44 ` Paul E. McKenney
2022-03-09 3:08 ` [External] " Chengming Zhou
2022-02-20 5:14 ` [PATCH v3 3/3] sched/cpuacct: remove redundant " Chengming Zhou
2022-03-01 15:24 ` [tip: sched/core] sched/cpuacct: Remove " tip-bot2 for Chengming Zhou
[not found] ` <CGME20220308233107eucas1p119a2f5a8d4f5b5eec38ea8dde92b3368@eucas1p1.samsung.com>
2022-03-08 23:31 ` [PATCH v3 3/3] sched/cpuacct: remove " Marek Szyprowski
2022-02-22 18:01 ` [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Tejun Heo
2022-02-23 9:11 ` Peter Zijlstra
2022-03-01 15:24 ` [tip: sched/core] sched/cpuacct: Fix " tip-bot2 for Chengming Zhou
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).