* [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage @ 2022-02-20 5:14 Chengming Zhou 2022-02-20 5:14 ` [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock Chengming Zhou ` (3 more replies) 0 siblings, 4 replies; 18+ messages in thread From: Chengming Zhou @ 2022-02-20 5:14 UTC (permalink / raw) To: mingo, peterz, vincent.guittot, bristot, zhaolei, tj, lizefan.x, hannes Cc: linux-kernel, Chengming Zhou, Minye Zhu The cpuacct_account_field() is always called by the current task itself, so it's ok to use __this_cpu_add() to charge the tick time. But cpuacct_charge() maybe called by update_curr() in load_balance() on a random CPU, different from the CPU on which the task is running. So __this_cpu_add() will charge that cputime to a random incorrect CPU. Fixes: 73e6aafd9ea8 ("sched/cpuacct: Simplify the cpuacct code") Reported-by: Minye Zhu <zhuminye@bytedance.com> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> --- kernel/sched/cpuacct.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c index 3d06c5e4220d..307800586ac8 100644 --- a/kernel/sched/cpuacct.c +++ b/kernel/sched/cpuacct.c @@ -334,12 +334,13 @@ static struct cftype files[] = { */ void cpuacct_charge(struct task_struct *tsk, u64 cputime) { + unsigned int cpu = task_cpu(tsk); struct cpuacct *ca; rcu_read_lock(); for (ca = task_ca(tsk); ca; ca = parent_ca(ca)) - __this_cpu_add(*ca->cpuusage, cputime); + *per_cpu_ptr(ca->cpuusage, cpu) += cputime; rcu_read_unlock(); } -- 2.20.1 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock 2022-02-20 5:14 [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Chengming Zhou @ 2022-02-20 5:14 ` Chengming Zhou 2022-03-01 15:24 ` [tip: sched/core] sched/cpuacct: Optimize " tip-bot2 for Chengming Zhou [not found] ` <CGME20220308232034eucas1p2b0f39cee0f462af6004ebdfbe5bacb9f@eucas1p2.samsung.com> 2022-02-20 5:14 ` [PATCH v3 3/3] sched/cpuacct: remove redundant " Chengming Zhou ` (2 subsequent siblings) 3 siblings, 2 replies; 18+ messages in thread From: Chengming Zhou @ 2022-02-20 5:14 UTC (permalink / raw) To: mingo, peterz, vincent.guittot, bristot, zhaolei, tj, lizefan.x, hannes Cc: linux-kernel, Chengming Zhou Since cpuacct_charge() is called from the scheduler update_curr(), we must already have rq lock held, then the RCU read lock can be optimized away. And do the same thing in it's wrapper cgroup_account_cputime(), but we can't use lockdep_assert_rq_held() there, which defined in kernel/sched/sched.h. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> --- include/linux/cgroup.h | 2 -- kernel/sched/cpuacct.c | 4 +--- 2 files changed, 1 insertion(+), 5 deletions(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 75c151413fda..9a109c6ac0e0 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -791,11 +791,9 @@ static inline void cgroup_account_cputime(struct task_struct *task, cpuacct_charge(task, delta_exec); - rcu_read_lock(); cgrp = task_dfl_cgroup(task); if (cgroup_parent(cgrp)) __cgroup_account_cputime(cgrp, delta_exec); - rcu_read_unlock(); } static inline void cgroup_account_cputime_field(struct task_struct *task, diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c index 307800586ac8..f79f88456d72 100644 --- a/kernel/sched/cpuacct.c +++ b/kernel/sched/cpuacct.c @@ -337,12 +337,10 @@ void cpuacct_charge(struct task_struct *tsk, u64 cputime) unsigned int cpu = task_cpu(tsk); struct cpuacct *ca; - rcu_read_lock(); + lockdep_assert_rq_held(cpu_rq(cpu)); for (ca = task_ca(tsk); ca; ca = parent_ca(ca)) *per_cpu_ptr(ca->cpuusage, cpu) += cputime; - - rcu_read_unlock(); } /* -- 2.20.1 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [tip: sched/core] sched/cpuacct: Optimize away RCU read lock 2022-02-20 5:14 ` [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock Chengming Zhou @ 2022-03-01 15:24 ` tip-bot2 for Chengming Zhou [not found] ` <CGME20220308232034eucas1p2b0f39cee0f462af6004ebdfbe5bacb9f@eucas1p2.samsung.com> 1 sibling, 0 replies; 18+ messages in thread From: tip-bot2 for Chengming Zhou @ 2022-03-01 15:24 UTC (permalink / raw) To: linux-tip-commits Cc: Peter Zijlstra (Intel), Chengming Zhou, x86, linux-kernel The following commit has been merged into the sched/core branch of tip: Commit-ID: dc6e0818bc9a0336d9accf3ea35d146d72aa7a18 Gitweb: https://git.kernel.org/tip/dc6e0818bc9a0336d9accf3ea35d146d72aa7a18 Author: Chengming Zhou <zhouchengming@bytedance.com> AuthorDate: Sun, 20 Feb 2022 13:14:25 +08:00 Committer: Peter Zijlstra <peterz@infradead.org> CommitterDate: Tue, 01 Mar 2022 16:18:38 +01:00 sched/cpuacct: Optimize away RCU read lock Since cpuacct_charge() is called from the scheduler update_curr(), we must already have rq lock held, then the RCU read lock can be optimized away. And do the same thing in it's wrapper cgroup_account_cputime(), but we can't use lockdep_assert_rq_held() there, which defined in kernel/sched/sched.h. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220220051426.5274-2-zhouchengming@bytedance.com --- include/linux/cgroup.h | 2 -- kernel/sched/cpuacct.c | 4 +--- 2 files changed, 1 insertion(+), 5 deletions(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 75c1514..9a109c6 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -791,11 +791,9 @@ static inline void cgroup_account_cputime(struct task_struct *task, cpuacct_charge(task, delta_exec); - rcu_read_lock(); cgrp = task_dfl_cgroup(task); if (cgroup_parent(cgrp)) __cgroup_account_cputime(cgrp, delta_exec); - rcu_read_unlock(); } static inline void cgroup_account_cputime_field(struct task_struct *task, diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c index 3078005..f79f884 100644 --- a/kernel/sched/cpuacct.c +++ b/kernel/sched/cpuacct.c @@ -337,12 +337,10 @@ void cpuacct_charge(struct task_struct *tsk, u64 cputime) unsigned int cpu = task_cpu(tsk); struct cpuacct *ca; - rcu_read_lock(); + lockdep_assert_rq_held(cpu_rq(cpu)); for (ca = task_ca(tsk); ca; ca = parent_ca(ca)) *per_cpu_ptr(ca->cpuusage, cpu) += cputime; - - rcu_read_unlock(); } /* ^ permalink raw reply related [flat|nested] 18+ messages in thread
[parent not found: <CGME20220308232034eucas1p2b0f39cee0f462af6004ebdfbe5bacb9f@eucas1p2.samsung.com>]
* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock [not found] ` <CGME20220308232034eucas1p2b0f39cee0f462af6004ebdfbe5bacb9f@eucas1p2.samsung.com> @ 2022-03-08 23:20 ` Marek Szyprowski 2022-03-08 23:32 ` Peter Zijlstra 2022-03-09 3:08 ` [External] " Chengming Zhou 0 siblings, 2 replies; 18+ messages in thread From: Marek Szyprowski @ 2022-03-08 23:20 UTC (permalink / raw) To: Chengming Zhou, mingo, peterz, vincent.guittot, bristot, zhaolei, tj, lizefan.x, hannes Cc: linux-kernel On 20.02.2022 06:14, Chengming Zhou wrote: > Since cpuacct_charge() is called from the scheduler update_curr(), > we must already have rq lock held, then the RCU read lock can > be optimized away. > > And do the same thing in it's wrapper cgroup_account_cputime(), > but we can't use lockdep_assert_rq_held() there, which defined > in kernel/sched/sched.h. > > Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> This patch landed recently in linux-next as commit dc6e0818bc9a ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I found that it triggers a following warning in the early boot stage: Calibrating delay loop (skipped), value calculated using timer frequency.. 48.00 BogoMIPS (lpj=240000) pid_max: default: 32768 minimum: 301 Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear) Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear) CPU: Testing write buffer coherency: ok CPU0: Spectre v2: using BPIALL workaround ============================= WARNING: suspicious RCU usage 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted ----------------------------- ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 1 2 locks held by kthreadd/2: #0: c1d7972c (&p->pi_lock){....}-{2:2}, at: task_rq_lock+0x30/0x118 #1: ef7b52d0 (&rq->__lock){-...}-{2:2}, at: raw_spin_rq_lock_nested+0x24/0x34 stack backtrace: CPU: 0 PID: 2 Comm: kthreadd Not tainted 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Hardware name: Samsung Exynos (Flattened Device Tree) unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x58/0x70 dump_stack_lvl from update_curr+0x1bc/0x35c update_curr from dequeue_task_fair+0xb0/0x8e8 dequeue_task_fair from __do_set_cpus_allowed+0x19c/0x258 __do_set_cpus_allowed from __set_cpus_allowed_ptr_locked+0x130/0x1d8 __set_cpus_allowed_ptr_locked from __set_cpus_allowed_ptr+0x48/0x64 __set_cpus_allowed_ptr from kthreadd+0x44/0x16c kthreadd from ret_from_fork+0x14/0x2c Exception stack(0xc1cb9fb0 to 0xc1cb9ff8) 9fa0: 00000000 00000000 00000000 00000000 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 CPU0: thread -1, cpu 0, socket 9, mpidr 80000900 cblist_init_generic: Setting adjustable number of callback queues. cblist_init_generic: Setting shift to 1 and lim to 1. Running RCU-tasks wait API self tests Setting up static identity map for 0x40100000 - 0x40100060 rcu: Hierarchical SRCU implementation. ============================= WARNING: suspicious RCU usage 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted ----------------------------- ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 1 1 lock held by migration/0/13: #0: ef7b52d0 (&rq->__lock){-...}-{2:2}, at: raw_spin_rq_lock_nested+0x24/0x34 stack backtrace: CPU: 0 PID: 13 Comm: migration/0 Not tainted 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Hardware name: Samsung Exynos (Flattened Device Tree) Stopper: 0x0 <- 0x0 unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x58/0x70 dump_stack_lvl from put_prev_task_stop+0x16c/0x25c put_prev_task_stop from __schedule+0x698/0x964 __schedule from schedule+0x54/0xe0 schedule from smpboot_thread_fn+0x218/0x288 smpboot_thread_fn from kthread+0xf0/0x134 kthread from ret_from_fork+0x14/0x2c Exception stack(0xc1ccffb0 to 0xc1ccfff8) ffa0: 00000000 00000000 00000000 00000000 ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 smp: Bringing up secondary CPUs ... CPU1: thread -1, cpu 1, socket 9, mpidr 80000901 CPU1: Spectre v2: using BPIALL workaround smp: Brought up 1 node, 2 CPUs SMP: Total of 2 processors activated (96.00 BogoMIPS). The above log comes from ARM 32bit Samsung Exnyos4210 based Trats board. > --- > include/linux/cgroup.h | 2 -- > kernel/sched/cpuacct.c | 4 +--- > 2 files changed, 1 insertion(+), 5 deletions(-) > > diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h > index 75c151413fda..9a109c6ac0e0 100644 > --- a/include/linux/cgroup.h > +++ b/include/linux/cgroup.h > @@ -791,11 +791,9 @@ static inline void cgroup_account_cputime(struct task_struct *task, > > cpuacct_charge(task, delta_exec); > > - rcu_read_lock(); > cgrp = task_dfl_cgroup(task); > if (cgroup_parent(cgrp)) > __cgroup_account_cputime(cgrp, delta_exec); > - rcu_read_unlock(); > } > > static inline void cgroup_account_cputime_field(struct task_struct *task, > diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c > index 307800586ac8..f79f88456d72 100644 > --- a/kernel/sched/cpuacct.c > +++ b/kernel/sched/cpuacct.c > @@ -337,12 +337,10 @@ void cpuacct_charge(struct task_struct *tsk, u64 cputime) > unsigned int cpu = task_cpu(tsk); > struct cpuacct *ca; > > - rcu_read_lock(); > + lockdep_assert_rq_held(cpu_rq(cpu)); > > for (ca = task_ca(tsk); ca; ca = parent_ca(ca)) > *per_cpu_ptr(ca->cpuusage, cpu) += cputime; > - > - rcu_read_unlock(); > } > > /* Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock 2022-03-08 23:20 ` [PATCH v3 2/3] sched/cpuacct: optimize " Marek Szyprowski @ 2022-03-08 23:32 ` Peter Zijlstra 2022-03-08 23:44 ` Paul E. McKenney 2022-03-09 3:08 ` [External] " Chengming Zhou 1 sibling, 1 reply; 18+ messages in thread From: Peter Zijlstra @ 2022-03-08 23:32 UTC (permalink / raw) To: Marek Szyprowski Cc: Chengming Zhou, mingo, vincent.guittot, bristot, zhaolei, tj, lizefan.x, hannes, linux-kernel, Paul McKenney On Wed, Mar 09, 2022 at 12:20:33AM +0100, Marek Szyprowski wrote: > On 20.02.2022 06:14, Chengming Zhou wrote: > > Since cpuacct_charge() is called from the scheduler update_curr(), > > we must already have rq lock held, then the RCU read lock can > > be optimized away. > > > > And do the same thing in it's wrapper cgroup_account_cputime(), > > but we can't use lockdep_assert_rq_held() there, which defined > > in kernel/sched/sched.h. > > > > Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> > > This patch landed recently in linux-next as commit dc6e0818bc9a > ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I > found that it triggers a following warning in the early boot stage: > > Calibrating delay loop (skipped), value calculated using timer > frequency.. 48.00 BogoMIPS (lpj=240000) > pid_max: default: 32768 minimum: 301 > Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear) > Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear) > CPU: Testing write buffer coherency: ok > CPU0: Spectre v2: using BPIALL workaround > > ============================= > WARNING: suspicious RCU usage > 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted > ----------------------------- > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! Arguably, with the flavours folded again, rcu_dereference_check() ought to default include rcu_read_lock_sched_held() or its equivalent I suppose. Paul? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock 2022-03-08 23:32 ` Peter Zijlstra @ 2022-03-08 23:44 ` Paul E. McKenney 2022-03-09 0:21 ` Paul E. McKenney 2022-03-10 8:45 ` Peter Zijlstra 0 siblings, 2 replies; 18+ messages in thread From: Paul E. McKenney @ 2022-03-08 23:44 UTC (permalink / raw) To: Peter Zijlstra Cc: Marek Szyprowski, Chengming Zhou, mingo, vincent.guittot, bristot, zhaolei, tj, lizefan.x, hannes, linux-kernel On Wed, Mar 09, 2022 at 12:32:25AM +0100, Peter Zijlstra wrote: > On Wed, Mar 09, 2022 at 12:20:33AM +0100, Marek Szyprowski wrote: > > On 20.02.2022 06:14, Chengming Zhou wrote: > > > Since cpuacct_charge() is called from the scheduler update_curr(), > > > we must already have rq lock held, then the RCU read lock can > > > be optimized away. > > > > > > And do the same thing in it's wrapper cgroup_account_cputime(), > > > but we can't use lockdep_assert_rq_held() there, which defined > > > in kernel/sched/sched.h. > > > > > > Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> > > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> > > > > This patch landed recently in linux-next as commit dc6e0818bc9a > > ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I > > found that it triggers a following warning in the early boot stage: > > > > Calibrating delay loop (skipped), value calculated using timer > > frequency.. 48.00 BogoMIPS (lpj=240000) > > pid_max: default: 32768 minimum: 301 > > Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear) > > Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear) > > CPU: Testing write buffer coherency: ok > > CPU0: Spectre v2: using BPIALL workaround > > > > ============================= > > WARNING: suspicious RCU usage > > 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted > > ----------------------------- > > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! > > Arguably, with the flavours folded again, rcu_dereference_check() ought > to default include rcu_read_lock_sched_held() or its equivalent I > suppose. > > Paul? That would reduce the number of warnings, but it also would hide bugs. So, are you sure you really want this? Thanx, Paul ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock 2022-03-08 23:44 ` Paul E. McKenney @ 2022-03-09 0:21 ` Paul E. McKenney 2022-03-10 8:45 ` Peter Zijlstra 1 sibling, 0 replies; 18+ messages in thread From: Paul E. McKenney @ 2022-03-09 0:21 UTC (permalink / raw) To: Peter Zijlstra Cc: Marek Szyprowski, Chengming Zhou, mingo, vincent.guittot, bristot, zhaolei, tj, lizefan.x, hannes, linux-kernel On Tue, Mar 08, 2022 at 03:44:03PM -0800, Paul E. McKenney wrote: > On Wed, Mar 09, 2022 at 12:32:25AM +0100, Peter Zijlstra wrote: > > On Wed, Mar 09, 2022 at 12:20:33AM +0100, Marek Szyprowski wrote: > > > On 20.02.2022 06:14, Chengming Zhou wrote: > > > > Since cpuacct_charge() is called from the scheduler update_curr(), > > > > we must already have rq lock held, then the RCU read lock can > > > > be optimized away. > > > > > > > > And do the same thing in it's wrapper cgroup_account_cputime(), > > > > but we can't use lockdep_assert_rq_held() there, which defined > > > > in kernel/sched/sched.h. > > > > > > > > Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> > > > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> > > > > > > This patch landed recently in linux-next as commit dc6e0818bc9a > > > ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I > > > found that it triggers a following warning in the early boot stage: > > > > > > Calibrating delay loop (skipped), value calculated using timer > > > frequency.. 48.00 BogoMIPS (lpj=240000) > > > pid_max: default: 32768 minimum: 301 > > > Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear) > > > Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear) > > > CPU: Testing write buffer coherency: ok > > > CPU0: Spectre v2: using BPIALL workaround > > > > > > ============================= > > > WARNING: suspicious RCU usage > > > 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted > > > ----------------------------- > > > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! > > > > Arguably, with the flavours folded again, rcu_dereference_check() ought > > to default include rcu_read_lock_sched_held() or its equivalent I > > suppose. > > > > Paul? > > That would reduce the number of warnings, but it also would hide bugs. > > So, are you sure you really want this? Of course, if you are designing a use case that really expects multiple types of readers... Another approach might be rcu_dereference_brs(), but hopefully with a better name, that checks for either rcu_read_lock(), disabled BH, or disabled preemption. Or if you are looking only for rcu_read_lock() or disabled preemption, rcu_dereference_rs(), again hopefully with a better name. Is that what you had in mind? Thanx, Paul ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock 2022-03-08 23:44 ` Paul E. McKenney 2022-03-09 0:21 ` Paul E. McKenney @ 2022-03-10 8:45 ` Peter Zijlstra 2022-03-10 15:01 ` Paul E. McKenney 1 sibling, 1 reply; 18+ messages in thread From: Peter Zijlstra @ 2022-03-10 8:45 UTC (permalink / raw) To: Paul E. McKenney Cc: Marek Szyprowski, Chengming Zhou, mingo, vincent.guittot, bristot, zhaolei, tj, lizefan.x, hannes, linux-kernel On Tue, Mar 08, 2022 at 03:44:03PM -0800, Paul E. McKenney wrote: > On Wed, Mar 09, 2022 at 12:32:25AM +0100, Peter Zijlstra wrote: > > On Wed, Mar 09, 2022 at 12:20:33AM +0100, Marek Szyprowski wrote: > > > On 20.02.2022 06:14, Chengming Zhou wrote: > > > > Since cpuacct_charge() is called from the scheduler update_curr(), > > > > we must already have rq lock held, then the RCU read lock can > > > > be optimized away. > > > > > > > > And do the same thing in it's wrapper cgroup_account_cputime(), > > > > but we can't use lockdep_assert_rq_held() there, which defined > > > > in kernel/sched/sched.h. > > > > > > > > Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> > > > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> > > > > > > This patch landed recently in linux-next as commit dc6e0818bc9a > > > ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I > > > found that it triggers a following warning in the early boot stage: > > > > > > Calibrating delay loop (skipped), value calculated using timer > > > frequency.. 48.00 BogoMIPS (lpj=240000) > > > pid_max: default: 32768 minimum: 301 > > > Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear) > > > Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear) > > > CPU: Testing write buffer coherency: ok > > > CPU0: Spectre v2: using BPIALL workaround > > > > > > ============================= > > > WARNING: suspicious RCU usage > > > 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted > > > ----------------------------- > > > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! > > > > Arguably, with the flavours folded again, rcu_dereference_check() ought > > to default include rcu_read_lock_sched_held() or its equivalent I > > suppose. > > > > Paul? > > That would reduce the number of warnings, but it also would hide bugs. > > So, are you sure you really want this? I don't understand... Since the flavours got merged regular RCU has it's quescent state held off by preempt_disable. So how can relying on that cause bugs? And if we can rely on that, then surely rcu_dereferenced_check() ought to play by the same rules, otherwise we get silly warnings like these at hand. Specifically, we removed the rcu_read_lock() here because this has rq->lock held, which is a raw_spinlock_t which very much implies preempt disable, on top of that, it's also an IRQ-safe lock and thus IRQs will be disabled. There is no possible way for RCU to make progress. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock 2022-03-10 8:45 ` Peter Zijlstra @ 2022-03-10 15:01 ` Paul E. McKenney 2022-03-12 12:15 ` Peter Zijlstra 0 siblings, 1 reply; 18+ messages in thread From: Paul E. McKenney @ 2022-03-10 15:01 UTC (permalink / raw) To: Peter Zijlstra Cc: Marek Szyprowski, Chengming Zhou, mingo, vincent.guittot, bristot, zhaolei, tj, lizefan.x, hannes, linux-kernel On Thu, Mar 10, 2022 at 09:45:17AM +0100, Peter Zijlstra wrote: > On Tue, Mar 08, 2022 at 03:44:03PM -0800, Paul E. McKenney wrote: > > On Wed, Mar 09, 2022 at 12:32:25AM +0100, Peter Zijlstra wrote: > > > On Wed, Mar 09, 2022 at 12:20:33AM +0100, Marek Szyprowski wrote: > > > > On 20.02.2022 06:14, Chengming Zhou wrote: > > > > > Since cpuacct_charge() is called from the scheduler update_curr(), > > > > > we must already have rq lock held, then the RCU read lock can > > > > > be optimized away. > > > > > > > > > > And do the same thing in it's wrapper cgroup_account_cputime(), > > > > > but we can't use lockdep_assert_rq_held() there, which defined > > > > > in kernel/sched/sched.h. > > > > > > > > > > Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> > > > > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> > > > > > > > > This patch landed recently in linux-next as commit dc6e0818bc9a > > > > ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I > > > > found that it triggers a following warning in the early boot stage: > > > > > > > > Calibrating delay loop (skipped), value calculated using timer > > > > frequency.. 48.00 BogoMIPS (lpj=240000) > > > > pid_max: default: 32768 minimum: 301 > > > > Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear) > > > > Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear) > > > > CPU: Testing write buffer coherency: ok > > > > CPU0: Spectre v2: using BPIALL workaround > > > > > > > > ============================= > > > > WARNING: suspicious RCU usage > > > > 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted > > > > ----------------------------- > > > > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! > > > > > > Arguably, with the flavours folded again, rcu_dereference_check() ought > > > to default include rcu_read_lock_sched_held() or its equivalent I > > > suppose. > > > > > > Paul? > > > > That would reduce the number of warnings, but it also would hide bugs. > > > > So, are you sure you really want this? > > I don't understand... Since the flavours got merged regular RCU has it's > quescent state held off by preempt_disable. So how can relying on that > cause bugs? Somene forgets an rcu_read_lock() and there happens to be something like a preempt_disable() that by coincidence covers that particular rcu_dereference(). The kernel therefore doesn't complain. That someone goes on to other things, maybe even posthumously. Then some time later the preempt_disable() goes away, for good and sufficient reasons. Good luck figuring out where to put the needed rcu_read_lock() and rcu_read_unlock(). > And if we can rely on that, then surely rcu_dereferenced_check() ought > to play by the same rules, otherwise we get silly warnings like these at > hand. > > Specifically, we removed the rcu_read_lock() here because this has > rq->lock held, which is a raw_spinlock_t which very much implies preempt > disable, on top of that, it's also an IRQ-safe lock and thus IRQs will > be disabled. > > There is no possible way for RCU to make progress. Then let's have that particular rcu_dereference_check() explicitly state what it needs, which seems to be either rcu_read_lock() on the one hand. Right now, that could be just this: p = rcu_dereference_check(gp, rcu_read_lock_sched_held()); Or am I missing something here? Thanx, Paul ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock 2022-03-10 15:01 ` Paul E. McKenney @ 2022-03-12 12:15 ` Peter Zijlstra 2022-03-12 17:44 ` Paul E. McKenney 0 siblings, 1 reply; 18+ messages in thread From: Peter Zijlstra @ 2022-03-12 12:15 UTC (permalink / raw) To: Paul E. McKenney Cc: Marek Szyprowski, Chengming Zhou, mingo, vincent.guittot, bristot, zhaolei, tj, lizefan.x, hannes, linux-kernel On Thu, Mar 10, 2022 at 07:01:52AM -0800, Paul E. McKenney wrote: > > > > > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! > > > > > > > > Arguably, with the flavours folded again, rcu_dereference_check() ought > > > > to default include rcu_read_lock_sched_held() or its equivalent I > > > > suppose. > > > > > > > > Paul? > > > > > > That would reduce the number of warnings, but it also would hide bugs. > > > > > > So, are you sure you really want this? > > > > I don't understand... Since the flavours got merged regular RCU has it's > > quescent state held off by preempt_disable. So how can relying on that > > cause bugs? > > Somene forgets an rcu_read_lock() and there happens to be something > like a preempt_disable() that by coincidence covers that particular > rcu_dereference(). The kernel therefore doesn't complain. That someone > goes on to other things, maybe even posthumously. Then some time later > the preempt_disable() goes away, for good and sufficient reasons. > > Good luck figuring out where to put the needed rcu_read_lock() and > rcu_read_unlock(). Well, that's software engineering for you. Also in that case the warning will work as expected. Then figuring out how to fix it is not the problem of the warning -- that worked as advertised. (also, I don't think it'll be too hard, you just gotta figure out which object is rcu protected -- the warning gives you this, where the lookup happens -- again the warning helps, and how long it's used for, all relatively well definted things) I don't see a problem. No bugs hidden. > > And if we can rely on that, then surely rcu_dereferenced_check() ought > > to play by the same rules, otherwise we get silly warnings like these at > > hand. > > > > Specifically, we removed the rcu_read_lock() here because this has > > rq->lock held, which is a raw_spinlock_t which very much implies preempt > > disable, on top of that, it's also an IRQ-safe lock and thus IRQs will > > be disabled. > > > > There is no possible way for RCU to make progress. > > Then let's have that particular rcu_dereference_check() explicitly state > what it needs, which seems to be either rcu_read_lock() on the one hand. > Right now, that could be just this: > > p = rcu_dereference_check(gp, rcu_read_lock_sched_held()); > > Or am I missing something here? That will work; I just don't agree with it. Per the rules of RCU it is entirely correct to mix rcu_read_lock() and preempt_disable() (or anything that implies the same). So I strongly feel that rcu_dereference() should not warn about obviously correct code. Why would we need to special case this ? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock 2022-03-12 12:15 ` Peter Zijlstra @ 2022-03-12 17:44 ` Paul E. McKenney 0 siblings, 0 replies; 18+ messages in thread From: Paul E. McKenney @ 2022-03-12 17:44 UTC (permalink / raw) To: Peter Zijlstra Cc: Marek Szyprowski, Chengming Zhou, mingo, vincent.guittot, bristot, zhaolei, tj, lizefan.x, hannes, linux-kernel On Sat, Mar 12, 2022 at 01:15:33PM +0100, Peter Zijlstra wrote: > On Thu, Mar 10, 2022 at 07:01:52AM -0800, Paul E. McKenney wrote: > > > > > > > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! > > > > > > > > > > Arguably, with the flavours folded again, rcu_dereference_check() ought > > > > > to default include rcu_read_lock_sched_held() or its equivalent I > > > > > suppose. > > > > > > > > > > Paul? > > > > > > > > That would reduce the number of warnings, but it also would hide bugs. > > > > > > > > So, are you sure you really want this? > > > > > > I don't understand... Since the flavours got merged regular RCU has it's > > > quescent state held off by preempt_disable. So how can relying on that > > > cause bugs? > > > > Somene forgets an rcu_read_lock() and there happens to be something > > like a preempt_disable() that by coincidence covers that particular > > rcu_dereference(). The kernel therefore doesn't complain. That someone > > goes on to other things, maybe even posthumously. Then some time later > > the preempt_disable() goes away, for good and sufficient reasons. > > > > Good luck figuring out where to put the needed rcu_read_lock() and > > rcu_read_unlock(). > > Well, that's software engineering for you. My point exactly!!! > Also in that case the warning > will work as expected. Then figuring out how to fix it is not the > problem of the warning -- that worked as advertised. > > (also, I don't think it'll be too hard, you just gotta figure out which > object is rcu protected -- the warning gives you this, where the lookup > happens -- again the warning helps, and how long it's used for, all > relatively well definted things) Without in any way agreeing with that assessment of difficulty, especially in the general case... It is -way- easier just to tell RCU what your design rules are for the code in question. > I don't see a problem. No bugs hidden. C'mon, Peter! There really was a bug hidden. That someone intended to add some calls to rcu_read_lock() and rcu_read_unlock() in the proper places. Their failure to add them really was a bug. That bug was hidden by: (1) There being a preempt_disable() or whatever that by coincidence happened to be covering the part of the code containing the rcu_dereference() and (2) Your proposed change that would make rcu_dereference() unable to detect that bug. And that bug can be quite bad. Given your proposed change, RCU cannot detect this bug: /* Preemption is enabled. */ /* There should be an rcu_read_lock() here. */ preempt_disable(); p = rcu_dereference(gp); do_something_with(p); preempt_enable(); /* Without the rcu_read_lock(), *p is history. */ do_something_else_with(p); /* There should be an rcu_read_unlock() here. */ > > > And if we can rely on that, then surely rcu_dereferenced_check() ought > > > to play by the same rules, otherwise we get silly warnings like these at > > > hand. > > > > > > Specifically, we removed the rcu_read_lock() here because this has > > > rq->lock held, which is a raw_spinlock_t which very much implies preempt > > > disable, on top of that, it's also an IRQ-safe lock and thus IRQs will > > > be disabled. > > > > > > There is no possible way for RCU to make progress. > > > > Then let's have that particular rcu_dereference_check() explicitly state > > what it needs, which seems to be either rcu_read_lock() on the one hand. > > Right now, that could be just this: > > > > p = rcu_dereference_check(gp, rcu_read_lock_sched_held()); > > > > Or am I missing something here? > > That will work; I just don't agree with it. Per the rules of RCU it is > entirely correct to mix rcu_read_lock() and preempt_disable() (or > anything that implies the same). So I strongly feel that > rcu_dereference() should not warn about obviously correct code. Why > would we need to special case this ? This use case might well be entirely correct, but it is most certainly not the common case. Therefore, my answer to this requested chance in rcu_dereference() semantics is "no". Thanx, Paul ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [External] Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock 2022-03-08 23:20 ` [PATCH v3 2/3] sched/cpuacct: optimize " Marek Szyprowski 2022-03-08 23:32 ` Peter Zijlstra @ 2022-03-09 3:08 ` Chengming Zhou 1 sibling, 0 replies; 18+ messages in thread From: Chengming Zhou @ 2022-03-09 3:08 UTC (permalink / raw) To: Marek Szyprowski, mingo, peterz, vincent.guittot, bristot, zhaolei, tj, lizefan.x, hannes Cc: linux-kernel On 2022/3/9 7:20 上午, Marek Szyprowski wrote: > On 20.02.2022 06:14, Chengming Zhou wrote: >> Since cpuacct_charge() is called from the scheduler update_curr(), >> we must already have rq lock held, then the RCU read lock can >> be optimized away. >> >> And do the same thing in it's wrapper cgroup_account_cputime(), >> but we can't use lockdep_assert_rq_held() there, which defined >> in kernel/sched/sched.h. >> >> Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> >> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> > > This patch landed recently in linux-next as commit dc6e0818bc9a > ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I > found that it triggers a following warning in the early boot stage: Hi, thanks for the report. I've send a fix patch[1] for review. [1] https://lore.kernel.org/lkml/20220305034103.57123-1-zhouchengming@bytedance.com/ > > Calibrating delay loop (skipped), value calculated using timer > frequency.. 48.00 BogoMIPS (lpj=240000) > pid_max: default: 32768 minimum: 301 > Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear) > Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear) > CPU: Testing write buffer coherency: ok > CPU0: Spectre v2: using BPIALL workaround > > ============================= > WARNING: suspicious RCU usage > 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted > ----------------------------- > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! > > other info that might help us debug this: > > > rcu_scheduler_active = 1, debug_locks = 1 > 2 locks held by kthreadd/2: > #0: c1d7972c (&p->pi_lock){....}-{2:2}, at: task_rq_lock+0x30/0x118 > #1: ef7b52d0 (&rq->__lock){-...}-{2:2}, at: > raw_spin_rq_lock_nested+0x24/0x34 > > stack backtrace: > CPU: 0 PID: 2 Comm: kthreadd Not tainted 5.17.0-rc5-00050-gdc6e0818bc9a > #11458 > Hardware name: Samsung Exynos (Flattened Device Tree) > unwind_backtrace from show_stack+0x10/0x14 > show_stack from dump_stack_lvl+0x58/0x70 > dump_stack_lvl from update_curr+0x1bc/0x35c > update_curr from dequeue_task_fair+0xb0/0x8e8 > dequeue_task_fair from __do_set_cpus_allowed+0x19c/0x258 > __do_set_cpus_allowed from __set_cpus_allowed_ptr_locked+0x130/0x1d8 > __set_cpus_allowed_ptr_locked from __set_cpus_allowed_ptr+0x48/0x64 > __set_cpus_allowed_ptr from kthreadd+0x44/0x16c > kthreadd from ret_from_fork+0x14/0x2c > Exception stack(0xc1cb9fb0 to 0xc1cb9ff8) > 9fa0: 00000000 00000000 00000000 > 00000000 > 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > 00000000 > 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 > CPU0: thread -1, cpu 0, socket 9, mpidr 80000900 > cblist_init_generic: Setting adjustable number of callback queues. > cblist_init_generic: Setting shift to 1 and lim to 1. > Running RCU-tasks wait API self tests > Setting up static identity map for 0x40100000 - 0x40100060 > rcu: Hierarchical SRCU implementation. > > ============================= > WARNING: suspicious RCU usage > 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted > ----------------------------- > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! > > other info that might help us debug this: > > > rcu_scheduler_active = 1, debug_locks = 1 > 1 lock held by migration/0/13: > #0: ef7b52d0 (&rq->__lock){-...}-{2:2}, at: > raw_spin_rq_lock_nested+0x24/0x34 > > stack backtrace: > CPU: 0 PID: 13 Comm: migration/0 Not tainted > 5.17.0-rc5-00050-gdc6e0818bc9a #11458 > Hardware name: Samsung Exynos (Flattened Device Tree) > Stopper: 0x0 <- 0x0 > unwind_backtrace from show_stack+0x10/0x14 > show_stack from dump_stack_lvl+0x58/0x70 > dump_stack_lvl from put_prev_task_stop+0x16c/0x25c > put_prev_task_stop from __schedule+0x698/0x964 > __schedule from schedule+0x54/0xe0 > schedule from smpboot_thread_fn+0x218/0x288 > smpboot_thread_fn from kthread+0xf0/0x134 > kthread from ret_from_fork+0x14/0x2c > Exception stack(0xc1ccffb0 to 0xc1ccfff8) > ffa0: 00000000 00000000 00000000 > 00000000 > ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > 00000000 > ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 > smp: Bringing up secondary CPUs ... > CPU1: thread -1, cpu 1, socket 9, mpidr 80000901 > CPU1: Spectre v2: using BPIALL workaround > smp: Brought up 1 node, 2 CPUs > SMP: Total of 2 processors activated (96.00 BogoMIPS). > > The above log comes from ARM 32bit Samsung Exnyos4210 based Trats board. > >> --- >> include/linux/cgroup.h | 2 -- >> kernel/sched/cpuacct.c | 4 +--- >> 2 files changed, 1 insertion(+), 5 deletions(-) >> >> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h >> index 75c151413fda..9a109c6ac0e0 100644 >> --- a/include/linux/cgroup.h >> +++ b/include/linux/cgroup.h >> @@ -791,11 +791,9 @@ static inline void cgroup_account_cputime(struct task_struct *task, >> >> cpuacct_charge(task, delta_exec); >> >> - rcu_read_lock(); >> cgrp = task_dfl_cgroup(task); >> if (cgroup_parent(cgrp)) >> __cgroup_account_cputime(cgrp, delta_exec); >> - rcu_read_unlock(); >> } >> >> static inline void cgroup_account_cputime_field(struct task_struct *task, >> diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c >> index 307800586ac8..f79f88456d72 100644 >> --- a/kernel/sched/cpuacct.c >> +++ b/kernel/sched/cpuacct.c >> @@ -337,12 +337,10 @@ void cpuacct_charge(struct task_struct *tsk, u64 cputime) >> unsigned int cpu = task_cpu(tsk); >> struct cpuacct *ca; >> >> - rcu_read_lock(); >> + lockdep_assert_rq_held(cpu_rq(cpu)); >> >> for (ca = task_ca(tsk); ca; ca = parent_ca(ca)) >> *per_cpu_ptr(ca->cpuusage, cpu) += cputime; >> - >> - rcu_read_unlock(); >> } >> >> /* > > Best regards ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3 3/3] sched/cpuacct: remove redundant RCU read lock 2022-02-20 5:14 [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Chengming Zhou 2022-02-20 5:14 ` [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock Chengming Zhou @ 2022-02-20 5:14 ` Chengming Zhou 2022-03-01 15:24 ` [tip: sched/core] sched/cpuacct: Remove " tip-bot2 for Chengming Zhou [not found] ` <CGME20220308233107eucas1p119a2f5a8d4f5b5eec38ea8dde92b3368@eucas1p1.samsung.com> 2022-02-22 18:01 ` [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Tejun Heo 2022-03-01 15:24 ` [tip: sched/core] sched/cpuacct: Fix " tip-bot2 for Chengming Zhou 3 siblings, 2 replies; 18+ messages in thread From: Chengming Zhou @ 2022-02-20 5:14 UTC (permalink / raw) To: mingo, peterz, vincent.guittot, bristot, zhaolei, tj, lizefan.x, hannes Cc: linux-kernel, Chengming Zhou The cpuacct_account_field() and it's cgroup v2 wrapper cgroup_account_cputime_field() is only called from cputime in task_group_account_field(), which is already in RCU read-side critical section. So remove these redundant RCU read lock. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> --- include/linux/cgroup.h | 2 -- kernel/sched/cpuacct.c | 2 -- 2 files changed, 4 deletions(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 9a109c6ac0e0..1e356c222756 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -804,11 +804,9 @@ static inline void cgroup_account_cputime_field(struct task_struct *task, cpuacct_account_field(task, index, delta_exec); - rcu_read_lock(); cgrp = task_dfl_cgroup(task); if (cgroup_parent(cgrp)) __cgroup_account_cputime_field(cgrp, index, delta_exec); - rcu_read_unlock(); } #else /* CONFIG_CGROUPS */ diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c index f79f88456d72..d269ede84e85 100644 --- a/kernel/sched/cpuacct.c +++ b/kernel/sched/cpuacct.c @@ -352,10 +352,8 @@ void cpuacct_account_field(struct task_struct *tsk, int index, u64 val) { struct cpuacct *ca; - rcu_read_lock(); for (ca = task_ca(tsk); ca != &root_cpuacct; ca = parent_ca(ca)) __this_cpu_add(ca->cpustat->cpustat[index], val); - rcu_read_unlock(); } struct cgroup_subsys cpuacct_cgrp_subsys = { -- 2.20.1 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [tip: sched/core] sched/cpuacct: Remove redundant RCU read lock 2022-02-20 5:14 ` [PATCH v3 3/3] sched/cpuacct: remove redundant " Chengming Zhou @ 2022-03-01 15:24 ` tip-bot2 for Chengming Zhou [not found] ` <CGME20220308233107eucas1p119a2f5a8d4f5b5eec38ea8dde92b3368@eucas1p1.samsung.com> 1 sibling, 0 replies; 18+ messages in thread From: tip-bot2 for Chengming Zhou @ 2022-03-01 15:24 UTC (permalink / raw) To: linux-tip-commits Cc: Tejun Heo, Chengming Zhou, Peter Zijlstra (Intel), x86, linux-kernel The following commit has been merged into the sched/core branch of tip: Commit-ID: 3eba0505d03a9c1eb30d40c2330c0880b22d1b3a Gitweb: https://git.kernel.org/tip/3eba0505d03a9c1eb30d40c2330c0880b22d1b3a Author: Chengming Zhou <zhouchengming@bytedance.com> AuthorDate: Sun, 20 Feb 2022 13:14:26 +08:00 Committer: Peter Zijlstra <peterz@infradead.org> CommitterDate: Tue, 01 Mar 2022 16:18:38 +01:00 sched/cpuacct: Remove redundant RCU read lock The cpuacct_account_field() and it's cgroup v2 wrapper cgroup_account_cputime_field() is only called from cputime in task_group_account_field(), which is already in RCU read-side critical section. So remove these redundant RCU read lock. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220220051426.5274-3-zhouchengming@bytedance.com --- include/linux/cgroup.h | 2 -- kernel/sched/cpuacct.c | 2 -- 2 files changed, 4 deletions(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 9a109c6..1e356c2 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -804,11 +804,9 @@ static inline void cgroup_account_cputime_field(struct task_struct *task, cpuacct_account_field(task, index, delta_exec); - rcu_read_lock(); cgrp = task_dfl_cgroup(task); if (cgroup_parent(cgrp)) __cgroup_account_cputime_field(cgrp, index, delta_exec); - rcu_read_unlock(); } #else /* CONFIG_CGROUPS */ diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c index f79f884..d269ede 100644 --- a/kernel/sched/cpuacct.c +++ b/kernel/sched/cpuacct.c @@ -352,10 +352,8 @@ void cpuacct_account_field(struct task_struct *tsk, int index, u64 val) { struct cpuacct *ca; - rcu_read_lock(); for (ca = task_ca(tsk); ca != &root_cpuacct; ca = parent_ca(ca)) __this_cpu_add(ca->cpustat->cpustat[index], val); - rcu_read_unlock(); } struct cgroup_subsys cpuacct_cgrp_subsys = { ^ permalink raw reply related [flat|nested] 18+ messages in thread
[parent not found: <CGME20220308233107eucas1p119a2f5a8d4f5b5eec38ea8dde92b3368@eucas1p1.samsung.com>]
* Re: [PATCH v3 3/3] sched/cpuacct: remove redundant RCU read lock [not found] ` <CGME20220308233107eucas1p119a2f5a8d4f5b5eec38ea8dde92b3368@eucas1p1.samsung.com> @ 2022-03-08 23:31 ` Marek Szyprowski 0 siblings, 0 replies; 18+ messages in thread From: Marek Szyprowski @ 2022-03-08 23:31 UTC (permalink / raw) To: Chengming Zhou, mingo, peterz, vincent.guittot, bristot, zhaolei, tj, lizefan.x, hannes Cc: linux-kernel On 20.02.2022 06:14, Chengming Zhou wrote: > The cpuacct_account_field() and it's cgroup v2 wrapper > cgroup_account_cputime_field() is only called from cputime > in task_group_account_field(), which is already in RCU read-side > critical section. So remove these redundant RCU read lock. > > Suggested-by: Tejun Heo <tj@kernel.org> > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> This patch landed recently in linux-next as commit 3eba0505d03a ("sched/cpuacct: Remove redundant RCU read lock"). On my test systems I found that it triggers a following warning in the early boot stage: Calibrating delay loop (skipped), value calculated using timer frequency.. 48.00 BogoMIPS (lpj=240000) pid_max: default: 32768 minimum: 301 Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear) Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear) CPU: Testing write buffer coherency: ok CPU0: Spectre v2: using BPIALL workaround CPU0: thread -1, cpu 0, socket 9, mpidr 80000900 cblist_init_generic: Setting adjustable number of callback queues. ============================= WARNING: suspicious RCU usage 5.17.0-rc5-00052-g167ee9b08bed #11459 Not tainted ----------------------------- ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 1 no locks held by swapper/0/1. stack backtrace: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.0-rc5-00052-g167ee9b08bed #11459 Hardware name: Samsung Exynos (Flattened Device Tree) unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x58/0x70 dump_stack_lvl from account_system_index_time+0x13c/0x148 account_system_index_time from account_system_time+0x78/0xa0 account_system_time from update_process_times+0x50/0xc4 update_process_times from tick_handle_periodic+0x1c/0x94 tick_handle_periodic from exynos4_mct_tick_isr+0x30/0x38 exynos4_mct_tick_isr from __handle_irq_event_percpu+0x74/0x3ac __handle_irq_event_percpu from handle_irq_event_percpu+0xc/0x38 handle_irq_event_percpu from handle_irq_event+0x38/0x5c handle_irq_event from handle_fasteoi_irq+0xc4/0x180 handle_fasteoi_irq from generic_handle_domain_irq+0x44/0x88 generic_handle_domain_irq from gic_handle_irq+0x88/0xac gic_handle_irq from generic_handle_arch_irq+0x58/0x78 generic_handle_arch_irq from __irq_svc+0x54/0x88 Exception stack(0xc1cafea0 to 0xc1cafee8) fea0: 00000000 c0f33ac4 00000001 00000129 60000053 c127b458 00000008 2e64e000 fec0: ef7bd5fc c127af84 c1209048 c1208f10 00000000 c1cafef0 c0b7e25c c0b8976c fee0: 20000053 ffffffff __irq_svc from _raw_spin_unlock_irqrestore+0x2c/0x60 _raw_spin_unlock_irqrestore from cblist_init_generic.constprop.14+0x298/0x328 cblist_init_generic.constprop.14 from rcu_init_tasks_generic+0x18/0x110 rcu_init_tasks_generic from kernel_init_freeable+0xa0/0x214 kernel_init_freeable from kernel_init+0x18/0x12c kernel_init from ret_from_fork+0x14/0x2c Exception stack(0xc1caffb0 to 0xc1cafff8) ffa0: 00000000 00000000 00000000 00000000 ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 cblist_init_generic: Setting shift to 1 and lim to 1. Running RCU-tasks wait API self tests Setting up static identity map for 0x40100000 - 0x40100060 rcu: Hierarchical SRCU implementation. smp: Bringing up secondary CPUs ... CPU1: thread -1, cpu 1, socket 9, mpidr 80000901 CPU1: Spectre v2: using BPIALL workaround smp: Brought up 1 node, 2 CPUs SMP: Total of 2 processors activated (96.00 BogoMIPS). CPU: All CPU(s) started in SVC mode. The above log comes from ARM 32bit Samsung Exnyos4210 based Trats board. To get above log I reverted commit dc6e0818bc9a ("sched/cpuacct: Optimize away RCU read lock"), because it triggered a similar warning. > --- > include/linux/cgroup.h | 2 -- > kernel/sched/cpuacct.c | 2 -- > 2 files changed, 4 deletions(-) > > diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h > index 9a109c6ac0e0..1e356c222756 100644 > --- a/include/linux/cgroup.h > +++ b/include/linux/cgroup.h > @@ -804,11 +804,9 @@ static inline void cgroup_account_cputime_field(struct task_struct *task, > > cpuacct_account_field(task, index, delta_exec); > > - rcu_read_lock(); > cgrp = task_dfl_cgroup(task); > if (cgroup_parent(cgrp)) > __cgroup_account_cputime_field(cgrp, index, delta_exec); > - rcu_read_unlock(); > } > > #else /* CONFIG_CGROUPS */ > diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c > index f79f88456d72..d269ede84e85 100644 > --- a/kernel/sched/cpuacct.c > +++ b/kernel/sched/cpuacct.c > @@ -352,10 +352,8 @@ void cpuacct_account_field(struct task_struct *tsk, int index, u64 val) > { > struct cpuacct *ca; > > - rcu_read_lock(); > for (ca = task_ca(tsk); ca != &root_cpuacct; ca = parent_ca(ca)) > __this_cpu_add(ca->cpustat->cpustat[index], val); > - rcu_read_unlock(); > } > > struct cgroup_subsys cpuacct_cgrp_subsys = { Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage 2022-02-20 5:14 [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Chengming Zhou 2022-02-20 5:14 ` [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock Chengming Zhou 2022-02-20 5:14 ` [PATCH v3 3/3] sched/cpuacct: remove redundant " Chengming Zhou @ 2022-02-22 18:01 ` Tejun Heo 2022-02-23 9:11 ` Peter Zijlstra 2022-03-01 15:24 ` [tip: sched/core] sched/cpuacct: Fix " tip-bot2 for Chengming Zhou 3 siblings, 1 reply; 18+ messages in thread From: Tejun Heo @ 2022-02-22 18:01 UTC (permalink / raw) To: Chengming Zhou Cc: mingo, peterz, vincent.guittot, bristot, zhaolei, lizefan.x, hannes, linux-kernel, Minye Zhu On Sun, Feb 20, 2022 at 01:14:24PM +0800, Chengming Zhou wrote: > The cpuacct_account_field() is always called by the current task > itself, so it's ok to use __this_cpu_add() to charge the tick time. > > But cpuacct_charge() maybe called by update_curr() in load_balance() > on a random CPU, different from the CPU on which the task is running. > So __this_cpu_add() will charge that cputime to a random incorrect CPU. > > Fixes: 73e6aafd9ea8 ("sched/cpuacct: Simplify the cpuacct code") > Reported-by: Minye Zhu <zhuminye@bytedance.com> > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> For all three patches, Acked-by: Tejun Heo <tj@kernel.org> Thanks. -- tejun ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage 2022-02-22 18:01 ` [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Tejun Heo @ 2022-02-23 9:11 ` Peter Zijlstra 0 siblings, 0 replies; 18+ messages in thread From: Peter Zijlstra @ 2022-02-23 9:11 UTC (permalink / raw) To: Tejun Heo Cc: Chengming Zhou, mingo, vincent.guittot, bristot, zhaolei, lizefan.x, hannes, linux-kernel, Minye Zhu On Tue, Feb 22, 2022 at 08:01:51AM -1000, Tejun Heo wrote: > On Sun, Feb 20, 2022 at 01:14:24PM +0800, Chengming Zhou wrote: > > The cpuacct_account_field() is always called by the current task > > itself, so it's ok to use __this_cpu_add() to charge the tick time. > > > > But cpuacct_charge() maybe called by update_curr() in load_balance() > > on a random CPU, different from the CPU on which the task is running. > > So __this_cpu_add() will charge that cputime to a random incorrect CPU. > > > > Fixes: 73e6aafd9ea8 ("sched/cpuacct: Simplify the cpuacct code") > > Reported-by: Minye Zhu <zhuminye@bytedance.com> > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> > > For all three patches, > > Acked-by: Tejun Heo <tj@kernel.org> Thanks! ^ permalink raw reply [flat|nested] 18+ messages in thread
* [tip: sched/core] sched/cpuacct: Fix charge percpu cpuusage 2022-02-20 5:14 [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Chengming Zhou ` (2 preceding siblings ...) 2022-02-22 18:01 ` [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Tejun Heo @ 2022-03-01 15:24 ` tip-bot2 for Chengming Zhou 3 siblings, 0 replies; 18+ messages in thread From: tip-bot2 for Chengming Zhou @ 2022-03-01 15:24 UTC (permalink / raw) To: linux-tip-commits Cc: Minye Zhu, Chengming Zhou, Peter Zijlstra (Intel), Tejun Heo, x86, linux-kernel The following commit has been merged into the sched/core branch of tip: Commit-ID: 248cc9993d1cc12b8e9ed716cc3fc09f6c3517dd Gitweb: https://git.kernel.org/tip/248cc9993d1cc12b8e9ed716cc3fc09f6c3517dd Author: Chengming Zhou <zhouchengming@bytedance.com> AuthorDate: Sun, 20 Feb 2022 13:14:24 +08:00 Committer: Peter Zijlstra <peterz@infradead.org> CommitterDate: Tue, 01 Mar 2022 16:18:37 +01:00 sched/cpuacct: Fix charge percpu cpuusage The cpuacct_account_field() is always called by the current task itself, so it's ok to use __this_cpu_add() to charge the tick time. But cpuacct_charge() maybe called by update_curr() in load_balance() on a random CPU, different from the CPU on which the task is running. So __this_cpu_add() will charge that cputime to a random incorrect CPU. Fixes: 73e6aafd9ea8 ("sched/cpuacct: Simplify the cpuacct code") Reported-by: Minye Zhu <zhuminye@bytedance.com> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220220051426.5274-1-zhouchengming@bytedance.com --- kernel/sched/cpuacct.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c index 3d06c5e..3078005 100644 --- a/kernel/sched/cpuacct.c +++ b/kernel/sched/cpuacct.c @@ -334,12 +334,13 @@ static struct cftype files[] = { */ void cpuacct_charge(struct task_struct *tsk, u64 cputime) { + unsigned int cpu = task_cpu(tsk); struct cpuacct *ca; rcu_read_lock(); for (ca = task_ca(tsk); ca; ca = parent_ca(ca)) - __this_cpu_add(*ca->cpuusage, cputime); + *per_cpu_ptr(ca->cpuusage, cpu) += cputime; rcu_read_unlock(); } ^ permalink raw reply related [flat|nested] 18+ messages in thread
end of thread, other threads:[~2022-03-12 17:45 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-02-20 5:14 [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Chengming Zhou 2022-02-20 5:14 ` [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock Chengming Zhou 2022-03-01 15:24 ` [tip: sched/core] sched/cpuacct: Optimize " tip-bot2 for Chengming Zhou [not found] ` <CGME20220308232034eucas1p2b0f39cee0f462af6004ebdfbe5bacb9f@eucas1p2.samsung.com> 2022-03-08 23:20 ` [PATCH v3 2/3] sched/cpuacct: optimize " Marek Szyprowski 2022-03-08 23:32 ` Peter Zijlstra 2022-03-08 23:44 ` Paul E. McKenney 2022-03-09 0:21 ` Paul E. McKenney 2022-03-10 8:45 ` Peter Zijlstra 2022-03-10 15:01 ` Paul E. McKenney 2022-03-12 12:15 ` Peter Zijlstra 2022-03-12 17:44 ` Paul E. McKenney 2022-03-09 3:08 ` [External] " Chengming Zhou 2022-02-20 5:14 ` [PATCH v3 3/3] sched/cpuacct: remove redundant " Chengming Zhou 2022-03-01 15:24 ` [tip: sched/core] sched/cpuacct: Remove " tip-bot2 for Chengming Zhou [not found] ` <CGME20220308233107eucas1p119a2f5a8d4f5b5eec38ea8dde92b3368@eucas1p1.samsung.com> 2022-03-08 23:31 ` [PATCH v3 3/3] sched/cpuacct: remove " Marek Szyprowski 2022-02-22 18:01 ` [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Tejun Heo 2022-02-23 9:11 ` Peter Zijlstra 2022-03-01 15:24 ` [tip: sched/core] sched/cpuacct: Fix " tip-bot2 for Chengming Zhou
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).