* [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage @ 2022-02-20 5:14 Chengming Zhou 2022-02-20 5:14 ` [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock Chengming Zhou ` (3 more replies) 0 siblings, 4 replies; 18+ messages in thread From: Chengming Zhou @ 2022-02-20 5:14 UTC (permalink / raw) To: mingo, peterz, vincent.guittot, bristot, zhaolei, tj, lizefan.x, hannes Cc: linux-kernel, Chengming Zhou, Minye Zhu The cpuacct_account_field() is always called by the current task itself, so it's ok to use __this_cpu_add() to charge the tick time. But cpuacct_charge() maybe called by update_curr() in load_balance() on a random CPU, different from the CPU on which the task is running. So __this_cpu_add() will charge that cputime to a random incorrect CPU. Fixes: 73e6aafd9ea8 ("sched/cpuacct: Simplify the cpuacct code") Reported-by: Minye Zhu <zhuminye@bytedance.com> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> --- kernel/sched/cpuacct.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c index 3d06c5e4220d..307800586ac8 100644 --- a/kernel/sched/cpuacct.c +++ b/kernel/sched/cpuacct.c @@ -334,12 +334,13 @@ static struct cftype files[] = { */ void cpuacct_charge(struct task_struct *tsk, u64 cputime) { + unsigned int cpu = task_cpu(tsk); struct cpuacct *ca; rcu_read_lock(); for (ca = task_ca(tsk); ca; ca = parent_ca(ca)) - __this_cpu_add(*ca->cpuusage, cputime); + *per_cpu_ptr(ca->cpuusage, cpu) += cputime; rcu_read_unlock(); } -- 2.20.1 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock 2022-02-20 5:14 [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Chengming Zhou @ 2022-02-20 5:14 ` Chengming Zhou 2022-03-01 15:24 ` [tip: sched/core] sched/cpuacct: Optimize " tip-bot2 for Chengming Zhou [not found] ` <CGME20220308232034eucas1p2b0f39cee0f462af6004ebdfbe5bacb9f@eucas1p2.samsung.com> 2022-02-20 5:14 ` [PATCH v3 3/3] sched/cpuacct: remove redundant " Chengming Zhou ` (2 subsequent siblings) 3 siblings, 2 replies; 18+ messages in thread From: Chengming Zhou @ 2022-02-20 5:14 UTC (permalink / raw) To: mingo, peterz, vincent.guittot, bristot, zhaolei, tj, lizefan.x, hannes Cc: linux-kernel, Chengming Zhou Since cpuacct_charge() is called from the scheduler update_curr(), we must already have rq lock held, then the RCU read lock can be optimized away. And do the same thing in it's wrapper cgroup_account_cputime(), but we can't use lockdep_assert_rq_held() there, which defined in kernel/sched/sched.h. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> --- include/linux/cgroup.h | 2 -- kernel/sched/cpuacct.c | 4 +--- 2 files changed, 1 insertion(+), 5 deletions(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 75c151413fda..9a109c6ac0e0 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -791,11 +791,9 @@ static inline void cgroup_account_cputime(struct task_struct *task, cpuacct_charge(task, delta_exec); - rcu_read_lock(); cgrp = task_dfl_cgroup(task); if (cgroup_parent(cgrp)) __cgroup_account_cputime(cgrp, delta_exec); - rcu_read_unlock(); } static inline void cgroup_account_cputime_field(struct task_struct *task, diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c index 307800586ac8..f79f88456d72 100644 --- a/kernel/sched/cpuacct.c +++ b/kernel/sched/cpuacct.c @@ -337,12 +337,10 @@ void cpuacct_charge(struct task_struct *tsk, u64 cputime) unsigned int cpu = task_cpu(tsk); struct cpuacct *ca; - rcu_read_lock(); + lockdep_assert_rq_held(cpu_rq(cpu)); for (ca = task_ca(tsk); ca; ca = parent_ca(ca)) *per_cpu_ptr(ca->cpuusage, cpu) += cputime; - - rcu_read_unlock(); } /* -- 2.20.1 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [tip: sched/core] sched/cpuacct: Optimize away RCU read lock 2022-02-20 5:14 ` [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock Chengming Zhou @ 2022-03-01 15:24 ` tip-bot2 for Chengming Zhou [not found] ` <CGME20220308232034eucas1p2b0f39cee0f462af6004ebdfbe5bacb9f@eucas1p2.samsung.com> 1 sibling, 0 replies; 18+ messages in thread From: tip-bot2 for Chengming Zhou @ 2022-03-01 15:24 UTC (permalink / raw) To: linux-tip-commits Cc: Peter Zijlstra (Intel), Chengming Zhou, x86, linux-kernel The following commit has been merged into the sched/core branch of tip: Commit-ID: dc6e0818bc9a0336d9accf3ea35d146d72aa7a18 Gitweb: https://git.kernel.org/tip/dc6e0818bc9a0336d9accf3ea35d146d72aa7a18 Author: Chengming Zhou <zhouchengming@bytedance.com> AuthorDate: Sun, 20 Feb 2022 13:14:25 +08:00 Committer: Peter Zijlstra <peterz@infradead.org> CommitterDate: Tue, 01 Mar 2022 16:18:38 +01:00 sched/cpuacct: Optimize away RCU read lock Since cpuacct_charge() is called from the scheduler update_curr(), we must already have rq lock held, then the RCU read lock can be optimized away. And do the same thing in it's wrapper cgroup_account_cputime(), but we can't use lockdep_assert_rq_held() there, which defined in kernel/sched/sched.h. Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220220051426.5274-2-zhouchengming@bytedance.com --- include/linux/cgroup.h | 2 -- kernel/sched/cpuacct.c | 4 +--- 2 files changed, 1 insertion(+), 5 deletions(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 75c1514..9a109c6 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -791,11 +791,9 @@ static inline void cgroup_account_cputime(struct task_struct *task, cpuacct_charge(task, delta_exec); - rcu_read_lock(); cgrp = task_dfl_cgroup(task); if (cgroup_parent(cgrp)) __cgroup_account_cputime(cgrp, delta_exec); - rcu_read_unlock(); } static inline void cgroup_account_cputime_field(struct task_struct *task, diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c index 3078005..f79f884 100644 --- a/kernel/sched/cpuacct.c +++ b/kernel/sched/cpuacct.c @@ -337,12 +337,10 @@ void cpuacct_charge(struct task_struct *tsk, u64 cputime) unsigned int cpu = task_cpu(tsk); struct cpuacct *ca; - rcu_read_lock(); + lockdep_assert_rq_held(cpu_rq(cpu)); for (ca = task_ca(tsk); ca; ca = parent_ca(ca)) *per_cpu_ptr(ca->cpuusage, cpu) += cputime; - - rcu_read_unlock(); } /* ^ permalink raw reply related [flat|nested] 18+ messages in thread
[parent not found: <CGME20220308232034eucas1p2b0f39cee0f462af6004ebdfbe5bacb9f@eucas1p2.samsung.com>]
* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock [not found] ` <CGME20220308232034eucas1p2b0f39cee0f462af6004ebdfbe5bacb9f@eucas1p2.samsung.com> @ 2022-03-08 23:20 ` Marek Szyprowski 2022-03-08 23:32 ` Peter Zijlstra 2022-03-09 3:08 ` [External] " Chengming Zhou 0 siblings, 2 replies; 18+ messages in thread From: Marek Szyprowski @ 2022-03-08 23:20 UTC (permalink / raw) To: Chengming Zhou, mingo, peterz, vincent.guittot, bristot, zhaolei, tj, lizefan.x, hannes Cc: linux-kernel On 20.02.2022 06:14, Chengming Zhou wrote: > Since cpuacct_charge() is called from the scheduler update_curr(), > we must already have rq lock held, then the RCU read lock can > be optimized away. > > And do the same thing in it's wrapper cgroup_account_cputime(), > but we can't use lockdep_assert_rq_held() there, which defined > in kernel/sched/sched.h. > > Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> This patch landed recently in linux-next as commit dc6e0818bc9a ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I found that it triggers a following warning in the early boot stage: Calibrating delay loop (skipped), value calculated using timer frequency.. 48.00 BogoMIPS (lpj=240000) pid_max: default: 32768 minimum: 301 Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear) Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear) CPU: Testing write buffer coherency: ok CPU0: Spectre v2: using BPIALL workaround ============================= WARNING: suspicious RCU usage 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted ----------------------------- ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 1 2 locks held by kthreadd/2: #0: c1d7972c (&p->pi_lock){....}-{2:2}, at: task_rq_lock+0x30/0x118 #1: ef7b52d0 (&rq->__lock){-...}-{2:2}, at: raw_spin_rq_lock_nested+0x24/0x34 stack backtrace: CPU: 0 PID: 2 Comm: kthreadd Not tainted 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Hardware name: Samsung Exynos (Flattened Device Tree) unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x58/0x70 dump_stack_lvl from update_curr+0x1bc/0x35c update_curr from dequeue_task_fair+0xb0/0x8e8 dequeue_task_fair from __do_set_cpus_allowed+0x19c/0x258 __do_set_cpus_allowed from __set_cpus_allowed_ptr_locked+0x130/0x1d8 __set_cpus_allowed_ptr_locked from __set_cpus_allowed_ptr+0x48/0x64 __set_cpus_allowed_ptr from kthreadd+0x44/0x16c kthreadd from ret_from_fork+0x14/0x2c Exception stack(0xc1cb9fb0 to 0xc1cb9ff8) 9fa0: 00000000 00000000 00000000 00000000 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 CPU0: thread -1, cpu 0, socket 9, mpidr 80000900 cblist_init_generic: Setting adjustable number of callback queues. cblist_init_generic: Setting shift to 1 and lim to 1. Running RCU-tasks wait API self tests Setting up static identity map for 0x40100000 - 0x40100060 rcu: Hierarchical SRCU implementation. ============================= WARNING: suspicious RCU usage 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted ----------------------------- ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 1 1 lock held by migration/0/13: #0: ef7b52d0 (&rq->__lock){-...}-{2:2}, at: raw_spin_rq_lock_nested+0x24/0x34 stack backtrace: CPU: 0 PID: 13 Comm: migration/0 Not tainted 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Hardware name: Samsung Exynos (Flattened Device Tree) Stopper: 0x0 <- 0x0 unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x58/0x70 dump_stack_lvl from put_prev_task_stop+0x16c/0x25c put_prev_task_stop from __schedule+0x698/0x964 __schedule from schedule+0x54/0xe0 schedule from smpboot_thread_fn+0x218/0x288 smpboot_thread_fn from kthread+0xf0/0x134 kthread from ret_from_fork+0x14/0x2c Exception stack(0xc1ccffb0 to 0xc1ccfff8) ffa0: 00000000 00000000 00000000 00000000 ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 smp: Bringing up secondary CPUs ... CPU1: thread -1, cpu 1, socket 9, mpidr 80000901 CPU1: Spectre v2: using BPIALL workaround smp: Brought up 1 node, 2 CPUs SMP: Total of 2 processors activated (96.00 BogoMIPS). The above log comes from ARM 32bit Samsung Exnyos4210 based Trats board. > --- > include/linux/cgroup.h | 2 -- > kernel/sched/cpuacct.c | 4 +--- > 2 files changed, 1 insertion(+), 5 deletions(-) > > diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h > index 75c151413fda..9a109c6ac0e0 100644 > --- a/include/linux/cgroup.h > +++ b/include/linux/cgroup.h > @@ -791,11 +791,9 @@ static inline void cgroup_account_cputime(struct task_struct *task, > > cpuacct_charge(task, delta_exec); > > - rcu_read_lock(); > cgrp = task_dfl_cgroup(task); > if (cgroup_parent(cgrp)) > __cgroup_account_cputime(cgrp, delta_exec); > - rcu_read_unlock(); > } > > static inline void cgroup_account_cputime_field(struct task_struct *task, > diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c > index 307800586ac8..f79f88456d72 100644 > --- a/kernel/sched/cpuacct.c > +++ b/kernel/sched/cpuacct.c > @@ -337,12 +337,10 @@ void cpuacct_charge(struct task_struct *tsk, u64 cputime) > unsigned int cpu = task_cpu(tsk); > struct cpuacct *ca; > > - rcu_read_lock(); > + lockdep_assert_rq_held(cpu_rq(cpu)); > > for (ca = task_ca(tsk); ca; ca = parent_ca(ca)) > *per_cpu_ptr(ca->cpuusage, cpu) += cputime; > - > - rcu_read_unlock(); > } > > /* Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock 2022-03-08 23:20 ` [PATCH v3 2/3] sched/cpuacct: optimize " Marek Szyprowski @ 2022-03-08 23:32 ` Peter Zijlstra 2022-03-08 23:44 ` Paul E. McKenney 2022-03-09 3:08 ` [External] " Chengming Zhou 1 sibling, 1 reply; 18+ messages in thread From: Peter Zijlstra @ 2022-03-08 23:32 UTC (permalink / raw) To: Marek Szyprowski Cc: Chengming Zhou, mingo, vincent.guittot, bristot, zhaolei, tj, lizefan.x, hannes, linux-kernel, Paul McKenney On Wed, Mar 09, 2022 at 12:20:33AM +0100, Marek Szyprowski wrote: > On 20.02.2022 06:14, Chengming Zhou wrote: > > Since cpuacct_charge() is called from the scheduler update_curr(), > > we must already have rq lock held, then the RCU read lock can > > be optimized away. > > > > And do the same thing in it's wrapper cgroup_account_cputime(), > > but we can't use lockdep_assert_rq_held() there, which defined > > in kernel/sched/sched.h. > > > > Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> > > This patch landed recently in linux-next as commit dc6e0818bc9a > ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I > found that it triggers a following warning in the early boot stage: > > Calibrating delay loop (skipped), value calculated using timer > frequency.. 48.00 BogoMIPS (lpj=240000) > pid_max: default: 32768 minimum: 301 > Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear) > Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear) > CPU: Testing write buffer coherency: ok > CPU0: Spectre v2: using BPIALL workaround > > ============================= > WARNING: suspicious RCU usage > 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted > ----------------------------- > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! Arguably, with the flavours folded again, rcu_dereference_check() ought to default include rcu_read_lock_sched_held() or its equivalent I suppose. Paul? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock 2022-03-08 23:32 ` Peter Zijlstra @ 2022-03-08 23:44 ` Paul E. McKenney 2022-03-09 0:21 ` Paul E. McKenney 2022-03-10 8:45 ` Peter Zijlstra 0 siblings, 2 replies; 18+ messages in thread From: Paul E. McKenney @ 2022-03-08 23:44 UTC (permalink / raw) To: Peter Zijlstra Cc: Marek Szyprowski, Chengming Zhou, mingo, vincent.guittot, bristot, zhaolei, tj, lizefan.x, hannes, linux-kernel On Wed, Mar 09, 2022 at 12:32:25AM +0100, Peter Zijlstra wrote: > On Wed, Mar 09, 2022 at 12:20:33AM +0100, Marek Szyprowski wrote: > > On 20.02.2022 06:14, Chengming Zhou wrote: > > > Since cpuacct_charge() is called from the scheduler update_curr(), > > > we must already have rq lock held, then the RCU read lock can > > > be optimized away. > > > > > > And do the same thing in it's wrapper cgroup_account_cputime(), > > > but we can't use lockdep_assert_rq_held() there, which defined > > > in kernel/sched/sched.h. > > > > > > Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> > > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> > > > > This patch landed recently in linux-next as commit dc6e0818bc9a > > ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I > > found that it triggers a following warning in the early boot stage: > > > > Calibrating delay loop (skipped), value calculated using timer > > frequency.. 48.00 BogoMIPS (lpj=240000) > > pid_max: default: 32768 minimum: 301 > > Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear) > > Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear) > > CPU: Testing write buffer coherency: ok > > CPU0: Spectre v2: using BPIALL workaround > > > > ============================= > > WARNING: suspicious RCU usage > > 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted > > ----------------------------- > > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! > > Arguably, with the flavours folded again, rcu_dereference_check() ought > to default include rcu_read_lock_sched_held() or its equivalent I > suppose. > > Paul? That would reduce the number of warnings, but it also would hide bugs. So, are you sure you really want this? Thanx, Paul ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock 2022-03-08 23:44 ` Paul E. McKenney @ 2022-03-09 0:21 ` Paul E. McKenney 2022-03-10 8:45 ` Peter Zijlstra 1 sibling, 0 replies; 18+ messages in thread From: Paul E. McKenney @ 2022-03-09 0:21 UTC (permalink / raw) To: Peter Zijlstra Cc: Marek Szyprowski, Chengming Zhou, mingo, vincent.guittot, bristot, zhaolei, tj, lizefan.x, hannes, linux-kernel On Tue, Mar 08, 2022 at 03:44:03PM -0800, Paul E. McKenney wrote: > On Wed, Mar 09, 2022 at 12:32:25AM +0100, Peter Zijlstra wrote: > > On Wed, Mar 09, 2022 at 12:20:33AM +0100, Marek Szyprowski wrote: > > > On 20.02.2022 06:14, Chengming Zhou wrote: > > > > Since cpuacct_charge() is called from the scheduler update_curr(), > > > > we must already have rq lock held, then the RCU read lock can > > > > be optimized away. > > > > > > > > And do the same thing in it's wrapper cgroup_account_cputime(), > > > > but we can't use lockdep_assert_rq_held() there, which defined > > > > in kernel/sched/sched.h. > > > > > > > > Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> > > > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> > > > > > > This patch landed recently in linux-next as commit dc6e0818bc9a > > > ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I > > > found that it triggers a following warning in the early boot stage: > > > > > > Calibrating delay loop (skipped), value calculated using timer > > > frequency.. 48.00 BogoMIPS (lpj=240000) > > > pid_max: default: 32768 minimum: 301 > > > Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear) > > > Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear) > > > CPU: Testing write buffer coherency: ok > > > CPU0: Spectre v2: using BPIALL workaround > > > > > > ============================= > > > WARNING: suspicious RCU usage > > > 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted > > > ----------------------------- > > > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! > > > > Arguably, with the flavours folded again, rcu_dereference_check() ought > > to default include rcu_read_lock_sched_held() or its equivalent I > > suppose. > > > > Paul? > > That would reduce the number of warnings, but it also would hide bugs. > > So, are you sure you really want this? Of course, if you are designing a use case that really expects multiple types of readers... Another approach might be rcu_dereference_brs(), but hopefully with a better name, that checks for either rcu_read_lock(), disabled BH, or disabled preemption. Or if you are looking only for rcu_read_lock() or disabled preemption, rcu_dereference_rs(), again hopefully with a better name. Is that what you had in mind? Thanx, Paul ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock 2022-03-08 23:44 ` Paul E. McKenney 2022-03-09 0:21 ` Paul E. McKenney @ 2022-03-10 8:45 ` Peter Zijlstra 2022-03-10 15:01 ` Paul E. McKenney 1 sibling, 1 reply; 18+ messages in thread From: Peter Zijlstra @ 2022-03-10 8:45 UTC (permalink / raw) To: Paul E. McKenney Cc: Marek Szyprowski, Chengming Zhou, mingo, vincent.guittot, bristot, zhaolei, tj, lizefan.x, hannes, linux-kernel On Tue, Mar 08, 2022 at 03:44:03PM -0800, Paul E. McKenney wrote: > On Wed, Mar 09, 2022 at 12:32:25AM +0100, Peter Zijlstra wrote: > > On Wed, Mar 09, 2022 at 12:20:33AM +0100, Marek Szyprowski wrote: > > > On 20.02.2022 06:14, Chengming Zhou wrote: > > > > Since cpuacct_charge() is called from the scheduler update_curr(), > > > > we must already have rq lock held, then the RCU read lock can > > > > be optimized away. > > > > > > > > And do the same thing in it's wrapper cgroup_account_cputime(), > > > > but we can't use lockdep_assert_rq_held() there, which defined > > > > in kernel/sched/sched.h. > > > > > > > > Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> > > > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> > > > > > > This patch landed recently in linux-next as commit dc6e0818bc9a > > > ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I > > > found that it triggers a following warning in the early boot stage: > > > > > > Calibrating delay loop (skipped), value calculated using timer > > > frequency.. 48.00 BogoMIPS (lpj=240000) > > > pid_max: default: 32768 minimum: 301 > > > Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear) > > > Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear) > > > CPU: Testing write buffer coherency: ok > > > CPU0: Spectre v2: using BPIALL workaround > > > > > > ============================= > > > WARNING: suspicious RCU usage > > > 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted > > > ----------------------------- > > > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! > > > > Arguably, with the flavours folded again, rcu_dereference_check() ought > > to default include rcu_read_lock_sched_held() or its equivalent I > > suppose. > > > > Paul? > > That would reduce the number of warnings, but it also would hide bugs. > > So, are you sure you really want this? I don't understand... Since the flavours got merged regular RCU has it's quescent state held off by preempt_disable. So how can relying on that cause bugs? And if we can rely on that, then surely rcu_dereferenced_check() ought to play by the same rules, otherwise we get silly warnings like these at hand. Specifically, we removed the rcu_read_lock() here because this has rq->lock held, which is a raw_spinlock_t which very much implies preempt disable, on top of that, it's also an IRQ-safe lock and thus IRQs will be disabled. There is no possible way for RCU to make progress. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock 2022-03-10 8:45 ` Peter Zijlstra @ 2022-03-10 15:01 ` Paul E. McKenney 2022-03-12 12:15 ` Peter Zijlstra 0 siblings, 1 reply; 18+ messages in thread From: Paul E. McKenney @ 2022-03-10 15:01 UTC (permalink / raw) To: Peter Zijlstra Cc: Marek Szyprowski, Chengming Zhou, mingo, vincent.guittot, bristot, zhaolei, tj, lizefan.x, hannes, linux-kernel On Thu, Mar 10, 2022 at 09:45:17AM +0100, Peter Zijlstra wrote: > On Tue, Mar 08, 2022 at 03:44:03PM -0800, Paul E. McKenney wrote: > > On Wed, Mar 09, 2022 at 12:32:25AM +0100, Peter Zijlstra wrote: > > > On Wed, Mar 09, 2022 at 12:20:33AM +0100, Marek Szyprowski wrote: > > > > On 20.02.2022 06:14, Chengming Zhou wrote: > > > > > Since cpuacct_charge() is called from the scheduler update_curr(), > > > > > we must already have rq lock held, then the RCU read lock can > > > > > be optimized away. > > > > > > > > > > And do the same thing in it's wrapper cgroup_account_cputime(), > > > > > but we can't use lockdep_assert_rq_held() there, which defined > > > > > in kernel/sched/sched.h. > > > > > > > > > > Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> > > > > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> > > > > > > > > This patch landed recently in linux-next as commit dc6e0818bc9a > > > > ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I > > > > found that it triggers a following warning in the early boot stage: > > > > > > > > Calibrating delay loop (skipped), value calculated using timer > > > > frequency.. 48.00 BogoMIPS (lpj=240000) > > > > pid_max: default: 32768 minimum: 301 > > > > Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear) > > > > Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear) > > > > CPU: Testing write buffer coherency: ok > > > > CPU0: Spectre v2: using BPIALL workaround > > > > > > > > ============================= > > > > WARNING: suspicious RCU usage > > > > 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted > > > > ----------------------------- > > > > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! > > > > > > Arguably, with the flavours folded again, rcu_dereference_check() ought > > > to default include rcu_read_lock_sched_held() or its equivalent I > > > suppose. > > > > > > Paul? > > > > That would reduce the number of warnings, but it also would hide bugs. > > > > So, are you sure you really want this? > > I don't understand... Since the flavours got merged regular RCU has it's > quescent state held off by preempt_disable. So how can relying on that > cause bugs? Somene forgets an rcu_read_lock() and there happens to be something like a preempt_disable() that by coincidence covers that particular rcu_dereference(). The kernel therefore doesn't complain. That someone goes on to other things, maybe even posthumously. Then some time later the preempt_disable() goes away, for good and sufficient reasons. Good luck figuring out where to put the needed rcu_read_lock() and rcu_read_unlock(). > And if we can rely on that, then surely rcu_dereferenced_check() ought > to play by the same rules, otherwise we get silly warnings like these at > hand. > > Specifically, we removed the rcu_read_lock() here because this has > rq->lock held, which is a raw_spinlock_t which very much implies preempt > disable, on top of that, it's also an IRQ-safe lock and thus IRQs will > be disabled. > > There is no possible way for RCU to make progress. Then let's have that particular rcu_dereference_check() explicitly state what it needs, which seems to be either rcu_read_lock() on the one hand. Right now, that could be just this: p = rcu_dereference_check(gp, rcu_read_lock_sched_held()); Or am I missing something here? Thanx, Paul ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock 2022-03-10 15:01 ` Paul E. McKenney @ 2022-03-12 12:15 ` Peter Zijlstra 2022-03-12 17:44 ` Paul E. McKenney 0 siblings, 1 reply; 18+ messages in thread From: Peter Zijlstra @ 2022-03-12 12:15 UTC (permalink / raw) To: Paul E. McKenney Cc: Marek Szyprowski, Chengming Zhou, mingo, vincent.guittot, bristot, zhaolei, tj, lizefan.x, hannes, linux-kernel On Thu, Mar 10, 2022 at 07:01:52AM -0800, Paul E. McKenney wrote: > > > > > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! > > > > > > > > Arguably, with the flavours folded again, rcu_dereference_check() ought > > > > to default include rcu_read_lock_sched_held() or its equivalent I > > > > suppose. > > > > > > > > Paul? > > > > > > That would reduce the number of warnings, but it also would hide bugs. > > > > > > So, are you sure you really want this? > > > > I don't understand... Since the flavours got merged regular RCU has it's > > quescent state held off by preempt_disable. So how can relying on that > > cause bugs? > > Somene forgets an rcu_read_lock() and there happens to be something > like a preempt_disable() that by coincidence covers that particular > rcu_dereference(). The kernel therefore doesn't complain. That someone > goes on to other things, maybe even posthumously. Then some time later > the preempt_disable() goes away, for good and sufficient reasons. > > Good luck figuring out where to put the needed rcu_read_lock() and > rcu_read_unlock(). Well, that's software engineering for you. Also in that case the warning will work as expected. Then figuring out how to fix it is not the problem of the warning -- that worked as advertised. (also, I don't think it'll be too hard, you just gotta figure out which object is rcu protected -- the warning gives you this, where the lookup happens -- again the warning helps, and how long it's used for, all relatively well definted things) I don't see a problem. No bugs hidden. > > And if we can rely on that, then surely rcu_dereferenced_check() ought > > to play by the same rules, otherwise we get silly warnings like these at > > hand. > > > > Specifically, we removed the rcu_read_lock() here because this has > > rq->lock held, which is a raw_spinlock_t which very much implies preempt > > disable, on top of that, it's also an IRQ-safe lock and thus IRQs will > > be disabled. > > > > There is no possible way for RCU to make progress. > > Then let's have that particular rcu_dereference_check() explicitly state > what it needs, which seems to be either rcu_read_lock() on the one hand. > Right now, that could be just this: > > p = rcu_dereference_check(gp, rcu_read_lock_sched_held()); > > Or am I missing something here? That will work; I just don't agree with it. Per the rules of RCU it is entirely correct to mix rcu_read_lock() and preempt_disable() (or anything that implies the same). So I strongly feel that rcu_dereference() should not warn about obviously correct code. Why would we need to special case this ? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock 2022-03-12 12:15 ` Peter Zijlstra @ 2022-03-12 17:44 ` Paul E. McKenney 0 siblings, 0 replies; 18+ messages in thread From: Paul E. McKenney @ 2022-03-12 17:44 UTC (permalink / raw) To: Peter Zijlstra Cc: Marek Szyprowski, Chengming Zhou, mingo, vincent.guittot, bristot, zhaolei, tj, lizefan.x, hannes, linux-kernel On Sat, Mar 12, 2022 at 01:15:33PM +0100, Peter Zijlstra wrote: > On Thu, Mar 10, 2022 at 07:01:52AM -0800, Paul E. McKenney wrote: > > > > > > > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! > > > > > > > > > > Arguably, with the flavours folded again, rcu_dereference_check() ought > > > > > to default include rcu_read_lock_sched_held() or its equivalent I > > > > > suppose. > > > > > > > > > > Paul? > > > > > > > > That would reduce the number of warnings, but it also would hide bugs. > > > > > > > > So, are you sure you really want this? > > > > > > I don't understand... Since the flavours got merged regular RCU has it's > > > quescent state held off by preempt_disable. So how can relying on that > > > cause bugs? > > > > Somene forgets an rcu_read_lock() and there happens to be something > > like a preempt_disable() that by coincidence covers that particular > > rcu_dereference(). The kernel therefore doesn't complain. That someone > > goes on to other things, maybe even posthumously. Then some time later > > the preempt_disable() goes away, for good and sufficient reasons. > > > > Good luck figuring out where to put the needed rcu_read_lock() and > > rcu_read_unlock(). > > Well, that's software engineering for you. My point exactly!!! > Also in that case the warning > will work as expected. Then figuring out how to fix it is not the > problem of the warning -- that worked as advertised. > > (also, I don't think it'll be too hard, you just gotta figure out which > object is rcu protected -- the warning gives you this, where the lookup > happens -- again the warning helps, and how long it's used for, all > relatively well definted things) Without in any way agreeing with that assessment of difficulty, especially in the general case... It is -way- easier just to tell RCU what your design rules are for the code in question. > I don't see a problem. No bugs hidden. C'mon, Peter! There really was a bug hidden. That someone intended to add some calls to rcu_read_lock() and rcu_read_unlock() in the proper places. Their failure to add them really was a bug. That bug was hidden by: (1) There being a preempt_disable() or whatever that by coincidence happened to be covering the part of the code containing the rcu_dereference() and (2) Your proposed change that would make rcu_dereference() unable to detect that bug. And that bug can be quite bad. Given your proposed change, RCU cannot detect this bug: /* Preemption is enabled. */ /* There should be an rcu_read_lock() here. */ preempt_disable(); p = rcu_dereference(gp); do_something_with(p); preempt_enable(); /* Without the rcu_read_lock(), *p is history. */ do_something_else_with(p); /* There should be an rcu_read_unlock() here. */ > > > And if we can rely on that, then surely rcu_dereferenced_check() ought > > > to play by the same rules, otherwise we get silly warnings like these at > > > hand. > > > > > > Specifically, we removed the rcu_read_lock() here because this has > > > rq->lock held, which is a raw_spinlock_t which very much implies preempt > > > disable, on top of that, it's also an IRQ-safe lock and thus IRQs will > > > be disabled. > > > > > > There is no possible way for RCU to make progress. > > > > Then let's have that particular rcu_dereference_check() explicitly state > > what it needs, which seems to be either rcu_read_lock() on the one hand. > > Right now, that could be just this: > > > > p = rcu_dereference_check(gp, rcu_read_lock_sched_held()); > > > > Or am I missing something here? > > That will work; I just don't agree with it. Per the rules of RCU it is > entirely correct to mix rcu_read_lock() and preempt_disable() (or > anything that implies the same). So I strongly feel that > rcu_dereference() should not warn about obviously correct code. Why > would we need to special case this ? This use case might well be entirely correct, but it is most certainly not the common case. Therefore, my answer to this requested chance in rcu_dereference() semantics is "no". Thanx, Paul ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [External] Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock 2022-03-08 23:20 ` [PATCH v3 2/3] sched/cpuacct: optimize " Marek Szyprowski 2022-03-08 23:32 ` Peter Zijlstra @ 2022-03-09 3:08 ` Chengming Zhou 1 sibling, 0 replies; 18+ messages in thread From: Chengming Zhou @ 2022-03-09 3:08 UTC (permalink / raw) To: Marek Szyprowski, mingo, peterz, vincent.guittot, bristot, zhaolei, tj, lizefan.x, hannes Cc: linux-kernel On 2022/3/9 7:20 上午, Marek Szyprowski wrote: > On 20.02.2022 06:14, Chengming Zhou wrote: >> Since cpuacct_charge() is called from the scheduler update_curr(), >> we must already have rq lock held, then the RCU read lock can >> be optimized away. >> >> And do the same thing in it's wrapper cgroup_account_cputime(), >> but we can't use lockdep_assert_rq_held() there, which defined >> in kernel/sched/sched.h. >> >> Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> >> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> > > This patch landed recently in linux-next as commit dc6e0818bc9a > ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I > found that it triggers a following warning in the early boot stage: Hi, thanks for the report. I've send a fix patch[1] for review. [1] https://lore.kernel.org/lkml/20220305034103.57123-1-zhouchengming@bytedance.com/ > > Calibrating delay loop (skipped), value calculated using timer > frequency.. 48.00 BogoMIPS (lpj=240000) > pid_max: default: 32768 minimum: 301 > Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear) > Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear) > CPU: Testing write buffer coherency: ok > CPU0: Spectre v2: using BPIALL workaround > > ============================= > WARNING: suspicious RCU usage > 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted > ----------------------------- > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! > > other info that might help us debug this: > > > rcu_scheduler_active = 1, debug_locks = 1 > 2 locks held by kthreadd/2: > #0: c1d7972c (&p->pi_lock){....}-{2:2}, at: task_rq_lock+0x30/0x118 > #1: ef7b52d0 (&rq->__lock){-...}-{2:2}, at: > raw_spin_rq_lock_nested+0x24/0x34 > > stack backtrace: > CPU: 0 PID: 2 Comm: kthreadd Not tainted 5.17.0-rc5-00050-gdc6e0818bc9a > #11458 > Hardware name: Samsung Exynos (Flattened Device Tree) > unwind_backtrace from show_stack+0x10/0x14 > show_stack from dump_stack_lvl+0x58/0x70 > dump_stack_lvl from update_curr+0x1bc/0x35c > update_curr from dequeue_task_fair+0xb0/0x8e8 > dequeue_task_fair from __do_set_cpus_allowed+0x19c/0x258 > __do_set_cpus_allowed from __set_cpus_allowed_ptr_locked+0x130/0x1d8 > __set_cpus_allowed_ptr_locked from __set_cpus_allowed_ptr+0x48/0x64 > __set_cpus_allowed_ptr from kthreadd+0x44/0x16c > kthreadd from ret_from_fork+0x14/0x2c > Exception stack(0xc1cb9fb0 to 0xc1cb9ff8) > 9fa0: 00000000 00000000 00000000 > 00000000 > 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > 00000000 > 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 > CPU0: thread -1, cpu 0, socket 9, mpidr 80000900 > cblist_init_generic: Setting adjustable number of callback queues. > cblist_init_generic: Setting shift to 1 and lim to 1. > Running RCU-tasks wait API self tests > Setting up static identity map for 0x40100000 - 0x40100060 > rcu: Hierarchical SRCU implementation. > > ============================= > WARNING: suspicious RCU usage > 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted > ----------------------------- > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! > > other info that might help us debug this: > > > rcu_scheduler_active = 1, debug_locks = 1 > 1 lock held by migration/0/13: > #0: ef7b52d0 (&rq->__lock){-...}-{2:2}, at: > raw_spin_rq_lock_nested+0x24/0x34 > > stack backtrace: > CPU: 0 PID: 13 Comm: migration/0 Not tainted > 5.17.0-rc5-00050-gdc6e0818bc9a #11458 > Hardware name: Samsung Exynos (Flattened Device Tree) > Stopper: 0x0 <- 0x0 > unwind_backtrace from show_stack+0x10/0x14 > show_stack from dump_stack_lvl+0x58/0x70 > dump_stack_lvl from put_prev_task_stop+0x16c/0x25c > put_prev_task_stop from __schedule+0x698/0x964 > __schedule from schedule+0x54/0xe0 > schedule from smpboot_thread_fn+0x218/0x288 > smpboot_thread_fn from kthread+0xf0/0x134 > kthread from ret_from_fork+0x14/0x2c > Exception stack(0xc1ccffb0 to 0xc1ccfff8) > ffa0: 00000000 00000000 00000000 > 00000000 > ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > 00000000 > ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 > smp: Bringing up secondary CPUs ... > CPU1: thread -1, cpu 1, socket 9, mpidr 80000901 > CPU1: Spectre v2: using BPIALL workaround > smp: Brought up 1 node, 2 CPUs > SMP: Total of 2 processors activated (96.00 BogoMIPS). > > The above log comes from ARM 32bit Samsung Exnyos4210 based Trats board. > >> --- >> include/linux/cgroup.h | 2 -- >> kernel/sched/cpuacct.c | 4 +--- >> 2 files changed, 1 insertion(+), 5 deletions(-) >> >> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h >> index 75c151413fda..9a109c6ac0e0 100644 >> --- a/include/linux/cgroup.h >> +++ b/include/linux/cgroup.h >> @@ -791,11 +791,9 @@ static inline void cgroup_account_cputime(struct task_struct *task, >> >> cpuacct_charge(task, delta_exec); >> >> - rcu_read_lock(); >> cgrp = task_dfl_cgroup(task); >> if (cgroup_parent(cgrp)) >> __cgroup_account_cputime(cgrp, delta_exec); >> - rcu_read_unlock(); >> } >> >> static inline void cgroup_account_cputime_field(struct task_struct *task, >> diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c >> index 307800586ac8..f79f88456d72 100644 >> --- a/kernel/sched/cpuacct.c >> +++ b/kernel/sched/cpuacct.c >> @@ -337,12 +337,10 @@ void cpuacct_charge(struct task_struct *tsk, u64 cputime) >> unsigned int cpu = task_cpu(tsk); >> struct cpuacct *ca; >> >> - rcu_read_lock(); >> + lockdep_assert_rq_held(cpu_rq(cpu)); >> >> for (ca = task_ca(tsk); ca; ca = parent_ca(ca)) >> *per_cpu_ptr(ca->cpuusage, cpu) += cputime; >> - >> - rcu_read_unlock(); >> } >> >> /* > > Best regards ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v3 3/3] sched/cpuacct: remove redundant RCU read lock 2022-02-20 5:14 [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Chengming Zhou 2022-02-20 5:14 ` [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock Chengming Zhou @ 2022-02-20 5:14 ` Chengming Zhou 2022-03-01 15:24 ` [tip: sched/core] sched/cpuacct: Remove " tip-bot2 for Chengming Zhou [not found] ` <CGME20220308233107eucas1p119a2f5a8d4f5b5eec38ea8dde92b3368@eucas1p1.samsung.com> 2022-02-22 18:01 ` [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Tejun Heo 2022-03-01 15:24 ` [tip: sched/core] sched/cpuacct: Fix " tip-bot2 for Chengming Zhou 3 siblings, 2 replies; 18+ messages in thread From: Chengming Zhou @ 2022-02-20 5:14 UTC (permalink / raw) To: mingo, peterz, vincent.guittot, bristot, zhaolei, tj, lizefan.x, hannes Cc: linux-kernel, Chengming Zhou The cpuacct_account_field() and it's cgroup v2 wrapper cgroup_account_cputime_field() is only called from cputime in task_group_account_field(), which is already in RCU read-side critical section. So remove these redundant RCU read lock. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> --- include/linux/cgroup.h | 2 -- kernel/sched/cpuacct.c | 2 -- 2 files changed, 4 deletions(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 9a109c6ac0e0..1e356c222756 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -804,11 +804,9 @@ static inline void cgroup_account_cputime_field(struct task_struct *task, cpuacct_account_field(task, index, delta_exec); - rcu_read_lock(); cgrp = task_dfl_cgroup(task); if (cgroup_parent(cgrp)) __cgroup_account_cputime_field(cgrp, index, delta_exec); - rcu_read_unlock(); } #else /* CONFIG_CGROUPS */ diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c index f79f88456d72..d269ede84e85 100644 --- a/kernel/sched/cpuacct.c +++ b/kernel/sched/cpuacct.c @@ -352,10 +352,8 @@ void cpuacct_account_field(struct task_struct *tsk, int index, u64 val) { struct cpuacct *ca; - rcu_read_lock(); for (ca = task_ca(tsk); ca != &root_cpuacct; ca = parent_ca(ca)) __this_cpu_add(ca->cpustat->cpustat[index], val); - rcu_read_unlock(); } struct cgroup_subsys cpuacct_cgrp_subsys = { -- 2.20.1 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [tip: sched/core] sched/cpuacct: Remove redundant RCU read lock 2022-02-20 5:14 ` [PATCH v3 3/3] sched/cpuacct: remove redundant " Chengming Zhou @ 2022-03-01 15:24 ` tip-bot2 for Chengming Zhou [not found] ` <CGME20220308233107eucas1p119a2f5a8d4f5b5eec38ea8dde92b3368@eucas1p1.samsung.com> 1 sibling, 0 replies; 18+ messages in thread From: tip-bot2 for Chengming Zhou @ 2022-03-01 15:24 UTC (permalink / raw) To: linux-tip-commits Cc: Tejun Heo, Chengming Zhou, Peter Zijlstra (Intel), x86, linux-kernel The following commit has been merged into the sched/core branch of tip: Commit-ID: 3eba0505d03a9c1eb30d40c2330c0880b22d1b3a Gitweb: https://git.kernel.org/tip/3eba0505d03a9c1eb30d40c2330c0880b22d1b3a Author: Chengming Zhou <zhouchengming@bytedance.com> AuthorDate: Sun, 20 Feb 2022 13:14:26 +08:00 Committer: Peter Zijlstra <peterz@infradead.org> CommitterDate: Tue, 01 Mar 2022 16:18:38 +01:00 sched/cpuacct: Remove redundant RCU read lock The cpuacct_account_field() and it's cgroup v2 wrapper cgroup_account_cputime_field() is only called from cputime in task_group_account_field(), which is already in RCU read-side critical section. So remove these redundant RCU read lock. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220220051426.5274-3-zhouchengming@bytedance.com --- include/linux/cgroup.h | 2 -- kernel/sched/cpuacct.c | 2 -- 2 files changed, 4 deletions(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 9a109c6..1e356c2 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -804,11 +804,9 @@ static inline void cgroup_account_cputime_field(struct task_struct *task, cpuacct_account_field(task, index, delta_exec); - rcu_read_lock(); cgrp = task_dfl_cgroup(task); if (cgroup_parent(cgrp)) __cgroup_account_cputime_field(cgrp, index, delta_exec); - rcu_read_unlock(); } #else /* CONFIG_CGROUPS */ diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c index f79f884..d269ede 100644 --- a/kernel/sched/cpuacct.c +++ b/kernel/sched/cpuacct.c @@ -352,10 +352,8 @@ void cpuacct_account_field(struct task_struct *tsk, int index, u64 val) { struct cpuacct *ca; - rcu_read_lock(); for (ca = task_ca(tsk); ca != &root_cpuacct; ca = parent_ca(ca)) __this_cpu_add(ca->cpustat->cpustat[index], val); - rcu_read_unlock(); } struct cgroup_subsys cpuacct_cgrp_subsys = { ^ permalink raw reply related [flat|nested] 18+ messages in thread
[parent not found: <CGME20220308233107eucas1p119a2f5a8d4f5b5eec38ea8dde92b3368@eucas1p1.samsung.com>]
* Re: [PATCH v3 3/3] sched/cpuacct: remove redundant RCU read lock [not found] ` <CGME20220308233107eucas1p119a2f5a8d4f5b5eec38ea8dde92b3368@eucas1p1.samsung.com> @ 2022-03-08 23:31 ` Marek Szyprowski 0 siblings, 0 replies; 18+ messages in thread From: Marek Szyprowski @ 2022-03-08 23:31 UTC (permalink / raw) To: Chengming Zhou, mingo, peterz, vincent.guittot, bristot, zhaolei, tj, lizefan.x, hannes Cc: linux-kernel On 20.02.2022 06:14, Chengming Zhou wrote: > The cpuacct_account_field() and it's cgroup v2 wrapper > cgroup_account_cputime_field() is only called from cputime > in task_group_account_field(), which is already in RCU read-side > critical section. So remove these redundant RCU read lock. > > Suggested-by: Tejun Heo <tj@kernel.org> > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> This patch landed recently in linux-next as commit 3eba0505d03a ("sched/cpuacct: Remove redundant RCU read lock"). On my test systems I found that it triggers a following warning in the early boot stage: Calibrating delay loop (skipped), value calculated using timer frequency.. 48.00 BogoMIPS (lpj=240000) pid_max: default: 32768 minimum: 301 Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear) Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear) CPU: Testing write buffer coherency: ok CPU0: Spectre v2: using BPIALL workaround CPU0: thread -1, cpu 0, socket 9, mpidr 80000900 cblist_init_generic: Setting adjustable number of callback queues. ============================= WARNING: suspicious RCU usage 5.17.0-rc5-00052-g167ee9b08bed #11459 Not tainted ----------------------------- ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 1 no locks held by swapper/0/1. stack backtrace: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.0-rc5-00052-g167ee9b08bed #11459 Hardware name: Samsung Exynos (Flattened Device Tree) unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x58/0x70 dump_stack_lvl from account_system_index_time+0x13c/0x148 account_system_index_time from account_system_time+0x78/0xa0 account_system_time from update_process_times+0x50/0xc4 update_process_times from tick_handle_periodic+0x1c/0x94 tick_handle_periodic from exynos4_mct_tick_isr+0x30/0x38 exynos4_mct_tick_isr from __handle_irq_event_percpu+0x74/0x3ac __handle_irq_event_percpu from handle_irq_event_percpu+0xc/0x38 handle_irq_event_percpu from handle_irq_event+0x38/0x5c handle_irq_event from handle_fasteoi_irq+0xc4/0x180 handle_fasteoi_irq from generic_handle_domain_irq+0x44/0x88 generic_handle_domain_irq from gic_handle_irq+0x88/0xac gic_handle_irq from generic_handle_arch_irq+0x58/0x78 generic_handle_arch_irq from __irq_svc+0x54/0x88 Exception stack(0xc1cafea0 to 0xc1cafee8) fea0: 00000000 c0f33ac4 00000001 00000129 60000053 c127b458 00000008 2e64e000 fec0: ef7bd5fc c127af84 c1209048 c1208f10 00000000 c1cafef0 c0b7e25c c0b8976c fee0: 20000053 ffffffff __irq_svc from _raw_spin_unlock_irqrestore+0x2c/0x60 _raw_spin_unlock_irqrestore from cblist_init_generic.constprop.14+0x298/0x328 cblist_init_generic.constprop.14 from rcu_init_tasks_generic+0x18/0x110 rcu_init_tasks_generic from kernel_init_freeable+0xa0/0x214 kernel_init_freeable from kernel_init+0x18/0x12c kernel_init from ret_from_fork+0x14/0x2c Exception stack(0xc1caffb0 to 0xc1cafff8) ffa0: 00000000 00000000 00000000 00000000 ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 cblist_init_generic: Setting shift to 1 and lim to 1. Running RCU-tasks wait API self tests Setting up static identity map for 0x40100000 - 0x40100060 rcu: Hierarchical SRCU implementation. smp: Bringing up secondary CPUs ... CPU1: thread -1, cpu 1, socket 9, mpidr 80000901 CPU1: Spectre v2: using BPIALL workaround smp: Brought up 1 node, 2 CPUs SMP: Total of 2 processors activated (96.00 BogoMIPS). CPU: All CPU(s) started in SVC mode. The above log comes from ARM 32bit Samsung Exnyos4210 based Trats board. To get above log I reverted commit dc6e0818bc9a ("sched/cpuacct: Optimize away RCU read lock"), because it triggered a similar warning. > --- > include/linux/cgroup.h | 2 -- > kernel/sched/cpuacct.c | 2 -- > 2 files changed, 4 deletions(-) > > diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h > index 9a109c6ac0e0..1e356c222756 100644 > --- a/include/linux/cgroup.h > +++ b/include/linux/cgroup.h > @@ -804,11 +804,9 @@ static inline void cgroup_account_cputime_field(struct task_struct *task, > > cpuacct_account_field(task, index, delta_exec); > > - rcu_read_lock(); > cgrp = task_dfl_cgroup(task); > if (cgroup_parent(cgrp)) > __cgroup_account_cputime_field(cgrp, index, delta_exec); > - rcu_read_unlock(); > } > > #else /* CONFIG_CGROUPS */ > diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c > index f79f88456d72..d269ede84e85 100644 > --- a/kernel/sched/cpuacct.c > +++ b/kernel/sched/cpuacct.c > @@ -352,10 +352,8 @@ void cpuacct_account_field(struct task_struct *tsk, int index, u64 val) > { > struct cpuacct *ca; > > - rcu_read_lock(); > for (ca = task_ca(tsk); ca != &root_cpuacct; ca = parent_ca(ca)) > __this_cpu_add(ca->cpustat->cpustat[index], val); > - rcu_read_unlock(); > } > > struct cgroup_subsys cpuacct_cgrp_subsys = { Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage 2022-02-20 5:14 [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Chengming Zhou 2022-02-20 5:14 ` [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock Chengming Zhou 2022-02-20 5:14 ` [PATCH v3 3/3] sched/cpuacct: remove redundant " Chengming Zhou @ 2022-02-22 18:01 ` Tejun Heo 2022-02-23 9:11 ` Peter Zijlstra 2022-03-01 15:24 ` [tip: sched/core] sched/cpuacct: Fix " tip-bot2 for Chengming Zhou 3 siblings, 1 reply; 18+ messages in thread From: Tejun Heo @ 2022-02-22 18:01 UTC (permalink / raw) To: Chengming Zhou Cc: mingo, peterz, vincent.guittot, bristot, zhaolei, lizefan.x, hannes, linux-kernel, Minye Zhu On Sun, Feb 20, 2022 at 01:14:24PM +0800, Chengming Zhou wrote: > The cpuacct_account_field() is always called by the current task > itself, so it's ok to use __this_cpu_add() to charge the tick time. > > But cpuacct_charge() maybe called by update_curr() in load_balance() > on a random CPU, different from the CPU on which the task is running. > So __this_cpu_add() will charge that cputime to a random incorrect CPU. > > Fixes: 73e6aafd9ea8 ("sched/cpuacct: Simplify the cpuacct code") > Reported-by: Minye Zhu <zhuminye@bytedance.com> > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> For all three patches, Acked-by: Tejun Heo <tj@kernel.org> Thanks. -- tejun ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage 2022-02-22 18:01 ` [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Tejun Heo @ 2022-02-23 9:11 ` Peter Zijlstra 0 siblings, 0 replies; 18+ messages in thread From: Peter Zijlstra @ 2022-02-23 9:11 UTC (permalink / raw) To: Tejun Heo Cc: Chengming Zhou, mingo, vincent.guittot, bristot, zhaolei, lizefan.x, hannes, linux-kernel, Minye Zhu On Tue, Feb 22, 2022 at 08:01:51AM -1000, Tejun Heo wrote: > On Sun, Feb 20, 2022 at 01:14:24PM +0800, Chengming Zhou wrote: > > The cpuacct_account_field() is always called by the current task > > itself, so it's ok to use __this_cpu_add() to charge the tick time. > > > > But cpuacct_charge() maybe called by update_curr() in load_balance() > > on a random CPU, different from the CPU on which the task is running. > > So __this_cpu_add() will charge that cputime to a random incorrect CPU. > > > > Fixes: 73e6aafd9ea8 ("sched/cpuacct: Simplify the cpuacct code") > > Reported-by: Minye Zhu <zhuminye@bytedance.com> > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> > > For all three patches, > > Acked-by: Tejun Heo <tj@kernel.org> Thanks! ^ permalink raw reply [flat|nested] 18+ messages in thread
* [tip: sched/core] sched/cpuacct: Fix charge percpu cpuusage 2022-02-20 5:14 [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Chengming Zhou ` (2 preceding siblings ...) 2022-02-22 18:01 ` [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Tejun Heo @ 2022-03-01 15:24 ` tip-bot2 for Chengming Zhou 3 siblings, 0 replies; 18+ messages in thread From: tip-bot2 for Chengming Zhou @ 2022-03-01 15:24 UTC (permalink / raw) To: linux-tip-commits Cc: Minye Zhu, Chengming Zhou, Peter Zijlstra (Intel), Tejun Heo, x86, linux-kernel The following commit has been merged into the sched/core branch of tip: Commit-ID: 248cc9993d1cc12b8e9ed716cc3fc09f6c3517dd Gitweb: https://git.kernel.org/tip/248cc9993d1cc12b8e9ed716cc3fc09f6c3517dd Author: Chengming Zhou <zhouchengming@bytedance.com> AuthorDate: Sun, 20 Feb 2022 13:14:24 +08:00 Committer: Peter Zijlstra <peterz@infradead.org> CommitterDate: Tue, 01 Mar 2022 16:18:37 +01:00 sched/cpuacct: Fix charge percpu cpuusage The cpuacct_account_field() is always called by the current task itself, so it's ok to use __this_cpu_add() to charge the tick time. But cpuacct_charge() maybe called by update_curr() in load_balance() on a random CPU, different from the CPU on which the task is running. So __this_cpu_add() will charge that cputime to a random incorrect CPU. Fixes: 73e6aafd9ea8 ("sched/cpuacct: Simplify the cpuacct code") Reported-by: Minye Zhu <zhuminye@bytedance.com> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220220051426.5274-1-zhouchengming@bytedance.com --- kernel/sched/cpuacct.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c index 3d06c5e..3078005 100644 --- a/kernel/sched/cpuacct.c +++ b/kernel/sched/cpuacct.c @@ -334,12 +334,13 @@ static struct cftype files[] = { */ void cpuacct_charge(struct task_struct *tsk, u64 cputime) { + unsigned int cpu = task_cpu(tsk); struct cpuacct *ca; rcu_read_lock(); for (ca = task_ca(tsk); ca; ca = parent_ca(ca)) - __this_cpu_add(*ca->cpuusage, cputime); + *per_cpu_ptr(ca->cpuusage, cpu) += cputime; rcu_read_unlock(); } ^ permalink raw reply related [flat|nested] 18+ messages in thread
end of thread, other threads:[~2022-03-12 17:45 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-02-20 5:14 [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Chengming Zhou 2022-02-20 5:14 ` [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock Chengming Zhou 2022-03-01 15:24 ` [tip: sched/core] sched/cpuacct: Optimize " tip-bot2 for Chengming Zhou [not found] ` <CGME20220308232034eucas1p2b0f39cee0f462af6004ebdfbe5bacb9f@eucas1p2.samsung.com> 2022-03-08 23:20 ` [PATCH v3 2/3] sched/cpuacct: optimize " Marek Szyprowski 2022-03-08 23:32 ` Peter Zijlstra 2022-03-08 23:44 ` Paul E. McKenney 2022-03-09 0:21 ` Paul E. McKenney 2022-03-10 8:45 ` Peter Zijlstra 2022-03-10 15:01 ` Paul E. McKenney 2022-03-12 12:15 ` Peter Zijlstra 2022-03-12 17:44 ` Paul E. McKenney 2022-03-09 3:08 ` [External] " Chengming Zhou 2022-02-20 5:14 ` [PATCH v3 3/3] sched/cpuacct: remove redundant " Chengming Zhou 2022-03-01 15:24 ` [tip: sched/core] sched/cpuacct: Remove " tip-bot2 for Chengming Zhou [not found] ` <CGME20220308233107eucas1p119a2f5a8d4f5b5eec38ea8dde92b3368@eucas1p1.samsung.com> 2022-03-08 23:31 ` [PATCH v3 3/3] sched/cpuacct: remove " Marek Szyprowski 2022-02-22 18:01 ` [PATCH v3 1/3] sched/cpuacct: fix charge percpu cpuusage Tejun Heo 2022-02-23 9:11 ` Peter Zijlstra 2022-03-01 15:24 ` [tip: sched/core] sched/cpuacct: Fix " tip-bot2 for Chengming Zhou
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.