* [PATCH] sched/isolation: Make NO_HZ_FULL select CPU_ISOLATION @ 2017-11-30 20:20 Paul E. McKenney 2017-12-02 13:59 ` Frederic Weisbecker 0 siblings, 1 reply; 9+ messages in thread From: Paul E. McKenney @ 2017-11-30 20:20 UTC (permalink / raw) To: linux-kernel Cc: xiaolong.ye, frederic, tglx, cmetcalf, cl, torvalds, lcapitulino, efault, peterz, riel, kernellwp, mingo, john.stultz Commit 5c4991e24c69 ("sched/isolation: Split out new CONFIG_CPU_ISOLATION=y config from CONFIG_NO_HZ_FULL") can result in RCU CPU stall warnings when running rcutorture with CONFIG_NO_HZ_FULL_ALL=y and CONFIG_CPU_ISOLATION=n. These warnings are caused by RCU's grace-period kthreads sleeping for a few jiffies, but never being awakened: [ 116.353432] rcu_preempt kthread starved for 9974 jiffies! g4294967208 +c4294967207 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=0 [ 116.355517] rcu_preempt I 7464 8 2 0x80000000 [ 116.356543] Call Trace: [ 116.357008] __schedule+0x493/0x620 [ 116.357682] schedule+0x24/0x40 [ 116.358291] schedule_timeout+0x330/0x3b0 [ 116.359024] ? preempt_count_sub+0xea/0x140 [ 116.359806] ? collect_expired_timers+0xb0/0xb0 [ 116.360660] rcu_gp_kthread+0x6bf/0xef0 This commit therefore makes NO_HZ_FULL select CPU_ISOLATION, which prevents this behavior and seems like it was the original intention in any case. Fixes: 5c4991e24c69 ("sched/isolation: Split out new CONFIG_CPU_ISOLATION=y config from CONFIG_NO_HZ_FULL") Link: https://lkml.kernel.org/r/20171126084203.GE21779@yexl-desktop Reported-by: kernel test robot <xiaolong.ye@intel.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Christoph Lameter <cl@linux.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Wanpeng Li <kernellwp@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: John Stultz <john.stultz@linaro.org> diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig index e776fc8cc1df..f6b5f19223d6 100644 --- a/kernel/time/Kconfig +++ b/kernel/time/Kconfig @@ -95,6 +95,7 @@ config NO_HZ_FULL select RCU_NOCB_CPU select VIRT_CPU_ACCOUNTING_GEN select IRQ_WORK + select CPU_ISOLATION help Adaptively try to shutdown the tick whenever possible, even when the CPU is running tasks. Typically this requires running a single ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] sched/isolation: Make NO_HZ_FULL select CPU_ISOLATION 2017-11-30 20:20 [PATCH] sched/isolation: Make NO_HZ_FULL select CPU_ISOLATION Paul E. McKenney @ 2017-12-02 13:59 ` Frederic Weisbecker 2017-12-02 19:24 ` Paul E. McKenney 0 siblings, 1 reply; 9+ messages in thread From: Frederic Weisbecker @ 2017-12-02 13:59 UTC (permalink / raw) To: paulmck Cc: linux-kernel, xiaolong.ye, tglx, cmetcalf, cl, torvalds, lcapitulino, efault, peterz, riel, kernellwp, mingo, john.stultz 2017-11-30 21:20 UTC+01:00, Paul E. McKenney <paulmck@linux.vnet.ibm.com>: > Commit 5c4991e24c69 ("sched/isolation: Split out new > CONFIG_CPU_ISOLATION=y config from CONFIG_NO_HZ_FULL") can result in RCU > CPU stall warnings when running rcutorture with CONFIG_NO_HZ_FULL_ALL=y > and CONFIG_CPU_ISOLATION=n. These warnings are caused by RCU's > grace-period kthreads sleeping for a few jiffies, but never being > awakened: > > [ 116.353432] rcu_preempt kthread starved for 9974 jiffies! g4294967208 > +c4294967207 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=0 > [ 116.355517] rcu_preempt I 7464 8 2 0x80000000 > [ 116.356543] Call Trace: > [ 116.357008] __schedule+0x493/0x620 > [ 116.357682] schedule+0x24/0x40 > [ 116.358291] schedule_timeout+0x330/0x3b0 > [ 116.359024] ? preempt_count_sub+0xea/0x140 > [ 116.359806] ? collect_expired_timers+0xb0/0xb0 > [ 116.360660] rcu_gp_kthread+0x6bf/0xef0 > > This commit therefore makes NO_HZ_FULL select CPU_ISOLATION, which > prevents this behavior and seems like it was the original intention in > any case. Although CONFIG_NO_HZ should indeed select CONFIG_CPU_ISOLATION, I'm surprised about this stall. I'm even more surprised that setting CONFIG_CPU_ISOLATION=y is enough to fix the issue because CONFIG_NO_HZ_FULL_ALL shortcuts CONFIG_CPU_ISOLATION entirely (which is not good, but work in progress...). Did you have any nohz_full= or isolcpus= boot options? Thanks. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] sched/isolation: Make NO_HZ_FULL select CPU_ISOLATION 2017-12-02 13:59 ` Frederic Weisbecker @ 2017-12-02 19:24 ` Paul E. McKenney 2017-12-04 15:53 ` Frederic Weisbecker 0 siblings, 1 reply; 9+ messages in thread From: Paul E. McKenney @ 2017-12-02 19:24 UTC (permalink / raw) To: Frederic Weisbecker Cc: linux-kernel, xiaolong.ye, tglx, cmetcalf, cl, torvalds, lcapitulino, efault, peterz, riel, kernellwp, mingo, john.stultz On Sat, Dec 02, 2017 at 02:59:12PM +0100, Frederic Weisbecker wrote: > 2017-11-30 21:20 UTC+01:00, Paul E. McKenney <paulmck@linux.vnet.ibm.com>: > > Commit 5c4991e24c69 ("sched/isolation: Split out new > > CONFIG_CPU_ISOLATION=y config from CONFIG_NO_HZ_FULL") can result in RCU > > CPU stall warnings when running rcutorture with CONFIG_NO_HZ_FULL_ALL=y > > and CONFIG_CPU_ISOLATION=n. These warnings are caused by RCU's > > grace-period kthreads sleeping for a few jiffies, but never being > > awakened: > > > > [ 116.353432] rcu_preempt kthread starved for 9974 jiffies! g4294967208 > > +c4294967207 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=0 > > [ 116.355517] rcu_preempt I 7464 8 2 0x80000000 > > [ 116.356543] Call Trace: > > [ 116.357008] __schedule+0x493/0x620 > > [ 116.357682] schedule+0x24/0x40 > > [ 116.358291] schedule_timeout+0x330/0x3b0 > > [ 116.359024] ? preempt_count_sub+0xea/0x140 > > [ 116.359806] ? collect_expired_timers+0xb0/0xb0 > > [ 116.360660] rcu_gp_kthread+0x6bf/0xef0 > > > > This commit therefore makes NO_HZ_FULL select CPU_ISOLATION, which > > prevents this behavior and seems like it was the original intention in > > any case. > > Although CONFIG_NO_HZ should indeed select CONFIG_CPU_ISOLATION, I'm > surprised about this stall. I'm even more surprised that setting > CONFIG_CPU_ISOLATION=y is enough to fix the issue because > CONFIG_NO_HZ_FULL_ALL shortcuts CONFIG_CPU_ISOLATION entirely (which > is not good, but work in progress...). Yes, and after applying this patch, I get failures a few commits later, which appears to be due to other changes that break CONFIG_NO_HZ_FULL_ALL=y. So I have another patch staged that removes CONFIG_NO_HZ_FULL_ALL, on the grounds that no one else has complained, so rcutorture is likely to be the only user, and I don't see the point of having a Kconfig option for only one user. > Did you have any nohz_full= or isolcpus= boot options? Replacing CONFIG_NO_HZ_FULL_ALL=y with nohz_full=1-7 works, that is CONFIG_NO_HZ_FULL=y, CONFIG_NO_HZ_FULL_ALL=n, and nohz_full=1-7 on an eight-CPU test. But it is relatively easy to test. Running the rcutorture TREE04 scenario on a four-socket x86 gets me RCU CPU stall warnings within a few minutes more than half the time. ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] sched/isolation: Make NO_HZ_FULL select CPU_ISOLATION 2017-12-02 19:24 ` Paul E. McKenney @ 2017-12-04 15:53 ` Frederic Weisbecker 2017-12-04 17:16 ` Paul E. McKenney 0 siblings, 1 reply; 9+ messages in thread From: Frederic Weisbecker @ 2017-12-04 15:53 UTC (permalink / raw) To: paulmck Cc: linux-kernel, xiaolong.ye, tglx, cmetcalf, cl, torvalds, lcapitulino, efault, peterz, riel, kernellwp, mingo, john.stultz 2017-12-02 20:24 UTC+01:00, Paul E. McKenney <paulmck@linux.vnet.ibm.com>: > On Sat, Dec 02, 2017 at 02:59:12PM +0100, Frederic Weisbecker wrote: >> 2017-11-30 21:20 UTC+01:00, Paul E. McKenney >> <paulmck@linux.vnet.ibm.com>: >> > Commit 5c4991e24c69 ("sched/isolation: Split out new >> > CONFIG_CPU_ISOLATION=y config from CONFIG_NO_HZ_FULL") can result in >> > RCU >> > CPU stall warnings when running rcutorture with CONFIG_NO_HZ_FULL_ALL=y >> > and CONFIG_CPU_ISOLATION=n. These warnings are caused by RCU's >> > grace-period kthreads sleeping for a few jiffies, but never being >> > awakened: >> > >> > [ 116.353432] rcu_preempt kthread starved for 9974 jiffies! >> > g4294967208 >> > +c4294967207 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=0 >> > [ 116.355517] rcu_preempt I 7464 8 2 0x80000000 >> > [ 116.356543] Call Trace: >> > [ 116.357008] __schedule+0x493/0x620 >> > [ 116.357682] schedule+0x24/0x40 >> > [ 116.358291] schedule_timeout+0x330/0x3b0 >> > [ 116.359024] ? preempt_count_sub+0xea/0x140 >> > [ 116.359806] ? collect_expired_timers+0xb0/0xb0 >> > [ 116.360660] rcu_gp_kthread+0x6bf/0xef0 >> > >> > This commit therefore makes NO_HZ_FULL select CPU_ISOLATION, which >> > prevents this behavior and seems like it was the original intention in >> > any case. >> >> Although CONFIG_NO_HZ should indeed select CONFIG_CPU_ISOLATION, I'm >> surprised about this stall. I'm even more surprised that setting >> CONFIG_CPU_ISOLATION=y is enough to fix the issue because >> CONFIG_NO_HZ_FULL_ALL shortcuts CONFIG_CPU_ISOLATION entirely (which >> is not good, but work in progress...). > > Yes, and after applying this patch, I get failures a few commits > later, which appears to be due to other changes that break > CONFIG_NO_HZ_FULL_ALL=y. So I have another patch staged that removes > CONFIG_NO_HZ_FULL_ALL, on the grounds that no one else has complained, > so rcutorture is likely to be the only user, and I don't see the point > of having a Kconfig option for only one user. I would prefer to keep it. It's useful for automated boot testing based on configs such as 0-day or -tip test machines. But I'm likely to migrate it to isolcpus implementation. Maybe something along the lines of CONFIG_CPU_ISOLATION_ALL. > >> Did you have any nohz_full= or isolcpus= boot options? > > Replacing CONFIG_NO_HZ_FULL_ALL=y with nohz_full=1-7 works, that > is CONFIG_NO_HZ_FULL=y, CONFIG_NO_HZ_FULL_ALL=n, and nohz_full=1-7 > on an eight-CPU test. > > But it is relatively easy to test. Running the rcutorture TREE04 > scenario on a four-socket x86 gets me RCU CPU stall warnings within > a few minutes more than half the time. ;-) Indeed I managed to trigger something. If it's the same thing I should be able to track down the root cause. [ 123.907557] ??? Writer stall state RTWS_STUTTER(8) g160 c160 f0x0 ->state 0x1 cpu 2 [ 123.915184] rcu_torture_wri S 0 111 2 0x80080000 [ 123.920673] Call Trace: [ 123.923096] ? __schedule+0x2bf/0xbb0 [ 123.926715] ? _raw_spin_unlock_irqrestore+0x59/0x70 [ 123.931657] schedule+0x3c/0x90 [ 123.934777] schedule_timeout+0x1e1/0x560 [ 123.938785] ? __next_timer_interrupt+0xd0/0xd0 [ 123.943276] stutter_wait+0xc5/0xe0 [ 123.946738] rcu_torture_writer+0x1ae/0x730 [ 123.950912] ? rcu_torture_pipe_update+0xf0/0xf0 [ 123.955491] kthread+0x15f/0x1a0 [ 123.958702] ? kthread_unpark+0x60/0x60 [ 123.962523] ret_from_fork+0x24/0x30 [ 123.966091] rcu_preempt: wait state: 1 ->state: 0x402 [ 123.971112] rcu_sched: wait state: 1 ->state: 0x402 [ 123.975953] rcu_bh: wait state: 1 ->state: 0x402 Thanks! ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] sched/isolation: Make NO_HZ_FULL select CPU_ISOLATION 2017-12-04 15:53 ` Frederic Weisbecker @ 2017-12-04 17:16 ` Paul E. McKenney 2017-12-07 16:14 ` Frederic Weisbecker 0 siblings, 1 reply; 9+ messages in thread From: Paul E. McKenney @ 2017-12-04 17:16 UTC (permalink / raw) To: Frederic Weisbecker Cc: linux-kernel, xiaolong.ye, tglx, cmetcalf, cl, torvalds, lcapitulino, efault, peterz, riel, kernellwp, mingo, john.stultz On Mon, Dec 04, 2017 at 04:53:15PM +0100, Frederic Weisbecker wrote: > 2017-12-02 20:24 UTC+01:00, Paul E. McKenney <paulmck@linux.vnet.ibm.com>: > > On Sat, Dec 02, 2017 at 02:59:12PM +0100, Frederic Weisbecker wrote: > >> 2017-11-30 21:20 UTC+01:00, Paul E. McKenney > >> <paulmck@linux.vnet.ibm.com>: > >> > Commit 5c4991e24c69 ("sched/isolation: Split out new > >> > CONFIG_CPU_ISOLATION=y config from CONFIG_NO_HZ_FULL") can result in > >> > RCU > >> > CPU stall warnings when running rcutorture with CONFIG_NO_HZ_FULL_ALL=y > >> > and CONFIG_CPU_ISOLATION=n. These warnings are caused by RCU's > >> > grace-period kthreads sleeping for a few jiffies, but never being > >> > awakened: > >> > > >> > [ 116.353432] rcu_preempt kthread starved for 9974 jiffies! > >> > g4294967208 > >> > +c4294967207 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=0 > >> > [ 116.355517] rcu_preempt I 7464 8 2 0x80000000 > >> > [ 116.356543] Call Trace: > >> > [ 116.357008] __schedule+0x493/0x620 > >> > [ 116.357682] schedule+0x24/0x40 > >> > [ 116.358291] schedule_timeout+0x330/0x3b0 > >> > [ 116.359024] ? preempt_count_sub+0xea/0x140 > >> > [ 116.359806] ? collect_expired_timers+0xb0/0xb0 > >> > [ 116.360660] rcu_gp_kthread+0x6bf/0xef0 > >> > > >> > This commit therefore makes NO_HZ_FULL select CPU_ISOLATION, which > >> > prevents this behavior and seems like it was the original intention in > >> > any case. > >> > >> Although CONFIG_NO_HZ should indeed select CONFIG_CPU_ISOLATION, I'm > >> surprised about this stall. I'm even more surprised that setting > >> CONFIG_CPU_ISOLATION=y is enough to fix the issue because > >> CONFIG_NO_HZ_FULL_ALL shortcuts CONFIG_CPU_ISOLATION entirely (which > >> is not good, but work in progress...). > > > > Yes, and after applying this patch, I get failures a few commits > > later, which appears to be due to other changes that break > > CONFIG_NO_HZ_FULL_ALL=y. So I have another patch staged that removes > > CONFIG_NO_HZ_FULL_ALL, on the grounds that no one else has complained, > > so rcutorture is likely to be the only user, and I don't see the point > > of having a Kconfig option for only one user. > > I would prefer to keep it. It's useful for automated boot testing > based on configs such as 0-day or -tip test machines. But I'm likely > to migrate it to isolcpus implementation. Maybe something along the > lines of CONFIG_CPU_ISOLATION_ALL. How about instead allowing something like "nohz_full=1-" specify that all CPUs other than CPU 0 should be nohz_full CPUs? That would shrink the code by eliminating CONFIG_NO_HZ_FULL_ALL while still allowing easy automation of that particular scenario. (Right now, the boot code complains about "nohz_full=1-", which means that whatever is generating the boot parameters needs to know how many CPUs there really are, which as you say can be a pain.) > >> Did you have any nohz_full= or isolcpus= boot options? > > > > Replacing CONFIG_NO_HZ_FULL_ALL=y with nohz_full=1-7 works, that > > is CONFIG_NO_HZ_FULL=y, CONFIG_NO_HZ_FULL_ALL=n, and nohz_full=1-7 > > on an eight-CPU test. > > > > But it is relatively easy to test. Running the rcutorture TREE04 > > scenario on a four-socket x86 gets me RCU CPU stall warnings within > > a few minutes more than half the time. ;-) > > Indeed I managed to trigger something. If it's the same thing I should > be able to track down the root cause. > > [ 123.907557] ??? Writer stall state RTWS_STUTTER(8) g160 c160 f0x0 > ->state 0x1 cpu 2 > [ 123.915184] rcu_torture_wri S 0 111 2 0x80080000 > [ 123.920673] Call Trace: > [ 123.923096] ? __schedule+0x2bf/0xbb0 > [ 123.926715] ? _raw_spin_unlock_irqrestore+0x59/0x70 > [ 123.931657] schedule+0x3c/0x90 > [ 123.934777] schedule_timeout+0x1e1/0x560 It might well be the same thing, as this schedule_timeout() does look familiar. I have some diagnostic patches in -rcu, please see below for the overall effect. Thanx, Paul > [ 123.938785] ? __next_timer_interrupt+0xd0/0xd0 > [ 123.943276] stutter_wait+0xc5/0xe0 > [ 123.946738] rcu_torture_writer+0x1ae/0x730 > [ 123.950912] ? rcu_torture_pipe_update+0xf0/0xf0 > [ 123.955491] kthread+0x15f/0x1a0 > [ 123.958702] ? kthread_unpark+0x60/0x60 > [ 123.962523] ret_from_fork+0x24/0x30 > [ 123.966091] rcu_preempt: wait state: 1 ->state: 0x402 > [ 123.971112] rcu_sched: wait state: 1 ->state: 0x402 > [ 123.975953] rcu_bh: wait state: 1 ->state: 0x402 ------------------------------------------------------------------------ diff --git a/kernel/time/timer.c b/kernel/time/timer.c index ffebcf878fba..23af27461d8c 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -1755,8 +1755,13 @@ static void process_timeout(struct timer_list *t) */ signed long __sched schedule_timeout(signed long timeout) { + struct timer_base *base; struct process_timer timer; unsigned long expire; + unsigned long flags; + unsigned long i; + unsigned int idx, idx_now; + unsigned long j; switch (timeout) { @@ -1793,6 +1798,17 @@ signed long __sched schedule_timeout(signed long timeout) timer_setup_on_stack(&timer.timer, process_timeout, 0); __mod_timer(&timer.timer, expire, 0); schedule(); + j = jiffies; + if (timeout < 5 && time_after(j, expire + 8 * HZ) && timer_pending(&timer.timer)) { + base = lock_timer_base(&timer.timer, &flags); + idx = timer_get_idx(&timer.timer); + idx_now = calc_wheel_index(j, base->clk); + raw_spin_unlock_irqrestore(&base->lock, flags); + pr_info("%s: Waylayed timer base->clk: %#lx jiffies: %#lx base->next_expiry: %#lx timer->flags: %#x timer->expires %#lx idx: %x idx_now: %x base->pending_map ", __func__, base->clk, j, base->next_expiry, timer.timer.flags, timer.timer.expires, idx, idx_now); + for (i = 0; i < WHEEL_SIZE / sizeof(base->pending_map[0]) / 8; i++) + pr_cont("%016lx", base->pending_map[i]); + pr_cont("\n"); + } del_singleshot_timer_sync(&timer.timer); /* Remove the timer from the object tracker */ ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] sched/isolation: Make NO_HZ_FULL select CPU_ISOLATION 2017-12-04 17:16 ` Paul E. McKenney @ 2017-12-07 16:14 ` Frederic Weisbecker 2017-12-07 17:29 ` Paul E. McKenney 0 siblings, 1 reply; 9+ messages in thread From: Frederic Weisbecker @ 2017-12-07 16:14 UTC (permalink / raw) To: paulmck Cc: linux-kernel, xiaolong.ye, tglx, cmetcalf, cl, torvalds, lcapitulino, efault, peterz, riel, kernellwp, mingo, john.stultz 2017-12-04 18:16 UTC+01:00, Paul E. McKenney <paulmck@linux.vnet.ibm.com>: > On Mon, Dec 04, 2017 at 04:53:15PM +0100, Frederic Weisbecker wrote: >> 2017-12-02 20:24 UTC+01:00, Paul E. McKenney >> I would prefer to keep it. It's useful for automated boot testing >> based on configs such as 0-day or -tip test machines. But I'm likely >> to migrate it to isolcpus implementation. Maybe something along the >> lines of CONFIG_CPU_ISOLATION_ALL. > > How about instead allowing something like "nohz_full=1-" specify that > all CPUs other than CPU 0 should be nohz_full CPUs? That would shrink > the code by eliminating CONFIG_NO_HZ_FULL_ALL while still allowing > easy automation of that particular scenario. > > (Right now, the boot code complains about "nohz_full=1-", which means > that whatever is generating the boot parameters needs to know how many > CPUs there really are, which as you say can be a pain.) Yes but automated boot testing is rather based on configs than boot options. It's much easier. I think that's how Wu Fengguang lab works, and -tip automated tests as well. > >> >> Did you have any nohz_full= or isolcpus= boot options? >> > >> > Replacing CONFIG_NO_HZ_FULL_ALL=y with nohz_full=1-7 works, that >> > is CONFIG_NO_HZ_FULL=y, CONFIG_NO_HZ_FULL_ALL=n, and nohz_full=1-7 >> > on an eight-CPU test. >> > >> > But it is relatively easy to test. Running the rcutorture TREE04 >> > scenario on a four-socket x86 gets me RCU CPU stall warnings within >> > a few minutes more than half the time. ;-) >> >> Indeed I managed to trigger something. If it's the same thing I should >> be able to track down the root cause. >> >> [ 123.907557] ??? Writer stall state RTWS_STUTTER(8) g160 c160 f0x0 >> ->state 0x1 cpu 2 >> [ 123.915184] rcu_torture_wri S 0 111 2 0x80080000 >> [ 123.920673] Call Trace: >> [ 123.923096] ? __schedule+0x2bf/0xbb0 >> [ 123.926715] ? _raw_spin_unlock_irqrestore+0x59/0x70 >> [ 123.931657] schedule+0x3c/0x90 >> [ 123.934777] schedule_timeout+0x1e1/0x560 > > It might well be the same thing, as this schedule_timeout() does look > familiar. I have some diagnostic patches in -rcu, please see below > for the overall effect. I fear I can hit that even without any nohz_full CPU as well. > > Thanx, Paul > >> [ 123.938785] ? __next_timer_interrupt+0xd0/0xd0 >> [ 123.943276] stutter_wait+0xc5/0xe0 >> [ 123.946738] rcu_torture_writer+0x1ae/0x730 >> [ 123.950912] ? rcu_torture_pipe_update+0xf0/0xf0 >> [ 123.955491] kthread+0x15f/0x1a0 >> [ 123.958702] ? kthread_unpark+0x60/0x60 >> [ 123.962523] ret_from_fork+0x24/0x30 >> [ 123.966091] rcu_preempt: wait state: 1 ->state: 0x402 >> [ 123.971112] rcu_sched: wait state: 1 ->state: 0x402 >> [ 123.975953] rcu_bh: wait state: 1 ->state: 0x402 > > ------------------------------------------------------------------------ > > diff --git a/kernel/time/timer.c b/kernel/time/timer.c > index ffebcf878fba..23af27461d8c 100644 > --- a/kernel/time/timer.c > +++ b/kernel/time/timer.c > @@ -1755,8 +1755,13 @@ static void process_timeout(struct timer_list *t) > */ > signed long __sched schedule_timeout(signed long timeout) > { > + struct timer_base *base; > struct process_timer timer; > unsigned long expire; > + unsigned long flags; > + unsigned long i; > + unsigned int idx, idx_now; > + unsigned long j; > > switch (timeout) > { > @@ -1793,6 +1798,17 @@ signed long __sched schedule_timeout(signed long > timeout) > timer_setup_on_stack(&timer.timer, process_timeout, 0); > __mod_timer(&timer.timer, expire, 0); > schedule(); > + j = jiffies; > + if (timeout < 5 && time_after(j, expire + 8 * HZ) && > timer_pending(&timer.timer)) { > + base = lock_timer_base(&timer.timer, &flags); > + idx = timer_get_idx(&timer.timer); > + idx_now = calc_wheel_index(j, base->clk); > + raw_spin_unlock_irqrestore(&base->lock, flags); > + pr_info("%s: Waylayed timer base->clk: %#lx jiffies: %#lx > base->next_expiry: %#lx timer->flags: %#x timer->expires %#lx idx: %x > idx_now: %x base->pending_map ", __func__, base->clk, j, base->next_expiry, > timer.timer.flags, timer.timer.expires, idx, idx_now); > + for (i = 0; i < WHEEL_SIZE / sizeof(base->pending_map[0]) / 8; i++) > + pr_cont("%016lx", base->pending_map[i]); > + pr_cont("\n"); > + } > del_singleshot_timer_sync(&timer.timer); > > /* Remove the timer from the object tracker */ > > Hmm, that message doesn't seem to trigger :-s Thanks. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] sched/isolation: Make NO_HZ_FULL select CPU_ISOLATION 2017-12-07 16:14 ` Frederic Weisbecker @ 2017-12-07 17:29 ` Paul E. McKenney 2017-12-09 13:09 ` Frederic Weisbecker 0 siblings, 1 reply; 9+ messages in thread From: Paul E. McKenney @ 2017-12-07 17:29 UTC (permalink / raw) To: Frederic Weisbecker Cc: linux-kernel, xiaolong.ye, tglx, cmetcalf, cl, torvalds, lcapitulino, efault, peterz, riel, kernellwp, mingo, john.stultz On Thu, Dec 07, 2017 at 05:14:54PM +0100, Frederic Weisbecker wrote: > 2017-12-04 18:16 UTC+01:00, Paul E. McKenney <paulmck@linux.vnet.ibm.com>: > > On Mon, Dec 04, 2017 at 04:53:15PM +0100, Frederic Weisbecker wrote: > >> 2017-12-02 20:24 UTC+01:00, Paul E. McKenney > >> I would prefer to keep it. It's useful for automated boot testing > >> based on configs such as 0-day or -tip test machines. But I'm likely > >> to migrate it to isolcpus implementation. Maybe something along the > >> lines of CONFIG_CPU_ISOLATION_ALL. > > > > How about instead allowing something like "nohz_full=1-" specify that > > all CPUs other than CPU 0 should be nohz_full CPUs? That would shrink > > the code by eliminating CONFIG_NO_HZ_FULL_ALL while still allowing > > easy automation of that particular scenario. > > > > (Right now, the boot code complains about "nohz_full=1-", which means > > that whatever is generating the boot parameters needs to know how many > > CPUs there really are, which as you say can be a pain.) > > Yes but automated boot testing is rather based on configs than boot > options. It's much easier. I think that's how Wu Fengguang lab works, > and -tip automated tests as well. So you have gotten bug reports from them? Because I see splats from rcutorture testing rather frequently. This thing is in no way a subtle low-probability bug. ;-) > >> >> Did you have any nohz_full= or isolcpus= boot options? > >> > > >> > Replacing CONFIG_NO_HZ_FULL_ALL=y with nohz_full=1-7 works, that > >> > is CONFIG_NO_HZ_FULL=y, CONFIG_NO_HZ_FULL_ALL=n, and nohz_full=1-7 > >> > on an eight-CPU test. > >> > > >> > But it is relatively easy to test. Running the rcutorture TREE04 > >> > scenario on a four-socket x86 gets me RCU CPU stall warnings within > >> > a few minutes more than half the time. ;-) > >> > >> Indeed I managed to trigger something. If it's the same thing I should > >> be able to track down the root cause. > >> > >> [ 123.907557] ??? Writer stall state RTWS_STUTTER(8) g160 c160 f0x0 > >> ->state 0x1 cpu 2 > >> [ 123.915184] rcu_torture_wri S 0 111 2 0x80080000 > >> [ 123.920673] Call Trace: > >> [ 123.923096] ? __schedule+0x2bf/0xbb0 > >> [ 123.926715] ? _raw_spin_unlock_irqrestore+0x59/0x70 > >> [ 123.931657] schedule+0x3c/0x90 > >> [ 123.934777] schedule_timeout+0x1e1/0x560 > > > > It might well be the same thing, as this schedule_timeout() does look > > familiar. I have some diagnostic patches in -rcu, please see below > > for the overall effect. > > I fear I can hit that even without any nohz_full CPU as well. Indeed, I do hit that with my TREE01 scenario, which does not set CONFIG_NO_HZ_FULL. But it is much less frequent. The good news is that I have finally figured out a way to extract information from this thing without suppressing it. At the moment it appears to be a rather strange deadlock involving CPU hotplug, timers, and RCU. But that is a completely different bug from the ones for which I have the two patches in my tree. Anyway, I will keep those two patches because I cannot have the corresponding bugs possibly hiding RCU bugs in my testing. If you put some other fix in place, I will drop those two patches in favor of your fix. Thanx, Paul > >> [ 123.938785] ? __next_timer_interrupt+0xd0/0xd0 > >> [ 123.943276] stutter_wait+0xc5/0xe0 > >> [ 123.946738] rcu_torture_writer+0x1ae/0x730 > >> [ 123.950912] ? rcu_torture_pipe_update+0xf0/0xf0 > >> [ 123.955491] kthread+0x15f/0x1a0 > >> [ 123.958702] ? kthread_unpark+0x60/0x60 > >> [ 123.962523] ret_from_fork+0x24/0x30 > >> [ 123.966091] rcu_preempt: wait state: 1 ->state: 0x402 > >> [ 123.971112] rcu_sched: wait state: 1 ->state: 0x402 > >> [ 123.975953] rcu_bh: wait state: 1 ->state: 0x402 > > > > ------------------------------------------------------------------------ > > > > diff --git a/kernel/time/timer.c b/kernel/time/timer.c > > index ffebcf878fba..23af27461d8c 100644 > > --- a/kernel/time/timer.c > > +++ b/kernel/time/timer.c > > @@ -1755,8 +1755,13 @@ static void process_timeout(struct timer_list *t) > > */ > > signed long __sched schedule_timeout(signed long timeout) > > { > > + struct timer_base *base; > > struct process_timer timer; > > unsigned long expire; > > + unsigned long flags; > > + unsigned long i; > > + unsigned int idx, idx_now; > > + unsigned long j; > > > > switch (timeout) > > { > > @@ -1793,6 +1798,17 @@ signed long __sched schedule_timeout(signed long > > timeout) > > timer_setup_on_stack(&timer.timer, process_timeout, 0); > > __mod_timer(&timer.timer, expire, 0); > > schedule(); > > + j = jiffies; > > + if (timeout < 5 && time_after(j, expire + 8 * HZ) && > > timer_pending(&timer.timer)) { > > + base = lock_timer_base(&timer.timer, &flags); > > + idx = timer_get_idx(&timer.timer); > > + idx_now = calc_wheel_index(j, base->clk); > > + raw_spin_unlock_irqrestore(&base->lock, flags); > > + pr_info("%s: Waylayed timer base->clk: %#lx jiffies: %#lx > > base->next_expiry: %#lx timer->flags: %#x timer->expires %#lx idx: %x > > idx_now: %x base->pending_map ", __func__, base->clk, j, base->next_expiry, > > timer.timer.flags, timer.timer.expires, idx, idx_now); > > + for (i = 0; i < WHEEL_SIZE / sizeof(base->pending_map[0]) / 8; i++) > > + pr_cont("%016lx", base->pending_map[i]); > > + pr_cont("\n"); > > + } > > del_singleshot_timer_sync(&timer.timer); > > > > /* Remove the timer from the object tracker */ > > > > > > Hmm, that message doesn't seem to trigger :-s > > Thanks. > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] sched/isolation: Make NO_HZ_FULL select CPU_ISOLATION 2017-12-07 17:29 ` Paul E. McKenney @ 2017-12-09 13:09 ` Frederic Weisbecker 2017-12-09 18:09 ` Paul E. McKenney 0 siblings, 1 reply; 9+ messages in thread From: Frederic Weisbecker @ 2017-12-09 13:09 UTC (permalink / raw) To: paulmck Cc: linux-kernel, xiaolong.ye, tglx, cmetcalf, cl, torvalds, lcapitulino, efault, peterz, riel, kernellwp, mingo, john.stultz 2017-12-07 18:29 UTC+01:00, Paul E. McKenney <paulmck@linux.vnet.ibm.com>: > On Thu, Dec 07, 2017 at 05:14:54PM +0100, Frederic Weisbecker wrote: >> 2017-12-04 18:16 UTC+01:00, Paul E. McKenney >> <paulmck@linux.vnet.ibm.com>: >> > On Mon, Dec 04, 2017 at 04:53:15PM +0100, Frederic Weisbecker wrote: >> >> 2017-12-02 20:24 UTC+01:00, Paul E. McKenney >> >> I would prefer to keep it. It's useful for automated boot testing >> >> based on configs such as 0-day or -tip test machines. But I'm likely >> >> to migrate it to isolcpus implementation. Maybe something along the >> >> lines of CONFIG_CPU_ISOLATION_ALL. >> > >> > How about instead allowing something like "nohz_full=1-" specify that >> > all CPUs other than CPU 0 should be nohz_full CPUs? That would shrink >> > the code by eliminating CONFIG_NO_HZ_FULL_ALL while still allowing >> > easy automation of that particular scenario. >> > >> > (Right now, the boot code complains about "nohz_full=1-", which means >> > that whatever is generating the boot parameters needs to know how many >> > CPUs there really are, which as you say can be a pain.) >> >> Yes but automated boot testing is rather based on configs than boot >> options. It's much easier. I think that's how Wu Fengguang lab works, >> and -tip automated tests as well. > > So you have gotten bug reports from them? Because I see splats from > rcutorture testing rather frequently. This thing is in no way a subtle > low-probability bug. ;-) Nope I haven't got anything from them. So far you're the only reproducer I know :) >> >> >> Did you have any nohz_full= or isolcpus= boot options? >> >> > >> >> > Replacing CONFIG_NO_HZ_FULL_ALL=y with nohz_full=1-7 works, that >> >> > is CONFIG_NO_HZ_FULL=y, CONFIG_NO_HZ_FULL_ALL=n, and nohz_full=1-7 >> >> > on an eight-CPU test. >> >> > >> >> > But it is relatively easy to test. Running the rcutorture TREE04 >> >> > scenario on a four-socket x86 gets me RCU CPU stall warnings within >> >> > a few minutes more than half the time. ;-) >> >> >> >> Indeed I managed to trigger something. If it's the same thing I should >> >> be able to track down the root cause. >> >> >> >> [ 123.907557] ??? Writer stall state RTWS_STUTTER(8) g160 c160 f0x0 >> >> ->state 0x1 cpu 2 >> >> [ 123.915184] rcu_torture_wri S 0 111 2 0x80080000 >> >> [ 123.920673] Call Trace: >> >> [ 123.923096] ? __schedule+0x2bf/0xbb0 >> >> [ 123.926715] ? _raw_spin_unlock_irqrestore+0x59/0x70 >> >> [ 123.931657] schedule+0x3c/0x90 >> >> [ 123.934777] schedule_timeout+0x1e1/0x560 >> > >> > It might well be the same thing, as this schedule_timeout() does look >> > familiar. I have some diagnostic patches in -rcu, please see below >> > for the overall effect. >> >> I fear I can hit that even without any nohz_full CPU as well. > > Indeed, I do hit that with my TREE01 scenario, which does not set > CONFIG_NO_HZ_FULL. But it is much less frequent. The good news is that > I have finally figured out a way to extract information from this thing > without suppressing it. At the moment it appears to be a rather strange > deadlock involving CPU hotplug, timers, and RCU. > > But that is a completely different bug from the ones for which I have > the two patches in my tree. > > Anyway, I will keep those two patches because I cannot have the > corresponding bugs possibly hiding RCU bugs in my testing. If you > put some other fix in place, I will drop those two patches in favor of > your fix. Ok. I'm a bit troubled by this bug, I hate to push a fix for a bug I don't understand nor can reproduce. But having CONFIG_NO_HZ_FULL select CONFIG_CPU_ISOLATION is already a fix for sanity that I need to push. So I think I'm going to take your patch anyway and rewrite the changelog to take all that into account. Thanks Paul! ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] sched/isolation: Make NO_HZ_FULL select CPU_ISOLATION 2017-12-09 13:09 ` Frederic Weisbecker @ 2017-12-09 18:09 ` Paul E. McKenney 0 siblings, 0 replies; 9+ messages in thread From: Paul E. McKenney @ 2017-12-09 18:09 UTC (permalink / raw) To: Frederic Weisbecker Cc: linux-kernel, xiaolong.ye, tglx, cmetcalf, cl, torvalds, lcapitulino, efault, peterz, riel, kernellwp, mingo, john.stultz On Sat, Dec 09, 2017 at 02:09:07PM +0100, Frederic Weisbecker wrote: > 2017-12-07 18:29 UTC+01:00, Paul E. McKenney <paulmck@linux.vnet.ibm.com>: > > On Thu, Dec 07, 2017 at 05:14:54PM +0100, Frederic Weisbecker wrote: > >> 2017-12-04 18:16 UTC+01:00, Paul E. McKenney > >> <paulmck@linux.vnet.ibm.com>: > >> > On Mon, Dec 04, 2017 at 04:53:15PM +0100, Frederic Weisbecker wrote: > >> >> 2017-12-02 20:24 UTC+01:00, Paul E. McKenney > >> >> I would prefer to keep it. It's useful for automated boot testing > >> >> based on configs such as 0-day or -tip test machines. But I'm likely > >> >> to migrate it to isolcpus implementation. Maybe something along the > >> >> lines of CONFIG_CPU_ISOLATION_ALL. > >> > > >> > How about instead allowing something like "nohz_full=1-" specify that > >> > all CPUs other than CPU 0 should be nohz_full CPUs? That would shrink > >> > the code by eliminating CONFIG_NO_HZ_FULL_ALL while still allowing > >> > easy automation of that particular scenario. > >> > > >> > (Right now, the boot code complains about "nohz_full=1-", which means > >> > that whatever is generating the boot parameters needs to know how many > >> > CPUs there really are, which as you say can be a pain.) > >> > >> Yes but automated boot testing is rather based on configs than boot > >> options. It's much easier. I think that's how Wu Fengguang lab works, > >> and -tip automated tests as well. > > > > So you have gotten bug reports from them? Because I see splats from > > rcutorture testing rather frequently. This thing is in no way a subtle > > low-probability bug. ;-) > > Nope I haven't got anything from them. So far you're the only > reproducer I know :) > > >> >> >> Did you have any nohz_full= or isolcpus= boot options? > >> >> > > >> >> > Replacing CONFIG_NO_HZ_FULL_ALL=y with nohz_full=1-7 works, that > >> >> > is CONFIG_NO_HZ_FULL=y, CONFIG_NO_HZ_FULL_ALL=n, and nohz_full=1-7 > >> >> > on an eight-CPU test. > >> >> > > >> >> > But it is relatively easy to test. Running the rcutorture TREE04 > >> >> > scenario on a four-socket x86 gets me RCU CPU stall warnings within > >> >> > a few minutes more than half the time. ;-) > >> >> > >> >> Indeed I managed to trigger something. If it's the same thing I should > >> >> be able to track down the root cause. > >> >> > >> >> [ 123.907557] ??? Writer stall state RTWS_STUTTER(8) g160 c160 f0x0 > >> >> ->state 0x1 cpu 2 > >> >> [ 123.915184] rcu_torture_wri S 0 111 2 0x80080000 > >> >> [ 123.920673] Call Trace: > >> >> [ 123.923096] ? __schedule+0x2bf/0xbb0 > >> >> [ 123.926715] ? _raw_spin_unlock_irqrestore+0x59/0x70 > >> >> [ 123.931657] schedule+0x3c/0x90 > >> >> [ 123.934777] schedule_timeout+0x1e1/0x560 > >> > > >> > It might well be the same thing, as this schedule_timeout() does look > >> > familiar. I have some diagnostic patches in -rcu, please see below > >> > for the overall effect. > >> > >> I fear I can hit that even without any nohz_full CPU as well. > > > > Indeed, I do hit that with my TREE01 scenario, which does not set > > CONFIG_NO_HZ_FULL. But it is much less frequent. The good news is that > > I have finally figured out a way to extract information from this thing > > without suppressing it. At the moment it appears to be a rather strange > > deadlock involving CPU hotplug, timers, and RCU. > > > > But that is a completely different bug from the ones for which I have > > the two patches in my tree. > > > > Anyway, I will keep those two patches because I cannot have the > > corresponding bugs possibly hiding RCU bugs in my testing. If you > > put some other fix in place, I will drop those two patches in favor of > > your fix. > > Ok. I'm a bit troubled by this bug, I hate to push a fix for a bug I > don't understand nor can reproduce. I would be happy to talk you through running the TREE04 rcutorture scenario, if you would like. I just verified that I can reproduce this on a single-socket four-core (8 hardware threads) x86 system running v4.15-rc1, so I would guess that you have access to hardware that can reproduce it. In the meantime, please feel free to take a look at the file tools/testing/selftests/rcutorture/doc/initrd.txt in the kernel source. This file tells how to create the initrd that rcutorture uses to avoid the need to maintain root partitions. Once you have that in place in tools/testing/selftests/rcutorture/initrd (as a expanded file tree, not any sort of archive) the reproducer is as follows, run from the top level of the kernel source tree: bash tools/testing/selftests/rcutorture/bin/kvm.sh --duration 5 --configs TREE04 This will output a bunch of rcutorture status/progress text, ending in something like the following: --- Sat Dec 9 09:49:26 PST 2017 Test summary: Results directory: /home/git/linux-2.6/tools/testing/selftests/rcutorture/res/2017.12.09-09:49:26 tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 7 --duration 5 --configs TREE04 TREE04 ------- 350 grace periods (1.16667 per second) WARNING: Assertion failure in /home/git/linux-2.6/tools/testing/selftests/rcutorture/res/2017.12.09-09:49:26/TREE04/console.log WARNING: Summary: Call Traces: 7 Stalls: 1 Starves: 1 The big long pathname ending in "console.log" contains the console output. A successful run would end without the WARNING lines. > But having CONFIG_NO_HZ_FULL > select CONFIG_CPU_ISOLATION is already a fix for sanity that I need to > push. So I think I'm going to take your patch anyway and rewrite the > changelog to take all that into account. > > Thanks Paul! Works for me! I don't know of anyone else encountering this, so I don't see it as an emergency. Left to myself, I would therefore push the fixes into the v4.17 merge window (that is, the one after the next one). But please let me know when you have pushed fixes, and I will adjust my tree accordingly. I am of course happy to test your fixes. Thanx, Paul ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2017-12-09 18:10 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-11-30 20:20 [PATCH] sched/isolation: Make NO_HZ_FULL select CPU_ISOLATION Paul E. McKenney 2017-12-02 13:59 ` Frederic Weisbecker 2017-12-02 19:24 ` Paul E. McKenney 2017-12-04 15:53 ` Frederic Weisbecker 2017-12-04 17:16 ` Paul E. McKenney 2017-12-07 16:14 ` Frederic Weisbecker 2017-12-07 17:29 ` Paul E. McKenney 2017-12-09 13:09 ` Frederic Weisbecker 2017-12-09 18:09 ` Paul E. McKenney
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).