linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sched/isolation: Make NO_HZ_FULL select CPU_ISOLATION
@ 2017-11-30 20:20 Paul E. McKenney
  2017-12-02 13:59 ` Frederic Weisbecker
  0 siblings, 1 reply; 9+ messages in thread
From: Paul E. McKenney @ 2017-11-30 20:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: xiaolong.ye, frederic, tglx, cmetcalf, cl, torvalds, lcapitulino,
	efault, peterz, riel, kernellwp, mingo, john.stultz

Commit 5c4991e24c69 ("sched/isolation: Split out new
CONFIG_CPU_ISOLATION=y config from CONFIG_NO_HZ_FULL") can result in RCU
CPU stall warnings when running rcutorture with CONFIG_NO_HZ_FULL_ALL=y
and CONFIG_CPU_ISOLATION=n.  These warnings are caused by RCU's
grace-period kthreads sleeping for a few jiffies, but never being
awakened:

[  116.353432] rcu_preempt kthread starved for 9974 jiffies! g4294967208
+c4294967207 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=0
[  116.355517] rcu_preempt     I 7464     8      2 0x80000000
[  116.356543] Call Trace:
[  116.357008]  __schedule+0x493/0x620
[  116.357682]  schedule+0x24/0x40
[  116.358291]  schedule_timeout+0x330/0x3b0
[  116.359024]  ? preempt_count_sub+0xea/0x140
[  116.359806]  ? collect_expired_timers+0xb0/0xb0
[  116.360660]  rcu_gp_kthread+0x6bf/0xef0

This commit therefore makes NO_HZ_FULL select CPU_ISOLATION, which
prevents this behavior and seems like it was the original intention in
any case.

Fixes: 5c4991e24c69 ("sched/isolation: Split out new CONFIG_CPU_ISOLATION=y config from CONFIG_NO_HZ_FULL")

Link: https://lkml.kernel.org/r/20171126084203.GE21779@yexl-desktop
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wanpeng Li <kernellwp@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: John Stultz <john.stultz@linaro.org>

diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig
index e776fc8cc1df..f6b5f19223d6 100644
--- a/kernel/time/Kconfig
+++ b/kernel/time/Kconfig
@@ -95,6 +95,7 @@ config NO_HZ_FULL
 	select RCU_NOCB_CPU
 	select VIRT_CPU_ACCOUNTING_GEN
 	select IRQ_WORK
+	select CPU_ISOLATION
 	help
 	 Adaptively try to shutdown the tick whenever possible, even when
 	 the CPU is running tasks. Typically this requires running a single

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/isolation: Make NO_HZ_FULL select CPU_ISOLATION
  2017-11-30 20:20 [PATCH] sched/isolation: Make NO_HZ_FULL select CPU_ISOLATION Paul E. McKenney
@ 2017-12-02 13:59 ` Frederic Weisbecker
  2017-12-02 19:24   ` Paul E. McKenney
  0 siblings, 1 reply; 9+ messages in thread
From: Frederic Weisbecker @ 2017-12-02 13:59 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, xiaolong.ye, tglx, cmetcalf, cl, torvalds,
	lcapitulino, efault, peterz, riel, kernellwp, mingo, john.stultz

2017-11-30 21:20 UTC+01:00, Paul E. McKenney <paulmck@linux.vnet.ibm.com>:
> Commit 5c4991e24c69 ("sched/isolation: Split out new
> CONFIG_CPU_ISOLATION=y config from CONFIG_NO_HZ_FULL") can result in RCU
> CPU stall warnings when running rcutorture with CONFIG_NO_HZ_FULL_ALL=y
> and CONFIG_CPU_ISOLATION=n.  These warnings are caused by RCU's
> grace-period kthreads sleeping for a few jiffies, but never being
> awakened:
>
> [  116.353432] rcu_preempt kthread starved for 9974 jiffies! g4294967208
> +c4294967207 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=0
> [  116.355517] rcu_preempt     I 7464     8      2 0x80000000
> [  116.356543] Call Trace:
> [  116.357008]  __schedule+0x493/0x620
> [  116.357682]  schedule+0x24/0x40
> [  116.358291]  schedule_timeout+0x330/0x3b0
> [  116.359024]  ? preempt_count_sub+0xea/0x140
> [  116.359806]  ? collect_expired_timers+0xb0/0xb0
> [  116.360660]  rcu_gp_kthread+0x6bf/0xef0
>
> This commit therefore makes NO_HZ_FULL select CPU_ISOLATION, which
> prevents this behavior and seems like it was the original intention in
> any case.

Although CONFIG_NO_HZ should indeed select CONFIG_CPU_ISOLATION, I'm
surprised about this stall. I'm even  more surprised that setting
CONFIG_CPU_ISOLATION=y is enough to fix the issue because
CONFIG_NO_HZ_FULL_ALL shortcuts CONFIG_CPU_ISOLATION entirely (which
is not good, but work in progress...).

Did you have any nohz_full= or isolcpus= boot options?

Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/isolation: Make NO_HZ_FULL select CPU_ISOLATION
  2017-12-02 13:59 ` Frederic Weisbecker
@ 2017-12-02 19:24   ` Paul E. McKenney
  2017-12-04 15:53     ` Frederic Weisbecker
  0 siblings, 1 reply; 9+ messages in thread
From: Paul E. McKenney @ 2017-12-02 19:24 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, xiaolong.ye, tglx, cmetcalf, cl, torvalds,
	lcapitulino, efault, peterz, riel, kernellwp, mingo, john.stultz

On Sat, Dec 02, 2017 at 02:59:12PM +0100, Frederic Weisbecker wrote:
> 2017-11-30 21:20 UTC+01:00, Paul E. McKenney <paulmck@linux.vnet.ibm.com>:
> > Commit 5c4991e24c69 ("sched/isolation: Split out new
> > CONFIG_CPU_ISOLATION=y config from CONFIG_NO_HZ_FULL") can result in RCU
> > CPU stall warnings when running rcutorture with CONFIG_NO_HZ_FULL_ALL=y
> > and CONFIG_CPU_ISOLATION=n.  These warnings are caused by RCU's
> > grace-period kthreads sleeping for a few jiffies, but never being
> > awakened:
> >
> > [  116.353432] rcu_preempt kthread starved for 9974 jiffies! g4294967208
> > +c4294967207 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=0
> > [  116.355517] rcu_preempt     I 7464     8      2 0x80000000
> > [  116.356543] Call Trace:
> > [  116.357008]  __schedule+0x493/0x620
> > [  116.357682]  schedule+0x24/0x40
> > [  116.358291]  schedule_timeout+0x330/0x3b0
> > [  116.359024]  ? preempt_count_sub+0xea/0x140
> > [  116.359806]  ? collect_expired_timers+0xb0/0xb0
> > [  116.360660]  rcu_gp_kthread+0x6bf/0xef0
> >
> > This commit therefore makes NO_HZ_FULL select CPU_ISOLATION, which
> > prevents this behavior and seems like it was the original intention in
> > any case.
> 
> Although CONFIG_NO_HZ should indeed select CONFIG_CPU_ISOLATION, I'm
> surprised about this stall. I'm even  more surprised that setting
> CONFIG_CPU_ISOLATION=y is enough to fix the issue because
> CONFIG_NO_HZ_FULL_ALL shortcuts CONFIG_CPU_ISOLATION entirely (which
> is not good, but work in progress...).

Yes, and after applying this patch, I get failures a few commits
later, which appears to be due to other changes that break
CONFIG_NO_HZ_FULL_ALL=y.  So I have another patch staged that removes
CONFIG_NO_HZ_FULL_ALL, on the grounds that no one else has complained,
so rcutorture is likely to be the only user, and I don't see the point
of having a Kconfig option for only one user.

> Did you have any nohz_full= or isolcpus= boot options?

Replacing CONFIG_NO_HZ_FULL_ALL=y with nohz_full=1-7 works, that
is CONFIG_NO_HZ_FULL=y, CONFIG_NO_HZ_FULL_ALL=n, and nohz_full=1-7
on an eight-CPU test.

But it is relatively easy to test.  Running the rcutorture TREE04
scenario on a four-socket x86 gets me RCU CPU stall warnings within
a few minutes more than half the time.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/isolation: Make NO_HZ_FULL select CPU_ISOLATION
  2017-12-02 19:24   ` Paul E. McKenney
@ 2017-12-04 15:53     ` Frederic Weisbecker
  2017-12-04 17:16       ` Paul E. McKenney
  0 siblings, 1 reply; 9+ messages in thread
From: Frederic Weisbecker @ 2017-12-04 15:53 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, xiaolong.ye, tglx, cmetcalf, cl, torvalds,
	lcapitulino, efault, peterz, riel, kernellwp, mingo, john.stultz

2017-12-02 20:24 UTC+01:00, Paul E. McKenney <paulmck@linux.vnet.ibm.com>:
> On Sat, Dec 02, 2017 at 02:59:12PM +0100, Frederic Weisbecker wrote:
>> 2017-11-30 21:20 UTC+01:00, Paul E. McKenney
>> <paulmck@linux.vnet.ibm.com>:
>> > Commit 5c4991e24c69 ("sched/isolation: Split out new
>> > CONFIG_CPU_ISOLATION=y config from CONFIG_NO_HZ_FULL") can result in
>> > RCU
>> > CPU stall warnings when running rcutorture with CONFIG_NO_HZ_FULL_ALL=y
>> > and CONFIG_CPU_ISOLATION=n.  These warnings are caused by RCU's
>> > grace-period kthreads sleeping for a few jiffies, but never being
>> > awakened:
>> >
>> > [  116.353432] rcu_preempt kthread starved for 9974 jiffies!
>> > g4294967208
>> > +c4294967207 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=0
>> > [  116.355517] rcu_preempt     I 7464     8      2 0x80000000
>> > [  116.356543] Call Trace:
>> > [  116.357008]  __schedule+0x493/0x620
>> > [  116.357682]  schedule+0x24/0x40
>> > [  116.358291]  schedule_timeout+0x330/0x3b0
>> > [  116.359024]  ? preempt_count_sub+0xea/0x140
>> > [  116.359806]  ? collect_expired_timers+0xb0/0xb0
>> > [  116.360660]  rcu_gp_kthread+0x6bf/0xef0
>> >
>> > This commit therefore makes NO_HZ_FULL select CPU_ISOLATION, which
>> > prevents this behavior and seems like it was the original intention in
>> > any case.
>>
>> Although CONFIG_NO_HZ should indeed select CONFIG_CPU_ISOLATION, I'm
>> surprised about this stall. I'm even  more surprised that setting
>> CONFIG_CPU_ISOLATION=y is enough to fix the issue because
>> CONFIG_NO_HZ_FULL_ALL shortcuts CONFIG_CPU_ISOLATION entirely (which
>> is not good, but work in progress...).
>
> Yes, and after applying this patch, I get failures a few commits
> later, which appears to be due to other changes that break
> CONFIG_NO_HZ_FULL_ALL=y.  So I have another patch staged that removes
> CONFIG_NO_HZ_FULL_ALL, on the grounds that no one else has complained,
> so rcutorture is likely to be the only user, and I don't see the point
> of having a Kconfig option for only one user.

I would prefer to keep it. It's useful for automated boot testing
based on configs such as 0-day or -tip test machines. But I'm likely
to migrate it to isolcpus implementation. Maybe something along the
lines of CONFIG_CPU_ISOLATION_ALL.

>
>> Did you have any nohz_full= or isolcpus= boot options?
>
> Replacing CONFIG_NO_HZ_FULL_ALL=y with nohz_full=1-7 works, that
> is CONFIG_NO_HZ_FULL=y, CONFIG_NO_HZ_FULL_ALL=n, and nohz_full=1-7
> on an eight-CPU test.
>
> But it is relatively easy to test.  Running the rcutorture TREE04
> scenario on a four-socket x86 gets me RCU CPU stall warnings within
> a few minutes more than half the time.  ;-)

Indeed I managed to trigger something. If it's the same thing I should
be able to track down the root cause.

[  123.907557] ??? Writer stall state RTWS_STUTTER(8) g160 c160 f0x0
->state 0x1 cpu 2
[  123.915184] rcu_torture_wri S    0   111      2 0x80080000
[  123.920673] Call Trace:
[  123.923096]  ? __schedule+0x2bf/0xbb0
[  123.926715]  ? _raw_spin_unlock_irqrestore+0x59/0x70
[  123.931657]  schedule+0x3c/0x90
[  123.934777]  schedule_timeout+0x1e1/0x560
[  123.938785]  ? __next_timer_interrupt+0xd0/0xd0
[  123.943276]  stutter_wait+0xc5/0xe0
[  123.946738]  rcu_torture_writer+0x1ae/0x730
[  123.950912]  ? rcu_torture_pipe_update+0xf0/0xf0
[  123.955491]  kthread+0x15f/0x1a0
[  123.958702]  ? kthread_unpark+0x60/0x60
[  123.962523]  ret_from_fork+0x24/0x30
[  123.966091] rcu_preempt: wait state: 1 ->state: 0x402
[  123.971112] rcu_sched: wait state: 1 ->state: 0x402
[  123.975953] rcu_bh: wait state: 1 ->state: 0x402

Thanks!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/isolation: Make NO_HZ_FULL select CPU_ISOLATION
  2017-12-04 15:53     ` Frederic Weisbecker
@ 2017-12-04 17:16       ` Paul E. McKenney
  2017-12-07 16:14         ` Frederic Weisbecker
  0 siblings, 1 reply; 9+ messages in thread
From: Paul E. McKenney @ 2017-12-04 17:16 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, xiaolong.ye, tglx, cmetcalf, cl, torvalds,
	lcapitulino, efault, peterz, riel, kernellwp, mingo, john.stultz

On Mon, Dec 04, 2017 at 04:53:15PM +0100, Frederic Weisbecker wrote:
> 2017-12-02 20:24 UTC+01:00, Paul E. McKenney <paulmck@linux.vnet.ibm.com>:
> > On Sat, Dec 02, 2017 at 02:59:12PM +0100, Frederic Weisbecker wrote:
> >> 2017-11-30 21:20 UTC+01:00, Paul E. McKenney
> >> <paulmck@linux.vnet.ibm.com>:
> >> > Commit 5c4991e24c69 ("sched/isolation: Split out new
> >> > CONFIG_CPU_ISOLATION=y config from CONFIG_NO_HZ_FULL") can result in
> >> > RCU
> >> > CPU stall warnings when running rcutorture with CONFIG_NO_HZ_FULL_ALL=y
> >> > and CONFIG_CPU_ISOLATION=n.  These warnings are caused by RCU's
> >> > grace-period kthreads sleeping for a few jiffies, but never being
> >> > awakened:
> >> >
> >> > [  116.353432] rcu_preempt kthread starved for 9974 jiffies!
> >> > g4294967208
> >> > +c4294967207 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=0
> >> > [  116.355517] rcu_preempt     I 7464     8      2 0x80000000
> >> > [  116.356543] Call Trace:
> >> > [  116.357008]  __schedule+0x493/0x620
> >> > [  116.357682]  schedule+0x24/0x40
> >> > [  116.358291]  schedule_timeout+0x330/0x3b0
> >> > [  116.359024]  ? preempt_count_sub+0xea/0x140
> >> > [  116.359806]  ? collect_expired_timers+0xb0/0xb0
> >> > [  116.360660]  rcu_gp_kthread+0x6bf/0xef0
> >> >
> >> > This commit therefore makes NO_HZ_FULL select CPU_ISOLATION, which
> >> > prevents this behavior and seems like it was the original intention in
> >> > any case.
> >>
> >> Although CONFIG_NO_HZ should indeed select CONFIG_CPU_ISOLATION, I'm
> >> surprised about this stall. I'm even  more surprised that setting
> >> CONFIG_CPU_ISOLATION=y is enough to fix the issue because
> >> CONFIG_NO_HZ_FULL_ALL shortcuts CONFIG_CPU_ISOLATION entirely (which
> >> is not good, but work in progress...).
> >
> > Yes, and after applying this patch, I get failures a few commits
> > later, which appears to be due to other changes that break
> > CONFIG_NO_HZ_FULL_ALL=y.  So I have another patch staged that removes
> > CONFIG_NO_HZ_FULL_ALL, on the grounds that no one else has complained,
> > so rcutorture is likely to be the only user, and I don't see the point
> > of having a Kconfig option for only one user.
> 
> I would prefer to keep it. It's useful for automated boot testing
> based on configs such as 0-day or -tip test machines. But I'm likely
> to migrate it to isolcpus implementation. Maybe something along the
> lines of CONFIG_CPU_ISOLATION_ALL.

How about instead allowing something like "nohz_full=1-" specify that
all CPUs other than CPU 0 should be nohz_full CPUs?  That would shrink
the code by eliminating CONFIG_NO_HZ_FULL_ALL while still allowing
easy automation of that particular scenario.

(Right now, the boot code complains about "nohz_full=1-", which means
that whatever is generating the boot parameters needs to know how many
CPUs there really are, which as you say can be a pain.)

> >> Did you have any nohz_full= or isolcpus= boot options?
> >
> > Replacing CONFIG_NO_HZ_FULL_ALL=y with nohz_full=1-7 works, that
> > is CONFIG_NO_HZ_FULL=y, CONFIG_NO_HZ_FULL_ALL=n, and nohz_full=1-7
> > on an eight-CPU test.
> >
> > But it is relatively easy to test.  Running the rcutorture TREE04
> > scenario on a four-socket x86 gets me RCU CPU stall warnings within
> > a few minutes more than half the time.  ;-)
> 
> Indeed I managed to trigger something. If it's the same thing I should
> be able to track down the root cause.
> 
> [  123.907557] ??? Writer stall state RTWS_STUTTER(8) g160 c160 f0x0
> ->state 0x1 cpu 2
> [  123.915184] rcu_torture_wri S    0   111      2 0x80080000
> [  123.920673] Call Trace:
> [  123.923096]  ? __schedule+0x2bf/0xbb0
> [  123.926715]  ? _raw_spin_unlock_irqrestore+0x59/0x70
> [  123.931657]  schedule+0x3c/0x90
> [  123.934777]  schedule_timeout+0x1e1/0x560

It might well be the same thing, as this schedule_timeout() does look
familiar.  I have some diagnostic patches in -rcu, please see below
for the overall effect.

							Thanx, Paul

> [  123.938785]  ? __next_timer_interrupt+0xd0/0xd0
> [  123.943276]  stutter_wait+0xc5/0xe0
> [  123.946738]  rcu_torture_writer+0x1ae/0x730
> [  123.950912]  ? rcu_torture_pipe_update+0xf0/0xf0
> [  123.955491]  kthread+0x15f/0x1a0
> [  123.958702]  ? kthread_unpark+0x60/0x60
> [  123.962523]  ret_from_fork+0x24/0x30
> [  123.966091] rcu_preempt: wait state: 1 ->state: 0x402
> [  123.971112] rcu_sched: wait state: 1 ->state: 0x402
> [  123.975953] rcu_bh: wait state: 1 ->state: 0x402

------------------------------------------------------------------------

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index ffebcf878fba..23af27461d8c 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1755,8 +1755,13 @@ static void process_timeout(struct timer_list *t)
  */
 signed long __sched schedule_timeout(signed long timeout)
 {
+	struct timer_base *base;
 	struct process_timer timer;
 	unsigned long expire;
+	unsigned long flags;
+	unsigned long i;
+	unsigned int idx, idx_now;
+	unsigned long j;
 
 	switch (timeout)
 	{
@@ -1793,6 +1798,17 @@ signed long __sched schedule_timeout(signed long timeout)
 	timer_setup_on_stack(&timer.timer, process_timeout, 0);
 	__mod_timer(&timer.timer, expire, 0);
 	schedule();
+	j = jiffies;
+	if (timeout < 5 && time_after(j, expire + 8 * HZ) && timer_pending(&timer.timer)) {
+		base = lock_timer_base(&timer.timer, &flags);
+		idx = timer_get_idx(&timer.timer);
+		idx_now = calc_wheel_index(j, base->clk);
+		raw_spin_unlock_irqrestore(&base->lock, flags);
+		pr_info("%s: Waylayed timer base->clk: %#lx jiffies: %#lx base->next_expiry: %#lx timer->flags: %#x timer->expires %#lx idx: %x idx_now: %x base->pending_map ", __func__, base->clk, j, base->next_expiry, timer.timer.flags, timer.timer.expires, idx, idx_now);
+		for (i = 0; i < WHEEL_SIZE / sizeof(base->pending_map[0]) / 8; i++)
+			pr_cont("%016lx", base->pending_map[i]);
+		pr_cont("\n");
+	}
 	del_singleshot_timer_sync(&timer.timer);
 
 	/* Remove the timer from the object tracker */

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/isolation: Make NO_HZ_FULL select CPU_ISOLATION
  2017-12-04 17:16       ` Paul E. McKenney
@ 2017-12-07 16:14         ` Frederic Weisbecker
  2017-12-07 17:29           ` Paul E. McKenney
  0 siblings, 1 reply; 9+ messages in thread
From: Frederic Weisbecker @ 2017-12-07 16:14 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, xiaolong.ye, tglx, cmetcalf, cl, torvalds,
	lcapitulino, efault, peterz, riel, kernellwp, mingo, john.stultz

2017-12-04 18:16 UTC+01:00, Paul E. McKenney <paulmck@linux.vnet.ibm.com>:
> On Mon, Dec 04, 2017 at 04:53:15PM +0100, Frederic Weisbecker wrote:
>> 2017-12-02 20:24 UTC+01:00, Paul E. McKenney
>> I would prefer to keep it. It's useful for automated boot testing
>> based on configs such as 0-day or -tip test machines. But I'm likely
>> to migrate it to isolcpus implementation. Maybe something along the
>> lines of CONFIG_CPU_ISOLATION_ALL.
>
> How about instead allowing something like "nohz_full=1-" specify that
> all CPUs other than CPU 0 should be nohz_full CPUs?  That would shrink
> the code by eliminating CONFIG_NO_HZ_FULL_ALL while still allowing
> easy automation of that particular scenario.
>
> (Right now, the boot code complains about "nohz_full=1-", which means
> that whatever is generating the boot parameters needs to know how many
> CPUs there really are, which as you say can be a pain.)

Yes but automated boot testing is rather based on configs than boot
options. It's much easier. I think that's how Wu Fengguang lab works,
and -tip automated tests as well.

>
>> >> Did you have any nohz_full= or isolcpus= boot options?
>> >
>> > Replacing CONFIG_NO_HZ_FULL_ALL=y with nohz_full=1-7 works, that
>> > is CONFIG_NO_HZ_FULL=y, CONFIG_NO_HZ_FULL_ALL=n, and nohz_full=1-7
>> > on an eight-CPU test.
>> >
>> > But it is relatively easy to test.  Running the rcutorture TREE04
>> > scenario on a four-socket x86 gets me RCU CPU stall warnings within
>> > a few minutes more than half the time.  ;-)
>>
>> Indeed I managed to trigger something. If it's the same thing I should
>> be able to track down the root cause.
>>
>> [  123.907557] ??? Writer stall state RTWS_STUTTER(8) g160 c160 f0x0
>> ->state 0x1 cpu 2
>> [  123.915184] rcu_torture_wri S    0   111      2 0x80080000
>> [  123.920673] Call Trace:
>> [  123.923096]  ? __schedule+0x2bf/0xbb0
>> [  123.926715]  ? _raw_spin_unlock_irqrestore+0x59/0x70
>> [  123.931657]  schedule+0x3c/0x90
>> [  123.934777]  schedule_timeout+0x1e1/0x560
>
> It might well be the same thing, as this schedule_timeout() does look
> familiar.  I have some diagnostic patches in -rcu, please see below
> for the overall effect.

I fear I can hit that even without any nohz_full CPU as well.

>
> 							Thanx, Paul
>
>> [  123.938785]  ? __next_timer_interrupt+0xd0/0xd0
>> [  123.943276]  stutter_wait+0xc5/0xe0
>> [  123.946738]  rcu_torture_writer+0x1ae/0x730
>> [  123.950912]  ? rcu_torture_pipe_update+0xf0/0xf0
>> [  123.955491]  kthread+0x15f/0x1a0
>> [  123.958702]  ? kthread_unpark+0x60/0x60
>> [  123.962523]  ret_from_fork+0x24/0x30
>> [  123.966091] rcu_preempt: wait state: 1 ->state: 0x402
>> [  123.971112] rcu_sched: wait state: 1 ->state: 0x402
>> [  123.975953] rcu_bh: wait state: 1 ->state: 0x402
>
> ------------------------------------------------------------------------
>
> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> index ffebcf878fba..23af27461d8c 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -1755,8 +1755,13 @@ static void process_timeout(struct timer_list *t)
>   */
>  signed long __sched schedule_timeout(signed long timeout)
>  {
> +	struct timer_base *base;
>  	struct process_timer timer;
>  	unsigned long expire;
> +	unsigned long flags;
> +	unsigned long i;
> +	unsigned int idx, idx_now;
> +	unsigned long j;
>
>  	switch (timeout)
>  	{
> @@ -1793,6 +1798,17 @@ signed long __sched schedule_timeout(signed long
> timeout)
>  	timer_setup_on_stack(&timer.timer, process_timeout, 0);
>  	__mod_timer(&timer.timer, expire, 0);
>  	schedule();
> +	j = jiffies;
> +	if (timeout < 5 && time_after(j, expire + 8 * HZ) &&
> timer_pending(&timer.timer)) {
> +		base = lock_timer_base(&timer.timer, &flags);
> +		idx = timer_get_idx(&timer.timer);
> +		idx_now = calc_wheel_index(j, base->clk);
> +		raw_spin_unlock_irqrestore(&base->lock, flags);
> +		pr_info("%s: Waylayed timer base->clk: %#lx jiffies: %#lx
> base->next_expiry: %#lx timer->flags: %#x timer->expires %#lx idx: %x
> idx_now: %x base->pending_map ", __func__, base->clk, j, base->next_expiry,
> timer.timer.flags, timer.timer.expires, idx, idx_now);
> +		for (i = 0; i < WHEEL_SIZE / sizeof(base->pending_map[0]) / 8; i++)
> +			pr_cont("%016lx", base->pending_map[i]);
> +		pr_cont("\n");
> +	}
>  	del_singleshot_timer_sync(&timer.timer);
>
>  	/* Remove the timer from the object tracker */
>
>

Hmm, that message doesn't seem to trigger :-s

Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/isolation: Make NO_HZ_FULL select CPU_ISOLATION
  2017-12-07 16:14         ` Frederic Weisbecker
@ 2017-12-07 17:29           ` Paul E. McKenney
  2017-12-09 13:09             ` Frederic Weisbecker
  0 siblings, 1 reply; 9+ messages in thread
From: Paul E. McKenney @ 2017-12-07 17:29 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, xiaolong.ye, tglx, cmetcalf, cl, torvalds,
	lcapitulino, efault, peterz, riel, kernellwp, mingo, john.stultz

On Thu, Dec 07, 2017 at 05:14:54PM +0100, Frederic Weisbecker wrote:
> 2017-12-04 18:16 UTC+01:00, Paul E. McKenney <paulmck@linux.vnet.ibm.com>:
> > On Mon, Dec 04, 2017 at 04:53:15PM +0100, Frederic Weisbecker wrote:
> >> 2017-12-02 20:24 UTC+01:00, Paul E. McKenney
> >> I would prefer to keep it. It's useful for automated boot testing
> >> based on configs such as 0-day or -tip test machines. But I'm likely
> >> to migrate it to isolcpus implementation. Maybe something along the
> >> lines of CONFIG_CPU_ISOLATION_ALL.
> >
> > How about instead allowing something like "nohz_full=1-" specify that
> > all CPUs other than CPU 0 should be nohz_full CPUs?  That would shrink
> > the code by eliminating CONFIG_NO_HZ_FULL_ALL while still allowing
> > easy automation of that particular scenario.
> >
> > (Right now, the boot code complains about "nohz_full=1-", which means
> > that whatever is generating the boot parameters needs to know how many
> > CPUs there really are, which as you say can be a pain.)
> 
> Yes but automated boot testing is rather based on configs than boot
> options. It's much easier. I think that's how Wu Fengguang lab works,
> and -tip automated tests as well.

So you have gotten bug reports from them?  Because I see splats from
rcutorture testing rather frequently.  This thing is in no way a subtle
low-probability bug.  ;-)

> >> >> Did you have any nohz_full= or isolcpus= boot options?
> >> >
> >> > Replacing CONFIG_NO_HZ_FULL_ALL=y with nohz_full=1-7 works, that
> >> > is CONFIG_NO_HZ_FULL=y, CONFIG_NO_HZ_FULL_ALL=n, and nohz_full=1-7
> >> > on an eight-CPU test.
> >> >
> >> > But it is relatively easy to test.  Running the rcutorture TREE04
> >> > scenario on a four-socket x86 gets me RCU CPU stall warnings within
> >> > a few minutes more than half the time.  ;-)
> >>
> >> Indeed I managed to trigger something. If it's the same thing I should
> >> be able to track down the root cause.
> >>
> >> [  123.907557] ??? Writer stall state RTWS_STUTTER(8) g160 c160 f0x0
> >> ->state 0x1 cpu 2
> >> [  123.915184] rcu_torture_wri S    0   111      2 0x80080000
> >> [  123.920673] Call Trace:
> >> [  123.923096]  ? __schedule+0x2bf/0xbb0
> >> [  123.926715]  ? _raw_spin_unlock_irqrestore+0x59/0x70
> >> [  123.931657]  schedule+0x3c/0x90
> >> [  123.934777]  schedule_timeout+0x1e1/0x560
> >
> > It might well be the same thing, as this schedule_timeout() does look
> > familiar.  I have some diagnostic patches in -rcu, please see below
> > for the overall effect.
> 
> I fear I can hit that even without any nohz_full CPU as well.

Indeed, I do hit that with my TREE01 scenario, which does not set
CONFIG_NO_HZ_FULL.  But it is much less frequent.  The good news is that
I have finally figured out a way to extract information from this thing
without suppressing it.  At the moment it appears to be a rather strange
deadlock involving CPU hotplug, timers, and RCU.

But that is a completely different bug from the ones for which I have
the two patches in my tree.

Anyway, I will keep those two patches because I cannot have the
corresponding bugs possibly hiding RCU bugs in my testing.  If you
put some other fix in place, I will drop those two patches in favor of
your fix.

							Thanx, Paul

> >> [  123.938785]  ? __next_timer_interrupt+0xd0/0xd0
> >> [  123.943276]  stutter_wait+0xc5/0xe0
> >> [  123.946738]  rcu_torture_writer+0x1ae/0x730
> >> [  123.950912]  ? rcu_torture_pipe_update+0xf0/0xf0
> >> [  123.955491]  kthread+0x15f/0x1a0
> >> [  123.958702]  ? kthread_unpark+0x60/0x60
> >> [  123.962523]  ret_from_fork+0x24/0x30
> >> [  123.966091] rcu_preempt: wait state: 1 ->state: 0x402
> >> [  123.971112] rcu_sched: wait state: 1 ->state: 0x402
> >> [  123.975953] rcu_bh: wait state: 1 ->state: 0x402
> >
> > ------------------------------------------------------------------------
> >
> > diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> > index ffebcf878fba..23af27461d8c 100644
> > --- a/kernel/time/timer.c
> > +++ b/kernel/time/timer.c
> > @@ -1755,8 +1755,13 @@ static void process_timeout(struct timer_list *t)
> >   */
> >  signed long __sched schedule_timeout(signed long timeout)
> >  {
> > +	struct timer_base *base;
> >  	struct process_timer timer;
> >  	unsigned long expire;
> > +	unsigned long flags;
> > +	unsigned long i;
> > +	unsigned int idx, idx_now;
> > +	unsigned long j;
> >
> >  	switch (timeout)
> >  	{
> > @@ -1793,6 +1798,17 @@ signed long __sched schedule_timeout(signed long
> > timeout)
> >  	timer_setup_on_stack(&timer.timer, process_timeout, 0);
> >  	__mod_timer(&timer.timer, expire, 0);
> >  	schedule();
> > +	j = jiffies;
> > +	if (timeout < 5 && time_after(j, expire + 8 * HZ) &&
> > timer_pending(&timer.timer)) {
> > +		base = lock_timer_base(&timer.timer, &flags);
> > +		idx = timer_get_idx(&timer.timer);
> > +		idx_now = calc_wheel_index(j, base->clk);
> > +		raw_spin_unlock_irqrestore(&base->lock, flags);
> > +		pr_info("%s: Waylayed timer base->clk: %#lx jiffies: %#lx
> > base->next_expiry: %#lx timer->flags: %#x timer->expires %#lx idx: %x
> > idx_now: %x base->pending_map ", __func__, base->clk, j, base->next_expiry,
> > timer.timer.flags, timer.timer.expires, idx, idx_now);
> > +		for (i = 0; i < WHEEL_SIZE / sizeof(base->pending_map[0]) / 8; i++)
> > +			pr_cont("%016lx", base->pending_map[i]);
> > +		pr_cont("\n");
> > +	}
> >  	del_singleshot_timer_sync(&timer.timer);
> >
> >  	/* Remove the timer from the object tracker */
> >
> >
> 
> Hmm, that message doesn't seem to trigger :-s
> 
> Thanks.
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/isolation: Make NO_HZ_FULL select CPU_ISOLATION
  2017-12-07 17:29           ` Paul E. McKenney
@ 2017-12-09 13:09             ` Frederic Weisbecker
  2017-12-09 18:09               ` Paul E. McKenney
  0 siblings, 1 reply; 9+ messages in thread
From: Frederic Weisbecker @ 2017-12-09 13:09 UTC (permalink / raw)
  To: paulmck
  Cc: linux-kernel, xiaolong.ye, tglx, cmetcalf, cl, torvalds,
	lcapitulino, efault, peterz, riel, kernellwp, mingo, john.stultz

2017-12-07 18:29 UTC+01:00, Paul E. McKenney <paulmck@linux.vnet.ibm.com>:
> On Thu, Dec 07, 2017 at 05:14:54PM +0100, Frederic Weisbecker wrote:
>> 2017-12-04 18:16 UTC+01:00, Paul E. McKenney
>> <paulmck@linux.vnet.ibm.com>:
>> > On Mon, Dec 04, 2017 at 04:53:15PM +0100, Frederic Weisbecker wrote:
>> >> 2017-12-02 20:24 UTC+01:00, Paul E. McKenney
>> >> I would prefer to keep it. It's useful for automated boot testing
>> >> based on configs such as 0-day or -tip test machines. But I'm likely
>> >> to migrate it to isolcpus implementation. Maybe something along the
>> >> lines of CONFIG_CPU_ISOLATION_ALL.
>> >
>> > How about instead allowing something like "nohz_full=1-" specify that
>> > all CPUs other than CPU 0 should be nohz_full CPUs?  That would shrink
>> > the code by eliminating CONFIG_NO_HZ_FULL_ALL while still allowing
>> > easy automation of that particular scenario.
>> >
>> > (Right now, the boot code complains about "nohz_full=1-", which means
>> > that whatever is generating the boot parameters needs to know how many
>> > CPUs there really are, which as you say can be a pain.)
>>
>> Yes but automated boot testing is rather based on configs than boot
>> options. It's much easier. I think that's how Wu Fengguang lab works,
>> and -tip automated tests as well.
>
> So you have gotten bug reports from them?  Because I see splats from
> rcutorture testing rather frequently.  This thing is in no way a subtle
> low-probability bug.  ;-)

Nope I haven't got anything from them. So far you're the only
reproducer I know :)

>> >> >> Did you have any nohz_full= or isolcpus= boot options?
>> >> >
>> >> > Replacing CONFIG_NO_HZ_FULL_ALL=y with nohz_full=1-7 works, that
>> >> > is CONFIG_NO_HZ_FULL=y, CONFIG_NO_HZ_FULL_ALL=n, and nohz_full=1-7
>> >> > on an eight-CPU test.
>> >> >
>> >> > But it is relatively easy to test.  Running the rcutorture TREE04
>> >> > scenario on a four-socket x86 gets me RCU CPU stall warnings within
>> >> > a few minutes more than half the time.  ;-)
>> >>
>> >> Indeed I managed to trigger something. If it's the same thing I should
>> >> be able to track down the root cause.
>> >>
>> >> [  123.907557] ??? Writer stall state RTWS_STUTTER(8) g160 c160 f0x0
>> >> ->state 0x1 cpu 2
>> >> [  123.915184] rcu_torture_wri S    0   111      2 0x80080000
>> >> [  123.920673] Call Trace:
>> >> [  123.923096]  ? __schedule+0x2bf/0xbb0
>> >> [  123.926715]  ? _raw_spin_unlock_irqrestore+0x59/0x70
>> >> [  123.931657]  schedule+0x3c/0x90
>> >> [  123.934777]  schedule_timeout+0x1e1/0x560
>> >
>> > It might well be the same thing, as this schedule_timeout() does look
>> > familiar.  I have some diagnostic patches in -rcu, please see below
>> > for the overall effect.
>>
>> I fear I can hit that even without any nohz_full CPU as well.
>
> Indeed, I do hit that with my TREE01 scenario, which does not set
> CONFIG_NO_HZ_FULL.  But it is much less frequent.  The good news is that
> I have finally figured out a way to extract information from this thing
> without suppressing it.  At the moment it appears to be a rather strange
> deadlock involving CPU hotplug, timers, and RCU.
>
> But that is a completely different bug from the ones for which I have
> the two patches in my tree.
>
> Anyway, I will keep those two patches because I cannot have the
> corresponding bugs possibly hiding RCU bugs in my testing.  If you
> put some other fix in place, I will drop those two patches in favor of
> your fix.

Ok. I'm a bit troubled by this bug, I hate to push a fix for a bug I
don't understand nor can reproduce. But having CONFIG_NO_HZ_FULL
select CONFIG_CPU_ISOLATION is already a fix for sanity that I need to
push. So I think I'm going to take your patch anyway and rewrite the
changelog to take all that into account.

Thanks Paul!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] sched/isolation: Make NO_HZ_FULL select CPU_ISOLATION
  2017-12-09 13:09             ` Frederic Weisbecker
@ 2017-12-09 18:09               ` Paul E. McKenney
  0 siblings, 0 replies; 9+ messages in thread
From: Paul E. McKenney @ 2017-12-09 18:09 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, xiaolong.ye, tglx, cmetcalf, cl, torvalds,
	lcapitulino, efault, peterz, riel, kernellwp, mingo, john.stultz

On Sat, Dec 09, 2017 at 02:09:07PM +0100, Frederic Weisbecker wrote:
> 2017-12-07 18:29 UTC+01:00, Paul E. McKenney <paulmck@linux.vnet.ibm.com>:
> > On Thu, Dec 07, 2017 at 05:14:54PM +0100, Frederic Weisbecker wrote:
> >> 2017-12-04 18:16 UTC+01:00, Paul E. McKenney
> >> <paulmck@linux.vnet.ibm.com>:
> >> > On Mon, Dec 04, 2017 at 04:53:15PM +0100, Frederic Weisbecker wrote:
> >> >> 2017-12-02 20:24 UTC+01:00, Paul E. McKenney
> >> >> I would prefer to keep it. It's useful for automated boot testing
> >> >> based on configs such as 0-day or -tip test machines. But I'm likely
> >> >> to migrate it to isolcpus implementation. Maybe something along the
> >> >> lines of CONFIG_CPU_ISOLATION_ALL.
> >> >
> >> > How about instead allowing something like "nohz_full=1-" specify that
> >> > all CPUs other than CPU 0 should be nohz_full CPUs?  That would shrink
> >> > the code by eliminating CONFIG_NO_HZ_FULL_ALL while still allowing
> >> > easy automation of that particular scenario.
> >> >
> >> > (Right now, the boot code complains about "nohz_full=1-", which means
> >> > that whatever is generating the boot parameters needs to know how many
> >> > CPUs there really are, which as you say can be a pain.)
> >>
> >> Yes but automated boot testing is rather based on configs than boot
> >> options. It's much easier. I think that's how Wu Fengguang lab works,
> >> and -tip automated tests as well.
> >
> > So you have gotten bug reports from them?  Because I see splats from
> > rcutorture testing rather frequently.  This thing is in no way a subtle
> > low-probability bug.  ;-)
> 
> Nope I haven't got anything from them. So far you're the only
> reproducer I know :)
> 
> >> >> >> Did you have any nohz_full= or isolcpus= boot options?
> >> >> >
> >> >> > Replacing CONFIG_NO_HZ_FULL_ALL=y with nohz_full=1-7 works, that
> >> >> > is CONFIG_NO_HZ_FULL=y, CONFIG_NO_HZ_FULL_ALL=n, and nohz_full=1-7
> >> >> > on an eight-CPU test.
> >> >> >
> >> >> > But it is relatively easy to test.  Running the rcutorture TREE04
> >> >> > scenario on a four-socket x86 gets me RCU CPU stall warnings within
> >> >> > a few minutes more than half the time.  ;-)
> >> >>
> >> >> Indeed I managed to trigger something. If it's the same thing I should
> >> >> be able to track down the root cause.
> >> >>
> >> >> [  123.907557] ??? Writer stall state RTWS_STUTTER(8) g160 c160 f0x0
> >> >> ->state 0x1 cpu 2
> >> >> [  123.915184] rcu_torture_wri S    0   111      2 0x80080000
> >> >> [  123.920673] Call Trace:
> >> >> [  123.923096]  ? __schedule+0x2bf/0xbb0
> >> >> [  123.926715]  ? _raw_spin_unlock_irqrestore+0x59/0x70
> >> >> [  123.931657]  schedule+0x3c/0x90
> >> >> [  123.934777]  schedule_timeout+0x1e1/0x560
> >> >
> >> > It might well be the same thing, as this schedule_timeout() does look
> >> > familiar.  I have some diagnostic patches in -rcu, please see below
> >> > for the overall effect.
> >>
> >> I fear I can hit that even without any nohz_full CPU as well.
> >
> > Indeed, I do hit that with my TREE01 scenario, which does not set
> > CONFIG_NO_HZ_FULL.  But it is much less frequent.  The good news is that
> > I have finally figured out a way to extract information from this thing
> > without suppressing it.  At the moment it appears to be a rather strange
> > deadlock involving CPU hotplug, timers, and RCU.
> >
> > But that is a completely different bug from the ones for which I have
> > the two patches in my tree.
> >
> > Anyway, I will keep those two patches because I cannot have the
> > corresponding bugs possibly hiding RCU bugs in my testing.  If you
> > put some other fix in place, I will drop those two patches in favor of
> > your fix.
> 
> Ok. I'm a bit troubled by this bug, I hate to push a fix for a bug I
> don't understand nor can reproduce.

I would be happy to talk you through running the TREE04 rcutorture
scenario, if you would like.  I just verified that I can reproduce this
on a single-socket four-core (8 hardware threads) x86 system running
v4.15-rc1, so I would guess that you have access to hardware that can
reproduce it.

In the meantime, please feel free to take a look at the file
tools/testing/selftests/rcutorture/doc/initrd.txt in the kernel source.
This file tells how to create the initrd that rcutorture uses to avoid
the need to maintain root partitions.  Once you have that in place in
tools/testing/selftests/rcutorture/initrd (as a expanded file tree,
not any sort of archive) the reproducer is as follows, run from the
top level of the kernel source tree:

bash tools/testing/selftests/rcutorture/bin/kvm.sh --duration 5 --configs TREE04

This will output a bunch of rcutorture status/progress text, ending
in something like the following:

 --- Sat Dec  9 09:49:26 PST 2017 Test summary:
Results directory: /home/git/linux-2.6/tools/testing/selftests/rcutorture/res/2017.12.09-09:49:26
tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 7 --duration 5 --configs TREE04
TREE04 ------- 350 grace periods (1.16667 per second)
WARNING: Assertion failure in /home/git/linux-2.6/tools/testing/selftests/rcutorture/res/2017.12.09-09:49:26/TREE04/console.log
WARNING: Summary: Call Traces: 7 Stalls: 1 Starves: 1

The big long pathname ending in "console.log" contains the console output.

A successful run would end without the WARNING lines.

>                                     But having CONFIG_NO_HZ_FULL
> select CONFIG_CPU_ISOLATION is already a fix for sanity that I need to
> push. So I think I'm going to take your patch anyway and rewrite the
> changelog to take all that into account.
> 
> Thanks Paul!

Works for me!  I don't know of anyone else encountering this, so I don't
see it as an emergency.  Left to myself, I would therefore push the
fixes into the v4.17 merge window (that is, the one after the next one).
But please let me know when you have pushed fixes, and I will adjust my
tree accordingly.

I am of course happy to test your fixes.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-12-09 18:10 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-30 20:20 [PATCH] sched/isolation: Make NO_HZ_FULL select CPU_ISOLATION Paul E. McKenney
2017-12-02 13:59 ` Frederic Weisbecker
2017-12-02 19:24   ` Paul E. McKenney
2017-12-04 15:53     ` Frederic Weisbecker
2017-12-04 17:16       ` Paul E. McKenney
2017-12-07 16:14         ` Frederic Weisbecker
2017-12-07 17:29           ` Paul E. McKenney
2017-12-09 13:09             ` Frederic Weisbecker
2017-12-09 18:09               ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).