linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Timer refuses to expire
@ 2017-12-01 18:25 Paul E. McKenney
  2017-12-04 17:34 ` Paul E. McKenney
  2017-12-04 17:42 ` Paul E. McKenney
  0 siblings, 2 replies; 11+ messages in thread
From: Paul E. McKenney @ 2017-12-01 18:25 UTC (permalink / raw)
  To: anna-maria; +Cc: tglx, linux-kernel

Hello, Anna-Maria,

It turned out that one set of problems was due to NO_HZ_FULL issues,
which are addressed with a pair of patches.  However, there is still one
case of a timer refusing to expire in rcutorture's TREE01 test scenario.
The takes from two to four hours to trigger, and the most recent one
gives the following diagnostic (see patch at the very end of this email):

[13166.127458] schedule_timeout: Waylayed timer base->clk: 0x100c40004 jiffies: 0x100c4524e base->next_expiry: 0x100c40004 timer->flags: 0x1000003 timer->expires 0x100c40003 idx: 4 idx_now: ea base->pending_map 000000000000001000000000001080000000000000040000000002000000000000000000000000000000000000040000000000000000000000000000000000000000000000000000

The message appears any time a timer for less than five jiffies takes
more than eight seconds to expire.  In all cases, this expiry did not
happen naturally, but rather via an unsolicited wakeup from the RCU CPU
stall code.  If I am interpreting this message correctly, base->clk
has advanced past this timer, and the timer is correctly flagged in
base->pending_map.  This seems like part of the problem because the
timer's timeout was only three jiffies.  However, base->clk has not
advanced for more than 20 seconds, which seems like another problem.

What additional diagnostics would be helpful?  I can capture data
at the beginning of the schedule_timeout in the timer_list structure,
and of course can print more information at the time of the wakeup.

------------------------------------------------------------------------

[13144.096640] rcu_bh-torture:torture_onoff task: onlining 3
[13144.105877] smpboot: Booting Node 0 Processor 3 APIC 0x3
[13144.266716] rcu_bh-torture:torture_onoff task: onlined 3
[13147.296610] rcu_bh-torture:torture_onoff task: offlining 6
[13147.616632] rcu_bh-torture: rtc: ffffffffb6dd9ee0 ver: 221994 tfle: 0 rta: 221995 rtaf: 0 rtf: 221981 rtmbe: 0 rtbe: 0 rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 16382958 onoff: 1824/1824:1826/1827 49,354:25,605 264279:246163 (HZ=1000) barrier: 87633/87633:0 cbflood: 12833
[13147.623449] rcu_bh-torture: Reader Pipe:  163134304753 676327 0 0 0 0 0 0 0 0 0
[13147.624931] rcu_bh-torture: Reader Batch:  163051334134 83646946 0 0 0 0 0 0 0 0 0
[13147.627426] rcu_bh-torture: Free-Block Circulation:  221994 221993 221992 221988 221987 221986 221985 221984 221983 221981 0
[13162.976784] rcu_bh-torture: rtc: ffffffffb6dda690 ver: 222249 tfle: 0 rta: 222249 rtaf: 0 rtf: 222240 rtmbe: 0 rtbe: 0 rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 16402175 onoff: 1824/1824:1826/1827 49,354:25,605 264279:246163 (HZ=1000) barrier: 87737/87737:0 cbflood: 12849
[13162.981280] rcu_bh-torture: Reader Pipe:  163340804815 677169 0 0 0 0 0 0 0 0 0
[13162.983225] rcu_bh-torture: Reader Batch:  163257732604 83749380 0 0 0 0 0 0 0 0 0
[13162.985163] rcu_bh-torture: Free-Block Circulation:  222248 222248 222247 222246 222245 222244 222243 222242 222241 222240 0
[13166.058496] INFO: rcu_sched detected stalls on CPUs/tasks:
[13166.060682] 	(detected by 34, t=21004 jiffies, g=20232, c=20231, q=2)
[13166.061863] All QSes seen, last rcu_sched kthread activity 21004 (4307833356-4307812352), jiffies_till_next_fqs=3, root ->qsmask 0x0
[13166.065346] rcu_torture_rea R  running task    15016   835      2 0x80000000
[13166.066840] Call Trace:
[13166.068532]  <IRQ>
[13166.069337]  sched_show_task+0x105/0x130
[13166.070351]  rcu_check_callbacks+0x9dd/0x9f0
[13166.071418]  ? tick_sched_do_timer+0x40/0x40
[13166.072423]  update_process_times+0x23/0x50
[13166.073491]  tick_sched_handle+0x30/0x50
[13166.074418]  tick_sched_timer+0x2f/0x70
[13166.075307]  __hrtimer_run_queues+0x8d/0x250
[13166.076302]  hrtimer_interrupt+0xad/0x200
[13166.077390]  smp_apic_timer_interrupt+0x58/0x150
[13166.078533]  apic_timer_interrupt+0x89/0x90
[13166.079498]  </IRQ>
[13166.080024] RIP: 0010:delay_tsc+0x2e/0xb0
[13166.080958] RSP: 0018:ffffbb5281733e30 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff11
[13166.084359] RAX: 0000000080000200 RBX: 000017e66223c0c9 RCX: 0000000000000201
[13166.085994] RDX: 0000000080000201 RSI: ffffffffb51f1e29 RDI: 00000000ffffffff
[13166.087579] RBP: 000017e662222311 R08: 000000000000000b R09: 0000000000000101
[13166.089183] R10: ffffe543c043f340 R11: ffffbb5281733e58 R12: 000000000006169f
[13166.092235] R13: 0000000000000022 R14: 0000000000069fa8 R15: ffffffffb6dd9b20
[13166.095399]  rcu_read_delay+0x104/0x110
[13166.097195]  rcu_torture_reader+0x1a4/0x320
[13166.099012]  ? rcu_torture_reader+0x320/0x320
[13166.100939]  ? kthread+0xf0/0x130
[13166.102446]  kthread+0xf0/0x130
[13166.103841]  ? rcu_torture_timer_cb+0x10/0x10
[13166.105892]  ? kthread_destroy_worker+0x40/0x40
[13166.107804]  ret_from_fork+0x1f/0x30
[13166.109304] rcu_sched kthread starved for 21052 jiffies! g20232 c20231 f0x0 RCU_GP_DONE_GPS(2) ->state=0x2 ->cpu=3
[13166.113636] rcu_sched       D14992     9      2 0x80000000
[13166.116050] Call Trace:
[13166.117201]  ? __schedule+0x33c/0x6f0
[13166.118823]  ? preempt_count_add+0x51/0x90
[13166.120168]  schedule+0x37/0x90
[13166.121002]  schedule_timeout+0x159/0x450
[13166.121969]  ? __next_timer_interrupt+0xc0/0xc0
[13166.123007]  rcu_gp_kthread+0x226/0x1110
[13166.123909]  kthread+0xf0/0x130
[13166.124702]  ? rcu_barrier+0x10/0x10
[13166.125548]  ? kthread_destroy_worker+0x40/0x40
[13166.126575]  ret_from_fork+0x1f/0x30
[13166.127458] schedule_timeout: Waylayed timer base->clk: 0x100c40004 jiffies: 0x100c4524e base->next_expiry: 0x100c40004 timer->flags: 0x1000003 timer->expires 0x100c40003 idx: 4 idx_now: ea base->pending_map 000000000000001000000000001080000000000000040000000002000000000000000000000000000000000000040000000000000000000000000000000000000000000000000000

------------------------------------------------------------------------

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index ffebcf878fba..23af27461d8c 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1755,8 +1755,13 @@ static void process_timeout(struct timer_list *t)
  */
 signed long __sched schedule_timeout(signed long timeout)
 {
+	struct timer_base *base;
 	struct process_timer timer;
 	unsigned long expire;
+	unsigned long flags;
+	unsigned long i;
+	unsigned int idx, idx_now;
+	unsigned long j;
 
 	switch (timeout)
 	{
@@ -1793,6 +1798,17 @@ signed long __sched schedule_timeout(signed long timeout)
 	timer_setup_on_stack(&timer.timer, process_timeout, 0);
 	__mod_timer(&timer.timer, expire, 0);
 	schedule();
+	j = jiffies;
+	if (timeout < 5 && time_after(j, expire + 8 * HZ) && timer_pending(&timer.timer)) {
+		base = lock_timer_base(&timer.timer, &flags);
+		idx = timer_get_idx(&timer.timer);
+		idx_now = calc_wheel_index(j, base->clk);
+		raw_spin_unlock_irqrestore(&base->lock, flags);
+		pr_info("%s: Waylayed timer base->clk: %#lx jiffies: %#lx base->next_expiry: %#lx timer->flags: %#x timer->expires %#lx idx: %x idx_now: %x base->pending_map ", __func__, base->clk, j, base->next_expiry, timer.timer.flags, timer.timer.expires, idx, idx_now);
+		for (i = 0; i < WHEEL_SIZE / sizeof(base->pending_map[0]) / 8; i++)
+			pr_cont("%016lx", base->pending_map[i]);
+		pr_cont("\n");
+	}
 	del_singleshot_timer_sync(&timer.timer);
 
 	/* Remove the timer from the object tracker */

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Timer refuses to expire
  2017-12-01 18:25 Timer refuses to expire Paul E. McKenney
@ 2017-12-04 17:34 ` Paul E. McKenney
  2017-12-04 17:42 ` Paul E. McKenney
  1 sibling, 0 replies; 11+ messages in thread
From: Paul E. McKenney @ 2017-12-04 17:34 UTC (permalink / raw)
  To: anna-maria; +Cc: tglx, linux-kernel

On Fri, Dec 01, 2017 at 10:25:29AM -0800, Paul E. McKenney wrote:
> Hello, Anna-Maria,
> 
> It turned out that one set of problems was due to NO_HZ_FULL issues,
> which are addressed with a pair of patches.  However, there is still one
> case of a timer refusing to expire in rcutorture's TREE01 test scenario.
> The takes from two to four hours to trigger, and the most recent one
> gives the following diagnostic (see patch at the very end of this email):
> 
> [13166.127458] schedule_timeout: Waylayed timer base->clk: 0x100c40004 jiffies: 0x100c4524e base->next_expiry: 0x100c40004 timer->flags: 0x1000003 timer->expires 0x100c40003 idx: 4 idx_now: ea base->pending_map 000000000000001000000000001080000000000000040000000002000000000000000000000000000000000000040000000000000000000000000000000000000000000000000000
> 
> The message appears any time a timer for less than five jiffies takes
> more than eight seconds to expire.  In all cases, this expiry did not
> happen naturally, but rather via an unsolicited wakeup from the RCU CPU
> stall code.  If I am interpreting this message correctly, base->clk
> has advanced past this timer, and the timer is correctly flagged in
> base->pending_map.  This seems like part of the problem because the
> timer's timeout was only three jiffies.  However, base->clk has not
> advanced for more than 20 seconds, which seems like another problem.
> 
> What additional diagnostics would be helpful?  I can capture data
> at the beginning of the schedule_timeout in the timer_list structure,
> and of course can print more information at the time of the wakeup.

And a few more from weekend testing, on the off-chance that any of them
are at all helpful.  One interesting thing about the several messages
starting at time 4731 is that CPU 5 doesn't seem to be expiring timers,
even when new ones are posted.  But at time 4891, base->clk has in fact
advanced.

							Thanx, Paul

[ 4710.845607] schedule_timeout: Waylayed timer base->clk: 0x10042fb85 jiffies: 0x100434ddc base->next_expiry: 0x10042fb85 timer->flags: 0x1400005 timer->expires 0x10042fb84 idx: 5 idx_now: e7 base->pending_map 000000000000002000000000000000000000000000000000000000100000000000000000000000000000000000010000000000000000000000000000000000000000000000000000
[ 4731.872311] schedule_timeout: Waylayed timer base->clk: 0x100434800 jiffies: 0x10043a000 base->next_expiry: 0x100434800 timer->flags: 0x2e000005 timer->expires 0x100434de0 idx: b8 idx_now: d1 base->pending_map 000000000000000000000000000000000100000000000000020000100000140000000000000000000000000000010000000000000000000000000000000000000000000000000000
[ 4750.334637] schedule_timeout: Waylayed timer base->clk: 0x100434800 jiffies: 0x10043e81e base->next_expiry: 0x100434800 timer->flags: 0x33000005 timer->expires 0x10043960b idx: cc idx_now: 13f base->pending_map 000000000000000000000000000000000000000000000000020000100014140000000000000000000000000000010000000000000000000000000000000000000000000000000000
[ 4771.443542] schedule_timeout: Waylayed timer base->clk: 0x100434800 jiffies: 0x100443a93 base->next_expiry: 0x100434800 timer->flags: 0x4fc00005 timer->expires 0x10043e821 idx: 13f idx_now: 104 base->pending_map 000000000000000000000000000000000000000000000000020000100014040080000000000000000000000000010000000000000000000000000000000000000000000000000000
[ 4813.336678] schedule_timeout: Waylayed timer base->clk: 0x100434800 jiffies: 0x10044de38 base->next_expiry: 0x100434800 timer->flags: 0x4fc00005 timer->expires 0x10043e821 idx: 13f idx_now: 10e base->pending_map 000000000000000000000000000000000000000000000000020000100014040080000000000000100000000000010000000000000000000000000000000000000000000000000000
[ 4834.345313] schedule_timeout: Waylayed timer base->clk: 0x100434800 jiffies: 0x100453049 base->next_expiry: 0x100434800 timer->flags: 0x41000005 timer->expires 0x100443a96 idx: 104 idx_now: 114 base->pending_map 000000000000000000000000000000000000000000000000020000100014040000000000000040100000000000010000000000000000000000000000000000000000000000000000
[ 4834.561159] schedule_timeout: Waylayed timer base->clk: 0x100434800 jiffies: 0x100453121 base->next_expiry: 0x100434800 timer->flags: 0x43800005 timer->expires 0x10044de3b idx: 10e idx_now: 114 base->pending_map 000000000000000000000000000000000000000000000000020000100014040000000000000040000000000000000000000000000000000000000000000000000000000000000000
[ 4891.721190] schedule_timeout: Waylayed timer base->clk: 0x10045be44 jiffies: 0x100461069 base->next_expiry: 0x10045be44 timer->flags: 0x1000005 timer->expires 0x10045be43 idx: 4 idx_now: c9 base->pending_map 000000000000001000000000000000000000000000002000008010000000000100000000000000000000000000200000000000000000000000000000000000000000000000000000
[ 4891.929320] schedule_timeout: Waylayed timer base->clk: 0x10045be44 jiffies: 0x100461139 base->next_expiry: 0x10045be44 timer->flags: 0x1000005 timer->expires 0x10045be43 idx: 4 idx_now: c9 base->pending_map 000000000000001000000000000000000000000000002000008010000000000100000000000000000000000000200000000000000000000000000000000000000000000000000000
[ 4929.131690] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x10046a28b base->next_expiry: 0x10045c340 timer->flags: 0x49800005 timer->expires 0x100465039 idx: 126 idx_now: 12b base->pending_map 0000000000000000000000000000000000000000000020000080100010008401000000e0000000000000000000200000000000000000000000000000000000000000000000000000
[ 4934.785783] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x10046b8a1 base->next_expiry: 0x10045c340 timer->flags: 0x49c00005 timer->expires 0x100466634 idx: 127 idx_now: 12c base->pending_map 0000000000000000000000000000000000000000000020000080100010008401000000e0000000000000000000200000000000000000000000000000000000000000000000000000
[ 4997.703037] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x10047ae66 base->next_expiry: 0x10045c340 timer->flags: 0x4b000005 timer->expires 0x10046b8a4 idx: 12c idx_now: 13b base->pending_map 000000000000000000000000000000000000000000002000008010001000840100181060000000000000000000200000000000000000000000000000000000000000000000000000
[ 5018.820354] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x1004800e4 base->next_expiry: 0x10045c340 timer->flags: 0x4ec00005 timer->expires 0x10047ae69 idx: 13b idx_now: 101 base->pending_map 000000000000000000000000000000000000000000002000008010001000840108180060000000010000000000200000000000000000000000000000000000000000000000000000
[ 5079.106104] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x10048ecde base->next_expiry: 0x10045c340 timer->flags: 0x42800005 timer->expires 0x100489a5b idx: 10a idx_now: 10f base->pending_map 000000000000000000000000000000000000000000002000008010001000840100180060000084030000000000200000000000000000000000000000000000000000000000000000
[ 5142.123589] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x10049e28b base->next_expiry: 0x10045c340 timer->flags: 0x43c00005 timer->expires 0x10048ece1 idx: 10f idx_now: 154 base->pending_map 000000000000000000000000000000000000000000002000008010001000840100180060000180030000000000200000000000000000000000000000000000000000000000000000
[ 5163.235561] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x1004a3503 base->next_expiry: 0x10045c340 timer->flags: 0x55000005 timer->expires 0x10049e28e idx: 154 idx_now: 155 base->pending_map 000000000000000000000000000000000000000000002000008010001000840100180060000180030000000000300000000000000000000000000000000000000000000000000000
[ 5226.136310] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x1004b2ab8 base->next_expiry: 0x10045c340 timer->flags: 0x55400005 timer->expires 0x1004a3506 idx: 155 idx_now: 157 base->pending_map 000000000000000000000000000000000000000000002000008010001000840100180060000180030000000000200000000000000000000000000000000000000000000000000000
[18001.753494] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x1010e1b79 base->next_expiry: 0x10045c340 timer->flags: 0x55400005 timer->expires 0x1004a5e11 idx: 155 idx_now: 184 base->pending_map 000000000000000000000000000000000000000000002000008010001000800100180020000180030000000000200000000000000000000000000000000000000000000000000000
[18001.763893] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x1010e1b83 base->next_expiry: 0x10045c340 timer->flags: 0x49400005 timer->expires 0x100464e01 idx: 125 idx_now: 184 base->pending_map 000000000000000000000000000000000000000000002000008010001000800100180020000180030000000000200000000000000000000000000000000000000000000000000000
[18001.772807] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x1010e1b8c base->next_expiry: 0x10045c340 timer->flags: 0x40400005 timer->expires 0x100480601 idx: 101 idx_now: 184 base->pending_map 000000000000000000000000000000000000000000002000008010001000800100180000000180030000000000200000000000000000000000000000000000000000000000000000
[18001.782538] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x1010e1b96 base->next_expiry: 0x10045c340 timer->flags: 0x30000005 timer->expires 0x10045fe01 idx: c0 idx_now: 184 base->pending_map 000000000000000000000000000000000000000000002000008010001000800100180000000180000000000000200000000000000000000000000000000000000000000000000000
[18001.791954] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x1010e1b9f base->next_expiry: 0x10045c340 timer->flags: 0x44000005 timer->expires 0x10048f601 idx: 110 idx_now: 184 base->pending_map 000000000000000000000000000000000000000000002000000010001000800000180000000180000000000000200000000000000000000000000000000000000000000000000000
[18001.801725] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x1010e1ba9 base->next_expiry: 0x10045c340 timer->flags: 0x4d000005 timer->expires 0x100473e12 idx: 134 idx_now: 184 base->pending_map 000000000000000000000000000000000000000000002000000010001000800000180000000000000000000000200000000000000000000000000000000000000000000000000000
[18001.811052] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x1010e1bb2 base->next_expiry: 0x10045c340 timer->flags: 0x3b000005 timer->expires 0x10045d601 idx: ec idx_now: 184 base->pending_map 000000000000000000000000000000000000000000002000000010001000800000000000000000000000000000200000000000000000000000000000000000000000000000000000
[18001.820594] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x1010e1bbc base->next_expiry: 0x10045c340 timer->flags: 0x37000005 timer->expires 0x1004636e1 idx: dc idx_now: 184 base->pending_map 000000000000000000000000000000000000000000000000000000001000800000000000000000000000000000200000000000000000000000000000000000000000000000000000

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Timer refuses to expire
  2017-12-01 18:25 Timer refuses to expire Paul E. McKenney
  2017-12-04 17:34 ` Paul E. McKenney
@ 2017-12-04 17:42 ` Paul E. McKenney
  2017-12-05 23:37   ` Paul E. McKenney
  1 sibling, 1 reply; 11+ messages in thread
From: Paul E. McKenney @ 2017-12-04 17:42 UTC (permalink / raw)
  To: anna-maria; +Cc: tglx, linux-kernel

On Fri, Dec 01, 2017 at 10:25:29AM -0800, Paul E. McKenney wrote:
> Hello, Anna-Maria,
> 
> It turned out that one set of problems was due to NO_HZ_FULL issues,
> which are addressed with a pair of patches.  However, there is still one
> case of a timer refusing to expire in rcutorture's TREE01 test scenario.
> The takes from two to four hours to trigger, and the most recent one
> gives the following diagnostic (see patch at the very end of this email):
> 
> [13166.127458] schedule_timeout: Waylayed timer base->clk: 0x100c40004 jiffies: 0x100c4524e base->next_expiry: 0x100c40004 timer->flags: 0x1000003 timer->expires 0x100c40003 idx: 4 idx_now: ea base->pending_map 000000000000001000000000001080000000000000040000000002000000000000000000000000000000000000040000000000000000000000000000000000000000000000000000
> 
> The message appears any time a timer for less than five jiffies takes
> more than eight seconds to expire.  In all cases, this expiry did not
> happen naturally, but rather via an unsolicited wakeup from the RCU CPU
> stall code.  If I am interpreting this message correctly, base->clk
> has advanced past this timer, and the timer is correctly flagged in
> base->pending_map.  This seems like part of the problem because the
> timer's timeout was only three jiffies.  However, base->clk has not
> advanced for more than 20 seconds, which seems like another problem.
> 
> What additional diagnostics would be helpful?  I can capture data
> at the beginning of the schedule_timeout in the timer_list structure,
> and of course can print more information at the time of the wakeup.

And on the off-chance that the following messages from this weekend's
runs are at all helpful.  One interesting point is that starting at
time 4731.872311, there are repeated enqueues to CPU 5's timer wheel,
but CPU 5's ->clk does not advance.  Things seem to advance at
time 4891.721190.

							Thanx, Paul

------------------------------------------------------------------------

[ 4710.845607] schedule_timeout: Waylayed timer base->clk: 0x10042fb85 jiffies: 0x100434ddc base->next_expiry: 0x10042fb85 timer->flags: 0x1400005 timer->expires 0x10042fb84 idx: 5 idx_now: e7 base->pending_map 000000000000002000000000000000000000000000000000000000100000000000000000000000000000000000010000000000000000000000000000000000000000000000000000
[ 4731.872311] schedule_timeout: Waylayed timer base->clk: 0x100434800 jiffies: 0x10043a000 base->next_expiry: 0x100434800 timer->flags: 0x2e000005 timer->expires 0x100434de0 idx: b8 idx_now: d1 base->pending_map 000000000000000000000000000000000100000000000000020000100000140000000000000000000000000000010000000000000000000000000000000000000000000000000000
[ 4750.334637] schedule_timeout: Waylayed timer base->clk: 0x100434800 jiffies: 0x10043e81e base->next_expiry: 0x100434800 timer->flags: 0x33000005 timer->expires 0x10043960b idx: cc idx_now: 13f base->pending_map 000000000000000000000000000000000000000000000000020000100014140000000000000000000000000000010000000000000000000000000000000000000000000000000000
[ 4771.443542] schedule_timeout: Waylayed timer base->clk: 0x100434800 jiffies: 0x100443a93 base->next_expiry: 0x100434800 timer->flags: 0x4fc00005 timer->expires 0x10043e821 idx: 13f idx_now: 104 base->pending_map 000000000000000000000000000000000000000000000000020000100014040080000000000000000000000000010000000000000000000000000000000000000000000000000000
[ 4813.336678] schedule_timeout: Waylayed timer base->clk: 0x100434800 jiffies: 0x10044de38 base->next_expiry: 0x100434800 timer->flags: 0x4fc00005 timer->expires 0x10043e821 idx: 13f idx_now: 10e base->pending_map 000000000000000000000000000000000000000000000000020000100014040080000000000000100000000000010000000000000000000000000000000000000000000000000000
[ 4834.345313] schedule_timeout: Waylayed timer base->clk: 0x100434800 jiffies: 0x100453049 base->next_expiry: 0x100434800 timer->flags: 0x41000005 timer->expires 0x100443a96 idx: 104 idx_now: 114 base->pending_map 000000000000000000000000000000000000000000000000020000100014040000000000000040100000000000010000000000000000000000000000000000000000000000000000
[ 4834.561159] schedule_timeout: Waylayed timer base->clk: 0x100434800 jiffies: 0x100453121 base->next_expiry: 0x100434800 timer->flags: 0x43800005 timer->expires 0x10044de3b idx: 10e idx_now: 114 base->pending_map 000000000000000000000000000000000000000000000000020000100014040000000000000040000000000000000000000000000000000000000000000000000000000000000000
[ 4891.721190] schedule_timeout: Waylayed timer base->clk: 0x10045be44 jiffies: 0x100461069 base->next_expiry: 0x10045be44 timer->flags: 0x1000005 timer->expires 0x10045be43 idx: 4 idx_now: c9 base->pending_map 000000000000001000000000000000000000000000002000008010000000000100000000000000000000000000200000000000000000000000000000000000000000000000000000
[ 4891.929320] schedule_timeout: Waylayed timer base->clk: 0x10045be44 jiffies: 0x100461139 base->next_expiry: 0x10045be44 timer->flags: 0x1000005 timer->expires 0x10045be43 idx: 4 idx_now: c9 base->pending_map 000000000000001000000000000000000000000000002000008010000000000100000000000000000000000000200000000000000000000000000000000000000000000000000000
[ 4929.131690] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x10046a28b base->next_expiry: 0x10045c340 timer->flags: 0x49800005 timer->expires 0x100465039 idx: 126 idx_now: 12b base->pending_map 0000000000000000000000000000000000000000000020000080100010008401000000e0000000000000000000200000000000000000000000000000000000000000000000000000
[ 4934.785783] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x10046b8a1 base->next_expiry: 0x10045c340 timer->flags: 0x49c00005 timer->expires 0x100466634 idx: 127 idx_now: 12c base->pending_map 0000000000000000000000000000000000000000000020000080100010008401000000e0000000000000000000200000000000000000000000000000000000000000000000000000
[ 4997.703037] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x10047ae66 base->next_expiry: 0x10045c340 timer->flags: 0x4b000005 timer->expires 0x10046b8a4 idx: 12c idx_now: 13b base->pending_map 000000000000000000000000000000000000000000002000008010001000840100181060000000000000000000200000000000000000000000000000000000000000000000000000
[ 5018.820354] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x1004800e4 base->next_expiry: 0x10045c340 timer->flags: 0x4ec00005 timer->expires 0x10047ae69 idx: 13b idx_now: 101 base->pending_map 000000000000000000000000000000000000000000002000008010001000840108180060000000010000000000200000000000000000000000000000000000000000000000000000
[ 5079.106104] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x10048ecde base->next_expiry: 0x10045c340 timer->flags: 0x42800005 timer->expires 0x100489a5b idx: 10a idx_now: 10f base->pending_map 000000000000000000000000000000000000000000002000008010001000840100180060000084030000000000200000000000000000000000000000000000000000000000000000
[ 5142.123589] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x10049e28b base->next_expiry: 0x10045c340 timer->flags: 0x43c00005 timer->expires 0x10048ece1 idx: 10f idx_now: 154 base->pending_map 000000000000000000000000000000000000000000002000008010001000840100180060000180030000000000200000000000000000000000000000000000000000000000000000
[ 5163.235561] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x1004a3503 base->next_expiry: 0x10045c340 timer->flags: 0x55000005 timer->expires 0x10049e28e idx: 154 idx_now: 155 base->pending_map 000000000000000000000000000000000000000000002000008010001000840100180060000180030000000000300000000000000000000000000000000000000000000000000000
[ 5226.136310] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x1004b2ab8 base->next_expiry: 0x10045c340 timer->flags: 0x55400005 timer->expires 0x1004a3506 idx: 155 idx_now: 157 base->pending_map 000000000000000000000000000000000000000000002000008010001000840100180060000180030000000000200000000000000000000000000000000000000000000000000000
[18001.753494] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x1010e1b79 base->next_expiry: 0x10045c340 timer->flags: 0x55400005 timer->expires 0x1004a5e11 idx: 155 idx_now: 184 base->pending_map 000000000000000000000000000000000000000000002000008010001000800100180020000180030000000000200000000000000000000000000000000000000000000000000000
[18001.763893] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x1010e1b83 base->next_expiry: 0x10045c340 timer->flags: 0x49400005 timer->expires 0x100464e01 idx: 125 idx_now: 184 base->pending_map 000000000000000000000000000000000000000000002000008010001000800100180020000180030000000000200000000000000000000000000000000000000000000000000000
[18001.772807] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x1010e1b8c base->next_expiry: 0x10045c340 timer->flags: 0x40400005 timer->expires 0x100480601 idx: 101 idx_now: 184 base->pending_map 000000000000000000000000000000000000000000002000008010001000800100180000000180030000000000200000000000000000000000000000000000000000000000000000
[18001.782538] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x1010e1b96 base->next_expiry: 0x10045c340 timer->flags: 0x30000005 timer->expires 0x10045fe01 idx: c0 idx_now: 184 base->pending_map 000000000000000000000000000000000000000000002000008010001000800100180000000180000000000000200000000000000000000000000000000000000000000000000000
[18001.791954] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x1010e1b9f base->next_expiry: 0x10045c340 timer->flags: 0x44000005 timer->expires 0x10048f601 idx: 110 idx_now: 184 base->pending_map 000000000000000000000000000000000000000000002000000010001000800000180000000180000000000000200000000000000000000000000000000000000000000000000000
[18001.801725] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x1010e1ba9 base->next_expiry: 0x10045c340 timer->flags: 0x4d000005 timer->expires 0x100473e12 idx: 134 idx_now: 184 base->pending_map 000000000000000000000000000000000000000000002000000010001000800000180000000000000000000000200000000000000000000000000000000000000000000000000000
[18001.811052] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x1010e1bb2 base->next_expiry: 0x10045c340 timer->flags: 0x3b000005 timer->expires 0x10045d601 idx: ec idx_now: 184 base->pending_map 000000000000000000000000000000000000000000002000000010001000800000000000000000000000000000200000000000000000000000000000000000000000000000000000
[18001.820594] schedule_timeout: Waylayed timer base->clk: 0x10045c340 jiffies: 0x1010e1bbc base->next_expiry: 0x10045c340 timer->flags: 0x37000005 timer->expires 0x1004636e1 idx: dc idx_now: 184 base->pending_map 000000000000000000000000000000000000000000000000000000001000800000000000000000000000000000200000000000000000000000000000000000000000000000000000

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Timer refuses to expire
  2017-12-04 17:42 ` Paul E. McKenney
@ 2017-12-05 23:37   ` Paul E. McKenney
  2017-12-06 22:04     ` Paul E. McKenney
  0 siblings, 1 reply; 11+ messages in thread
From: Paul E. McKenney @ 2017-12-05 23:37 UTC (permalink / raw)
  To: anna-maria; +Cc: tglx, linux-kernel

On Mon, Dec 04, 2017 at 09:42:08AM -0800, Paul E. McKenney wrote:
> On Fri, Dec 01, 2017 at 10:25:29AM -0800, Paul E. McKenney wrote:
> > Hello, Anna-Maria,
> > 
> > It turned out that one set of problems was due to NO_HZ_FULL issues,
> > which are addressed with a pair of patches.  However, there is still one
> > case of a timer refusing to expire in rcutorture's TREE01 test scenario.
> > The takes from two to four hours to trigger, and the most recent one
> > gives the following diagnostic (see patch at the very end of this email):
> > 
> > [13166.127458] schedule_timeout: Waylayed timer base->clk: 0x100c40004 jiffies: 0x100c4524e base->next_expiry: 0x100c40004 timer->flags: 0x1000003 timer->expires 0x100c40003 idx: 4 idx_now: ea base->pending_map 000000000000001000000000001080000000000000040000000002000000000000000000000000000000000000040000000000000000000000000000000000000000000000000000
> > 
> > The message appears any time a timer for less than five jiffies takes
> > more than eight seconds to expire.  In all cases, this expiry did not
> > happen naturally, but rather via an unsolicited wakeup from the RCU CPU
> > stall code.  If I am interpreting this message correctly, base->clk
> > has advanced past this timer, and the timer is correctly flagged in
> > base->pending_map.  This seems like part of the problem because the
> > timer's timeout was only three jiffies.  However, base->clk has not
> > advanced for more than 20 seconds, which seems like another problem.
> > 
> > What additional diagnostics would be helpful?  I can capture data
> > at the beginning of the schedule_timeout in the timer_list structure,
> > and of course can print more information at the time of the wakeup.
> 
> And on the off-chance that the following messages from this weekend's
> runs are at all helpful.  One interesting point is that starting at
> time 4731.872311, there are repeated enqueues to CPU 5's timer wheel,
> but CPU 5's ->clk does not advance.  Things seem to advance at
> time 4891.721190.

Another layer on the onion...  For at least some of the failures, there
is a stalled CPU-hotplug operation.  This of course can mean that the
timers are stuck on the partially offlined CPU.  So I dumped the stack
of the task that is taking the CPU offline.  Please see below for console
output and patch.

I am considering doing an unsolicited wakeup of the task doing the
hotplug operation, but I am not convinced that the entirely of the CPU
hotplug code is willing to put up with that sort of thing.

							Thanx, Paul

------------------------------------------------------------------------

[ 1035.692679] schedule_timeout: Waylayed timer base->clk: 0x1000ae76c jiffies: 0x1000b39cd base->next_expiry: 0x1000ae76c timer->flags: 0xb000007 timer->expires 0x1000ae76b idx: 2c idx_now: dd base->pending_map 000010000000000002000800000000000002000000000000000000000020092000000000000000000000000080000000000000000000000000000000000000000000000000000000
[ 1035.692689] Torture onoff task state:
[ 1035.692691] torture_onoff   D14584   837      2 0x80000000
[ 1035.692734] Call Trace:
[ 1035.692742]  ? __schedule+0x33c/0x6f0
[ 1035.692745]  ? _raw_spin_lock_irq+0xf/0x30
[ 1035.692748]  schedule+0x37/0x90
[ 1035.692750]  schedule_timeout+0x20d/0x490
[ 1035.692755]  ? preempt_count_add+0x51/0x90
[ 1035.692758]  wait_for_completion+0x95/0x100
[ 1035.692763]  ? wake_up_q+0x60/0x60
[ 1035.692783]  cpuhp_kick_ap+0x29/0x70
[ 1035.692799]  cpuhp_kick_ap_work+0x34/0x130
[ 1035.692807]  _cpu_down+0x17e/0x1f0
[ 1035.692811]  do_cpu_down+0x30/0x50
[ 1035.692834]  torture_offline+0x76/0x140
[ 1035.692851]  torture_onoff+0xef/0x1c0
[ 1035.692854]  kthread+0xf0/0x130
[ 1035.692857]  ? torture_kthread_stopping+0x70/0x70

------------------------------------------------------------------------

commit 8069a4a1a0634055c43b50a37cb08237eea97dd5
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Tue Dec 5 13:59:06 2017 -0800

    EXP: timer: Dump torture-test online/offline task state
    
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/include/linux/torture.h b/include/linux/torture.h
index 66272862070b..ac62d21c9688 100644
--- a/include/linux/torture.h
+++ b/include/linux/torture.h
@@ -102,4 +102,6 @@ void _torture_stop_kthread(char *m, struct task_struct **tp);
 #define torture_preempt_schedule()
 #endif
 
+void schedule_timeout_set_task2dump(struct task_struct *t);
+
 #endif /* __LINUX_TORTURE_H */
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 23af27461d8c..0a67df4cdeb6 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1724,6 +1724,16 @@ static void process_timeout(struct timer_list *t)
 	wake_up_process(timeout->task);
 }
 
+static struct task_struct *schedule_timeout_task2dump;
+static DEFINE_MUTEX(schedule_timeout_task2dump_mutex);
+void schedule_timeout_set_task2dump(struct task_struct *t)
+{
+	mutex_lock(&schedule_timeout_task2dump_mutex);
+	WRITE_ONCE(schedule_timeout_task2dump, t);
+	mutex_unlock(&schedule_timeout_task2dump_mutex);
+}
+EXPORT_SYMBOL_GPL(schedule_timeout_set_task2dump);
+
 /**
  * schedule_timeout - sleep until timeout
  * @timeout: timeout value in jiffies
@@ -1808,6 +1818,14 @@ signed long __sched schedule_timeout(signed long timeout)
 		for (i = 0; i < WHEEL_SIZE / sizeof(base->pending_map[0]) / 8; i++)
 			pr_cont("%016lx", base->pending_map[i]);
 		pr_cont("\n");
+		if (READ_ONCE(schedule_timeout_task2dump)) {
+			mutex_lock(&schedule_timeout_task2dump_mutex);
+			if (schedule_timeout_task2dump) {
+				pr_info("Torture onoff task state:\n");
+				sched_show_task(schedule_timeout_task2dump);
+			}
+			mutex_unlock(&schedule_timeout_task2dump_mutex);
+		}
 	}
 	del_singleshot_timer_sync(&timer.timer);
 
diff --git a/kernel/torture.c b/kernel/torture.c
index 37b94012a3f8..48d5f03c644a 100644
--- a/kernel/torture.c
+++ b/kernel/torture.c
@@ -238,6 +238,7 @@ int torture_onoff_init(long ooholdoff, long oointerval)
 	if (onoff_interval <= 0)
 		return 0;
 	ret = torture_create_kthread(torture_onoff, NULL, onoff_task);
+	schedule_timeout_set_task2dump(onoff_task);
 #endif /* #ifdef CONFIG_HOTPLUG_CPU */
 	return ret;
 }
@@ -251,6 +252,7 @@ static void torture_onoff_cleanup(void)
 #ifdef CONFIG_HOTPLUG_CPU
 	if (onoff_task == NULL)
 		return;
+	schedule_timeout_set_task2dump(NULL);
 	VERBOSE_TOROUT_STRING("Stopping torture_onoff task");
 	kthread_stop(onoff_task);
 	onoff_task = NULL;

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Timer refuses to expire
  2017-12-05 23:37   ` Paul E. McKenney
@ 2017-12-06 22:04     ` Paul E. McKenney
  2017-12-07  7:03       ` Boqun Feng
  0 siblings, 1 reply; 11+ messages in thread
From: Paul E. McKenney @ 2017-12-06 22:04 UTC (permalink / raw)
  To: anna-maria; +Cc: tglx, linux-kernel

On Tue, Dec 05, 2017 at 03:37:44PM -0800, Paul E. McKenney wrote:
> On Mon, Dec 04, 2017 at 09:42:08AM -0800, Paul E. McKenney wrote:
> > On Fri, Dec 01, 2017 at 10:25:29AM -0800, Paul E. McKenney wrote:
> > > Hello, Anna-Maria,
> > > 
> > > It turned out that one set of problems was due to NO_HZ_FULL issues,
> > > which are addressed with a pair of patches.  However, there is still one
> > > case of a timer refusing to expire in rcutorture's TREE01 test scenario.
> > > The takes from two to four hours to trigger, and the most recent one
> > > gives the following diagnostic (see patch at the very end of this email):
> > > 
> > > [13166.127458] schedule_timeout: Waylayed timer base->clk: 0x100c40004 jiffies: 0x100c4524e base->next_expiry: 0x100c40004 timer->flags: 0x1000003 timer->expires 0x100c40003 idx: 4 idx_now: ea base->pending_map 000000000000001000000000001080000000000000040000000002000000000000000000000000000000000000040000000000000000000000000000000000000000000000000000
> > > 
> > > The message appears any time a timer for less than five jiffies takes
> > > more than eight seconds to expire.  In all cases, this expiry did not
> > > happen naturally, but rather via an unsolicited wakeup from the RCU CPU
> > > stall code.  If I am interpreting this message correctly, base->clk
> > > has advanced past this timer, and the timer is correctly flagged in
> > > base->pending_map.  This seems like part of the problem because the
> > > timer's timeout was only three jiffies.  However, base->clk has not
> > > advanced for more than 20 seconds, which seems like another problem.
> > > 
> > > What additional diagnostics would be helpful?  I can capture data
> > > at the beginning of the schedule_timeout in the timer_list structure,
> > > and of course can print more information at the time of the wakeup.
> > 
> > And on the off-chance that the following messages from this weekend's
> > runs are at all helpful.  One interesting point is that starting at
> > time 4731.872311, there are repeated enqueues to CPU 5's timer wheel,
> > but CPU 5's ->clk does not advance.  Things seem to advance at
> > time 4891.721190.
> 
> Another layer on the onion...  For at least some of the failures, there
> is a stalled CPU-hotplug operation.  This of course can mean that the
> timers are stuck on the partially offlined CPU.  So I dumped the stack
> of the task that is taking the CPU offline.  Please see below for console
> output and patch.
> 
> I am considering doing an unsolicited wakeup of the task doing the
> hotplug operation, but I am not convinced that the entirely of the CPU
> hotplug code is willing to put up with that sort of thing.

What I did instead was to dump out the state of the task that
__cpuhp_kick_ap() waits on, please see the patch at the very end of this
email.  This triggered as shown below, and you guessed it, that task is
waiting on a grace period.  Which I am guessing won't happen until the
outgoing CPU reaches CPUHP_TIMERS_DEAD state and calls timers_dead_cpu().
Which will prevent RCU's grace-period kthread from ever awakening, which
will prevent the task that __cpuhp_kick_ap() waits on from ever awakening,
which will prevent the outgoing CPU from reaching CPUHP_TIMERS_DEAD state.

Deadlock.

Does that make sense, or am I missing a trick here?

I tried invoking timers_dead_cpu() from sched_cpu_deactivate(), but
that triggered the BUG_ON(cpu_online(cpu)).  I removed this BUG_ON(),
and appear to have deadlocked on the timer ->lock.

Any thoughts on what else to try?

							Thanx, Paul

------------------------------------------------------------------------

[ 2939.381210] schedule_timeout: Waylayed timer base->clk: 0x1002709c3 jiffies: 0x100284614 base->next_expiry: 0x1002709c3 timer->flags: 0x40000007 timer->expires 0x10027f3a2 idx: 100 idx_now: 105 base->pending_map 000000000000000800000000000000000002000000000000020080400002000868000000000000010000000001000000000000000000000000000000000000000000000000000000
[ 2939.381219] Torture onoff task state:
[ 2939.381221] torture_onoff   D14584   842      2 0x80000000
[ 2939.381239] Call Trace:
[ 2939.381245]  ? __schedule+0x33c/0x6f0
[ 2939.381248]  ? preempt_count_add+0x51/0x90
[ 2939.381250]  schedule+0x37/0x90
[ 2939.381270]  schedule_timeout+0x20d/0x4c0
[ 2939.381276]  wait_for_completion+0x95/0x100
[ 2939.381282]  ? wake_up_q+0x60/0x60
[ 2939.381299]  __cpuhp_kick_ap+0x49/0x60
[ 2939.381313]  cpuhp_kick_ap+0x29/0x70
[ 2939.381316]  cpuhp_kick_ap_work+0x34/0x130
[ 2939.381322]  _cpu_down+0x17e/0x1f0
[ 2939.381326]  do_cpu_down+0x30/0x50
[ 2939.381354]  torture_offline+0x76/0x140
[ 2939.381368]  torture_onoff+0xef/0x1c0
[ 2939.381371]  kthread+0xf0/0x130
[ 2939.381374]  ? torture_kthread_stopping+0x70/0x70
[ 2939.381376]  ? kthread_destroy_worker+0x40/0x40
[ 2939.381378]  ret_from_fork+0x1f/0x30
[ 2939.381381] __cpuhp_kick_ap task state:
[ 2939.381383] cpuhp/7         D14504    46      2 0x80000000
[ 2939.381399] Call Trace:
[ 2939.381403]  ? __schedule+0x33c/0x6f0
[ 2939.381406]  schedule+0x37/0x90
[ 2939.381408]  schedule_timeout+0x20d/0x4c0
[ 2939.381414]  ? enqueue_task_fair+0x556/0x12b0
[ 2939.381417]  wait_for_completion+0x95/0x100
[ 2939.381419]  ? wake_up_q+0x60/0x60
[ 2939.381431]  __wait_rcu_gp+0x10d/0x140
[ 2939.381441]  ? sched_cpu_activate+0xc0/0xc0
[ 2939.381443]  sched_cpu_deactivate+0x36/0xa0
[ 2939.381453]  ? kfree_call_rcu+0x20/0x20
[ 2939.381454]  ? __call_rcu+0x530/0x530
[ 2939.381457]  ? __rcu_read_unlock+0x50/0x50
[ 2939.381459]  ? __rcu_read_unlock+0x50/0x50
[ 2939.381462]  cpuhp_invoke_callback+0xa8/0x610
[ 2939.381466]  cpuhp_thread_fun+0xc5/0x150
[ 2939.381469]  smpboot_thread_fn+0x15c/0x220
[ 2939.381476]  kthread+0xf0/0x130
[ 2939.381478]  ? sort_range+0x20/0x20
[ 2939.381480]  ? kthread_destroy_worker+0x40/0x40
[ 2939.381482]  ret_from_fork+0x1f/0x30

------------------------------------------------------------------------

commit aa2b802cb432a725589884d2ba47833de21e0871
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Wed Dec 6 09:46:30 2017 -0800

    EXP: timer: Dump __cpuhp_kick_ap task state
    
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 04892a82f6ac..dc9098946b3a 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -381,6 +381,8 @@ cpuhp_reset_state(struct cpuhp_cpu_state *st, enum cpuhp_state prev_state)
 	st->bringup = !st->bringup;
 }
 
+void schedule_timeout_set_task3dump(struct task_struct *t);
+
 /* Regular hotplug invocation of the AP hotplug thread */
 static void __cpuhp_kick_ap(struct cpuhp_cpu_state *st)
 {
@@ -395,7 +397,9 @@ static void __cpuhp_kick_ap(struct cpuhp_cpu_state *st)
 	smp_mb();
 	st->should_run = true;
 	wake_up_process(st->thread);
+	schedule_timeout_set_task3dump(st->thread);
 	wait_for_ap_thread(st, st->bringup);
+	schedule_timeout_set_task3dump(NULL);
 }
 
 static int cpuhp_kick_ap(struct cpuhp_cpu_state *st, enum cpuhp_state target)
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 0a67df4cdeb6..532b43d05219 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1725,6 +1725,7 @@ static void process_timeout(struct timer_list *t)
 }
 
 static struct task_struct *schedule_timeout_task2dump;
+static struct task_struct *schedule_timeout_task3dump;
 static DEFINE_MUTEX(schedule_timeout_task2dump_mutex);
 void schedule_timeout_set_task2dump(struct task_struct *t)
 {
@@ -1732,7 +1733,12 @@ void schedule_timeout_set_task2dump(struct task_struct *t)
 	WRITE_ONCE(schedule_timeout_task2dump, t);
 	mutex_unlock(&schedule_timeout_task2dump_mutex);
 }
-EXPORT_SYMBOL_GPL(schedule_timeout_set_task2dump);
+void schedule_timeout_set_task3dump(struct task_struct *t)
+{
+	mutex_lock(&schedule_timeout_task2dump_mutex);
+	WRITE_ONCE(schedule_timeout_task3dump, t);
+	mutex_unlock(&schedule_timeout_task2dump_mutex);
+}
 
 /**
  * schedule_timeout - sleep until timeout
@@ -1824,6 +1830,10 @@ signed long __sched schedule_timeout(signed long timeout)
 				pr_info("Torture onoff task state:\n");
 				sched_show_task(schedule_timeout_task2dump);
 			}
+			if (schedule_timeout_task3dump) {
+				pr_info("__cpuhp_kick_ap task state:\n");
+				sched_show_task(schedule_timeout_task3dump);
+			}
 			mutex_unlock(&schedule_timeout_task2dump_mutex);
 		}
 	}

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Timer refuses to expire
  2017-12-06 22:04     ` Paul E. McKenney
@ 2017-12-07  7:03       ` Boqun Feng
  2017-12-07 14:56         ` Paul E. McKenney
  0 siblings, 1 reply; 11+ messages in thread
From: Boqun Feng @ 2017-12-07  7:03 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: anna-maria, tglx, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 9298 bytes --]

Hi Paul,

On Wed, Dec 06, 2017 at 02:04:21PM -0800, Paul E. McKenney wrote:
> On Tue, Dec 05, 2017 at 03:37:44PM -0800, Paul E. McKenney wrote:
> > On Mon, Dec 04, 2017 at 09:42:08AM -0800, Paul E. McKenney wrote:
> > > On Fri, Dec 01, 2017 at 10:25:29AM -0800, Paul E. McKenney wrote:
> > > > Hello, Anna-Maria,
> > > > 
> > > > It turned out that one set of problems was due to NO_HZ_FULL issues,
> > > > which are addressed with a pair of patches.  However, there is still one
> > > > case of a timer refusing to expire in rcutorture's TREE01 test scenario.
> > > > The takes from two to four hours to trigger, and the most recent one
> > > > gives the following diagnostic (see patch at the very end of this email):
> > > > 
> > > > [13166.127458] schedule_timeout: Waylayed timer base->clk: 0x100c40004 jiffies: 0x100c4524e base->next_expiry: 0x100c40004 timer->flags: 0x1000003 timer->expires 0x100c40003 idx: 4 idx_now: ea base->pending_map 000000000000001000000000001080000000000000040000000002000000000000000000000000000000000000040000000000000000000000000000000000000000000000000000
> > > > 
> > > > The message appears any time a timer for less than five jiffies takes
> > > > more than eight seconds to expire.  In all cases, this expiry did not
> > > > happen naturally, but rather via an unsolicited wakeup from the RCU CPU
> > > > stall code.  If I am interpreting this message correctly, base->clk
> > > > has advanced past this timer, and the timer is correctly flagged in
> > > > base->pending_map.  This seems like part of the problem because the
> > > > timer's timeout was only three jiffies.  However, base->clk has not
> > > > advanced for more than 20 seconds, which seems like another problem.
> > > > 
> > > > What additional diagnostics would be helpful?  I can capture data
> > > > at the beginning of the schedule_timeout in the timer_list structure,
> > > > and of course can print more information at the time of the wakeup.
> > > 
> > > And on the off-chance that the following messages from this weekend's
> > > runs are at all helpful.  One interesting point is that starting at
> > > time 4731.872311, there are repeated enqueues to CPU 5's timer wheel,
> > > but CPU 5's ->clk does not advance.  Things seem to advance at
> > > time 4891.721190.
> > 
> > Another layer on the onion...  For at least some of the failures, there
> > is a stalled CPU-hotplug operation.  This of course can mean that the
> > timers are stuck on the partially offlined CPU.  So I dumped the stack
> > of the task that is taking the CPU offline.  Please see below for console
> > output and patch.
> > 
> > I am considering doing an unsolicited wakeup of the task doing the
> > hotplug operation, but I am not convinced that the entirely of the CPU
> > hotplug code is willing to put up with that sort of thing.
> 
> What I did instead was to dump out the state of the task that
> __cpuhp_kick_ap() waits on, please see the patch at the very end of this
> email.  This triggered as shown below, and you guessed it, that task is
> waiting on a grace period.  Which I am guessing won't happen until the
> outgoing CPU reaches CPUHP_TIMERS_DEAD state and calls timers_dead_cpu().
> Which will prevent RCU's grace-period kthread from ever awakening, which
> will prevent the task that __cpuhp_kick_ap() waits on from ever awakening,
> which will prevent the outgoing CPU from reaching CPUHP_TIMERS_DEAD state.
> 
> Deadlock.
> 

There is one thing I'm confused here. Sure, this is a deadlock, but the
timer should still work in such a deadlock, right? I mean, the timer of
schedule_timeout() should be able to wake up rcu_gp_kthread() even in
this case? And yes, the gp kthread will continue to wait due to the
deadlock, but the deadlock can not explain the "Waylayed timer", right?

Regards,
Boqun

> Does that make sense, or am I missing a trick here?
> 
> I tried invoking timers_dead_cpu() from sched_cpu_deactivate(), but
> that triggered the BUG_ON(cpu_online(cpu)).  I removed this BUG_ON(),
> and appear to have deadlocked on the timer ->lock.
> 
> Any thoughts on what else to try?
> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> [ 2939.381210] schedule_timeout: Waylayed timer base->clk: 0x1002709c3 jiffies: 0x100284614 base->next_expiry: 0x1002709c3 timer->flags: 0x40000007 timer->expires 0x10027f3a2 idx: 100 idx_now: 105 base->pending_map 000000000000000800000000000000000002000000000000020080400002000868000000000000010000000001000000000000000000000000000000000000000000000000000000
> [ 2939.381219] Torture onoff task state:
> [ 2939.381221] torture_onoff   D14584   842      2 0x80000000
> [ 2939.381239] Call Trace:
> [ 2939.381245]  ? __schedule+0x33c/0x6f0
> [ 2939.381248]  ? preempt_count_add+0x51/0x90
> [ 2939.381250]  schedule+0x37/0x90
> [ 2939.381270]  schedule_timeout+0x20d/0x4c0
> [ 2939.381276]  wait_for_completion+0x95/0x100
> [ 2939.381282]  ? wake_up_q+0x60/0x60
> [ 2939.381299]  __cpuhp_kick_ap+0x49/0x60
> [ 2939.381313]  cpuhp_kick_ap+0x29/0x70
> [ 2939.381316]  cpuhp_kick_ap_work+0x34/0x130
> [ 2939.381322]  _cpu_down+0x17e/0x1f0
> [ 2939.381326]  do_cpu_down+0x30/0x50
> [ 2939.381354]  torture_offline+0x76/0x140
> [ 2939.381368]  torture_onoff+0xef/0x1c0
> [ 2939.381371]  kthread+0xf0/0x130
> [ 2939.381374]  ? torture_kthread_stopping+0x70/0x70
> [ 2939.381376]  ? kthread_destroy_worker+0x40/0x40
> [ 2939.381378]  ret_from_fork+0x1f/0x30
> [ 2939.381381] __cpuhp_kick_ap task state:
> [ 2939.381383] cpuhp/7         D14504    46      2 0x80000000
> [ 2939.381399] Call Trace:
> [ 2939.381403]  ? __schedule+0x33c/0x6f0
> [ 2939.381406]  schedule+0x37/0x90
> [ 2939.381408]  schedule_timeout+0x20d/0x4c0
> [ 2939.381414]  ? enqueue_task_fair+0x556/0x12b0
> [ 2939.381417]  wait_for_completion+0x95/0x100
> [ 2939.381419]  ? wake_up_q+0x60/0x60
> [ 2939.381431]  __wait_rcu_gp+0x10d/0x140
> [ 2939.381441]  ? sched_cpu_activate+0xc0/0xc0
> [ 2939.381443]  sched_cpu_deactivate+0x36/0xa0
> [ 2939.381453]  ? kfree_call_rcu+0x20/0x20
> [ 2939.381454]  ? __call_rcu+0x530/0x530
> [ 2939.381457]  ? __rcu_read_unlock+0x50/0x50
> [ 2939.381459]  ? __rcu_read_unlock+0x50/0x50
> [ 2939.381462]  cpuhp_invoke_callback+0xa8/0x610
> [ 2939.381466]  cpuhp_thread_fun+0xc5/0x150
> [ 2939.381469]  smpboot_thread_fn+0x15c/0x220
> [ 2939.381476]  kthread+0xf0/0x130
> [ 2939.381478]  ? sort_range+0x20/0x20
> [ 2939.381480]  ? kthread_destroy_worker+0x40/0x40
> [ 2939.381482]  ret_from_fork+0x1f/0x30
> 
> ------------------------------------------------------------------------
> 
> commit aa2b802cb432a725589884d2ba47833de21e0871
> Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Date:   Wed Dec 6 09:46:30 2017 -0800
> 
>     EXP: timer: Dump __cpuhp_kick_ap task state
>     
>     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 04892a82f6ac..dc9098946b3a 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -381,6 +381,8 @@ cpuhp_reset_state(struct cpuhp_cpu_state *st, enum cpuhp_state prev_state)
>  	st->bringup = !st->bringup;
>  }
>  
> +void schedule_timeout_set_task3dump(struct task_struct *t);
> +
>  /* Regular hotplug invocation of the AP hotplug thread */
>  static void __cpuhp_kick_ap(struct cpuhp_cpu_state *st)
>  {
> @@ -395,7 +397,9 @@ static void __cpuhp_kick_ap(struct cpuhp_cpu_state *st)
>  	smp_mb();
>  	st->should_run = true;
>  	wake_up_process(st->thread);
> +	schedule_timeout_set_task3dump(st->thread);
>  	wait_for_ap_thread(st, st->bringup);
> +	schedule_timeout_set_task3dump(NULL);
>  }
>  
>  static int cpuhp_kick_ap(struct cpuhp_cpu_state *st, enum cpuhp_state target)
> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> index 0a67df4cdeb6..532b43d05219 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -1725,6 +1725,7 @@ static void process_timeout(struct timer_list *t)
>  }
>  
>  static struct task_struct *schedule_timeout_task2dump;
> +static struct task_struct *schedule_timeout_task3dump;
>  static DEFINE_MUTEX(schedule_timeout_task2dump_mutex);
>  void schedule_timeout_set_task2dump(struct task_struct *t)
>  {
> @@ -1732,7 +1733,12 @@ void schedule_timeout_set_task2dump(struct task_struct *t)
>  	WRITE_ONCE(schedule_timeout_task2dump, t);
>  	mutex_unlock(&schedule_timeout_task2dump_mutex);
>  }
> -EXPORT_SYMBOL_GPL(schedule_timeout_set_task2dump);
> +void schedule_timeout_set_task3dump(struct task_struct *t)
> +{
> +	mutex_lock(&schedule_timeout_task2dump_mutex);
> +	WRITE_ONCE(schedule_timeout_task3dump, t);
> +	mutex_unlock(&schedule_timeout_task2dump_mutex);
> +}
>  
>  /**
>   * schedule_timeout - sleep until timeout
> @@ -1824,6 +1830,10 @@ signed long __sched schedule_timeout(signed long timeout)
>  				pr_info("Torture onoff task state:\n");
>  				sched_show_task(schedule_timeout_task2dump);
>  			}
> +			if (schedule_timeout_task3dump) {
> +				pr_info("__cpuhp_kick_ap task state:\n");
> +				sched_show_task(schedule_timeout_task3dump);
> +			}
>  			mutex_unlock(&schedule_timeout_task2dump_mutex);
>  		}
>  	}
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Timer refuses to expire
  2017-12-07  7:03       ` Boqun Feng
@ 2017-12-07 14:56         ` Paul E. McKenney
  2017-12-07 21:45           ` Paul E. McKenney
  2017-12-08  1:06           ` Boqun Feng
  0 siblings, 2 replies; 11+ messages in thread
From: Paul E. McKenney @ 2017-12-07 14:56 UTC (permalink / raw)
  To: Boqun Feng; +Cc: anna-maria, tglx, linux-kernel

On Thu, Dec 07, 2017 at 03:03:50PM +0800, Boqun Feng wrote:
> Hi Paul,
> 
> On Wed, Dec 06, 2017 at 02:04:21PM -0800, Paul E. McKenney wrote:
> > On Tue, Dec 05, 2017 at 03:37:44PM -0800, Paul E. McKenney wrote:
> > > On Mon, Dec 04, 2017 at 09:42:08AM -0800, Paul E. McKenney wrote:
> > > > On Fri, Dec 01, 2017 at 10:25:29AM -0800, Paul E. McKenney wrote:
> > > > > Hello, Anna-Maria,
> > > > > 
> > > > > It turned out that one set of problems was due to NO_HZ_FULL issues,
> > > > > which are addressed with a pair of patches.  However, there is still one
> > > > > case of a timer refusing to expire in rcutorture's TREE01 test scenario.
> > > > > The takes from two to four hours to trigger, and the most recent one
> > > > > gives the following diagnostic (see patch at the very end of this email):
> > > > > 
> > > > > [13166.127458] schedule_timeout: Waylayed timer base->clk: 0x100c40004 jiffies: 0x100c4524e base->next_expiry: 0x100c40004 timer->flags: 0x1000003 timer->expires 0x100c40003 idx: 4 idx_now: ea base->pending_map 000000000000001000000000001080000000000000040000000002000000000000000000000000000000000000040000000000000000000000000000000000000000000000000000
> > > > > 
> > > > > The message appears any time a timer for less than five jiffies takes
> > > > > more than eight seconds to expire.  In all cases, this expiry did not
> > > > > happen naturally, but rather via an unsolicited wakeup from the RCU CPU
> > > > > stall code.  If I am interpreting this message correctly, base->clk
> > > > > has advanced past this timer, and the timer is correctly flagged in
> > > > > base->pending_map.  This seems like part of the problem because the
> > > > > timer's timeout was only three jiffies.  However, base->clk has not
> > > > > advanced for more than 20 seconds, which seems like another problem.
> > > > > 
> > > > > What additional diagnostics would be helpful?  I can capture data
> > > > > at the beginning of the schedule_timeout in the timer_list structure,
> > > > > and of course can print more information at the time of the wakeup.
> > > > 
> > > > And on the off-chance that the following messages from this weekend's
> > > > runs are at all helpful.  One interesting point is that starting at
> > > > time 4731.872311, there are repeated enqueues to CPU 5's timer wheel,
> > > > but CPU 5's ->clk does not advance.  Things seem to advance at
> > > > time 4891.721190.
> > > 
> > > Another layer on the onion...  For at least some of the failures, there
> > > is a stalled CPU-hotplug operation.  This of course can mean that the
> > > timers are stuck on the partially offlined CPU.  So I dumped the stack
> > > of the task that is taking the CPU offline.  Please see below for console
> > > output and patch.
> > > 
> > > I am considering doing an unsolicited wakeup of the task doing the
> > > hotplug operation, but I am not convinced that the entirely of the CPU
> > > hotplug code is willing to put up with that sort of thing.
> > 
> > What I did instead was to dump out the state of the task that
> > __cpuhp_kick_ap() waits on, please see the patch at the very end of this
> > email.  This triggered as shown below, and you guessed it, that task is
> > waiting on a grace period.  Which I am guessing won't happen until the
> > outgoing CPU reaches CPUHP_TIMERS_DEAD state and calls timers_dead_cpu().
> > Which will prevent RCU's grace-period kthread from ever awakening, which
> > will prevent the task that __cpuhp_kick_ap() waits on from ever awakening,
> > which will prevent the outgoing CPU from reaching CPUHP_TIMERS_DEAD state.
> > 
> > Deadlock.
> 
> There is one thing I'm confused here. Sure, this is a deadlock, but the
> timer should still work in such a deadlock, right? I mean, the timer of
> schedule_timeout() should be able to wake up rcu_gp_kthread() even in
> this case? And yes, the gp kthread will continue to wait due to the
> deadlock, but the deadlock can not explain the "Waylayed timer", right?

My belief is that the timer cannot fire because it is still on the
offlined CPU, and that CPU has not yet reached timers_dead_cpu().
But I might be missing something subtle in either the timers code or the
CPU-hotplug code, so please do check my reasoning here.  (I am relying on
the "timer->flags: 0x40000007" and the "cpuhp/7" below, which I believe
means that the timer is on CPU 7 and that it is CPU 7 that is in the
process of going offline.)

The "Waylayed timer" happens because the RCU CPU stall warning code
wakes up the grace-period kthread.  This is driven out of the
scheduling-clock tick, so is unaffected by timers, though it does
rely on the jiffies counter continuing to be incremented.

So what am I missing here?

							Thanx, Paul

> Regards,
> Boqun
> 
> > Does that make sense, or am I missing a trick here?
> > 
> > I tried invoking timers_dead_cpu() from sched_cpu_deactivate(), but
> > that triggered the BUG_ON(cpu_online(cpu)).  I removed this BUG_ON(),
> > and appear to have deadlocked on the timer ->lock.
> > 
> > Any thoughts on what else to try?
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > [ 2939.381210] schedule_timeout: Waylayed timer base->clk: 0x1002709c3 jiffies: 0x100284614 base->next_expiry: 0x1002709c3 timer->flags: 0x40000007 timer->expires 0x10027f3a2 idx: 100 idx_now: 105 base->pending_map 000000000000000800000000000000000002000000000000020080400002000868000000000000010000000001000000000000000000000000000000000000000000000000000000
> > [ 2939.381219] Torture onoff task state:
> > [ 2939.381221] torture_onoff   D14584   842      2 0x80000000
> > [ 2939.381239] Call Trace:
> > [ 2939.381245]  ? __schedule+0x33c/0x6f0
> > [ 2939.381248]  ? preempt_count_add+0x51/0x90
> > [ 2939.381250]  schedule+0x37/0x90
> > [ 2939.381270]  schedule_timeout+0x20d/0x4c0
> > [ 2939.381276]  wait_for_completion+0x95/0x100
> > [ 2939.381282]  ? wake_up_q+0x60/0x60
> > [ 2939.381299]  __cpuhp_kick_ap+0x49/0x60
> > [ 2939.381313]  cpuhp_kick_ap+0x29/0x70
> > [ 2939.381316]  cpuhp_kick_ap_work+0x34/0x130
> > [ 2939.381322]  _cpu_down+0x17e/0x1f0
> > [ 2939.381326]  do_cpu_down+0x30/0x50
> > [ 2939.381354]  torture_offline+0x76/0x140
> > [ 2939.381368]  torture_onoff+0xef/0x1c0
> > [ 2939.381371]  kthread+0xf0/0x130
> > [ 2939.381374]  ? torture_kthread_stopping+0x70/0x70
> > [ 2939.381376]  ? kthread_destroy_worker+0x40/0x40
> > [ 2939.381378]  ret_from_fork+0x1f/0x30
> > [ 2939.381381] __cpuhp_kick_ap task state:
> > [ 2939.381383] cpuhp/7         D14504    46      2 0x80000000
> > [ 2939.381399] Call Trace:
> > [ 2939.381403]  ? __schedule+0x33c/0x6f0
> > [ 2939.381406]  schedule+0x37/0x90
> > [ 2939.381408]  schedule_timeout+0x20d/0x4c0
> > [ 2939.381414]  ? enqueue_task_fair+0x556/0x12b0
> > [ 2939.381417]  wait_for_completion+0x95/0x100
> > [ 2939.381419]  ? wake_up_q+0x60/0x60
> > [ 2939.381431]  __wait_rcu_gp+0x10d/0x140
> > [ 2939.381441]  ? sched_cpu_activate+0xc0/0xc0
> > [ 2939.381443]  sched_cpu_deactivate+0x36/0xa0
> > [ 2939.381453]  ? kfree_call_rcu+0x20/0x20
> > [ 2939.381454]  ? __call_rcu+0x530/0x530
> > [ 2939.381457]  ? __rcu_read_unlock+0x50/0x50
> > [ 2939.381459]  ? __rcu_read_unlock+0x50/0x50
> > [ 2939.381462]  cpuhp_invoke_callback+0xa8/0x610
> > [ 2939.381466]  cpuhp_thread_fun+0xc5/0x150
> > [ 2939.381469]  smpboot_thread_fn+0x15c/0x220
> > [ 2939.381476]  kthread+0xf0/0x130
> > [ 2939.381478]  ? sort_range+0x20/0x20
> > [ 2939.381480]  ? kthread_destroy_worker+0x40/0x40
> > [ 2939.381482]  ret_from_fork+0x1f/0x30
> > 
> > ------------------------------------------------------------------------
> > 
> > commit aa2b802cb432a725589884d2ba47833de21e0871
> > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Date:   Wed Dec 6 09:46:30 2017 -0800
> > 
> >     EXP: timer: Dump __cpuhp_kick_ap task state
> >     
> >     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > 
> > diff --git a/kernel/cpu.c b/kernel/cpu.c
> > index 04892a82f6ac..dc9098946b3a 100644
> > --- a/kernel/cpu.c
> > +++ b/kernel/cpu.c
> > @@ -381,6 +381,8 @@ cpuhp_reset_state(struct cpuhp_cpu_state *st, enum cpuhp_state prev_state)
> >  	st->bringup = !st->bringup;
> >  }
> >  
> > +void schedule_timeout_set_task3dump(struct task_struct *t);
> > +
> >  /* Regular hotplug invocation of the AP hotplug thread */
> >  static void __cpuhp_kick_ap(struct cpuhp_cpu_state *st)
> >  {
> > @@ -395,7 +397,9 @@ static void __cpuhp_kick_ap(struct cpuhp_cpu_state *st)
> >  	smp_mb();
> >  	st->should_run = true;
> >  	wake_up_process(st->thread);
> > +	schedule_timeout_set_task3dump(st->thread);
> >  	wait_for_ap_thread(st, st->bringup);
> > +	schedule_timeout_set_task3dump(NULL);
> >  }
> >  
> >  static int cpuhp_kick_ap(struct cpuhp_cpu_state *st, enum cpuhp_state target)
> > diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> > index 0a67df4cdeb6..532b43d05219 100644
> > --- a/kernel/time/timer.c
> > +++ b/kernel/time/timer.c
> > @@ -1725,6 +1725,7 @@ static void process_timeout(struct timer_list *t)
> >  }
> >  
> >  static struct task_struct *schedule_timeout_task2dump;
> > +static struct task_struct *schedule_timeout_task3dump;
> >  static DEFINE_MUTEX(schedule_timeout_task2dump_mutex);
> >  void schedule_timeout_set_task2dump(struct task_struct *t)
> >  {
> > @@ -1732,7 +1733,12 @@ void schedule_timeout_set_task2dump(struct task_struct *t)
> >  	WRITE_ONCE(schedule_timeout_task2dump, t);
> >  	mutex_unlock(&schedule_timeout_task2dump_mutex);
> >  }
> > -EXPORT_SYMBOL_GPL(schedule_timeout_set_task2dump);
> > +void schedule_timeout_set_task3dump(struct task_struct *t)
> > +{
> > +	mutex_lock(&schedule_timeout_task2dump_mutex);
> > +	WRITE_ONCE(schedule_timeout_task3dump, t);
> > +	mutex_unlock(&schedule_timeout_task2dump_mutex);
> > +}
> >  
> >  /**
> >   * schedule_timeout - sleep until timeout
> > @@ -1824,6 +1830,10 @@ signed long __sched schedule_timeout(signed long timeout)
> >  				pr_info("Torture onoff task state:\n");
> >  				sched_show_task(schedule_timeout_task2dump);
> >  			}
> > +			if (schedule_timeout_task3dump) {
> > +				pr_info("__cpuhp_kick_ap task state:\n");
> > +				sched_show_task(schedule_timeout_task3dump);
> > +			}
> >  			mutex_unlock(&schedule_timeout_task2dump_mutex);
> >  		}
> >  	}
> > 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Timer refuses to expire
  2017-12-07 14:56         ` Paul E. McKenney
@ 2017-12-07 21:45           ` Paul E. McKenney
  2017-12-08  0:31             ` Paul E. McKenney
  2017-12-08  1:06           ` Boqun Feng
  1 sibling, 1 reply; 11+ messages in thread
From: Paul E. McKenney @ 2017-12-07 21:45 UTC (permalink / raw)
  To: Boqun Feng; +Cc: anna-maria, tglx, linux-kernel

On Thu, Dec 07, 2017 at 06:56:17AM -0800, Paul E. McKenney wrote:
> On Thu, Dec 07, 2017 at 03:03:50PM +0800, Boqun Feng wrote:

[ . . . ]

> > > What I did instead was to dump out the state of the task that
> > > __cpuhp_kick_ap() waits on, please see the patch at the very end of this
> > > email.  This triggered as shown below, and you guessed it, that task is
> > > waiting on a grace period.  Which I am guessing won't happen until the
> > > outgoing CPU reaches CPUHP_TIMERS_DEAD state and calls timers_dead_cpu().
> > > Which will prevent RCU's grace-period kthread from ever awakening, which
> > > will prevent the task that __cpuhp_kick_ap() waits on from ever awakening,
> > > which will prevent the outgoing CPU from reaching CPUHP_TIMERS_DEAD state.
> > > 
> > > Deadlock.
> > 
> > There is one thing I'm confused here. Sure, this is a deadlock, but the
> > timer should still work in such a deadlock, right? I mean, the timer of
> > schedule_timeout() should be able to wake up rcu_gp_kthread() even in
> > this case? And yes, the gp kthread will continue to wait due to the
> > deadlock, but the deadlock can not explain the "Waylayed timer", right?
> 
> My belief is that the timer cannot fire because it is still on the
> offlined CPU, and that CPU has not yet reached timers_dead_cpu().
> But I might be missing something subtle in either the timers code or the
> CPU-hotplug code, so please do check my reasoning here.  (I am relying on
> the "timer->flags: 0x40000007" and the "cpuhp/7" below, which I believe
> means that the timer is on CPU 7 and that it is CPU 7 that is in the
> process of going offline.)
> 
> The "Waylayed timer" happens because the RCU CPU stall warning code
> wakes up the grace-period kthread.  This is driven out of the
> scheduling-clock tick, so is unaffected by timers, though it does
> rely on the jiffies counter continuing to be incremented.
> 
> So what am I missing here?

Well, last night's runs had situations where the ->flags CPU didn't
match the CPU going offline, so I am clearly missing something or another.

One thing I might have been missing was the CPU-online processing.
What happens if a CPU goes offline, comes back online, but before ->clk
gets adjusted there is a schedule_timeout()?  Now, schedule_timeout()
does compute the absolute ->expires time using jiffies, so the wakeup time
should not be too far off of the desired time.  Except that the timers
now have something like 8% slop, and that slop will be calculated on the
difference between the desired expiration time and the (way outdated)
->clk value.  So the 8% might be a rather large number.  For example,
if the CPU was offline for 12 minutes (unlikely but entirely possible
with rcutorture testing's random onlining and offlining), the slop on
a 3-millisecond timer might be a full minute.

To my timer-naive eyes, it looks like a simple fix is to set
old_base->must_forward_clk to true in timers_dead_cpu() for each
timer_base, as shown below.  The other possibility that I considered
was to instead set ->is_idle, but that looked like an engraved
invitation to send IPIs to offline CPUs.

I am giving it a spin.  I still believe that the offline deadlock
scenario can happen, but one thing at a time...

Thoughts?

							Thanx, Paul

------------------------------------------------------------------------

commit e7e66e48125cba05b52242deb5b3fcb787aafe0e
Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Date:   Thu Dec 7 13:18:44 2017 -0800

    timers: Ensure that timer_base ->clk accounts for time offline
    
    The timer_base ->must_forward_clk is set to indicate that the next timer
    operation on that timer_base must check for passage of time.  One instance
    of time passage is when the timer wheel goes idle, and another is when
    the corresponding CPU is offline.  Note that it is not appropriate to set
    ->is_idle because that could result in IPIing an offline CPU.  Therefore,
    this commit instead sets ->must_forward_clk at CPU-offline time.
    
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index fdc086129682..0ab767c2bff2 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1921,6 +1921,7 @@ int timers_dead_cpu(unsigned int cpu)
 
 		BUG_ON(old_base->running_timer);
 
+		old_base->must_forward_clk = true;
 		for (i = 0; i < WHEEL_SIZE; i++)
 			migrate_timer_list(new_base, old_base->vectors + i);
 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Timer refuses to expire
  2017-12-07 21:45           ` Paul E. McKenney
@ 2017-12-08  0:31             ` Paul E. McKenney
  0 siblings, 0 replies; 11+ messages in thread
From: Paul E. McKenney @ 2017-12-08  0:31 UTC (permalink / raw)
  To: Boqun Feng; +Cc: anna-maria, tglx, linux-kernel

On Thu, Dec 07, 2017 at 01:45:14PM -0800, Paul E. McKenney wrote:
> On Thu, Dec 07, 2017 at 06:56:17AM -0800, Paul E. McKenney wrote:
> > On Thu, Dec 07, 2017 at 03:03:50PM +0800, Boqun Feng wrote:
> 
> [ . . . ]
> 
> > > > What I did instead was to dump out the state of the task that
> > > > __cpuhp_kick_ap() waits on, please see the patch at the very end of this
> > > > email.  This triggered as shown below, and you guessed it, that task is
> > > > waiting on a grace period.  Which I am guessing won't happen until the
> > > > outgoing CPU reaches CPUHP_TIMERS_DEAD state and calls timers_dead_cpu().
> > > > Which will prevent RCU's grace-period kthread from ever awakening, which
> > > > will prevent the task that __cpuhp_kick_ap() waits on from ever awakening,
> > > > which will prevent the outgoing CPU from reaching CPUHP_TIMERS_DEAD state.
> > > > 
> > > > Deadlock.
> > > 
> > > There is one thing I'm confused here. Sure, this is a deadlock, but the
> > > timer should still work in such a deadlock, right? I mean, the timer of
> > > schedule_timeout() should be able to wake up rcu_gp_kthread() even in
> > > this case? And yes, the gp kthread will continue to wait due to the
> > > deadlock, but the deadlock can not explain the "Waylayed timer", right?
> > 
> > My belief is that the timer cannot fire because it is still on the
> > offlined CPU, and that CPU has not yet reached timers_dead_cpu().
> > But I might be missing something subtle in either the timers code or the
> > CPU-hotplug code, so please do check my reasoning here.  (I am relying on
> > the "timer->flags: 0x40000007" and the "cpuhp/7" below, which I believe
> > means that the timer is on CPU 7 and that it is CPU 7 that is in the
> > process of going offline.)
> > 
> > The "Waylayed timer" happens because the RCU CPU stall warning code
> > wakes up the grace-period kthread.  This is driven out of the
> > scheduling-clock tick, so is unaffected by timers, though it does
> > rely on the jiffies counter continuing to be incremented.
> > 
> > So what am I missing here?
> 
> Well, last night's runs had situations where the ->flags CPU didn't
> match the CPU going offline, so I am clearly missing something or another.
> 
> One thing I might have been missing was the CPU-online processing.
> What happens if a CPU goes offline, comes back online, but before ->clk
> gets adjusted there is a schedule_timeout()?  Now, schedule_timeout()
> does compute the absolute ->expires time using jiffies, so the wakeup time
> should not be too far off of the desired time.  Except that the timers
> now have something like 8% slop, and that slop will be calculated on the
> difference between the desired expiration time and the (way outdated)
> ->clk value.  So the 8% might be a rather large number.  For example,
> if the CPU was offline for 12 minutes (unlikely but entirely possible
> with rcutorture testing's random onlining and offlining), the slop on
> a 3-millisecond timer might be a full minute.
> 
> To my timer-naive eyes, it looks like a simple fix is to set
> old_base->must_forward_clk to true in timers_dead_cpu() for each
> timer_base, as shown below.  The other possibility that I considered
> was to instead set ->is_idle, but that looked like an engraved
> invitation to send IPIs to offline CPUs.
> 
> I am giving it a spin.  I still believe that the offline deadlock
> scenario can happen, but one thing at a time...
> 
> Thoughts?

And it is hard to tell whether or not this is helping.  Not too
surprising, given that most of the splats seem to be the deadlock
case instead.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Timer refuses to expire
  2017-12-07 14:56         ` Paul E. McKenney
  2017-12-07 21:45           ` Paul E. McKenney
@ 2017-12-08  1:06           ` Boqun Feng
  2017-12-08  1:26             ` Paul E. McKenney
  1 sibling, 1 reply; 11+ messages in thread
From: Boqun Feng @ 2017-12-08  1:06 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: anna-maria, tglx, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 11627 bytes --]

On Thu, Dec 07, 2017 at 06:56:17AM -0800, Paul E. McKenney wrote:
> On Thu, Dec 07, 2017 at 03:03:50PM +0800, Boqun Feng wrote:
> > Hi Paul,
> > 
> > On Wed, Dec 06, 2017 at 02:04:21PM -0800, Paul E. McKenney wrote:
> > > On Tue, Dec 05, 2017 at 03:37:44PM -0800, Paul E. McKenney wrote:
> > > > On Mon, Dec 04, 2017 at 09:42:08AM -0800, Paul E. McKenney wrote:
> > > > > On Fri, Dec 01, 2017 at 10:25:29AM -0800, Paul E. McKenney wrote:
> > > > > > Hello, Anna-Maria,
> > > > > > 
> > > > > > It turned out that one set of problems was due to NO_HZ_FULL issues,
> > > > > > which are addressed with a pair of patches.  However, there is still one
> > > > > > case of a timer refusing to expire in rcutorture's TREE01 test scenario.
> > > > > > The takes from two to four hours to trigger, and the most recent one
> > > > > > gives the following diagnostic (see patch at the very end of this email):
> > > > > > 
> > > > > > [13166.127458] schedule_timeout: Waylayed timer base->clk: 0x100c40004 jiffies: 0x100c4524e base->next_expiry: 0x100c40004 timer->flags: 0x1000003 timer->expires 0x100c40003 idx: 4 idx_now: ea base->pending_map 000000000000001000000000001080000000000000040000000002000000000000000000000000000000000000040000000000000000000000000000000000000000000000000000
> > > > > > 
> > > > > > The message appears any time a timer for less than five jiffies takes
> > > > > > more than eight seconds to expire.  In all cases, this expiry did not
> > > > > > happen naturally, but rather via an unsolicited wakeup from the RCU CPU
> > > > > > stall code.  If I am interpreting this message correctly, base->clk
> > > > > > has advanced past this timer, and the timer is correctly flagged in
> > > > > > base->pending_map.  This seems like part of the problem because the
> > > > > > timer's timeout was only three jiffies.  However, base->clk has not
> > > > > > advanced for more than 20 seconds, which seems like another problem.
> > > > > > 
> > > > > > What additional diagnostics would be helpful?  I can capture data
> > > > > > at the beginning of the schedule_timeout in the timer_list structure,
> > > > > > and of course can print more information at the time of the wakeup.
> > > > > 
> > > > > And on the off-chance that the following messages from this weekend's
> > > > > runs are at all helpful.  One interesting point is that starting at
> > > > > time 4731.872311, there are repeated enqueues to CPU 5's timer wheel,
> > > > > but CPU 5's ->clk does not advance.  Things seem to advance at
> > > > > time 4891.721190.
> > > > 
> > > > Another layer on the onion...  For at least some of the failures, there
> > > > is a stalled CPU-hotplug operation.  This of course can mean that the
> > > > timers are stuck on the partially offlined CPU.  So I dumped the stack
> > > > of the task that is taking the CPU offline.  Please see below for console
> > > > output and patch.
> > > > 
> > > > I am considering doing an unsolicited wakeup of the task doing the
> > > > hotplug operation, but I am not convinced that the entirely of the CPU
> > > > hotplug code is willing to put up with that sort of thing.
> > > 
> > > What I did instead was to dump out the state of the task that
> > > __cpuhp_kick_ap() waits on, please see the patch at the very end of this
> > > email.  This triggered as shown below, and you guessed it, that task is
> > > waiting on a grace period.  Which I am guessing won't happen until the
> > > outgoing CPU reaches CPUHP_TIMERS_DEAD state and calls timers_dead_cpu().
> > > Which will prevent RCU's grace-period kthread from ever awakening, which
> > > will prevent the task that __cpuhp_kick_ap() waits on from ever awakening,
> > > which will prevent the outgoing CPU from reaching CPUHP_TIMERS_DEAD state.
> > > 
> > > Deadlock.
> > 
> > There is one thing I'm confused here. Sure, this is a deadlock, but the
> > timer should still work in such a deadlock, right? I mean, the timer of
> > schedule_timeout() should be able to wake up rcu_gp_kthread() even in
> > this case? And yes, the gp kthread will continue to wait due to the
> > deadlock, but the deadlock can not explain the "Waylayed timer", right?
> 
> My belief is that the timer cannot fire because it is still on the
> offlined CPU, and that CPU has not yet reached timers_dead_cpu().
> But I might be missing something subtle in either the timers code or the
> CPU-hotplug code, so please do check my reasoning here.  (I am relying on
> the "timer->flags: 0x40000007" and the "cpuhp/7" below, which I believe
> means that the timer is on CPU 7 and that it is CPU 7 that is in the
> process of going offline.)
> 

So if CPU 7 is going offline and has not yet reached timers_dead_cpu(),
the timer on CPU 7 can not be fired? I'm not sure this is true, because
since timers_dead_cpu() has not been reached, the hrtimer and softirq on
CPU 7 should still be working, so the timer base should still be working
well on CPU 7, no? I didn't find a place where we "disable" timer base
on the hotplug path. But maybe I'm missing something..

> The "Waylayed timer" happens because the RCU CPU stall warning code
> wakes up the grace-period kthread.  This is driven out of the
> scheduling-clock tick, so is unaffected by timers, though it does
> rely on the jiffies counter continuing to be incremented.
> 

I mean if the timer base works well, then there should be no "Waylayed
timer", the timer of schedule_timeout() will wake up the gp kthread
instead of RCU CPU stall code.

Regards,
Boqun

> So what am I missing here?
> 
> 							Thanx, Paul
> 
> > Regards,
> > Boqun
> > 
> > > Does that make sense, or am I missing a trick here?
> > > 
> > > I tried invoking timers_dead_cpu() from sched_cpu_deactivate(), but
> > > that triggered the BUG_ON(cpu_online(cpu)).  I removed this BUG_ON(),
> > > and appear to have deadlocked on the timer ->lock.
> > > 
> > > Any thoughts on what else to try?
> > > 
> > > 							Thanx, Paul
> > > 
> > > ------------------------------------------------------------------------
> > > 
> > > [ 2939.381210] schedule_timeout: Waylayed timer base->clk: 0x1002709c3 jiffies: 0x100284614 base->next_expiry: 0x1002709c3 timer->flags: 0x40000007 timer->expires 0x10027f3a2 idx: 100 idx_now: 105 base->pending_map 000000000000000800000000000000000002000000000000020080400002000868000000000000010000000001000000000000000000000000000000000000000000000000000000
> > > [ 2939.381219] Torture onoff task state:
> > > [ 2939.381221] torture_onoff   D14584   842      2 0x80000000
> > > [ 2939.381239] Call Trace:
> > > [ 2939.381245]  ? __schedule+0x33c/0x6f0
> > > [ 2939.381248]  ? preempt_count_add+0x51/0x90
> > > [ 2939.381250]  schedule+0x37/0x90
> > > [ 2939.381270]  schedule_timeout+0x20d/0x4c0
> > > [ 2939.381276]  wait_for_completion+0x95/0x100
> > > [ 2939.381282]  ? wake_up_q+0x60/0x60
> > > [ 2939.381299]  __cpuhp_kick_ap+0x49/0x60
> > > [ 2939.381313]  cpuhp_kick_ap+0x29/0x70
> > > [ 2939.381316]  cpuhp_kick_ap_work+0x34/0x130
> > > [ 2939.381322]  _cpu_down+0x17e/0x1f0
> > > [ 2939.381326]  do_cpu_down+0x30/0x50
> > > [ 2939.381354]  torture_offline+0x76/0x140
> > > [ 2939.381368]  torture_onoff+0xef/0x1c0
> > > [ 2939.381371]  kthread+0xf0/0x130
> > > [ 2939.381374]  ? torture_kthread_stopping+0x70/0x70
> > > [ 2939.381376]  ? kthread_destroy_worker+0x40/0x40
> > > [ 2939.381378]  ret_from_fork+0x1f/0x30
> > > [ 2939.381381] __cpuhp_kick_ap task state:
> > > [ 2939.381383] cpuhp/7         D14504    46      2 0x80000000
> > > [ 2939.381399] Call Trace:
> > > [ 2939.381403]  ? __schedule+0x33c/0x6f0
> > > [ 2939.381406]  schedule+0x37/0x90
> > > [ 2939.381408]  schedule_timeout+0x20d/0x4c0
> > > [ 2939.381414]  ? enqueue_task_fair+0x556/0x12b0
> > > [ 2939.381417]  wait_for_completion+0x95/0x100
> > > [ 2939.381419]  ? wake_up_q+0x60/0x60
> > > [ 2939.381431]  __wait_rcu_gp+0x10d/0x140
> > > [ 2939.381441]  ? sched_cpu_activate+0xc0/0xc0
> > > [ 2939.381443]  sched_cpu_deactivate+0x36/0xa0
> > > [ 2939.381453]  ? kfree_call_rcu+0x20/0x20
> > > [ 2939.381454]  ? __call_rcu+0x530/0x530
> > > [ 2939.381457]  ? __rcu_read_unlock+0x50/0x50
> > > [ 2939.381459]  ? __rcu_read_unlock+0x50/0x50
> > > [ 2939.381462]  cpuhp_invoke_callback+0xa8/0x610
> > > [ 2939.381466]  cpuhp_thread_fun+0xc5/0x150
> > > [ 2939.381469]  smpboot_thread_fn+0x15c/0x220
> > > [ 2939.381476]  kthread+0xf0/0x130
> > > [ 2939.381478]  ? sort_range+0x20/0x20
> > > [ 2939.381480]  ? kthread_destroy_worker+0x40/0x40
> > > [ 2939.381482]  ret_from_fork+0x1f/0x30
> > > 
> > > ------------------------------------------------------------------------
> > > 
> > > commit aa2b802cb432a725589884d2ba47833de21e0871
> > > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > Date:   Wed Dec 6 09:46:30 2017 -0800
> > > 
> > >     EXP: timer: Dump __cpuhp_kick_ap task state
> > >     
> > >     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > 
> > > diff --git a/kernel/cpu.c b/kernel/cpu.c
> > > index 04892a82f6ac..dc9098946b3a 100644
> > > --- a/kernel/cpu.c
> > > +++ b/kernel/cpu.c
> > > @@ -381,6 +381,8 @@ cpuhp_reset_state(struct cpuhp_cpu_state *st, enum cpuhp_state prev_state)
> > >  	st->bringup = !st->bringup;
> > >  }
> > >  
> > > +void schedule_timeout_set_task3dump(struct task_struct *t);
> > > +
> > >  /* Regular hotplug invocation of the AP hotplug thread */
> > >  static void __cpuhp_kick_ap(struct cpuhp_cpu_state *st)
> > >  {
> > > @@ -395,7 +397,9 @@ static void __cpuhp_kick_ap(struct cpuhp_cpu_state *st)
> > >  	smp_mb();
> > >  	st->should_run = true;
> > >  	wake_up_process(st->thread);
> > > +	schedule_timeout_set_task3dump(st->thread);
> > >  	wait_for_ap_thread(st, st->bringup);
> > > +	schedule_timeout_set_task3dump(NULL);
> > >  }
> > >  
> > >  static int cpuhp_kick_ap(struct cpuhp_cpu_state *st, enum cpuhp_state target)
> > > diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> > > index 0a67df4cdeb6..532b43d05219 100644
> > > --- a/kernel/time/timer.c
> > > +++ b/kernel/time/timer.c
> > > @@ -1725,6 +1725,7 @@ static void process_timeout(struct timer_list *t)
> > >  }
> > >  
> > >  static struct task_struct *schedule_timeout_task2dump;
> > > +static struct task_struct *schedule_timeout_task3dump;
> > >  static DEFINE_MUTEX(schedule_timeout_task2dump_mutex);
> > >  void schedule_timeout_set_task2dump(struct task_struct *t)
> > >  {
> > > @@ -1732,7 +1733,12 @@ void schedule_timeout_set_task2dump(struct task_struct *t)
> > >  	WRITE_ONCE(schedule_timeout_task2dump, t);
> > >  	mutex_unlock(&schedule_timeout_task2dump_mutex);
> > >  }
> > > -EXPORT_SYMBOL_GPL(schedule_timeout_set_task2dump);
> > > +void schedule_timeout_set_task3dump(struct task_struct *t)
> > > +{
> > > +	mutex_lock(&schedule_timeout_task2dump_mutex);
> > > +	WRITE_ONCE(schedule_timeout_task3dump, t);
> > > +	mutex_unlock(&schedule_timeout_task2dump_mutex);
> > > +}
> > >  
> > >  /**
> > >   * schedule_timeout - sleep until timeout
> > > @@ -1824,6 +1830,10 @@ signed long __sched schedule_timeout(signed long timeout)
> > >  				pr_info("Torture onoff task state:\n");
> > >  				sched_show_task(schedule_timeout_task2dump);
> > >  			}
> > > +			if (schedule_timeout_task3dump) {
> > > +				pr_info("__cpuhp_kick_ap task state:\n");
> > > +				sched_show_task(schedule_timeout_task3dump);
> > > +			}
> > >  			mutex_unlock(&schedule_timeout_task2dump_mutex);
> > >  		}
> > >  	}
> > > 
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Timer refuses to expire
  2017-12-08  1:06           ` Boqun Feng
@ 2017-12-08  1:26             ` Paul E. McKenney
  0 siblings, 0 replies; 11+ messages in thread
From: Paul E. McKenney @ 2017-12-08  1:26 UTC (permalink / raw)
  To: Boqun Feng; +Cc: anna-maria, tglx, linux-kernel

On Fri, Dec 08, 2017 at 09:06:40AM +0800, Boqun Feng wrote:
> On Thu, Dec 07, 2017 at 06:56:17AM -0800, Paul E. McKenney wrote:
> > On Thu, Dec 07, 2017 at 03:03:50PM +0800, Boqun Feng wrote:
> > > Hi Paul,
> > > 
> > > On Wed, Dec 06, 2017 at 02:04:21PM -0800, Paul E. McKenney wrote:
> > > > On Tue, Dec 05, 2017 at 03:37:44PM -0800, Paul E. McKenney wrote:
> > > > > On Mon, Dec 04, 2017 at 09:42:08AM -0800, Paul E. McKenney wrote:
> > > > > > On Fri, Dec 01, 2017 at 10:25:29AM -0800, Paul E. McKenney wrote:
> > > > > > > Hello, Anna-Maria,
> > > > > > > 
> > > > > > > It turned out that one set of problems was due to NO_HZ_FULL issues,
> > > > > > > which are addressed with a pair of patches.  However, there is still one
> > > > > > > case of a timer refusing to expire in rcutorture's TREE01 test scenario.
> > > > > > > The takes from two to four hours to trigger, and the most recent one
> > > > > > > gives the following diagnostic (see patch at the very end of this email):
> > > > > > > 
> > > > > > > [13166.127458] schedule_timeout: Waylayed timer base->clk: 0x100c40004 jiffies: 0x100c4524e base->next_expiry: 0x100c40004 timer->flags: 0x1000003 timer->expires 0x100c40003 idx: 4 idx_now: ea base->pending_map 000000000000001000000000001080000000000000040000000002000000000000000000000000000000000000040000000000000000000000000000000000000000000000000000
> > > > > > > 
> > > > > > > The message appears any time a timer for less than five jiffies takes
> > > > > > > more than eight seconds to expire.  In all cases, this expiry did not
> > > > > > > happen naturally, but rather via an unsolicited wakeup from the RCU CPU
> > > > > > > stall code.  If I am interpreting this message correctly, base->clk
> > > > > > > has advanced past this timer, and the timer is correctly flagged in
> > > > > > > base->pending_map.  This seems like part of the problem because the
> > > > > > > timer's timeout was only three jiffies.  However, base->clk has not
> > > > > > > advanced for more than 20 seconds, which seems like another problem.
> > > > > > > 
> > > > > > > What additional diagnostics would be helpful?  I can capture data
> > > > > > > at the beginning of the schedule_timeout in the timer_list structure,
> > > > > > > and of course can print more information at the time of the wakeup.
> > > > > > 
> > > > > > And on the off-chance that the following messages from this weekend's
> > > > > > runs are at all helpful.  One interesting point is that starting at
> > > > > > time 4731.872311, there are repeated enqueues to CPU 5's timer wheel,
> > > > > > but CPU 5's ->clk does not advance.  Things seem to advance at
> > > > > > time 4891.721190.
> > > > > 
> > > > > Another layer on the onion...  For at least some of the failures, there
> > > > > is a stalled CPU-hotplug operation.  This of course can mean that the
> > > > > timers are stuck on the partially offlined CPU.  So I dumped the stack
> > > > > of the task that is taking the CPU offline.  Please see below for console
> > > > > output and patch.
> > > > > 
> > > > > I am considering doing an unsolicited wakeup of the task doing the
> > > > > hotplug operation, but I am not convinced that the entirely of the CPU
> > > > > hotplug code is willing to put up with that sort of thing.
> > > > 
> > > > What I did instead was to dump out the state of the task that
> > > > __cpuhp_kick_ap() waits on, please see the patch at the very end of this
> > > > email.  This triggered as shown below, and you guessed it, that task is
> > > > waiting on a grace period.  Which I am guessing won't happen until the
> > > > outgoing CPU reaches CPUHP_TIMERS_DEAD state and calls timers_dead_cpu().
> > > > Which will prevent RCU's grace-period kthread from ever awakening, which
> > > > will prevent the task that __cpuhp_kick_ap() waits on from ever awakening,
> > > > which will prevent the outgoing CPU from reaching CPUHP_TIMERS_DEAD state.
> > > > 
> > > > Deadlock.
> > > 
> > > There is one thing I'm confused here. Sure, this is a deadlock, but the
> > > timer should still work in such a deadlock, right? I mean, the timer of
> > > schedule_timeout() should be able to wake up rcu_gp_kthread() even in
> > > this case? And yes, the gp kthread will continue to wait due to the
> > > deadlock, but the deadlock can not explain the "Waylayed timer", right?
> > 
> > My belief is that the timer cannot fire because it is still on the
> > offlined CPU, and that CPU has not yet reached timers_dead_cpu().
> > But I might be missing something subtle in either the timers code or the
> > CPU-hotplug code, so please do check my reasoning here.  (I am relying on
> > the "timer->flags: 0x40000007" and the "cpuhp/7" below, which I believe
> > means that the timer is on CPU 7 and that it is CPU 7 that is in the
> > process of going offline.)
> 
> So if CPU 7 is going offline and has not yet reached timers_dead_cpu(),
> the timer on CPU 7 can not be fired? I'm not sure this is true, because
> since timers_dead_cpu() has not been reached, the hrtimer and softirq on
> CPU 7 should still be working, so the timer base should still be working
> well on CPU 7, no? I didn't find a place where we "disable" timer base
> on the hotplug path. But maybe I'm missing something..

It clearly isn't waking up the grace-period kthread, whose timer is
on a CPU 7 timer wheel.  But your point about interrupts and softirqs
still seeming like they would be working at that time is a good one.

> > The "Waylayed timer" happens because the RCU CPU stall warning code
> > wakes up the grace-period kthread.  This is driven out of the
> > scheduling-clock tick, so is unaffected by timers, though it does
> > rely on the jiffies counter continuing to be incremented.
> 
> I mean if the timer base works well, then there should be no "Waylayed
> timer", the timer of schedule_timeout() will wake up the gp kthread
> instead of RCU CPU stall code.

I will agree that something is broken somewhere.  ;-)

							Thanx, Paul

> Regards,
> Boqun
> 
> > So what am I missing here?
> > 
> > 							Thanx, Paul
> > 
> > > Regards,
> > > Boqun
> > > 
> > > > Does that make sense, or am I missing a trick here?
> > > > 
> > > > I tried invoking timers_dead_cpu() from sched_cpu_deactivate(), but
> > > > that triggered the BUG_ON(cpu_online(cpu)).  I removed this BUG_ON(),
> > > > and appear to have deadlocked on the timer ->lock.
> > > > 
> > > > Any thoughts on what else to try?
> > > > 
> > > > 							Thanx, Paul
> > > > 
> > > > ------------------------------------------------------------------------
> > > > 
> > > > [ 2939.381210] schedule_timeout: Waylayed timer base->clk: 0x1002709c3 jiffies: 0x100284614 base->next_expiry: 0x1002709c3 timer->flags: 0x40000007 timer->expires 0x10027f3a2 idx: 100 idx_now: 105 base->pending_map 000000000000000800000000000000000002000000000000020080400002000868000000000000010000000001000000000000000000000000000000000000000000000000000000
> > > > [ 2939.381219] Torture onoff task state:
> > > > [ 2939.381221] torture_onoff   D14584   842      2 0x80000000
> > > > [ 2939.381239] Call Trace:
> > > > [ 2939.381245]  ? __schedule+0x33c/0x6f0
> > > > [ 2939.381248]  ? preempt_count_add+0x51/0x90
> > > > [ 2939.381250]  schedule+0x37/0x90
> > > > [ 2939.381270]  schedule_timeout+0x20d/0x4c0
> > > > [ 2939.381276]  wait_for_completion+0x95/0x100
> > > > [ 2939.381282]  ? wake_up_q+0x60/0x60
> > > > [ 2939.381299]  __cpuhp_kick_ap+0x49/0x60
> > > > [ 2939.381313]  cpuhp_kick_ap+0x29/0x70
> > > > [ 2939.381316]  cpuhp_kick_ap_work+0x34/0x130
> > > > [ 2939.381322]  _cpu_down+0x17e/0x1f0
> > > > [ 2939.381326]  do_cpu_down+0x30/0x50
> > > > [ 2939.381354]  torture_offline+0x76/0x140
> > > > [ 2939.381368]  torture_onoff+0xef/0x1c0
> > > > [ 2939.381371]  kthread+0xf0/0x130
> > > > [ 2939.381374]  ? torture_kthread_stopping+0x70/0x70
> > > > [ 2939.381376]  ? kthread_destroy_worker+0x40/0x40
> > > > [ 2939.381378]  ret_from_fork+0x1f/0x30
> > > > [ 2939.381381] __cpuhp_kick_ap task state:
> > > > [ 2939.381383] cpuhp/7         D14504    46      2 0x80000000
> > > > [ 2939.381399] Call Trace:
> > > > [ 2939.381403]  ? __schedule+0x33c/0x6f0
> > > > [ 2939.381406]  schedule+0x37/0x90
> > > > [ 2939.381408]  schedule_timeout+0x20d/0x4c0
> > > > [ 2939.381414]  ? enqueue_task_fair+0x556/0x12b0
> > > > [ 2939.381417]  wait_for_completion+0x95/0x100
> > > > [ 2939.381419]  ? wake_up_q+0x60/0x60
> > > > [ 2939.381431]  __wait_rcu_gp+0x10d/0x140
> > > > [ 2939.381441]  ? sched_cpu_activate+0xc0/0xc0
> > > > [ 2939.381443]  sched_cpu_deactivate+0x36/0xa0
> > > > [ 2939.381453]  ? kfree_call_rcu+0x20/0x20
> > > > [ 2939.381454]  ? __call_rcu+0x530/0x530
> > > > [ 2939.381457]  ? __rcu_read_unlock+0x50/0x50
> > > > [ 2939.381459]  ? __rcu_read_unlock+0x50/0x50
> > > > [ 2939.381462]  cpuhp_invoke_callback+0xa8/0x610
> > > > [ 2939.381466]  cpuhp_thread_fun+0xc5/0x150
> > > > [ 2939.381469]  smpboot_thread_fn+0x15c/0x220
> > > > [ 2939.381476]  kthread+0xf0/0x130
> > > > [ 2939.381478]  ? sort_range+0x20/0x20
> > > > [ 2939.381480]  ? kthread_destroy_worker+0x40/0x40
> > > > [ 2939.381482]  ret_from_fork+0x1f/0x30
> > > > 
> > > > ------------------------------------------------------------------------
> > > > 
> > > > commit aa2b802cb432a725589884d2ba47833de21e0871
> > > > Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > > Date:   Wed Dec 6 09:46:30 2017 -0800
> > > > 
> > > >     EXP: timer: Dump __cpuhp_kick_ap task state
> > > >     
> > > >     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > > 
> > > > diff --git a/kernel/cpu.c b/kernel/cpu.c
> > > > index 04892a82f6ac..dc9098946b3a 100644
> > > > --- a/kernel/cpu.c
> > > > +++ b/kernel/cpu.c
> > > > @@ -381,6 +381,8 @@ cpuhp_reset_state(struct cpuhp_cpu_state *st, enum cpuhp_state prev_state)
> > > >  	st->bringup = !st->bringup;
> > > >  }
> > > >  
> > > > +void schedule_timeout_set_task3dump(struct task_struct *t);
> > > > +
> > > >  /* Regular hotplug invocation of the AP hotplug thread */
> > > >  static void __cpuhp_kick_ap(struct cpuhp_cpu_state *st)
> > > >  {
> > > > @@ -395,7 +397,9 @@ static void __cpuhp_kick_ap(struct cpuhp_cpu_state *st)
> > > >  	smp_mb();
> > > >  	st->should_run = true;
> > > >  	wake_up_process(st->thread);
> > > > +	schedule_timeout_set_task3dump(st->thread);
> > > >  	wait_for_ap_thread(st, st->bringup);
> > > > +	schedule_timeout_set_task3dump(NULL);
> > > >  }
> > > >  
> > > >  static int cpuhp_kick_ap(struct cpuhp_cpu_state *st, enum cpuhp_state target)
> > > > diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> > > > index 0a67df4cdeb6..532b43d05219 100644
> > > > --- a/kernel/time/timer.c
> > > > +++ b/kernel/time/timer.c
> > > > @@ -1725,6 +1725,7 @@ static void process_timeout(struct timer_list *t)
> > > >  }
> > > >  
> > > >  static struct task_struct *schedule_timeout_task2dump;
> > > > +static struct task_struct *schedule_timeout_task3dump;
> > > >  static DEFINE_MUTEX(schedule_timeout_task2dump_mutex);
> > > >  void schedule_timeout_set_task2dump(struct task_struct *t)
> > > >  {
> > > > @@ -1732,7 +1733,12 @@ void schedule_timeout_set_task2dump(struct task_struct *t)
> > > >  	WRITE_ONCE(schedule_timeout_task2dump, t);
> > > >  	mutex_unlock(&schedule_timeout_task2dump_mutex);
> > > >  }
> > > > -EXPORT_SYMBOL_GPL(schedule_timeout_set_task2dump);
> > > > +void schedule_timeout_set_task3dump(struct task_struct *t)
> > > > +{
> > > > +	mutex_lock(&schedule_timeout_task2dump_mutex);
> > > > +	WRITE_ONCE(schedule_timeout_task3dump, t);
> > > > +	mutex_unlock(&schedule_timeout_task2dump_mutex);
> > > > +}
> > > >  
> > > >  /**
> > > >   * schedule_timeout - sleep until timeout
> > > > @@ -1824,6 +1830,10 @@ signed long __sched schedule_timeout(signed long timeout)
> > > >  				pr_info("Torture onoff task state:\n");
> > > >  				sched_show_task(schedule_timeout_task2dump);
> > > >  			}
> > > > +			if (schedule_timeout_task3dump) {
> > > > +				pr_info("__cpuhp_kick_ap task state:\n");
> > > > +				sched_show_task(schedule_timeout_task3dump);
> > > > +			}
> > > >  			mutex_unlock(&schedule_timeout_task2dump_mutex);
> > > >  		}
> > > >  	}
> > > > 
> > 
> > 

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-12-08  1:26 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-01 18:25 Timer refuses to expire Paul E. McKenney
2017-12-04 17:34 ` Paul E. McKenney
2017-12-04 17:42 ` Paul E. McKenney
2017-12-05 23:37   ` Paul E. McKenney
2017-12-06 22:04     ` Paul E. McKenney
2017-12-07  7:03       ` Boqun Feng
2017-12-07 14:56         ` Paul E. McKenney
2017-12-07 21:45           ` Paul E. McKenney
2017-12-08  0:31             ` Paul E. McKenney
2017-12-08  1:06           ` Boqun Feng
2017-12-08  1:26             ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).