All of lore.kernel.org
 help / color / mirror / Atom feed
* deadlock in scheduler enabling HRTICK feature
@ 2013-06-25 21:05 David Ahern
  2013-06-25 21:17 ` Peter Zijlstra
  0 siblings, 1 reply; 17+ messages in thread
From: David Ahern @ 2013-06-25 21:05 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, LKML

Peter/Ingo:

I can reliably cause a deadlock in the scheduler by enabling the HRTICK 
feature. I first hit the problem with 2.6.27 but have been able to 
reproduce it with newer kernels. I have not tried top of Linus' tree, so 
perhaps this has been fixed in 3.10. Exact backtrace differs by release, 
but the root cause is the same: the run queue is locked early in the 
schedule path and then wanted again servicing the softirq.

Using Fedora 18 and the 3.9.6-200.fc18.x86_64 kernel as an example,

[root@f18 ~]# cat /sys/kernel/debug/sched_features
GENTLE_FAIR_SLEEPERS START_DEBIT NO_NEXT_BUDDY LAST_BUDDY 
CACHE_HOT_BUDDY WAKEUP_PREEMPTION ARCH_POWER NO_HRTICK NO_DOUBLE_TICK 
LB_BIAS OWNER_SPIN NONTASK_POWER TTWU_QUEUE NO_FORCE_SD_OVERLAP 
RT_RUNTIME_SHARE NO_LB_MIN NO_NUMA NO_NUMA_FORCE

[root@f18 ~]# echo HRTICK > /sys/kernel/debug/sched_features

[root@f18 ~]# cat /sys/kernel/debug/sched_features
GENTLE_FAIR_SLEEPERS START_DEBIT NO_NEXT_BUDDY LAST_BUDDY 
CACHE_HOT_BUDDY WAKEUP_PREEMPTION ARCH_POWER HRTICK NO_DOUBLE_TICK 
LB_BIAS OWNER_SPIN NONTASK_POWER TTWU_QUEUE NO_FORCE_SD_OVERLAP 
RT_RUNTIME_SHARE NO_LB_MIN NO_NUMA NO_NUMA_FORCE

For a workload a simple kernel build suffices: 'make O=/tmp/kbuild -j 8' 
on a 4vcpu VM. Lockup occurs pretty quickly.

The relevant stack trace from the nmi watchdog:
...
[  219.467698]  <<EOE>>  [<ffffffff81093c61>] try_to_wake_up+0x1d1/0x2d0
[  219.467698]  [<ffffffff81043daf>] ? kvm_clock_read+0x1f/0x30
[  219.467698]  [<ffffffff81093dc7>] wake_up_process+0x27/0x50
[  219.467698]  [<ffffffff81066fc9>] wakeup_softirqd+0x29/0x30
[  219.467698]  [<ffffffff81067b95>] raise_softirq_irqoff+0x25/0x30
[  219.467698]  [<ffffffff810867c5>] __hrtimer_start_range_ns+0x3a5/0x400
[  219.467698]  [<ffffffff8109a089>] ? update_curr+0x99/0x170
[  219.467698]  [<ffffffff81086854>] hrtimer_start_range_ns+0x14/0x20
[  219.467698]  [<ffffffff81090bf0>] hrtick_start+0x90/0xa0
[  219.467698]  [<ffffffff810985f8>] hrtick_start_fair+0x88/0xd0
[  219.467698]  [<ffffffff81098f33>] hrtick_update+0x73/0x80
[  219.467698]  [<ffffffff8109c876>] enqueue_task_fair+0x346/0x550
[  219.467698]  [<ffffffff81090ab6>] enqueue_task+0x66/0x80
[  219.467698]  [<ffffffff81091443>] activate_task+0x23/0x30
[  219.467698]  [<ffffffff810917ac>] ttwu_do_activate.constprop.83+0x3c/0x70
[  219.467698]  [<ffffffff81093c6c>] try_to_wake_up+0x1dc/0x2d0
[  219.467698]  [<ffffffff81198898>] ? mem_cgroup_charge_common+0xa8/0x120
[  219.467698]  [<ffffffff81093d72>] default_wake_function+0x12/0x20
[  219.467698]  [<ffffffff810833fd>] autoremove_wake_function+0x1d/0x50
[  219.467698]  [<ffffffff8108b0e5>] __wake_up_common+0x55/0x90
[  219.467698]  [<ffffffff8108e973>] __wake_up_sync_key+0x53/0x80
...


You can see the nested calls to try_to_wake_up() which has called 
ttwu_queue() in both places. The trouble spot is here in ttwu_queue:
     ...
     raw_spin_lock(&rq->lock);     <---- dead lock here on second call
     ttwu_do_activate(rq, p, 0);
     raw_spin_unlock(&rq->lock);
     ...

David

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-07-12 13:30 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-25 21:05 deadlock in scheduler enabling HRTICK feature David Ahern
2013-06-25 21:17 ` Peter Zijlstra
2013-06-25 21:20   ` David Ahern
2013-06-26  7:05     ` Peter Zijlstra
2013-06-26 16:46       ` David Ahern
2013-06-27 10:43         ` Peter Zijlstra
2013-06-27 10:53           ` Peter Zijlstra
2013-06-27 12:28             ` Mike Galbraith
2013-06-27 13:06             ` Ingo Molnar
2013-06-27 19:18             ` Andy Lutomirski
2013-06-27 20:37               ` Peter Zijlstra
2013-06-27 22:28           ` David Ahern
2013-06-28  9:00             ` Ingo Molnar
2013-06-28  9:18               ` Peter Zijlstra
2013-07-12 13:29                 ` [tip:sched/core] sched: Fix HRTICK tip-bot for Peter Zijlstra
2013-06-28  9:09             ` deadlock in scheduler enabling HRTICK feature Peter Zijlstra
2013-06-28 17:28               ` David Ahern

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.