linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Lockdep splat involving all_q_mutex
@ 2017-05-10 22:34 Paul E. McKenney
  2017-05-11  2:55 ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Paul E. McKenney @ 2017-05-10 22:34 UTC (permalink / raw)
  To: peterz, axboe; +Cc: linux-kernel, rostedt, tglx

Hello!

I got the lockdep splat shown below during some rcutorture testing (which
does CPU hotplug operations) on mainline at commit dc9edaab90de ("Merge
tag 'acpi-extra-4.12-rc1' of git://git.kernel.org/.../rafael/linux-pm").
My kneejerk reaction was just to reverse the "mutex_lock(&all_q_mutex);"
and "get_online_cpus();" in blk_mq_init_allocated_queue(), but then
I noticed that commit eabe06595d62 ("block/mq: Cure cpu hotplug lock
inversion") just got done moving these two statements in the other
direction.

Acquiring the update-side CPU-hotplug lock across sched_feat_write()
seems like it might be an alternative to eabe06595d62, but I figured
I should check first.  Another approach would be to do the work in
blk_mq_queue_reinit_dead() asynchronously, for example, from a workqueue,
but I would have to know much more about blk_mq to know what effects
that would have.

Thoughts?

							Thanx, Paul

[   32.808758] ======================================================
[   32.810110] [ INFO: possible circular locking dependency detected ]
[   32.811468] 4.11.0+ #1 Not tainted
[   32.812190] -------------------------------------------------------
[   32.813626] torture_onoff/769 is trying to acquire lock:
[   32.814769]  (all_q_mutex){+.+...}, at: [<ffffffff93b884c3>] blk_mq_queue_reinit_work+0x13/0x110
[   32.816655] 
[   32.816655] but task is already holding lock:
[   32.817926]  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff9386014b>] cpu_hotplug_begin+0x6b/0xb0
[   32.819754] 
[   32.819754] which lock already depends on the new lock.
[   32.819754] 
[   32.821898] 
[   32.821898] the existing dependency chain (in reverse order) is:
[   32.823518] 
[   32.823518] -> #1 (cpu_hotplug.lock){+.+.+.}:
[   32.824788]        lock_acquire+0xd1/0x1b0
[   32.825728]        __mutex_lock+0x54/0x8a0
[   32.826629]        mutex_lock_nested+0x16/0x20
[   32.827585]        get_online_cpus+0x47/0x60
[   32.828519]        blk_mq_init_allocated_queue+0x398/0x4d0
[   32.829754]        blk_mq_init_queue+0x35/0x60
[   32.830766]        loop_add+0xe0/0x270
[   32.831595]        loop_init+0x10d/0x14b
[   32.832454]        do_one_initcall+0xef/0x160
[   32.833449]        kernel_init_freeable+0x1b6/0x23e
[   32.834517]        kernel_init+0x9/0x100
[   32.835363]        ret_from_fork+0x2e/0x40
[   32.836254] 
[   32.836254] -> #0 (all_q_mutex){+.+...}:
[   32.837635]        __lock_acquire+0x10bf/0x1350
[   32.838714]        lock_acquire+0xd1/0x1b0
[   32.839702]        __mutex_lock+0x54/0x8a0
[   32.840667]        mutex_lock_nested+0x16/0x20
[   32.841767]        blk_mq_queue_reinit_work+0x13/0x110
[   32.842987]        blk_mq_queue_reinit_dead+0x17/0x20
[   32.844164]        cpuhp_invoke_callback+0x1d1/0x770
[   32.845379]        cpuhp_down_callbacks+0x3d/0x80
[   32.846484]        _cpu_down+0xad/0xe0
[   32.847388]        do_cpu_down+0x39/0x50
[   32.848316]        cpu_down+0xb/0x10
[   32.849236]        torture_offline+0x75/0x140
[   32.850258]        torture_onoff+0x102/0x1e0
[   32.851278]        kthread+0x104/0x140
[   32.852158]        ret_from_fork+0x2e/0x40
[   32.853167] 
[   32.853167] other info that might help us debug this:
[   32.853167] 
[   32.855052]  Possible unsafe locking scenario:
[   32.855052] 
[   32.856442]        CPU0                    CPU1
[   32.857366]        ----                    ----
[   32.858429]   lock(cpu_hotplug.lock);
[   32.859289]                                lock(all_q_mutex);
[   32.860649]                                lock(cpu_hotplug.lock);
[   32.862148]   lock(all_q_mutex);
[   32.862910] 
[   32.862910]  *** DEADLOCK ***
[   32.862910] 
[   32.864289] 3 locks held by torture_onoff/769:
[   32.865386]  #0:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff938601f2>] do_cpu_down+0x22/0x50
[   32.867429]  #1:  (cpu_hotplug.dep_map){++++++}, at: [<ffffffff938600e0>] cpu_hotplug_begin+0x0/0xb0
[   32.869612]  #2:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff9386014b>] cpu_hotplug_begin+0x6b/0xb0
[   32.871700] 
[   32.871700] stack backtrace:
[   32.872727] CPU: 1 PID: 769 Comm: torture_onoff Not tainted 4.11.0+ #1
[   32.874299] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[   32.876447] Call Trace:
[   32.877053]  dump_stack+0x67/0x97
[   32.877828]  print_circular_bug+0x1e3/0x250
[   32.878789]  __lock_acquire+0x10bf/0x1350
[   32.879701]  ? retint_kernel+0x10/0x10
[   32.880567]  lock_acquire+0xd1/0x1b0
[   32.881453]  ? lock_acquire+0xd1/0x1b0
[   32.882305]  ? blk_mq_queue_reinit_work+0x13/0x110
[   32.883400]  __mutex_lock+0x54/0x8a0
[   32.884215]  ? blk_mq_queue_reinit_work+0x13/0x110
[   32.885390]  ? kernfs_put+0x103/0x1a0
[   32.886227]  ? kernfs_put+0x103/0x1a0
[   32.887063]  ? blk_mq_queue_reinit_work+0x13/0x110
[   32.888158]  ? rcu_read_lock_sched_held+0x58/0x60
[   32.889287]  ? kmem_cache_free+0x1f7/0x260
[   32.890224]  ? anon_transport_class_unregister+0x20/0x20
[   32.891443]  ? kernfs_put+0x103/0x1a0
[   32.892274]  ? blk_mq_queue_reinit_work+0x110/0x110
[   32.893436]  mutex_lock_nested+0x16/0x20
[   32.894328]  ? mutex_lock_nested+0x16/0x20
[   32.895260]  blk_mq_queue_reinit_work+0x13/0x110
[   32.896307]  blk_mq_queue_reinit_dead+0x17/0x20
[   32.897425]  cpuhp_invoke_callback+0x1d1/0x770
[   32.898443]  ? __flow_cache_shrink+0x130/0x130
[   32.899453]  cpuhp_down_callbacks+0x3d/0x80
[   32.900402]  _cpu_down+0xad/0xe0
[   32.901213]  do_cpu_down+0x39/0x50
[   32.902002]  cpu_down+0xb/0x10
[   32.902716]  torture_offline+0x75/0x140
[   32.903603]  torture_onoff+0x102/0x1e0
[   32.904459]  kthread+0x104/0x140
[   32.905243]  ? torture_kthread_stopping+0x70/0x70
[   32.906316]  ? kthread_create_on_node+0x40/0x40
[   32.907351]  ret_from_fork+0x2e/0x40

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Lockdep splat involving all_q_mutex
  2017-05-10 22:34 Lockdep splat involving all_q_mutex Paul E. McKenney
@ 2017-05-11  2:55 ` Jens Axboe
  2017-05-11  3:13   ` Paul E. McKenney
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2017-05-11  2:55 UTC (permalink / raw)
  To: paulmck, peterz; +Cc: linux-kernel, rostedt, tglx

On 05/10/2017 04:34 PM, Paul E. McKenney wrote:
> Hello!
> 
> I got the lockdep splat shown below during some rcutorture testing (which
> does CPU hotplug operations) on mainline at commit dc9edaab90de ("Merge
> tag 'acpi-extra-4.12-rc1' of git://git.kernel.org/.../rafael/linux-pm").
> My kneejerk reaction was just to reverse the "mutex_lock(&all_q_mutex);"
> and "get_online_cpus();" in blk_mq_init_allocated_queue(), but then
> I noticed that commit eabe06595d62 ("block/mq: Cure cpu hotplug lock
> inversion") just got done moving these two statements in the other
> direction.

The problem is that that patch got merged too early, as it only
fixes a lockdep splat with the cpu hotplug rework. Fix is coming Linus'
way, it's in my for-linus tree.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Lockdep splat involving all_q_mutex
  2017-05-11  2:55 ` Jens Axboe
@ 2017-05-11  3:13   ` Paul E. McKenney
  2017-05-11 20:12     ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Paul E. McKenney @ 2017-05-11  3:13 UTC (permalink / raw)
  To: Jens Axboe; +Cc: peterz, linux-kernel, rostedt, tglx

On Wed, May 10, 2017 at 08:55:54PM -0600, Jens Axboe wrote:
> On 05/10/2017 04:34 PM, Paul E. McKenney wrote:
> > Hello!
> > 
> > I got the lockdep splat shown below during some rcutorture testing (which
> > does CPU hotplug operations) on mainline at commit dc9edaab90de ("Merge
> > tag 'acpi-extra-4.12-rc1' of git://git.kernel.org/.../rafael/linux-pm").
> > My kneejerk reaction was just to reverse the "mutex_lock(&all_q_mutex);"
> > and "get_online_cpus();" in blk_mq_init_allocated_queue(), but then
> > I noticed that commit eabe06595d62 ("block/mq: Cure cpu hotplug lock
> > inversion") just got done moving these two statements in the other
> > direction.
> 
> The problem is that that patch got merged too early, as it only
> fixes a lockdep splat with the cpu hotplug rework. Fix is coming Linus'
> way, it's in my for-linus tree.

Thank you for the update, looking forward to the fix.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Lockdep splat involving all_q_mutex
  2017-05-11  3:13   ` Paul E. McKenney
@ 2017-05-11 20:12     ` Jens Axboe
  2017-05-11 20:23       ` Paul E. McKenney
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2017-05-11 20:12 UTC (permalink / raw)
  To: paulmck; +Cc: peterz, linux-kernel, rostedt, tglx

On 05/10/2017 09:13 PM, Paul E. McKenney wrote:
> On Wed, May 10, 2017 at 08:55:54PM -0600, Jens Axboe wrote:
>> On 05/10/2017 04:34 PM, Paul E. McKenney wrote:
>>> Hello!
>>>
>>> I got the lockdep splat shown below during some rcutorture testing (which
>>> does CPU hotplug operations) on mainline at commit dc9edaab90de ("Merge
>>> tag 'acpi-extra-4.12-rc1' of git://git.kernel.org/.../rafael/linux-pm").
>>> My kneejerk reaction was just to reverse the "mutex_lock(&all_q_mutex);"
>>> and "get_online_cpus();" in blk_mq_init_allocated_queue(), but then
>>> I noticed that commit eabe06595d62 ("block/mq: Cure cpu hotplug lock
>>> inversion") just got done moving these two statements in the other
>>> direction.
>>
>> The problem is that that patch got merged too early, as it only
>> fixes a lockdep splat with the cpu hotplug rework. Fix is coming Linus'
>> way, it's in my for-linus tree.
> 
> Thank you for the update, looking forward to the fix.

It's upstream now.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Lockdep splat involving all_q_mutex
  2017-05-11 20:12     ` Jens Axboe
@ 2017-05-11 20:23       ` Paul E. McKenney
  2017-05-12  5:02         ` Paul E. McKenney
  0 siblings, 1 reply; 6+ messages in thread
From: Paul E. McKenney @ 2017-05-11 20:23 UTC (permalink / raw)
  To: Jens Axboe; +Cc: peterz, linux-kernel, rostedt, tglx

On Thu, May 11, 2017 at 02:12:39PM -0600, Jens Axboe wrote:
> On 05/10/2017 09:13 PM, Paul E. McKenney wrote:
> > On Wed, May 10, 2017 at 08:55:54PM -0600, Jens Axboe wrote:
> >> On 05/10/2017 04:34 PM, Paul E. McKenney wrote:
> >>> Hello!
> >>>
> >>> I got the lockdep splat shown below during some rcutorture testing (which
> >>> does CPU hotplug operations) on mainline at commit dc9edaab90de ("Merge
> >>> tag 'acpi-extra-4.12-rc1' of git://git.kernel.org/.../rafael/linux-pm").
> >>> My kneejerk reaction was just to reverse the "mutex_lock(&all_q_mutex);"
> >>> and "get_online_cpus();" in blk_mq_init_allocated_queue(), but then
> >>> I noticed that commit eabe06595d62 ("block/mq: Cure cpu hotplug lock
> >>> inversion") just got done moving these two statements in the other
> >>> direction.
> >>
> >> The problem is that that patch got merged too early, as it only
> >> fixes a lockdep splat with the cpu hotplug rework. Fix is coming Linus'
> >> way, it's in my for-linus tree.
> > 
> > Thank you for the update, looking forward to the fix.
> 
> It's upstream now.

Thank you, Jens!  I will test it this evening, Pacific Time.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Lockdep splat involving all_q_mutex
  2017-05-11 20:23       ` Paul E. McKenney
@ 2017-05-12  5:02         ` Paul E. McKenney
  0 siblings, 0 replies; 6+ messages in thread
From: Paul E. McKenney @ 2017-05-12  5:02 UTC (permalink / raw)
  To: Jens Axboe; +Cc: peterz, linux-kernel, rostedt, tglx

On Thu, May 11, 2017 at 01:23:44PM -0700, Paul E. McKenney wrote:
> On Thu, May 11, 2017 at 02:12:39PM -0600, Jens Axboe wrote:
> > On 05/10/2017 09:13 PM, Paul E. McKenney wrote:
> > > On Wed, May 10, 2017 at 08:55:54PM -0600, Jens Axboe wrote:
> > >> On 05/10/2017 04:34 PM, Paul E. McKenney wrote:
> > >>> Hello!
> > >>>
> > >>> I got the lockdep splat shown below during some rcutorture testing (which
> > >>> does CPU hotplug operations) on mainline at commit dc9edaab90de ("Merge
> > >>> tag 'acpi-extra-4.12-rc1' of git://git.kernel.org/.../rafael/linux-pm").
> > >>> My kneejerk reaction was just to reverse the "mutex_lock(&all_q_mutex);"
> > >>> and "get_online_cpus();" in blk_mq_init_allocated_queue(), but then
> > >>> I noticed that commit eabe06595d62 ("block/mq: Cure cpu hotplug lock
> > >>> inversion") just got done moving these two statements in the other
> > >>> direction.
> > >>
> > >> The problem is that that patch got merged too early, as it only
> > >> fixes a lockdep splat with the cpu hotplug rework. Fix is coming Linus'
> > >> way, it's in my for-linus tree.
> > > 
> > > Thank you for the update, looking forward to the fix.
> > 
> > It's upstream now.
> 
> Thank you, Jens!  I will test it this evening, Pacific Time.

And no more lockep splats, thank you!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-05-12  5:02 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-10 22:34 Lockdep splat involving all_q_mutex Paul E. McKenney
2017-05-11  2:55 ` Jens Axboe
2017-05-11  3:13   ` Paul E. McKenney
2017-05-11 20:12     ` Jens Axboe
2017-05-11 20:23       ` Paul E. McKenney
2017-05-12  5:02         ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).