All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: xen-devel@lists.xenproject.org
Cc: linux-block@vger.kernel.org,
	"Boris Ostrovsky" <boris.ostrovsky@oracle.com>,
	"Juergen Gross" <jgross@suse.com>,
	"Stefano Stabellini" <sstabellini@kernel.org>,
	"Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com>,
	"Roger Pau Monné" <roger.pau@citrix.com>,
	"Christoph Hellwig" <hch@lst.de>
Subject: [BUG report] Deadlock in xen-blkfront upon device hot-unplug
Date: Thu, 15 Jul 2021 11:16:30 +0200	[thread overview]
Message-ID: <87pmvk0wep.fsf@vitty.brq.redhat.com> (raw)

I'm observing a deadlock every time I try to unplug a xen-blkfront
device. With 5.14-rc1+ the deadlock looks like:

[  513.682405] ============================================
[  513.686617] WARNING: possible recursive locking detected
[  513.691020] 5.14.0-rc1+ #370 Not tainted
[  513.694272] --------------------------------------------
[  513.698528] xenwatch/144 is trying to acquire lock:
[  513.702424] ffff96dc4a4c1d28 (&disk->open_mutex){+.+.}-{3:3}, at: del_gendisk+0x53/0x210
[  513.708768] 
               but task is already holding lock:
[  513.713320] ffff96dc4a4c1d28 (&disk->open_mutex){+.+.}-{3:3}, at: blkback_changed+0x118/0xeb9 [xen_blkfront]
[  513.720369] 
               other info that might help us debug this:
[  513.724901]  Possible unsafe locking scenario:

[  513.729241]        CPU0
[  513.731326]        ----
[  513.733404]   lock(&disk->open_mutex);
[  513.736679]   lock(&disk->open_mutex);
[  513.739988] 
                *** DEADLOCK ***

[  513.745524]  May be due to missing lock nesting notation

[  513.751438] 2 locks held by xenwatch/144:
[  513.755344]  #0: ffffffff8c9f3c70 (xenwatch_mutex){+.+.}-{3:3}, at: xenwatch_thread+0xe6/0x190
[  513.762137]  #1: ffff96dc4a4c1d28 (&disk->open_mutex){+.+.}-{3:3}, at: blkback_changed+0x118/0xeb9 [xen_blkfront]
[  513.770381] 
               stack backtrace:
[  513.774785] CPU: 1 PID: 144 Comm: xenwatch Not tainted 5.14.0-rc1+ #370
[  513.780131] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
[  513.785097] Call Trace:
[  513.787920]  dump_stack_lvl+0x6a/0x9a
[  513.791223]  __lock_acquire.cold+0x14a/0x2ba
[  513.794918]  ? mark_held_locks+0x50/0x80
[  513.798453]  lock_acquire+0xd3/0x2f0
[  513.801819]  ? del_gendisk+0x53/0x210
[  513.805334]  ? kernfs_put.part.0+0xe8/0x1b0
[  513.808905]  ? del_gendisk+0x53/0x210
[  513.812230]  __mutex_lock+0x8d/0x8c0
[  513.815415]  ? del_gendisk+0x53/0x210
[  513.818931]  ? kernfs_put.part.0+0xe8/0x1b0
[  513.822594]  del_gendisk+0x53/0x210
[  513.825782]  xlvbd_release_gendisk+0x6f/0xb0 [xen_blkfront]
[  513.830186]  blkback_changed+0x20e/0xeb9 [xen_blkfront]
[  513.834458]  ? xenbus_read_driver_state+0x39/0x60
[  513.838540]  xenwatch_thread+0x94/0x190
[  513.841936]  ? do_wait_intr_irq+0xb0/0xb0
[  513.845451]  ? xenbus_dev_request_and_reply+0x90/0x90
[  513.849885]  kthread+0x149/0x170
[  513.853039]  ? set_kthread_struct+0x40/0x40
[  513.857027]  ret_from_fork+0x22/0x30

My suspicion is that the problem was introduced by:

commit c76f48eb5c084b1e15c931ae8cc1826cd771d70d
Author: Christoph Hellwig <hch@lst.de>
Date:   Tue Apr 6 08:22:56 2021 +0200

    block: take bd_mutex around delete_partitions in del_gendisk

blkfront_closing() takes '&bdev->bd_disk->open_mutex' around
xlvbd_release_gendisk() call which in its turn calls del_gendisk() which
after the above mentioned commit tries to take the same mutex. I may be
completely wrong though.

If I try to avoid taking the mutex from blkfront_closing(): 

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 8d49f8fa98bb..9af6831492d4 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -2145,8 +2145,6 @@ static void blkfront_closing(struct blkfront_info *info)
                return;
        }
 
-       mutex_lock(&bdev->bd_disk->open_mutex);
-
        if (bdev->bd_openers) {
                xenbus_dev_error(xbdev, -EBUSY,
                                 "Device in use; refusing to close");
@@ -2156,7 +2154,6 @@ static void blkfront_closing(struct blkfront_info *info)
                xenbus_frontend_closed(xbdev);
        }
 
-       mutex_unlock(&bdev->bd_disk->open_mutex);
        bdput(bdev);
 }
 
the situation becomes even worse:

[   74.371465] general protection fault, probably for non-canonical address 0xb0fa8ce8ee8a2234: 0000 [#1] SMP PTI
[   74.381294] CPU: 3 PID: 144 Comm: xenwatch Not tainted 5.14.0-rc1+ #370
[   74.386172] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
[   74.390918] RIP: 0010:del_timer+0x1f/0x80
[   74.394282] Code: 71 af a3 00 eb c1 31 c0 c3 66 90 0f 1f 44 00 00 41 55 41 54 45 31 e4 55 48 83 ec 10 65 48 8b 04 25 28 00 00 00 48 89 44 24 08 <48> 8b 47 08 48 85 c0 74 2d 48 89 e6 48 89 fd e8 dd e8 ff ff 48 89
[   74.407591] RSP: 0018:ffffbab68423bcc8 EFLAGS: 00010082
[   74.411691] RAX: dd931e09aefb8f00 RBX: b0fa8ce8ee8a21dc RCX: 0000000000005e7f
[   74.417041] RDX: 0000000000005e80 RSI: 0000000000000001 RDI: b0fa8ce8ee8a222c
[   74.422425] RBP: ffffbab68423bd20 R08: 0000000000000001 R09: 0000000000000001
[   74.427595] R10: 0000000000000001 R11: 0000000000000002 R12: 0000000000000000
[   74.432886] R13: ffffa0484f3e4000 R14: 0000000000000000 R15: ffffa0484691c000
[   74.438784] FS:  0000000000000000(0000) GS:ffffa083c8e00000(0000) knlGS:0000000000000000
[   74.444592] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   74.448917] CR2: 00007ff618903ff8 CR3: 0000000111e16001 CR4: 00000000001706e0
[   74.454309] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   74.460128] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   74.465872] Call Trace:
[   74.468467]  try_to_grab_pending+0x13f/0x2e0
[   74.472202]  cancel_delayed_work+0x2e/0xd0
[   74.475980]  blk_mq_stop_hw_queues+0x2d/0x50
[   74.479732]  blkfront_remove+0x40/0x210 [xen_blkfront]
[   74.484154]  xenbus_dev_remove+0x6d/0xf0
[   74.487872]  __device_release_driver+0x180/0x240
[   74.491561]  device_release_driver+0x26/0x40
[   74.497134]  bus_remove_device+0xef/0x160
[   74.500180]  device_del+0x18c/0x3e0
[   74.503451]  ? xenbus_probe_devices+0x120/0x120
[   74.506975]  ? klist_iter_exit+0x14/0x20
[   74.511649]  device_unregister+0x13/0x60
[   74.515237]  xenbus_dev_changed+0x174/0x1e0
[   74.518923]  xenwatch_thread+0x94/0x190
[   74.522208]  ? do_wait_intr_irq+0xb0/0xb0
[   74.525690]  ? xenbus_dev_request_and_reply+0x90/0x90
[   74.529973]  kthread+0x149/0x170
[   74.533007]  ? set_kthread_struct+0x40/0x40
[   74.537023]  ret_from_fork+0x22/0x30
[   74.540411] Modules linked in: vfat fat i2c_piix4 xfs libcrc32c crct10dif_pclmul crc32_pclmul crc32c_intel xen_blkfront ghash_clmulni_intel ena
[   74.549144] ---[ end trace 296bd6f709c05e9e ]---

At this point I can only say that something is certainly
wrong. Apologies if this is an already known problem.

-- 
Vitaly


             reply	other threads:[~2021-07-15  9:16 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-15  9:16 Vitaly Kuznetsov [this message]
2021-07-15 12:46 ` [BUG report] Deadlock in xen-blkfront upon device hot-unplug Christoph Hellwig
2021-07-15 13:17   ` Vitaly Kuznetsov
2021-07-15 13:46     ` Christoph Hellwig
2021-07-15 14:09       ` Vitaly Kuznetsov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87pmvk0wep.fsf@vitty.brq.redhat.com \
    --to=vkuznets@redhat.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=hch@lst.de \
    --cc=jgross@suse.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-block@vger.kernel.org \
    --cc=roger.pau@citrix.com \
    --cc=sstabellini@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.