linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: Sagi Grimberg <sagi@grimberg.me>, Yi Zhang <yi.zhang@redhat.com>,
	linux-block <linux-block@vger.kernel.org>,
	linux-nvme@lists.infradead.org
Subject: Re: [bug report][regression] device node still exists after blktests nvme/011 finished
Date: Sat, 29 May 2021 13:58:59 +0200	[thread overview]
Message-ID: <1e3e80b1-1314-6cd1-c696-774b94f639e5@suse.de> (raw)
In-Reply-To: <43d1b5d8-2be0-9219-e3dd-dcff0f57ea5d@grimberg.me>

[-- Attachment #1: Type: text/plain, Size: 8351 bytes --]

On 5/28/21 7:00 PM, Sagi Grimberg wrote:
>> Hi
>> I would like to report one regression issue we found recently, after
>> blktests nvme/011 finished, the device node still exists and seems
>> there is a lock issue from the log, let me know if you need any
>> testing for it, thanks.
> 
> Hannes, this is a result of your patch most likely.
> 
>>
>> # ./check nvme/011
>> nvme/011 (run data verification fio job on NVMeOF file-backed ns) 
>> [passed]
>>      runtime  71.350s  ...  77.131s
>> # lsblk
>> NAME    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
>> sda       8:0    0 465.8G  0 disk
>> ├─sda1    8:1    0     1G  0 part /boot
>> ├─sda2    8:2    0  31.5G  0 part [SWAP]
>> ├─sda3    8:3    0    15G  0 part
>> ├─sda4    8:4    0     1K  0 part
>> ├─sda5    8:5    0    15G  0 part
>> ├─sda6    8:6    0     5G  0 part
>> └─sda7    8:7    0 398.3G  0 part /
>> zram0   253:0    0     4G  0 disk [SWAP]
>> nvme0n1 259:1    0     1G  0 disk
>>
>> #dmesg
>> [  793.719149] run blktests nvme/011 at 2021-05-28 05:45:49
>> [  793.899062] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
>> [  793.950783] nvmet: creating controller 1 for subsystem
>> blktests-subsystem-1 for NQN
>> nqn.2014-08.org.nvmexpress:uuid:d39240f497c64fc8bf7ca767d256a394.
>> [  793.964271] nvme nvme0: creating 48 I/O queues.
>> [  793.973187] nvme nvme0: new ctrl: "blktests-subsystem-1"
>> [  863.401172] nvme nvme0: Removing ctrl: NQN "blktests-subsystem-1"
>> [  863.656534] block nvme0n1: no available path - failing I/O
>> [  863.662039] block nvme0n1: no available path - failing I/O
>> [  863.667546] block nvme0n1: no available path - failing I/O
>> [  863.673029] block nvme0n1: no available path - failing I/O
>> [  863.678530] block nvme0n1: no available path - failing I/O
>> [  863.684032] block nvme0n1: no available path - failing I/O
>> [  863.689523] block nvme0n1: no available path - failing I/O
>> [  863.695014] block nvme0n1: no available path - failing I/O
>> [  863.700502] block nvme0n1: no available path - failing I/O
>> [  863.705994] Buffer I/O error on dev nvme0n1, logical block 262016,
>> async page read
>> [ 1108.488647] INFO: task systemd-udevd:2400 blocked for more than 122 
>> seconds.
>> [ 1108.495699]       Not tainted 5.13.0-rc3+ #8
>> [ 1108.499972] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [ 1108.507797] task:systemd-udevd   state:D stack:    0 pid: 2400
>> ppid:  1134 flags:0x00004004
>> [ 1108.516140] Call Trace:
>> [ 1108.518589]  __schedule+0x247/0x6f0
>> [ 1108.522088]  schedule+0x46/0xb0
>> [ 1108.525233]  schedule_preempt_disabled+0xa/0x10
>> [ 1108.529767]  __mutex_lock.constprop.0+0x2a4/0x470
>> [ 1108.534472]  ? __kernfs_remove.part.0+0x174/0x1f0
>> [ 1108.539178]  ? kernfs_remove_by_name_ns+0x5c/0x90
>> [ 1108.543885]  del_gendisk+0x99/0x230
>> [ 1108.547378]  nvme_mpath_remove_disk+0x97/0xb0 [nvme_core]
>> [ 1108.552787]  nvme_put_ns_head+0x2a/0xb0 [nvme_core]
>> [ 1108.557664]  __blkdev_put+0x115/0x160
>> [ 1108.561339]  blkdev_put+0x4c/0x130
>> [ 1108.564745]  blkdev_close+0x22/0x30
>> [ 1108.568238]  __fput+0x94/0x240
>> [ 1108.571307]  task_work_run+0x5f/0x90
>> [ 1108.574885]  exit_to_user_mode_loop+0x119/0x120
>> [ 1108.579428]  exit_to_user_mode_prepare+0x97/0xa0
>> [ 1108.584047]  syscall_exit_to_user_mode+0x12/0x40
>> [ 1108.588664]  do_syscall_64+0x4d/0x80
>> [ 1108.592254]  entry_SYSCALL_64_after_hwframe+0x44/0xae
>> [ 1108.597314] RIP: 0033:0x7f222e799627
>> [ 1108.600903] RSP: 002b:00007ffd07c2b518 EFLAGS: 00000246 ORIG_RAX:
>> 0000000000000003
>> [ 1108.608477] RAX: 0000000000000000 RBX: 00007f222d801240 RCX: 
>> 00007f222e799627
>> [ 1108.615620] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
>> 0000000000000006
>> [ 1108.622762] RBP: 0000000000000006 R08: 000055badf77ba70 R09: 
>> 0000000000000000
>> [ 1108.629901] R10: 0000000000000000 R11: 0000000000000246 R12: 
>> 00007ffd07c2b5c0
>> [ 1108.637026] R13: 0000000000000000 R14: 000055bae0be5820 R15: 
>> 000055bae0b8bec0
>> [ 1231.370531] INFO: task systemd-udevd:2400 blocked for more than 245 
>> seconds.
>> [ 1231.377579]       Not tainted 5.13.0-rc3+ #8
>> [ 1231.381853] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [ 1231.389679] task:systemd-udevd   state:D stack:    0 pid: 2400
>> ppid:  1134 flags:0x00004004
>> [ 1231.398024] Call Trace:
>> [ 1231.400478]  __schedule+0x247/0x6f0
>> [ 1231.403970]  schedule+0x46/0xb0
>> [ 1231.407112]  schedule_preempt_disabled+0xa/0x10
>> [ 1231.411638]  __mutex_lock.constprop.0+0x2a4/0x470
>> [ 1231.416345]  ? __kernfs_remove.part.0+0x174/0x1f0
>> [ 1231.421051]  ? kernfs_remove_by_name_ns+0x5c/0x90
>> [ 1231.425765]  del_gendisk+0x99/0x230
>> [ 1231.429268]  nvme_mpath_remove_disk+0x97/0xb0 [nvme_core]
>> [ 1231.434692]  nvme_put_ns_head+0x2a/0xb0 [nvme_core]
>> [ 1231.439581]  __blkdev_put+0x115/0x160
>> [ 1231.443255]  blkdev_put+0x4c/0x130
>> [ 1231.446668]  blkdev_close+0x22/0x30
>> [ 1231.450171]  __fput+0x94/0x240
>> [ 1231.453232]  task_work_run+0x5f/0x90
>> [ 1231.456809]  exit_to_user_mode_loop+0x119/0x120
>> [ 1231.461351]  exit_to_user_mode_prepare+0x97/0xa0
>> [ 1231.465980]  syscall_exit_to_user_mode+0x12/0x40
>> [ 1231.470608]  do_syscall_64+0x4d/0x80
>> [ 1231.474187]  entry_SYSCALL_64_after_hwframe+0x44/0xae
>> [ 1231.479247] RIP: 0033:0x7f222e799627
>> [ 1231.482835] RSP: 002b:00007ffd07c2b518 EFLAGS: 00000246 ORIG_RAX:
>> 0000000000000003
>> [ 1231.490392] RAX: 0000000000000000 RBX: 00007f222d801240 RCX: 
>> 00007f222e799627
>> [ 1231.497537] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
>> 0000000000000006
>> [ 1231.504670] RBP: 0000000000000006 R08: 000055badf77ba70 R09: 
>> 0000000000000000
>> [ 1231.511800] R10: 0000000000000000 R11: 0000000000000246 R12: 
>> 00007ffd07c2b5c0
>> [ 1231.518925] R13: 0000000000000000 R14: 000055bae0be5820 R15: 
>> 000055bae0b8bec0
>> [ 1354.252532] INFO: task systemd-udevd:2400 blocked for more than 368 
>> seconds.
>> [ 1354.259581]       Not tainted 5.13.0-rc3+ #8
>> [ 1354.263853] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [ 1354.271678] task:systemd-udevd   state:D stack:    0 pid: 2400
>> ppid:  1134 flags:0x00004004
>> [ 1354.280025] Call Trace:
>> [ 1354.282469]  __schedule+0x247/0x6f0
>> [ 1354.285962]  schedule+0x46/0xb0
>> [ 1354.289105]  schedule_preempt_disabled+0xa/0x10
>> [ 1354.293630]  __mutex_lock.constprop.0+0x2a4/0x470
>> [ 1354.298327]  ? __kernfs_remove.part.0+0x174/0x1f0
>> [ 1354.303027]  ? kernfs_remove_by_name_ns+0x5c/0x90
>> [ 1354.307732]  del_gendisk+0x99/0x230
>> [ 1354.311226]  nvme_mpath_remove_disk+0x97/0xb0 [nvme_core]
>> [ 1354.316628]  nvme_put_ns_head+0x2a/0xb0 [nvme_core]
>> [ 1354.321506]  __blkdev_put+0x115/0x160
>> [ 1354.325171]  blkdev_put+0x4c/0x130
>> [ 1354.328576]  blkdev_close+0x22/0x30
>> [ 1354.332070]  __fput+0x94/0x240
>> [ 1354.335127]  task_work_run+0x5f/0x90
>> [ 1354.338717]  exit_to_user_mode_loop+0x119/0x120
>> [ 1354.343249]  exit_to_user_mode_prepare+0x97/0xa0
>> [ 1354.347860]  syscall_exit_to_user_mode+0x12/0x40
>> [ 1354.352480]  do_syscall_64+0x4d/0x80
>> [ 1354.356059]  entry_SYSCALL_64_after_hwframe+0x44/0xae
>> [ 1354.361111] RIP: 0033:0x7f222e799627
>> [ 1354.364684] RSP: 002b:00007ffd07c2b518 EFLAGS: 00000246 ORIG_RAX:
>> 0000000000000003
>> [ 1354.372247] RAX: 0000000000000000 RBX: 00007f222d801240 RCX: 
>> 00007f222e799627
>> [ 1354.379371] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
>> 0000000000000006
>> [ 1354.386501] RBP: 0000000000000006 R08: 000055badf77ba70 R09: 
>> 0000000000000000
>> [ 1354.393629] R10: 0000000000000000 R11: 0000000000000246 R12: 
>> 00007ffd07c2b5c0
>> [ 1354.400754] R13: 0000000000000000 R14: 000055bae0be5820 R15: 
>> 000055bae0b8bec0
>>
>>
Ah. Deadlock on bd_mutex.
Can you check if the attached patch fixes this issue?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

[-- Attachment #2: 0001-nvme-multipath-call-del_gendisk-in-nvme_mpath_check_.patch --]
[-- Type: text/x-patch, Size: 2787 bytes --]

From 433ef6dea38de57289ce6ae26147cabe03c9b114 Mon Sep 17 00:00:00 2001
From: Hannes Reinecke <hare@suse.de>
Date: Sat, 29 May 2021 13:54:22 +0200
Subject: [PATCH] nvme-multipath: call del_gendisk() in
 nvme_mpath_check_last_path()

We cannot call del_gendisk() during the final nvme_put_ns_head(), as
this might be called from blkdev_put(), as we then will deadlock on
bd_mutex:

[ 1108.529767]  __mutex_lock.constprop.0+0x2a4/0x470
[ 1108.534472]  ? __kernfs_remove.part.0+0x174/0x1f0
[ 1108.539178]  ? kernfs_remove_by_name_ns+0x5c/0x90
[ 1108.543885]  del_gendisk+0x99/0x230
[ 1108.547378]  nvme_mpath_remove_disk+0x97/0xb0 [nvme_core]
[ 1108.552787]  nvme_put_ns_head+0x2a/0xb0 [nvme_core]
[ 1108.557664]  __blkdev_put+0x115/0x160
[ 1108.561339]  blkdev_put+0x4c/0x130
[ 1108.564745]  blkdev_close+0x22/0x30
[ 1108.568238]  __fput+0x94/0x240

So call 'del_gendisk()' when checking if the last path has been removed
to avoid this deadlock.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/nvme/host/multipath.c | 13 +++++++++++--
 drivers/nvme/host/nvme.h      |  9 +--------
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 127a17b4c13d..22e2febf86bc 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -776,14 +776,23 @@ void nvme_mpath_add_disk(struct nvme_ns *ns, struct nvme_id_ns *id)
 #endif
 }
 
-void nvme_mpath_remove_disk(struct nvme_ns_head *head)
+void nvme_mpath_check_last_path(struct nvme_ns *ns)
 {
+	struct nvme_ns_head *head = ns->head;
+
 	if (!head->disk)
 		return;
-	if (head->disk->flags & GENHD_FL_UP) {
+
+	if (list_empty(&head->list) && head->disk->flags & GENHD_FL_UP) {
 		nvme_cdev_del(&head->cdev, &head->cdev_device);
 		del_gendisk(head->disk);
 	}
+}
+
+void nvme_mpath_remove_disk(struct nvme_ns_head *head)
+{
+	if (!head->disk)
+		return;
 	blk_set_queue_dying(head->disk->queue);
 	/* make sure all pending bios are cleaned up */
 	kblockd_schedule_work(&head->requeue_work);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 1f397ecba16c..8a497a420ea2 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -716,14 +716,7 @@ void nvme_mpath_uninit(struct nvme_ctrl *ctrl);
 void nvme_mpath_stop(struct nvme_ctrl *ctrl);
 bool nvme_mpath_clear_current_path(struct nvme_ns *ns);
 void nvme_mpath_clear_ctrl_paths(struct nvme_ctrl *ctrl);
-
-static inline void nvme_mpath_check_last_path(struct nvme_ns *ns)
-{
-	struct nvme_ns_head *head = ns->head;
-
-	if (head->disk && list_empty(&head->list))
-		kblockd_schedule_work(&head->requeue_work);
-}
+void nvme_mpath_check_last_path(struct nvme_ns *ns);
 
 static inline void nvme_trace_bio_complete(struct request *req)
 {
-- 
2.26.2


  reply	other threads:[~2021-05-29 11:59 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-28  9:57 [bug report][regression] device node still exists after blktests nvme/011 finished Yi Zhang
2021-05-28 17:00 ` Sagi Grimberg
2021-05-29 11:58   ` Hannes Reinecke [this message]
2021-05-31  2:27     ` Yi Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1e3e80b1-1314-6cd1-c696-774b94f639e5@suse.de \
    --to=hare@suse.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    --cc=yi.zhang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).