All of lore.kernel.org
 help / color / mirror / Atom feed
From: sagi@grimberg.me (Sagi Grimberg)
Subject: [PATCH 2/2] nvme: flush scan_work when resetting controller
Date: Tue, 25 Jun 2019 14:50:56 -0700	[thread overview]
Message-ID: <960997d6-9ce3-5730-00e6-5b2639502eaa@grimberg.me> (raw)
In-Reply-To: <ff13e243-da8f-f04c-d31b-f7c46d3a4375@suse.de>


>>>>>> That's what I thought initially, too, but it turned out to be not
>>>>>> sufficient.
>>>>>
>>>>> Not sufficient because it hangs? or panics?
>>>>>
>>> It hangs, and we're seeing a warning:
>>>
>>> kernel: [67088.344034] WARNING: CPU: 4 PID: 25020 at
>>> ../lib/percpu-refcount.c:334 percpu_ref_kill_and_confirm+0x7a/0xa0
>>> [ .. ]
>>> kernel: [67088.344106] Call Trace:
>>> kernel: [67088.344112]? blk_freeze_queue_start+0x2a/0x40
>>> kernel: [67088.344114]? blk_freeze_queue+0xe/0x40
>>> kernel: [67088.344118]? nvme_update_disk_info+0x36/0x260 [nvme_core]
>>> kernel: [67088.344122]? __nvme_revalidate_disk+0xca/0xf0 [nvme_core]
>>> kernel: [67088.344125]? nvme_revalidate_disk+0xa6/0x120 [nvme_core]
>>> kernel: [67088.344127]? ? blk_mq_get_tag+0xa3/0x220
>>> kernel: [67088.344130]? revalidate_disk+0x23/0xc0
>>> kernel: [67088.344133]? nvme_validate_ns+0x43/0x830 [nvme_core]
>>> kernel: [67088.344137]? ? wake_up_q+0x70/0x70
>>> kernel: [67088.344139]? ? blk_mq_free_request+0x12a/0x160
>>> kernel: [67088.344142]? ? __nvme_submit_sync_cmd+0x73/0xe0 [nvme_core]
>>> kernel: [67088.344145]? nvme_scan_work+0x2b3/0x350 [nvme_core]
>>> kernel: [67088.344149]? process_one_work+0x1da/0x400
>>>
>>>  ?From which I've inferred that we're still running a scan in parallel to
>>> reset, and that the scan thread is calling 'blk_freeze_queue()' on a
>>> queue which is already torn down.
>>
>>
>> Where is the scan triggered from? there is no scan call from the reset
>> path.
>>
> It's triggered from AEN, being received around the same time when reset
> triggers.
> There's actually a change that the AEN handling itself triggered the
> reset, but I haven't be able to decipher that from the crash dump.
> 
>> Is there a namespace removal or something else that triggers AEN
>> to make this happen?
>>
>> What exactly is the scenario?
> 
> The scenario is multiple storage failover on NetApp OnTAP while I/O is
> running.

Hannes,

I'm still not convinced that the transports need to flush the scan work
on resets.

Does the below help as an alternative:
--
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 024fb219de17..074bcb1e797a 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1665,6 +1665,10 @@ static void __nvme_revalidate_disk(struct gendisk 
*disk, struct nvme_id_ns *id)
  {
         struct nvme_ns *ns = disk->private_data;

+       /* if ns is removing we cannot mangle with the request queue */
+       if (test_bit(NVME_NS_REMOVING, &ns->flags))
+               return;
+
         /*
          * If identify namespace failed, use default 512 byte block size so
          * block layer can use before failing read/write for 0 capacity.
--

  reply	other threads:[~2019-06-25 21:50 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-18 10:10 [PATCH 0/2] nvme: flush rescan worker before resetting Hannes Reinecke
2019-06-18 10:10 ` [PATCH 1/2] nvme: Do not remove namespaces during reset Hannes Reinecke
2019-06-18 17:30   ` Sagi Grimberg
2019-06-20  1:22   ` Ming Lei
2019-06-18 10:10 ` [PATCH 2/2] nvme: flush scan_work when resetting controller Hannes Reinecke
2019-06-18 17:41   ` Sagi Grimberg
2019-06-19  6:22     ` Hannes Reinecke
2019-06-19 16:56       ` Sagi Grimberg
2019-06-19 18:45         ` Hannes Reinecke
2019-06-19 20:04           ` Sagi Grimberg
2019-06-21 16:26             ` Sagi Grimberg
2019-06-24  5:48               ` Hannes Reinecke
2019-06-24  6:13               ` Hannes Reinecke
2019-06-24 18:08                 ` Sagi Grimberg
2019-06-24 18:51                   ` James Smart
2019-06-25  6:07                   ` Hannes Reinecke
2019-06-25 21:50                     ` Sagi Grimberg [this message]
2019-06-26  5:34                       ` Hannes Reinecke
2019-06-26 20:22                         ` Sagi Grimberg
2019-07-02  5:38                           ` Sagi Grimberg
2019-07-02 13:29                             ` Hannes Reinecke
2019-06-20  1:36   ` Ming Lei
2019-06-21  6:14     ` Hannes Reinecke
2019-06-21  6:58       ` Ming Lei
2019-06-21  7:59         ` Hannes Reinecke
2019-06-21 17:23           ` James Smart
2019-06-21 17:23           ` James Smart
2019-06-24  3:29           ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=960997d6-9ce3-5730-00e6-5b2639502eaa@grimberg.me \
    --to=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.