All of lore.kernel.org
 help / color / mirror / Atom feed
From: jsmart2021@gmail.com (James Smart)
Subject: [PATCH v2] nvmet_fc: support target port removal with nvmet layer
Date: Fri, 10 Aug 2018 16:04:19 -0700	[thread overview]
Message-ID: <40e676e2-4dc8-a419-52eb-ab2c4b30e62d@gmail.com> (raw)
In-Reply-To: <1533934229.7802.149.camel@localhost.localdomain>

On 8/10/2018 1:50 PM, Ewan D. Milne wrote:
> OK, so with this patch applied on the target, I'm seeing undesirable
> behavior on the NVMe/FC initiator side when the target is not configured.
> 
> Without this patch, if the NVMe/FC initiator and the NVMe/FC soft target
> are booted, but the target is not configured, an attempt to connect on
> the initiator via "nvme connect" will return from the CLI immediately,
> and the connection attempts will commence, e.g.:
> 
> [  191.233854] nvme nvme1: Connect Invalid Data Parameter, subsysnqn "testnqn"
> [  191.241650] nvme nvme1: NVME-FC{0}: reset: Reconnect attempt failed (16770)
> [  191.249421] nvme nvme1: NVME-FC{0}: Reconnect attempt in 10 seconds
> 
> then if I configure the target, it will connect.  Great.
> 
> [  241.612730] nvme nvme1: NVME-FC{0}: controller connect complete
> 
> --
> 
> However, with this patch applied on the target, the nvme-cli connect
> command on the intiator (with an unconfigured target) hangs:

ok - but I'd like to be very clear:  This has to be a case where the 
nvmet target wasn't configured since boot (thus you see the above, and 
will see the same with and without patch), then was configured, then 
wasn't (via a "nvmetcli clear").  In other words, I believe you only see 
this point if nvmetcli clears the config after there's been a prior 
binding with a fc port.

> 
> [ 1516.041039] nvme            D ffff942449a1e970     0  1850   1849 0x00000080
> [ 1516.048924] Call Trace:
> [ 1516.051650]  [<ffffffffa46e1a88>] ? enqueue_task_fair+0x208/0x6c0
> [ 1516.058451]  [<ffffffffa4d5fea9>] schedule+0x29/0x70
> [ 1516.063989]  [<ffffffffa4d5d7f1>] schedule_timeout+0x221/0x2d0
> [ 1516.070498]  [<ffffffffa46d1e2f>] ? ttwu_do_activate+0x6f/0x80
> [ 1516.077017]  [<ffffffffa46d55b0>] ? try_to_wake_up+0x190/0x390
> [ 1516.083525]  [<ffffffffa4d6025d>] wait_for_completion+0xfd/0x140
> [ 1516.090228]  [<ffffffffa46d5870>] ? wake_up_state+0x20/0x20
> [ 1516.096446]  [<ffffffffa46b902d>] flush_work+0xfd/0x190
> [ 1516.102276]  [<ffffffffa46b5e20>] ? move_linked_works+0x90/0x90
> [ 1516.108882]  [<ffffffffa46b92ef>] flush_delayed_work+0x3f/0x50
> [ 1516.115429]  [<ffffffffc00c0cbd>] nvme_fc_create_ctrl+0x72d/0x7a0 [nvme_fc]
> [ 1516.123201]  [<ffffffffc011c5b6>] nvmf_dev_write+0xa26/0xbef [nvme_fabrics]
> [ 1516.130981]  [<ffffffffa48f6307>] ? security_file_permission+0x27/0xa0
> [ 1516.138265]  [<ffffffffa483eba0>] vfs_write+0xc0/0x1f0
> [ 1516.143997]  [<ffffffffa483f9bf>] SyS_write+0x7f/0xf0
> [ 1516.149633]  [<ffffffffa4d6cdef>] system_call_fastpath+0x1c/0x21
> 
> [ 1508.006831] kworker/u384:3  D ffff94244dcad7e0     0   578      2 0x00000000
> [ 1508.014736] Workqueue: nvme-wq nvme_fc_connect_ctrl_work [nvme_fc]
> [ 1508.021642] Call Trace:
> [ 1508.024368]  [<ffffffffa4d5fea9>] schedule+0x29/0x70
> [ 1508.029907]  [<ffffffffa4d5d738>] schedule_timeout+0x168/0x2d0
> [ 1508.036422]  [<ffffffffa46a83f0>] ? __internal_add_timer+0x130/0x130
> [ 1508.043515]  [<ffffffffa46ffc02>] ? ktime_get_ts64+0x52/0xf0
> [ 1508.049847]  [<ffffffffa4d5f3bd>] io_schedule_timeout+0xad/0x130
> [ 1508.056551]  [<ffffffffa4d603a5>] wait_for_completion_io_timeout+0x105/0x140
> [ 1508.064421]  [<ffffffffa46d5870>] ? wake_up_state+0x20/0x20
> [ 1508.070674]  [<ffffffffa494869b>] blk_execute_rq+0xab/0x150
> [ 1508.076897]  [<ffffffffc00cd8cf>] __nvme_submit_sync_cmd+0x6f/0xf0 [nvme_core]
> [ 1508.084958]  [<ffffffffc011b908>] nvmf_connect_admin_queue+0x128/0x1a0 [nvme_fabrics]
> [ 1508.093718]  [<ffffffffc00bfac0>] nvme_fc_create_association+0x3a0/0x9c0 [nvme_fc]
> [ 1508.102167]  [<ffffffffc00c00fe>] nvme_fc_connect_ctrl_work+0x1e/0x60 [nvme_fc]
> [ 1508.110323]  [<ffffffffa46b88af>] process_one_work+0x17f/0x440
> [ 1508.116831]  [<ffffffffa46b9a98>] worker_thread+0x278/0x3c0
> [ 1508.123050]  [<ffffffffa46b9820>] ? manage_workers.isra.24+0x2a0/0x2a0
> [ 1508.130333]  [<ffffffffa46c0a31>] kthread+0xd1/0xe0
> [ 1508.135774]  [<ffffffffa46c0960>] ? insert_kthread_work+0x40/0x40
> [ 1508.142574]  [<ffffffffa4d6cc37>] ret_from_fork_nospec_begin+0x21/0x21
> [ 1508.149859]  [<ffffffffa46c0960>] ? insert_kthread_work+0x40/0x40

I believe this to be the case which was added in v2 which has the 
transport abort the newly received command, as the abort should be the
notification back to the host.  And I'm guessing there's a bug in the 
lldd for the abort (assume lpfc?).

what doesn't make sense: this shouldn't be much different from the 
without patch case, as the port pointer in the fc port should be at
best stale and could be doing any number of things.

> 
> configuring the target does not help at this point.
> 
> I've haven't figured out exactly what is wrong yet, but thought I'd
> bring this up...
> 
> Clearly, separate from the target side issue, having the initiator hang
> regardless of what the target code is doing is a bad thing.  There's
> supposed to be an admin queue timeout, but it didn't work here.
> 
> -Ewan

I agree - the admin queue timeout should be what works around this as it 
would then have the host send it's own abort to recover. We do need to 
see why it didn't occur.

I'll put it through some additional testing and will post any findings.

-- james

  reply	other threads:[~2018-08-10 23:04 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-09 23:48 [PATCH v2] nvmet_fc: support target port removal with nvmet layer James Smart
2018-08-10 17:53 ` Ewan D. Milne
2018-08-10 23:21   ` James Smart
2018-08-13 16:38     ` Ewan D. Milne
2018-08-10 20:50 ` Ewan D. Milne
2018-08-10 23:04   ` James Smart [this message]
2018-09-05 19:56 ` Christoph Hellwig
2018-09-05 20:12   ` James Smart
2018-09-11  7:17     ` Christoph Hellwig
2018-09-05 20:24   ` Ewan D. Milne
2018-09-05 20:40     ` James Smart
2018-09-11  7:35 ` Christoph Hellwig
2018-09-11 17:01   ` James Smart

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=40e676e2-4dc8-a419-52eb-ab2c4b30e62d@gmail.com \
    --to=jsmart2021@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.