Linux-NVME Archive on lore.kernel.org
 help / color / Atom feed
From: jmoyer@redhat.com (Jeff Moyer)
Subject: [PATCH 04/47] block: provide a new BLK_EH_QUIESCED timeout return value
Date: Tue, 24 Nov 2015 10:51:04 -0500
Message-ID: <x491tbf2y6v.fsf@segfault.boston.devel.redhat.com> (raw)
In-Reply-To: <20151124154019.GB5939@lst.de> (Christoph Hellwig's message of "Tue, 24 Nov 2015 16:40:19 +0100")

Christoph Hellwig <hch at lst.de> writes:

> On Tue, Nov 24, 2015@10:16:51AM -0500, Jeff Moyer wrote:
>> Hi Christoph,
>> 
>> Christoph Hellwig <hch at lst.de> writes:
>> 
>> > This marks the request as one that's not actually completed yet, but
>> > should be reaped next time blk_mq_complete_request comes in.  This is
>> > useful it the abort handler kicked of a reset that will complete all
>> > pending requests.
>> 
>> What's the purpose, though?  Is this an optimization?
>
> It allows us to to correctly implement controller reset (like SCSI target
> resets) from the timeout handler.  The current HANDLED/NOT_HANDLED returns
> are not very useful if you want to eventually kick of a reset that will
> abort all requests, but needs to ensure the the requests don't get reused
> before that.  Only SCSI handles that for now, and needs it's own per-LUN
> command list and a lot of complex code for that - something we'd like
> to avoid for NVMe or other new drivers.

Thanks for the explanation.  One more question below.

>> We've had "fun" problems with races between completion and timeout
>> before.  I can't say I'm too keen on adding more complexity to this code
>> path.  Have you considered what happens in your new code when this race
>> occurs?  I don't expect it to cause any issues in the mq case, since the
>> timeout handler should run on the same cpu as the completion code for a
>> given request (right?).  However, for the old code path, they could run
>> in parallel.
>> 
>> blk_complete_request:
>> A  if (!blk_mark_rq_complete(rq) ||
>> B      test_and_cleart_bit(REQ_ATOM_QUIESCED, &req->atomic_flags)) {
>> C        __blk_mq_complete_request(rq);
>> 
>> could run alongside of:
>> 
>> blk_rq_check_expired:
>> 1 if (!blk_mark_rq_complete(rq))
>> 2   blk_rq_timed_out(rq);
>> 
>> So, if 1 comes before A, we have two cases to consider:
>> 
>> i.  the expiration path does not yet set REQ_ATOM_QUIESCED before the
>>     completion code runs, and so the completion code does nothing.
>
> The command has timed out and sets REQ_ATOM_COMPLETED first,
> the the actual completion comes in and does indeed nothing.  We now
> set REQ_ATOM_QUIESCED and kick off a controller reset, which will
> ultimatively complete all commands using blk_mq_complete_request.
> Now REQ_ATOM_QUIESCED is set on the command that caused the timeout,
> so it will be completed as well.
>
>> ii. the expiration path *does* SET REQ_ATOM_QUIESCED.  In this instance,
>>     will we get yet another completion for the request when the command
>>     is ultimately retired by the adapter reset?
>
> The command has timed out and sets REQ_ATOM_COMPLETED first, then
> REQ_ATOM_QUIESCED as well.  Now the actual completion comes in and does
> nothing because REQ_ATOM_COMPLETED was set.  We will then kick off the

See B above.  REQ_ATOM_COMPLETE is set, so the first half of that
statement is false, but then test_and_clear_bit(REQ_ATOM_QUIESCED...)
returns true, so we call __blk_complete_request.  So the question is,
will we get a double completion for that request after the reset is
performed?

-Jeff

p.s. That should be __blk_complete_request up there in 'C'.

> controller reset, which will ultimatively complete all commands using
> blk_mq_complete_request. Now REQ_ATOM_QUIESCED is set on the command that
> caused the timeout, so it will be completed as well.


> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply index

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-20 16:34 NVMe mega patchbomb for Linux 4.5-rc Christoph Hellwig
2015-11-20 16:34 ` [PATCH 01/47] nvme: add missing unmaps in nvme_queue_rq Christoph Hellwig
2015-11-20 16:34 ` [PATCH 02/47] block: fix blk_abort_request for blk-mq drivers Christoph Hellwig
2015-11-20 21:43   ` Jeff Moyer
2015-11-20 21:47     ` Jens Axboe
2015-11-20 21:54       ` Jeff Moyer
2015-11-20 22:20   ` Jeff Moyer
2015-11-21  7:34     ` Christoph Hellwig
2015-11-20 16:34 ` [PATCH 03/47] block: defer timeouts to a workqueue Christoph Hellwig
2015-11-23 20:31   ` Jeff Moyer
2015-11-23 20:48     ` Christoph Hellwig
2015-11-23 20:59       ` Jeff Moyer
2015-11-20 16:34 ` [PATCH 04/47] block: provide a new BLK_EH_QUIESCED timeout return value Christoph Hellwig
2015-11-24 15:16   ` Jeff Moyer
2015-11-24 15:40     ` Christoph Hellwig
2015-11-24 15:51       ` Jeff Moyer [this message]
2015-11-24 15:56         ` Christoph Hellwig
2015-11-24 16:34           ` Jeff Moyer
2015-11-24 16:47             ` Keith Busch
2015-11-24 17:56             ` Christoph Hellwig
2015-11-24 18:12               ` Jeff Moyer
2015-11-24 19:40                 ` Christoph Hellwig
2015-11-20 16:35 ` [PATCH 05/47] block: remoe REQ_ATOM_COMPLETE wrappers Christoph Hellwig
2015-11-23 21:23   ` Jeff Moyer
2015-11-24 21:22   ` Jens Axboe
2015-11-20 16:35 ` [PATCH 06/47] blk-mq: add a flags parameter to blk_mq_alloc_request Christoph Hellwig
2015-11-24 15:19   ` Jeff Moyer
2015-11-24 15:32     ` Christoph Hellwig
2015-11-24 21:21   ` Jens Axboe
2015-11-24 22:22     ` Christoph Hellwig
2015-11-24 22:25       ` Jens Axboe
2015-11-20 16:35 ` [PATCH 07/47] nvme: move struct nvme_iod to pci.c Christoph Hellwig
2015-11-20 16:35 ` [PATCH 08/47] nvme: split command submission helpers out of pci.c Christoph Hellwig
2015-11-20 16:35 ` [PATCH 09/47] nvme: add a vendor field to struct nvme_dev Christoph Hellwig
2015-11-20 16:35 ` [PATCH 10/47] nvme: use offset instead of a struct for registers Christoph Hellwig
2015-11-20 16:35 ` [PATCH 11/47] nvme: split a new struct nvme_ctrl out of struct nvme_dev Christoph Hellwig
2015-11-20 16:35 ` [PATCH 12/47] nvme: simplify nvme_setup_prps calling convention Christoph Hellwig
2015-11-20 16:35 ` [PATCH 13/47] nvme: refactor nvme_queue_rq Christoph Hellwig
2015-11-20 16:35 ` [PATCH 14/47] nvme: move nvme_error_status to common code Christoph Hellwig
2015-11-20 16:35 ` [PATCH 15/47] nvme: move nvme_setup_flush and nvme_setup_rw " Christoph Hellwig
2015-11-20 16:35 ` [PATCH 16/47] nvme: split __nvme_submit_sync_cmd Christoph Hellwig
2015-11-20 16:35 ` [PATCH 17/47] nvme: use the block layer for userspace passthrough metadata Christoph Hellwig
2015-11-20 16:35 ` [PATCH 18/47] nvme: move block_device_operations and ns/ctrl freeing to common code Christoph Hellwig
2015-11-20 16:35 ` [PATCH 19/47] nvme: add explicit quirk handling Christoph Hellwig
2015-11-20 16:35 ` [PATCH 20/47] nvme: add a common helper to read Identify Controller data Christoph Hellwig
2015-11-20 16:35 ` [PATCH 21/47] nvme: move the call to nvme_init_identify earlier Christoph Hellwig
2015-11-20 16:35 ` [PATCH 22/47] nvme: move namespace scanning to common code Christoph Hellwig
2015-11-20 16:35 ` [PATCH 23/47] nvme: move chardev and sysfs interface " Christoph Hellwig
2015-11-20 16:35 ` [PATCH 24/47] nvme: only add a controller to dev_list after it's been fully initialized Christoph Hellwig
2015-11-20 16:35 ` [PATCH 25/47] nvme: don't take the I/O queue q_lock in nvme_timeout Christoph Hellwig
2017-03-10 12:51   ` David Woodhouse
2017-03-10 14:24     ` Christoph Hellwig
2015-11-20 16:35 ` [PATCH 26/47] nvme: merge nvme_abort_req and nvme_timeout Christoph Hellwig
2015-11-20 16:35 ` [PATCH 27/47] nvme: do not restart the request timeout if we're resetting the controller Christoph Hellwig
2015-11-20 16:35 ` [PATCH 28/47] nvme: simplify resets Christoph Hellwig
2015-11-20 16:35 ` [PATCH 29/47] nvme: merge probe_work and reset_work Christoph Hellwig
2015-11-20 16:35 ` [PATCH 30/47] nvme: remove dead controllers from a work item Christoph Hellwig
2015-11-20 16:35 ` [PATCH 31/47] nvme: switch abort_limit to an atomic_t Christoph Hellwig
2015-11-20 16:35 ` [PATCH 32/47] NVMe: Implement namespace list scanning Christoph Hellwig
2015-11-20 16:35 ` [PATCH 33/47] NVMe: Use unbounded work queue for all work Christoph Hellwig
2015-11-20 16:35 ` [PATCH 34/47] NVMe: Remove device management handles on remove Christoph Hellwig
2015-11-20 16:50 ` NVMe mega patchbomb for Linux 4.5-rc Christoph Hellwig
2015-11-21  7:19 NVMe mega patchbomb for Linux 4.5-rc (resend) Christoph Hellwig
2015-11-21  7:19 ` [PATCH 04/47] block: provide a new BLK_EH_QUIESCED timeout return value Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=x491tbf2y6v.fsf@segfault.boston.devel.redhat.com \
    --to=jmoyer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-NVME Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nvme/0 linux-nvme/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nvme linux-nvme/ https://lore.kernel.org/linux-nvme \
		linux-nvme@lists.infradead.org
	public-inbox-index linux-nvme

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-nvme


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git