archive mirror
 help / color / mirror / Atom feed
From: Ming Lei <>
To: Sagi Grimberg <>
Cc: Jens Axboe <>,
	James Smart <>,,,
	Keith Busch <>, Christoph Hellwig <>
Subject: Re: [PATCH RFC 0/3] blk-mq/nvme: use blk_mq_alloc_request() for NVMe's connect request
Date: Sat, 16 Nov 2019 15:17:54 +0800	[thread overview]
Message-ID: <20191116071754.GB18194@ming.t460p> (raw)
In-Reply-To: <>

Hi Sagi,

On Fri, Nov 15, 2019 at 02:38:44PM -0800, Sagi Grimberg wrote:
> > Hi,
> Hey Ming,
> > Use blk_mq_alloc_request() for allocating NVMe loop, fc, rdma and tcp's
> > connect request, and selecting transport queue runtime for connect
> > request.
> > 
> > Then kill blk_mq_alloc_request_hctx().
> Is it really so wrong to have an API to allocate a tag that belongs to
> a specific queue? Why must the tags allocation always correlate to the
> running cpu? Its true that NVMe is the only consumer of this at the
> moment, but does this mean that the interface should be removed because
> it has one (or rather four) consumer(s)?

Now blk-mq takes a static queue mapping between CPU and hw queues, given
CPU hotplug may happen any time, so the specified hw queue may become
inactive any time.

Queue mapping from CPU to hw queue is one core model of blk-mq which
relies a lot on the fact that hw queue active or not depends on
request's submission CPU. And we always can retrieve one active hw
queue if there is at least one online CPU.

IO request is always mapped to the proper hw queue via the submission
CPU and queue type.

So blk_mq_alloc_request_hctx() is really weird from the above blk-mq's

Actually the 4 consumer is just one single type of usage for submitting
connect command, seems no one explain this special usage before. And the
driver can handle well enough without this interface just like this
patch, can't it?

> I would instead suggest to simply remove the constraint that
> blk_mq_alloc_request_hctx() will fail if the first cpu in the mask
> is not on the cpu_online_mask.. The caller of this would know and
> be able to handle it.

Of course, this usage is very special, which is different with normal
IO or passthrough request.

The caller of this still needs to rely on blk-mq for dispatching this

1) blk-mq needs to run hw queue in round-robin style, and different
CPU is selected from active CPU masks for running the hw queue.

2) Most of blk-mq drivers have switched to managed IRQ, which may be
shutdown even though there is still in-flight requests not completed
on the hw queue. We need to fix this issue. One solution is to free
the request and remap the bios into proper active hw queue according to
the new submission CPU.

3) warning will be caused when dispatching one request on inactive hw queue

If we are going to support this special usage, lots of blk-mq needs to
be changed for covering the so special case.

> To me it feels like this approach is fundamentally wrong. IMO, having
> the driver select a different queue than the tag naturally belongs to
> feels like a backwards design.

The root cause is that the connection command needs to be submitted via
one specified queue, which is fundamentally not compatible with blk-mq's
queue mapping model.

Given the connect command is so special for nvme, I'd suggest to let driver
handle it completely, since blk-mq is supposed to serve generic IO function.


Linux-nvme mailing list

  reply	other threads:[~2019-11-16  7:18 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-15 10:42 [PATCH RFC 0/3] blk-mq/nvme: use blk_mq_alloc_request() for NVMe's connect request Ming Lei
2019-11-15 10:42 ` [PATCH RFC 1/3] block: reuse one scheduler/flush field for private request's data Ming Lei
2019-11-15 10:42 ` [PATCH RFC 2/3] nvme: don't use blk_mq_alloc_request_hctx() for allocating connect request Ming Lei
2019-11-15 10:42 ` [PATCH RFC 3/3] blk-mq: kill blk_mq_alloc_request_hctx() Ming Lei
2019-11-15 22:38 ` [PATCH RFC 0/3] blk-mq/nvme: use blk_mq_alloc_request() for NVMe's connect request Sagi Grimberg
2019-11-16  7:17   ` Ming Lei [this message]
2019-11-17  1:24     ` Bart Van Assche
2019-11-17  4:12       ` Ming Lei
2019-11-18 23:27         ` Bart Van Assche
2019-11-19  0:05     ` Sagi Grimberg
2019-11-19  0:34       ` Keith Busch
2019-11-19  1:43         ` Sagi Grimberg
2019-11-19  2:38         ` Ming Lei
2019-11-19  2:33       ` Ming Lei
2019-11-19 17:56       ` James Smart
2019-11-20  6:35         ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191116071754.GB18194@ming.t460p \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).