* [RFC PATCH] nvme: request remote is usually not involved for nvme devices
@ 2022-09-17 16:40 Liu Song
2022-09-17 16:50 ` Jens Axboe
0 siblings, 1 reply; 11+ messages in thread
From: Liu Song @ 2022-09-17 16:40 UTC (permalink / raw)
To: kbusch, axboe, hch, sagi; +Cc: linux-nvme, linux-kernel
From: Liu Song <liusong@linux.alibaba.com>
NVMe devices usually have a 1:1 mapping between "ctx" and "hctx",
so when "nr_ctx" is equal to 1, there is no possibility of remote
request, so the corresponding process can be simplified.
Signed-off-by: Liu Song <liusong@linux.alibaba.com>
---
drivers/nvme/host/nvme.h | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 216acbe..cc21896 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -672,6 +672,10 @@ static inline bool nvme_try_complete_req(struct request *req, __le16 status,
nvme_should_fail(req);
if (unlikely(blk_should_fake_timeout(req->q)))
return true;
+ if (likely(req->mq_hctx->nr_ctx == 1)) {
+ WRITE_ONCE(req->state, MQ_RQ_COMPLETE);
+ return false;
+ }
return blk_mq_complete_request_remote(req);
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [RFC PATCH] nvme: request remote is usually not involved for nvme devices
2022-09-17 16:40 [RFC PATCH] nvme: request remote is usually not involved for nvme devices Liu Song
@ 2022-09-17 16:50 ` Jens Axboe
2022-09-18 16:10 ` Liu Song
0 siblings, 1 reply; 11+ messages in thread
From: Jens Axboe @ 2022-09-17 16:50 UTC (permalink / raw)
To: Liu Song, kbusch, axboe, hch, sagi; +Cc: linux-nvme, linux-kernel
On 9/17/22 10:40 AM, Liu Song wrote:
> From: Liu Song <liusong@linux.alibaba.com>
>
> NVMe devices usually have a 1:1 mapping between "ctx" and "hctx",
> so when "nr_ctx" is equal to 1, there is no possibility of remote
> request, so the corresponding process can be simplified.
If the worry is the call overhead of blk_mq_complete_request_remote(),
why don't we just make that available as an inline instead? That seems
vastly superior to providing a random shortcut in a driver to avoid
calling it.
--
Jens Axboe
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH] nvme: request remote is usually not involved for nvme devices
2022-09-17 16:50 ` Jens Axboe
@ 2022-09-18 16:10 ` Liu Song
2022-09-19 14:10 ` Jens Axboe
0 siblings, 1 reply; 11+ messages in thread
From: Liu Song @ 2022-09-18 16:10 UTC (permalink / raw)
To: Jens Axboe, kbusch, axboe, hch, sagi; +Cc: linux-nvme, linux-kernel
On 2022/9/18 00:50, Jens Axboe wrote:
> On 9/17/22 10:40 AM, Liu Song wrote:
>> From: Liu Song <liusong@linux.alibaba.com>
>>
>> NVMe devices usually have a 1:1 mapping between "ctx" and "hctx",
>> so when "nr_ctx" is equal to 1, there is no possibility of remote
>> request, so the corresponding process can be simplified.
> If the worry is the call overhead of blk_mq_complete_request_remote(),
> why don't we just make that available as an inline instead? That seems
> vastly superior to providing a random shortcut in a driver to avoid
> calling it.
Hi
This is what I think about it. If it is an SSD with only one hw queue,
remote requests will
appear occasionally. As a real multi-queue device, nvme usually does not
have remote
requests, so we don't need to care about it. So even if
"blk_mq_complete_request_remote"
is called, there is a high probability that it will return false, and
the cost of changing the call
to "blk_mq_complete_request_remote" to determine whether
"req->mq_hctx->nr_ctx" is 1 is
not only a function call, but more judgments in
"blk_mq_complete_request_remote".
If "blk_mq_complete_request_remote" is decorated as inline, it also
depends on the compiler,
there is uncertainty.
Thanks
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH] nvme: request remote is usually not involved for nvme devices
2022-09-18 16:10 ` Liu Song
@ 2022-09-19 14:10 ` Jens Axboe
2022-09-19 14:35 ` Christoph Hellwig
2022-09-21 3:32 ` [RFC PATCH] blk-mq: hctx has only one ctx mapping is no need to redirect the completion Liu Song
0 siblings, 2 replies; 11+ messages in thread
From: Jens Axboe @ 2022-09-19 14:10 UTC (permalink / raw)
To: Liu Song, kbusch, hch, sagi; +Cc: linux-nvme, linux-kernel
On 9/18/22 10:10 AM, Liu Song wrote:
>
> On 2022/9/18 00:50, Jens Axboe wrote:
>> On 9/17/22 10:40 AM, Liu Song wrote:
>>> From: Liu Song <liusong@linux.alibaba.com>
>>>
>>> NVMe devices usually have a 1:1 mapping between "ctx" and "hctx",
>>> so when "nr_ctx" is equal to 1, there is no possibility of remote
>>> request, so the corresponding process can be simplified.
>> If the worry is the call overhead of blk_mq_complete_request_remote(),
>> why don't we just make that available as an inline instead? That seems
>> vastly superior to providing a random shortcut in a driver to avoid
>> calling it.
>
> Hi
>
> This is what I think about it. If it is an SSD with only one hw queue,
> remote requests will appear occasionally. As a real multi-queue
> device, nvme usually does not have remote requests, so we don't need
> to care about it. So even if "blk_mq_complete_request_remote" is
> called, there is a high probability that it will return false, and the
> cost of changing the call to "blk_mq_complete_request_remote" to
> determine whether "req->mq_hctx->nr_ctx" is 1 is not only a function
> call, but more judgments in "blk_mq_complete_request_remote". If
> "blk_mq_complete_request_remote" is decorated as inline, it also
> depends on the compiler, there is uncertainty.
I'm not disagreeing with any of that, my point is just that you're
hacking around this in the nvme driver. This is problematic whenever
core changes happen, because now we have to touch individual drivers.
While the expectation is that there are no remote IPI completions for
NVMe, queue starved devices do exist and those do see remote
completions.
This optimization belongs in the blk-mq core, not in nvme. I do think it
makes sense, you just need to solve it in blk-mq rather than in the nvme
driver.
--
Jens Axboe
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH] nvme: request remote is usually not involved for nvme devices
2022-09-19 14:10 ` Jens Axboe
@ 2022-09-19 14:35 ` Christoph Hellwig
2022-09-21 3:40 ` Liu Song
2022-09-21 3:32 ` [RFC PATCH] blk-mq: hctx has only one ctx mapping is no need to redirect the completion Liu Song
1 sibling, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2022-09-19 14:35 UTC (permalink / raw)
To: Jens Axboe; +Cc: Liu Song, kbusch, hch, sagi, linux-nvme, linux-kernel
On Mon, Sep 19, 2022 at 08:10:31AM -0600, Jens Axboe wrote:
> I'm not disagreeing with any of that, my point is just that you're
> hacking around this in the nvme driver. This is problematic whenever
> core changes happen, because now we have to touch individual drivers.
> While the expectation is that there are no remote IPI completions for
> NVMe, queue starved devices do exist and those do see remote
> completions.
>
> This optimization belongs in the blk-mq core, not in nvme. I do think it
> makes sense, you just need to solve it in blk-mq rather than in the nvme
> driver.
I'd also really see solid numbers to justify it.
And btw, having more than one core per queue is quite common in
nvme. Even many enterprise SSDs only have 64 queues, and some of
the low-end consumers ones have very low number of queues that are
not enough for the number of cores in even mid-end desktop CPUs.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC PATCH] blk-mq: hctx has only one ctx mapping is no need to redirect the completion
2022-09-19 14:10 ` Jens Axboe
2022-09-19 14:35 ` Christoph Hellwig
@ 2022-09-21 3:32 ` Liu Song
2022-09-22 6:20 ` Christoph Hellwig
2022-09-24 15:03 ` [RFC PATCH] " Jens Axboe
1 sibling, 2 replies; 11+ messages in thread
From: Liu Song @ 2022-09-21 3:32 UTC (permalink / raw)
To: axboe; +Cc: hch, kbusch, linux-kernel, linux-nvme, liusong, sagi
From: Liu Song <liusong@linux.alibaba.com>
High-performance NVMe devices usually support a large hw queue, which
ensures a 1:1 mapping of hctx and ctx. In this case there will be no
remote request, so we don't need to care about it.
Signed-off-by: Liu Song <liusong@linux.alibaba.com>
---
block/blk-mq.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index c11949d..9626ea1 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1093,10 +1093,12 @@ bool blk_mq_complete_request_remote(struct request *rq)
WRITE_ONCE(rq->state, MQ_RQ_COMPLETE);
/*
- * For a polled request, always complete locally, it's pointless
- * to redirect the completion.
+ * For request which hctx has only one ctx mapping,
+ * or a polled request, always complete locally,
+ * it's pointless to redirect the completion.
*/
- if (rq->cmd_flags & REQ_POLLED)
+ if (rq->mq_hctx->nr_ctx == 1 ||
+ rq->cmd_flags & REQ_POLLED)
return false;
if (blk_mq_complete_need_ipi(rq)) {
--
1.8.3.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [RFC PATCH] nvme: request remote is usually not involved for nvme devices
2022-09-19 14:35 ` Christoph Hellwig
@ 2022-09-21 3:40 ` Liu Song
0 siblings, 0 replies; 11+ messages in thread
From: Liu Song @ 2022-09-21 3:40 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe; +Cc: kbusch, sagi, linux-nvme, linux-kernel
On 2022/9/19 22:35, Christoph Hellwig wrote:
> On Mon, Sep 19, 2022 at 08:10:31AM -0600, Jens Axboe wrote:
>> I'm not disagreeing with any of that, my point is just that you're
>> hacking around this in the nvme driver. This is problematic whenever
>> core changes happen, because now we have to touch individual drivers.
>> While the expectation is that there are no remote IPI completions for
>> NVMe, queue starved devices do exist and those do see remote
>> completions.
>>
>> This optimization belongs in the blk-mq core, not in nvme. I do think it
>> makes sense, you just need to solve it in blk-mq rather than in the nvme
>> driver.
> I'd also really see solid numbers to justify it.
>
> And btw, having more than one core per queue is quite common in
> nvme. Even many enterprise SSDs only have 64 queues, and some of
> the low-end consumers ones have very low number of queues that are
> not enough for the number of cores in even mid-end desktop CPUs.
Hi
Thank you for your suggestion. Here is what I think about it. NVMe
devices that can support
one core per queue are usually high-performance, so I prefer to make
more optimizations
for high-performance devices, while for ordinary consumption class NVMe
devices, such
modifications will not bring special additional overhead.
Thanks
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH] blk-mq: hctx has only one ctx mapping is no need to redirect the completion
2022-09-21 3:32 ` [RFC PATCH] blk-mq: hctx has only one ctx mapping is no need to redirect the completion Liu Song
@ 2022-09-22 6:20 ` Christoph Hellwig
2022-09-22 7:17 ` Liu Song
2022-09-22 8:27 ` [RFC PATCH v2] " Liu Song
2022-09-24 15:03 ` [RFC PATCH] " Jens Axboe
1 sibling, 2 replies; 11+ messages in thread
From: Christoph Hellwig @ 2022-09-22 6:20 UTC (permalink / raw)
To: Liu Song; +Cc: axboe, hch, kbusch, linux-kernel, linux-nvme, sagi
On Wed, Sep 21, 2022 at 11:32:03AM +0800, Liu Song wrote:
> From: Liu Song <liusong@linux.alibaba.com>
>
> High-performance NVMe devices usually support a large hw queue, which
a larger number of?
> /*
> - * For a polled request, always complete locally, it's pointless
> - * to redirect the completion.
> + * For request which hctx has only one ctx mapping,
> + * or a polled request, always complete locally,
> + * it's pointless to redirect the completion.
> */
> - if (rq->cmd_flags & REQ_POLLED)
> + if (rq->mq_hctx->nr_ctx == 1 ||
> + rq->cmd_flags & REQ_POLLED)
Some very odd comment formatting and and indentation here.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH] blk-mq: hctx has only one ctx mapping is no need to redirect the completion
2022-09-22 6:20 ` Christoph Hellwig
@ 2022-09-22 7:17 ` Liu Song
2022-09-22 8:27 ` [RFC PATCH v2] " Liu Song
1 sibling, 0 replies; 11+ messages in thread
From: Liu Song @ 2022-09-22 7:17 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: axboe, kbusch, linux-kernel, linux-nvme, sagi
On 2022/9/22 14:20, Christoph Hellwig wrote:
> On Wed, Sep 21, 2022 at 11:32:03AM +0800, Liu Song wrote:
>> From: Liu Song <liusong@linux.alibaba.com>
>>
>> High-performance NVMe devices usually support a large hw queue, which
> a larger number of?
>
>> /*
>> - * For a polled request, always complete locally, it's pointless
>> - * to redirect the completion.
>> + * For request which hctx has only one ctx mapping,
>> + * or a polled request, always complete locally,
>> + * it's pointless to redirect the completion.
>> */
>> - if (rq->cmd_flags & REQ_POLLED)
>> + if (rq->mq_hctx->nr_ctx == 1 ||
>> + rq->cmd_flags & REQ_POLLED)
> Some very odd comment formatting and and indentation here.
Thanks, I will issue a V2 patch as suggested.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC PATCH v2] blk-mq: hctx has only one ctx mapping is no need to redirect the completion
2022-09-22 6:20 ` Christoph Hellwig
2022-09-22 7:17 ` Liu Song
@ 2022-09-22 8:27 ` Liu Song
1 sibling, 0 replies; 11+ messages in thread
From: Liu Song @ 2022-09-22 8:27 UTC (permalink / raw)
To: hch; +Cc: axboe, kbusch, linux-kernel, linux-nvme, liusong, sagi
From: Liu Song <liusong@linux.alibaba.com>
High-performance NVMe devices usually support a larger number of hw queue,
which ensures a 1:1 mapping of hctx and ctx. In this case there will be no
remote request, so we don't need to care about it.
Signed-off-by: Liu Song <liusong@linux.alibaba.com>
---
block/blk-mq.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index c11949d..7f9be13 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1093,10 +1093,11 @@ bool blk_mq_complete_request_remote(struct request *rq)
WRITE_ONCE(rq->state, MQ_RQ_COMPLETE);
/*
- * For a polled request, always complete locally, it's pointless
- * to redirect the completion.
+ * If the request's hctx has only one ctx mapping,
+ * or is a polled request, both always complete locally,
+ * it's pointless to redirect the completion.
*/
- if (rq->cmd_flags & REQ_POLLED)
+ if (rq->mq_hctx->nr_ctx == 1 || rq->cmd_flags & REQ_POLLED)
return false;
if (blk_mq_complete_need_ipi(rq)) {
--
1.8.3.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [RFC PATCH] blk-mq: hctx has only one ctx mapping is no need to redirect the completion
2022-09-21 3:32 ` [RFC PATCH] blk-mq: hctx has only one ctx mapping is no need to redirect the completion Liu Song
2022-09-22 6:20 ` Christoph Hellwig
@ 2022-09-24 15:03 ` Jens Axboe
1 sibling, 0 replies; 11+ messages in thread
From: Jens Axboe @ 2022-09-24 15:03 UTC (permalink / raw)
To: Liu Song; +Cc: hch, sagi, linux-kernel, linux-nvme, kbusch
On Wed, 21 Sep 2022 11:32:03 +0800, Liu Song wrote:
> From: Liu Song <liusong@linux.alibaba.com>
>
> High-performance NVMe devices usually support a large hw queue, which
> ensures a 1:1 mapping of hctx and ctx. In this case there will be no
> remote request, so we don't need to care about it.
>
>
> [...]
Applied, thanks!
[1/1] blk-mq: hctx has only one ctx mapping is no need to redirect the completion
commit: f168420c62e7a106961f2489a89f6ade84fe3f27
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2022-09-24 15:03 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-17 16:40 [RFC PATCH] nvme: request remote is usually not involved for nvme devices Liu Song
2022-09-17 16:50 ` Jens Axboe
2022-09-18 16:10 ` Liu Song
2022-09-19 14:10 ` Jens Axboe
2022-09-19 14:35 ` Christoph Hellwig
2022-09-21 3:40 ` Liu Song
2022-09-21 3:32 ` [RFC PATCH] blk-mq: hctx has only one ctx mapping is no need to redirect the completion Liu Song
2022-09-22 6:20 ` Christoph Hellwig
2022-09-22 7:17 ` Liu Song
2022-09-22 8:27 ` [RFC PATCH v2] " Liu Song
2022-09-24 15:03 ` [RFC PATCH] " Jens Axboe
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.