[RFC PATCH] nvme: request remote is usually not involved for nvme devices

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH] nvme: request remote is usually not involved for nvme devices
@ 2022-09-17 16:40 Liu Song
  2022-09-17 16:50 ` Jens Axboe
  0 siblings, 1 reply; 11+ messages in thread
From: Liu Song @ 2022-09-17 16:40 UTC (permalink / raw)
  To: kbusch, axboe, hch, sagi; +Cc: linux-nvme, linux-kernel

From: Liu Song <liusong@linux.alibaba.com>

NVMe devices usually have a 1:1 mapping between "ctx" and "hctx",
so when "nr_ctx" is equal to 1, there is no possibility of remote
request, so the corresponding process can be simplified.

Signed-off-by: Liu Song <liusong@linux.alibaba.com>
---
 drivers/nvme/host/nvme.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 216acbe..cc21896 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -672,6 +672,10 @@ static inline bool nvme_try_complete_req(struct request *req, __le16 status,
 	nvme_should_fail(req);
 	if (unlikely(blk_should_fake_timeout(req->q)))
 		return true;
+	if (likely(req->mq_hctx->nr_ctx == 1)) {
+		WRITE_ONCE(req->state, MQ_RQ_COMPLETE);
+		return false;
+	}
 	return blk_mq_complete_request_remote(req);
 }
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] nvme: request remote is usually not involved for nvme devices
  2022-09-17 16:40 [RFC PATCH] nvme: request remote is usually not involved for nvme devices Liu Song
@ 2022-09-17 16:50 ` Jens Axboe
  2022-09-18 16:10   ` Liu Song
  0 siblings, 1 reply; 11+ messages in thread
From: Jens Axboe @ 2022-09-17 16:50 UTC (permalink / raw)
  To: Liu Song, kbusch, axboe, hch, sagi; +Cc: linux-nvme, linux-kernel

On 9/17/22 10:40 AM, Liu Song wrote:
> From: Liu Song <liusong@linux.alibaba.com>
> 
> NVMe devices usually have a 1:1 mapping between "ctx" and "hctx",
> so when "nr_ctx" is equal to 1, there is no possibility of remote
> request, so the corresponding process can be simplified.

If the worry is the call overhead of blk_mq_complete_request_remote(),
why don't we just make that available as an inline instead? That seems
vastly superior to providing a random shortcut in a driver to avoid
calling it.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] nvme: request remote is usually not involved for nvme devices
  2022-09-17 16:50 ` Jens Axboe
@ 2022-09-18 16:10   ` Liu Song
  2022-09-19 14:10     ` Jens Axboe
  0 siblings, 1 reply; 11+ messages in thread
From: Liu Song @ 2022-09-18 16:10 UTC (permalink / raw)
  To: Jens Axboe, kbusch, axboe, hch, sagi; +Cc: linux-nvme, linux-kernel

On 2022/9/18 00:50, Jens Axboe wrote:
> On 9/17/22 10:40 AM, Liu Song wrote:
>> From: Liu Song <liusong@linux.alibaba.com>
>>
>> NVMe devices usually have a 1:1 mapping between "ctx" and "hctx",
>> so when "nr_ctx" is equal to 1, there is no possibility of remote
>> request, so the corresponding process can be simplified.
> If the worry is the call overhead of blk_mq_complete_request_remote(),
> why don't we just make that available as an inline instead? That seems
> vastly superior to providing a random shortcut in a driver to avoid
> calling it.

Hi

This is what I think about it. If it is an SSD with only one hw queue, 
remote requests will
appear occasionally. As a real multi-queue device, nvme usually does not 
have remote
requests, so we don't need to care about it. So even if 
"blk_mq_complete_request_remote"
is called, there is a high probability that it will return false, and 
the cost of changing the call
to "blk_mq_complete_request_remote" to determine whether 
"req->mq_hctx->nr_ctx" is 1 is
not only a function call, but more judgments in 
"blk_mq_complete_request_remote".
If "blk_mq_complete_request_remote" is decorated as inline, it also 
depends on the compiler,
there is uncertainty.

Thanks

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] nvme: request remote is usually not involved for nvme devices
  2022-09-18 16:10   ` Liu Song
@ 2022-09-19 14:10     ` Jens Axboe
  2022-09-19 14:35       ` Christoph Hellwig
  2022-09-21  3:32       ` [RFC PATCH] blk-mq: hctx has only one ctx mapping is no need to redirect the completion Liu Song
  0 siblings, 2 replies; 11+ messages in thread
From: Jens Axboe @ 2022-09-19 14:10 UTC (permalink / raw)
  To: Liu Song, kbusch, hch, sagi; +Cc: linux-nvme, linux-kernel

On 9/18/22 10:10 AM, Liu Song wrote:
> 
> On 2022/9/18 00:50, Jens Axboe wrote:
>> On 9/17/22 10:40 AM, Liu Song wrote:
>>> From: Liu Song <liusong@linux.alibaba.com>
>>>
>>> NVMe devices usually have a 1:1 mapping between "ctx" and "hctx",
>>> so when "nr_ctx" is equal to 1, there is no possibility of remote
>>> request, so the corresponding process can be simplified.
>> If the worry is the call overhead of blk_mq_complete_request_remote(),
>> why don't we just make that available as an inline instead? That seems
>> vastly superior to providing a random shortcut in a driver to avoid
>> calling it.
> 
> Hi
> 
> This is what I think about it. If it is an SSD with only one hw queue,
> remote requests will appear occasionally. As a real multi-queue
> device, nvme usually does not have remote requests, so we don't need
> to care about it. So even if "blk_mq_complete_request_remote" is
> called, there is a high probability that it will return false, and the
> cost of changing the call to "blk_mq_complete_request_remote" to
> determine whether "req->mq_hctx->nr_ctx" is 1 is not only a function
> call, but more judgments in "blk_mq_complete_request_remote". If
> "blk_mq_complete_request_remote" is decorated as inline, it also
> depends on the compiler, there is uncertainty.

I'm not disagreeing with any of that, my point is just that you're
hacking around this in the nvme driver. This is problematic whenever
core changes happen, because now we have to touch individual drivers.
While the expectation is that there are no remote IPI completions for
NVMe, queue starved devices do exist and those do see remote
completions.

This optimization belongs in the blk-mq core, not in nvme. I do think it
makes sense, you just need to solve it in blk-mq rather than in the nvme
driver.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] nvme: request remote is usually not involved for nvme devices
  2022-09-19 14:10     ` Jens Axboe
@ 2022-09-19 14:35       ` Christoph Hellwig
  2022-09-21  3:40         ` Liu Song
  2022-09-21  3:32       ` [RFC PATCH] blk-mq: hctx has only one ctx mapping is no need to redirect the completion Liu Song
  1 sibling, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2022-09-19 14:35 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Liu Song, kbusch, hch, sagi, linux-nvme, linux-kernel

On Mon, Sep 19, 2022 at 08:10:31AM -0600, Jens Axboe wrote:
> I'm not disagreeing with any of that, my point is just that you're
> hacking around this in the nvme driver. This is problematic whenever
> core changes happen, because now we have to touch individual drivers.
> While the expectation is that there are no remote IPI completions for
> NVMe, queue starved devices do exist and those do see remote
> completions.
> 
> This optimization belongs in the blk-mq core, not in nvme. I do think it
> makes sense, you just need to solve it in blk-mq rather than in the nvme
> driver.

I'd also really see solid numbers to justify it.

And btw, having more than one core per queue is quite common in
nvme.  Even many enterprise SSDs only have 64 queues, and some of
the low-end consumers ones have very low number of queues that are
not enough for the number of cores in even mid-end desktop CPUs.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH] blk-mq: hctx has only one ctx mapping is no need to redirect the completion
  2022-09-19 14:10     ` Jens Axboe
  2022-09-19 14:35       ` Christoph Hellwig
@ 2022-09-21  3:32       ` Liu Song
  2022-09-22  6:20         ` Christoph Hellwig
  2022-09-24 15:03         ` [RFC PATCH] " Jens Axboe
  1 sibling, 2 replies; 11+ messages in thread
From: Liu Song @ 2022-09-21  3:32 UTC (permalink / raw)
  To: axboe; +Cc: hch, kbusch, linux-kernel, linux-nvme, liusong, sagi

From: Liu Song <liusong@linux.alibaba.com>

High-performance NVMe devices usually support a large hw queue, which
ensures a 1:1 mapping of hctx and ctx. In this case there will be no
remote request, so we don't need to care about it.

Signed-off-by: Liu Song <liusong@linux.alibaba.com>
---
 block/blk-mq.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index c11949d..9626ea1 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1093,10 +1093,12 @@ bool blk_mq_complete_request_remote(struct request *rq)
 	WRITE_ONCE(rq->state, MQ_RQ_COMPLETE);
 
 	/*
-	 * For a polled request, always complete locally, it's pointless
-	 * to redirect the completion.
+	 * For request which hctx has only one ctx mapping,
+	 * or a polled request, always complete locally,
+	 * it's pointless to redirect the completion.
 	 */
-	if (rq->cmd_flags & REQ_POLLED)
+	if (rq->mq_hctx->nr_ctx == 1 ||
+		rq->cmd_flags & REQ_POLLED)
 		return false;
 
 	if (blk_mq_complete_need_ipi(rq)) {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] nvme: request remote is usually not involved for nvme devices
  2022-09-19 14:35       ` Christoph Hellwig
@ 2022-09-21  3:40         ` Liu Song
  0 siblings, 0 replies; 11+ messages in thread
From: Liu Song @ 2022-09-21  3:40 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe; +Cc: kbusch, sagi, linux-nvme, linux-kernel


On 2022/9/19 22:35, Christoph Hellwig wrote:
> On Mon, Sep 19, 2022 at 08:10:31AM -0600, Jens Axboe wrote:
>> I'm not disagreeing with any of that, my point is just that you're
>> hacking around this in the nvme driver. This is problematic whenever
>> core changes happen, because now we have to touch individual drivers.
>> While the expectation is that there are no remote IPI completions for
>> NVMe, queue starved devices do exist and those do see remote
>> completions.
>>
>> This optimization belongs in the blk-mq core, not in nvme. I do think it
>> makes sense, you just need to solve it in blk-mq rather than in the nvme
>> driver.
> I'd also really see solid numbers to justify it.
>
> And btw, having more than one core per queue is quite common in
> nvme.  Even many enterprise SSDs only have 64 queues, and some of
> the low-end consumers ones have very low number of queues that are
> not enough for the number of cores in even mid-end desktop CPUs.

Hi

Thank you for your suggestion. Here is what I think about it. NVMe 
devices that can support
one core per queue are usually high-performance, so I prefer to make 
more optimizations
for high-performance devices, while for ordinary consumption class NVMe 
devices, such
modifications will not bring special additional overhead.

Thanks


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] blk-mq: hctx has only one ctx mapping is no need to redirect the completion
  2022-09-21  3:32       ` [RFC PATCH] blk-mq: hctx has only one ctx mapping is no need to redirect the completion Liu Song
@ 2022-09-22  6:20         ` Christoph Hellwig
  2022-09-22  7:17           ` Liu Song
  2022-09-22  8:27           ` [RFC PATCH v2] " Liu Song
  2022-09-24 15:03         ` [RFC PATCH] " Jens Axboe
  1 sibling, 2 replies; 11+ messages in thread
From: Christoph Hellwig @ 2022-09-22  6:20 UTC (permalink / raw)
  To: Liu Song; +Cc: axboe, hch, kbusch, linux-kernel, linux-nvme, sagi

On Wed, Sep 21, 2022 at 11:32:03AM +0800, Liu Song wrote:
> From: Liu Song <liusong@linux.alibaba.com>
> 
> High-performance NVMe devices usually support a large hw queue, which

a larger number of?

>  	/*
> -	 * For a polled request, always complete locally, it's pointless
> -	 * to redirect the completion.
> +	 * For request which hctx has only one ctx mapping,
> +	 * or a polled request, always complete locally,
> +	 * it's pointless to redirect the completion.
>  	 */
> -	if (rq->cmd_flags & REQ_POLLED)
> +	if (rq->mq_hctx->nr_ctx == 1 ||
> +		rq->cmd_flags & REQ_POLLED)

Some very odd comment formatting and and indentation here.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] blk-mq: hctx has only one ctx mapping is no need to redirect the completion
  2022-09-22  6:20         ` Christoph Hellwig
@ 2022-09-22  7:17           ` Liu Song
  2022-09-22  8:27           ` [RFC PATCH v2] " Liu Song
  1 sibling, 0 replies; 11+ messages in thread
From: Liu Song @ 2022-09-22  7:17 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: axboe, kbusch, linux-kernel, linux-nvme, sagi


On 2022/9/22 14:20, Christoph Hellwig wrote:

> On Wed, Sep 21, 2022 at 11:32:03AM +0800, Liu Song wrote:
>> From: Liu Song <liusong@linux.alibaba.com>
>>
>> High-performance NVMe devices usually support a large hw queue, which
> a larger number of?
>
>>   	/*
>> -	 * For a polled request, always complete locally, it's pointless
>> -	 * to redirect the completion.
>> +	 * For request which hctx has only one ctx mapping,
>> +	 * or a polled request, always complete locally,
>> +	 * it's pointless to redirect the completion.
>>   	 */
>> -	if (rq->cmd_flags & REQ_POLLED)
>> +	if (rq->mq_hctx->nr_ctx == 1 ||
>> +		rq->cmd_flags & REQ_POLLED)
> Some very odd comment formatting and and indentation here.

Thanks, I will issue a V2 patch as suggested.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH v2] blk-mq: hctx has only one ctx mapping is no need to redirect the completion
  2022-09-22  6:20         ` Christoph Hellwig
  2022-09-22  7:17           ` Liu Song
@ 2022-09-22  8:27           ` Liu Song
  1 sibling, 0 replies; 11+ messages in thread
From: Liu Song @ 2022-09-22  8:27 UTC (permalink / raw)
  To: hch; +Cc: axboe, kbusch, linux-kernel, linux-nvme, liusong, sagi

From: Liu Song <liusong@linux.alibaba.com>

High-performance NVMe devices usually support a larger number of hw queue,
which ensures a 1:1 mapping of hctx and ctx. In this case there will be no
remote request, so we don't need to care about it.

Signed-off-by: Liu Song <liusong@linux.alibaba.com>
---
 block/blk-mq.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index c11949d..7f9be13 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1093,10 +1093,11 @@ bool blk_mq_complete_request_remote(struct request *rq)
 	WRITE_ONCE(rq->state, MQ_RQ_COMPLETE);
 
 	/*
-	 * For a polled request, always complete locally, it's pointless
-	 * to redirect the completion.
+	 * If the request's hctx has only one ctx mapping,
+	 * or is a polled request, both always complete locally,
+	 * it's pointless to redirect the completion.
 	 */
-	if (rq->cmd_flags & REQ_POLLED)
+	if (rq->mq_hctx->nr_ctx == 1 || rq->cmd_flags & REQ_POLLED)
 		return false;
 
 	if (blk_mq_complete_need_ipi(rq)) {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH] blk-mq: hctx has only one ctx mapping is no need to redirect the completion
  2022-09-21  3:32       ` [RFC PATCH] blk-mq: hctx has only one ctx mapping is no need to redirect the completion Liu Song
  2022-09-22  6:20         ` Christoph Hellwig
@ 2022-09-24 15:03         ` Jens Axboe
  1 sibling, 0 replies; 11+ messages in thread
From: Jens Axboe @ 2022-09-24 15:03 UTC (permalink / raw)
  To: Liu Song; +Cc: hch, sagi, linux-kernel, linux-nvme, kbusch

On Wed, 21 Sep 2022 11:32:03 +0800, Liu Song wrote:
> From: Liu Song <liusong@linux.alibaba.com>
> 
> High-performance NVMe devices usually support a large hw queue, which
> ensures a 1:1 mapping of hctx and ctx. In this case there will be no
> remote request, so we don't need to care about it.
> 
> 
> [...]

Applied, thanks!

[1/1] blk-mq: hctx has only one ctx mapping is no need to redirect the completion
      commit: f168420c62e7a106961f2489a89f6ade84fe3f27

Best regards,
-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-09-24 15:03 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-17 16:40 [RFC PATCH] nvme: request remote is usually not involved for nvme devices Liu Song
2022-09-17 16:50 ` Jens Axboe
2022-09-18 16:10   ` Liu Song
2022-09-19 14:10     ` Jens Axboe
2022-09-19 14:35       ` Christoph Hellwig
2022-09-21  3:40         ` Liu Song
2022-09-21  3:32       ` [RFC PATCH] blk-mq: hctx has only one ctx mapping is no need to redirect the completion Liu Song
2022-09-22  6:20         ` Christoph Hellwig
2022-09-22  7:17           ` Liu Song
2022-09-22  8:27           ` [RFC PATCH v2] " Liu Song
2022-09-24 15:03         ` [RFC PATCH] " Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.