All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nvme: don't memset() the normal read/write command
@ 2021-10-12 18:13 Jens Axboe
  2021-10-12 18:31 ` Keith Busch
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Jens Axboe @ 2021-10-12 18:13 UTC (permalink / raw)
  To: linux-block, Christoph Hellwig, Keith Busch

This memset in the fast path costs a lot of cycles on my setup. Here's a
top-of-profile of doing ~6.7M IOPS:

+    5.90%  io_uring  [nvme]            [k] nvme_queue_rq
+    5.32%  io_uring  [nvme_core]       [k] nvme_setup_cmd
+    5.17%  io_uring  [kernel.vmlinux]  [k] io_submit_sqes
+    4.97%  io_uring  [kernel.vmlinux]  [k] blkdev_direct_IO

and a perf diff with this patch:

     0.92%     +4.40%  [nvme_core]       [k] nvme_setup_cmd

reducing it from 5.3% to only 0.9%. This takes it from the 2nd most
cycle consumer to something that's mostly irrelevant.

Retain the full clear for the other commands to avoid doing any audits
there, and just clear the fields in the rw command manually that we
don't already fill.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

---

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index ec7fa6f31e68..c1b19fd69503 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -854,9 +854,16 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns,
 		dsmgmt |= NVME_RW_DSM_FREQ_PREFETCH;
 
 	cmnd->rw.opcode = op;
+	cmnd->rw.flags = 0;
+	cmnd->rw.command_id = 0;
 	cmnd->rw.nsid = cpu_to_le32(ns->head->ns_id);
+	cmnd->rw.rsvd2 = 0;
+	cmnd->rw.metadata = 0;
 	cmnd->rw.slba = cpu_to_le64(nvme_sect_to_lba(ns, blk_rq_pos(req)));
 	cmnd->rw.length = cpu_to_le16((blk_rq_bytes(req) >> ns->lba_shift) - 1);
+	cmnd->rw.reftag = 0;
+	cmnd->rw.apptag = 0;
+	cmnd->rw.appmask = 0;
 
 	if (req_op(req) == REQ_OP_WRITE && ctrl->nr_streams)
 		nvme_assign_write_stream(ctrl, req, &control, &dsmgmt);
@@ -907,51 +914,64 @@ void nvme_cleanup_cmd(struct request *req)
 }
 EXPORT_SYMBOL_GPL(nvme_cleanup_cmd);
 
+static void nvme_clear_cmd(struct request *req)
+{
+	if (!(req->rq_flags & RQF_DONTPREP)) {
+		nvme_clear_nvme_request(req);
+		memset(nvme_req(req)->cmd, 0, sizeof(struct nvme_command));
+	}
+}
+
 blk_status_t nvme_setup_cmd(struct nvme_ns *ns, struct request *req)
 {
 	struct nvme_command *cmd = nvme_req(req)->cmd;
 	struct nvme_ctrl *ctrl = nvme_req(req)->ctrl;
 	blk_status_t ret = BLK_STS_OK;
 
-	if (!(req->rq_flags & RQF_DONTPREP)) {
-		nvme_clear_nvme_request(req);
-		memset(cmd, 0, sizeof(*cmd));
-	}
-
 	switch (req_op(req)) {
 	case REQ_OP_DRV_IN:
 	case REQ_OP_DRV_OUT:
 		/* these are setup prior to execution in nvme_init_request() */
 		break;
 	case REQ_OP_FLUSH:
+		nvme_clear_cmd(req);
 		nvme_setup_flush(ns, cmd);
 		break;
 	case REQ_OP_ZONE_RESET_ALL:
 	case REQ_OP_ZONE_RESET:
+		nvme_clear_cmd(req);
 		ret = nvme_setup_zone_mgmt_send(ns, req, cmd, NVME_ZONE_RESET);
 		break;
 	case REQ_OP_ZONE_OPEN:
+		nvme_clear_cmd(req);
 		ret = nvme_setup_zone_mgmt_send(ns, req, cmd, NVME_ZONE_OPEN);
 		break;
 	case REQ_OP_ZONE_CLOSE:
+		nvme_clear_cmd(req);
 		ret = nvme_setup_zone_mgmt_send(ns, req, cmd, NVME_ZONE_CLOSE);
 		break;
 	case REQ_OP_ZONE_FINISH:
+		nvme_clear_cmd(req);
 		ret = nvme_setup_zone_mgmt_send(ns, req, cmd, NVME_ZONE_FINISH);
 		break;
 	case REQ_OP_WRITE_ZEROES:
+		nvme_clear_cmd(req);
 		ret = nvme_setup_write_zeroes(ns, req, cmd);
 		break;
 	case REQ_OP_DISCARD:
+		nvme_clear_cmd(req);
 		ret = nvme_setup_discard(ns, req, cmd);
 		break;
 	case REQ_OP_READ:
+		nvme_clear_nvme_request(req);
 		ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_read);
 		break;
 	case REQ_OP_WRITE:
+		nvme_clear_nvme_request(req);
 		ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_write);
 		break;
 	case REQ_OP_ZONE_APPEND:
+		nvme_clear_nvme_request(req);
 		ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_zone_append);
 		break;
 	default:

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme: don't memset() the normal read/write command
  2021-10-12 18:13 [PATCH] nvme: don't memset() the normal read/write command Jens Axboe
@ 2021-10-12 18:31 ` Keith Busch
  2021-10-12 18:56   ` Jens Axboe
  2021-10-13  4:14 ` Chaitanya Kulkarni
  2021-10-13  4:58 ` Christoph Hellwig
  2 siblings, 1 reply; 9+ messages in thread
From: Keith Busch @ 2021-10-12 18:31 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Christoph Hellwig

On Tue, Oct 12, 2021 at 12:13:52PM -0600, Jens Axboe wrote:
> This memset in the fast path costs a lot of cycles on my setup. Here's a
> top-of-profile of doing ~6.7M IOPS:
> 
> +    5.90%  io_uring  [nvme]            [k] nvme_queue_rq
> +    5.32%  io_uring  [nvme_core]       [k] nvme_setup_cmd
> +    5.17%  io_uring  [kernel.vmlinux]  [k] io_submit_sqes
> +    4.97%  io_uring  [kernel.vmlinux]  [k] blkdev_direct_IO
> 
> and a perf diff with this patch:
> 
>      0.92%     +4.40%  [nvme_core]       [k] nvme_setup_cmd
> 
> reducing it from 5.3% to only 0.9%. This takes it from the 2nd most
> cycle consumer to something that's mostly irrelevant.
> 
> Retain the full clear for the other commands to avoid doing any audits
> there, and just clear the fields in the rw command manually that we
> don't already fill.

Oo, we knew about this optimization *years* ago, yet didn't do anything
about it! Better late than never.

  http://lists.infradead.org/pipermail/linux-nvme/2014-May/000837.html

Acked-by: Keith Busch <kbusch@kernel.org>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme: don't memset() the normal read/write command
  2021-10-12 18:31 ` Keith Busch
@ 2021-10-12 18:56   ` Jens Axboe
  0 siblings, 0 replies; 9+ messages in thread
From: Jens Axboe @ 2021-10-12 18:56 UTC (permalink / raw)
  To: Keith Busch; +Cc: linux-block, Christoph Hellwig

On 10/12/21 12:31 PM, Keith Busch wrote:
> On Tue, Oct 12, 2021 at 12:13:52PM -0600, Jens Axboe wrote:
>> This memset in the fast path costs a lot of cycles on my setup. Here's a
>> top-of-profile of doing ~6.7M IOPS:
>>
>> +    5.90%  io_uring  [nvme]            [k] nvme_queue_rq
>> +    5.32%  io_uring  [nvme_core]       [k] nvme_setup_cmd
>> +    5.17%  io_uring  [kernel.vmlinux]  [k] io_submit_sqes
>> +    4.97%  io_uring  [kernel.vmlinux]  [k] blkdev_direct_IO
>>
>> and a perf diff with this patch:
>>
>>      0.92%     +4.40%  [nvme_core]       [k] nvme_setup_cmd
>>
>> reducing it from 5.3% to only 0.9%. This takes it from the 2nd most
>> cycle consumer to something that's mostly irrelevant.
>>
>> Retain the full clear for the other commands to avoid doing any audits
>> there, and just clear the fields in the rw command manually that we
>> don't already fill.
> 
> Oo, we knew about this optimization *years* ago, yet didn't do anything
> about it! Better late than never.
> 
>   http://lists.infradead.org/pipermail/linux-nvme/2014-May/000837.html

Dang, 2014! Better late than never...

> Acked-by: Keith Busch <kbusch@kernel.org>

Added, thanks.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme: don't memset() the normal read/write command
  2021-10-12 18:13 [PATCH] nvme: don't memset() the normal read/write command Jens Axboe
  2021-10-12 18:31 ` Keith Busch
@ 2021-10-13  4:14 ` Chaitanya Kulkarni
  2021-10-13  4:58 ` Christoph Hellwig
  2 siblings, 0 replies; 9+ messages in thread
From: Chaitanya Kulkarni @ 2021-10-13  4:14 UTC (permalink / raw)
  To: Jens Axboe, linux-block, Christoph Hellwig, Keith Busch

On 10/12/2021 11:13 AM, Jens Axboe wrote:
> External email: Use caution opening links or attachments
> 
> 
> This memset in the fast path costs a lot of cycles on my setup. Here's a
> top-of-profile of doing ~6.7M IOPS:
> 
> +    5.90%  io_uring  [nvme]            [k] nvme_queue_rq
> +    5.32%  io_uring  [nvme_core]       [k] nvme_setup_cmd
> +    5.17%  io_uring  [kernel.vmlinux]  [k] io_submit_sqes
> +    4.97%  io_uring  [kernel.vmlinux]  [k] blkdev_direct_IO
> 
> and a perf diff with this patch:
> 
>       0.92%     +4.40%  [nvme_core]       [k] nvme_setup_cmd
> 
> reducing it from 5.3% to only 0.9%. This takes it from the 2nd most
> cycle consumer to something that's mostly irrelevant.
> 
> Retain the full clear for the other commands to avoid doing any audits
> there, and just clear the fields in the rw command manually that we
> don't already fill.
> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
> ---
>

Looks good to me.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme: don't memset() the normal read/write command
  2021-10-12 18:13 [PATCH] nvme: don't memset() the normal read/write command Jens Axboe
  2021-10-12 18:31 ` Keith Busch
  2021-10-13  4:14 ` Chaitanya Kulkarni
@ 2021-10-13  4:58 ` Christoph Hellwig
  2021-10-18 10:22   ` Christoph Hellwig
  2021-10-18 12:33   ` Jens Axboe
  2 siblings, 2 replies; 9+ messages in thread
From: Christoph Hellwig @ 2021-10-13  4:58 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Keith Busch

I like this in general, but some comments below:

>  	cmnd->rw.opcode = op;
> +	cmnd->rw.flags = 0;
> +	cmnd->rw.command_id = 0;

The command_id is always set in nvme_setup_cmd, so no need to clear it
here.

>  	cmnd->rw.nsid = cpu_to_le32(ns->head->ns_id);
> +	cmnd->rw.rsvd2 = 0;

There should be no need to clear this reserved space.

> +	cmnd->rw.metadata = 0;
>  	cmnd->rw.slba = cpu_to_le64(nvme_sect_to_lba(ns, blk_rq_pos(req)));
>  	cmnd->rw.length = cpu_to_le16((blk_rq_bytes(req) >> ns->lba_shift) - 1);
> +	cmnd->rw.reftag = 0;
> +	cmnd->rw.apptag = 0;
> +	cmnd->rw.appmask = 0;

All these PI fields are reserved when PI isn't enabled using the control
field.  So I think we can stop clearing reftag here entirely, and move
clearing the apptag and appmask down next to the reftag assignment.

>  
> +static void nvme_clear_cmd(struct request *req)
> +{
> +	if (!(req->rq_flags & RQF_DONTPREP)) {
> +		nvme_clear_nvme_request(req);
> +		memset(nvme_req(req)->cmd, 0, sizeof(struct nvme_command));
> +	}
> +}
> +
>  blk_status_t nvme_setup_cmd(struct nvme_ns *ns, struct request *req)
>  {
>  	struct nvme_command *cmd = nvme_req(req)->cmd;
>  	struct nvme_ctrl *ctrl = nvme_req(req)->ctrl;
>  	blk_status_t ret = BLK_STS_OK;
>  
> -	if (!(req->rq_flags & RQF_DONTPREP)) {
> -		nvme_clear_nvme_request(req);
> -		memset(cmd, 0, sizeof(*cmd));
> -	}

The nvme_clear_nvme_request is not done unconditionally for the read
and write commands below, which does the wrong thing.  So I think we want
to keep it here and just move the memset.

I think the best way is to split this patch into two:

  1) remove the memset here and do it unconditionally in the individual
     nvme_setup_* handlers, and document there why the extra memsets don't
     hurt (no partial completions unlikey SCSI)
  2) optimize the read/write case

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme: don't memset() the normal read/write command
  2021-10-13  4:58 ` Christoph Hellwig
@ 2021-10-18 10:22   ` Christoph Hellwig
  2021-10-18 12:33   ` Jens Axboe
  1 sibling, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2021-10-18 10:22 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Keith Busch

Do you plan to respin and resend this one?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme: don't memset() the normal read/write command
  2021-10-13  4:58 ` Christoph Hellwig
  2021-10-18 10:22   ` Christoph Hellwig
@ 2021-10-18 12:33   ` Jens Axboe
  2021-10-18 12:37     ` Christoph Hellwig
  1 sibling, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2021-10-18 12:33 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, Keith Busch

On 10/12/21 10:58 PM, Christoph Hellwig wrote:
> I like this in general, but some comments below:
> 
>>  	cmnd->rw.opcode = op;
>> +	cmnd->rw.flags = 0;
>> +	cmnd->rw.command_id = 0;
> 
> The command_id is always set in nvme_setup_cmd, so no need to clear it
> here.

Perfect, I'll drop it.

>>  	cmnd->rw.nsid = cpu_to_le32(ns->head->ns_id);
>> +	cmnd->rw.rsvd2 = 0;
> 
> There should be no need to clear this reserved space.

Actually wasn't sure, sometimes hardware specs required reserved fields
to get cleared. nvme does not? I'd be happy to drop it if not the case.

>> +	cmnd->rw.metadata = 0;
>>  	cmnd->rw.slba = cpu_to_le64(nvme_sect_to_lba(ns, blk_rq_pos(req)));
>>  	cmnd->rw.length = cpu_to_le16((blk_rq_bytes(req) >> ns->lba_shift) - 1);
>> +	cmnd->rw.reftag = 0;
>> +	cmnd->rw.apptag = 0;
>> +	cmnd->rw.appmask = 0;
> 
> All these PI fields are reserved when PI isn't enabled using the control
> field.  So I think we can stop clearing reftag here entirely, and move
> clearing the apptag and appmask down next to the reftag assignment.

OK, will move.

>>  
>> +static void nvme_clear_cmd(struct request *req)
>> +{
>> +	if (!(req->rq_flags & RQF_DONTPREP)) {
>> +		nvme_clear_nvme_request(req);
>> +		memset(nvme_req(req)->cmd, 0, sizeof(struct nvme_command));
>> +	}
>> +}
>> +
>>  blk_status_t nvme_setup_cmd(struct nvme_ns *ns, struct request *req)
>>  {
>>  	struct nvme_command *cmd = nvme_req(req)->cmd;
>>  	struct nvme_ctrl *ctrl = nvme_req(req)->ctrl;
>>  	blk_status_t ret = BLK_STS_OK;
>>  
>> -	if (!(req->rq_flags & RQF_DONTPREP)) {
>> -		nvme_clear_nvme_request(req);
>> -		memset(cmd, 0, sizeof(*cmd));
>> -	}
> 
> The nvme_clear_nvme_request is not done unconditionally for the read
> and write commands below, which does the wrong thing.  So I think we want
> to keep it here and just move the memset.
> 
> I think the best way is to split this patch into two:
> 
>   1) remove the memset here and do it unconditionally in the individual
>      nvme_setup_* handlers, and document there why the extra memsets don't
>      hurt (no partial completions unlikey SCSI)
>   2) optimize the read/write case

OK I'll do that and resend.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme: don't memset() the normal read/write command
  2021-10-18 12:33   ` Jens Axboe
@ 2021-10-18 12:37     ` Christoph Hellwig
  2021-10-18 12:38       ` Jens Axboe
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2021-10-18 12:37 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Christoph Hellwig, linux-block, Keith Busch

On Mon, Oct 18, 2021 at 06:33:51AM -0600, Jens Axboe wrote:
> >>  	cmnd->rw.nsid = cpu_to_le32(ns->head->ns_id);
> >> +	cmnd->rw.rsvd2 = 0;
> > 
> > There should be no need to clear this reserved space.
> 
> Actually wasn't sure, sometimes hardware specs required reserved fields
> to get cleared. nvme does not? I'd be happy to drop it if not the case.

It only requires bits in otherwise unused fields to be the zeroed.

That being said, yes we're probably on the safe side by clearing
everything.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] nvme: don't memset() the normal read/write command
  2021-10-18 12:37     ` Christoph Hellwig
@ 2021-10-18 12:38       ` Jens Axboe
  0 siblings, 0 replies; 9+ messages in thread
From: Jens Axboe @ 2021-10-18 12:38 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, Keith Busch

On 10/18/21 6:37 AM, Christoph Hellwig wrote:
> On Mon, Oct 18, 2021 at 06:33:51AM -0600, Jens Axboe wrote:
>>>>  	cmnd->rw.nsid = cpu_to_le32(ns->head->ns_id);
>>>> +	cmnd->rw.rsvd2 = 0;
>>>
>>> There should be no need to clear this reserved space.
>>
>> Actually wasn't sure, sometimes hardware specs required reserved fields
>> to get cleared. nvme does not? I'd be happy to drop it if not the case.
> 
> It only requires bits in otherwise unused fields to be the zeroed.
> 
> That being said, yes we're probably on the safe side by clearing
> everything.

I'll keep the reserved clearing.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-10-18 12:39 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-12 18:13 [PATCH] nvme: don't memset() the normal read/write command Jens Axboe
2021-10-12 18:31 ` Keith Busch
2021-10-12 18:56   ` Jens Axboe
2021-10-13  4:14 ` Chaitanya Kulkarni
2021-10-13  4:58 ` Christoph Hellwig
2021-10-18 10:22   ` Christoph Hellwig
2021-10-18 12:33   ` Jens Axboe
2021-10-18 12:37     ` Christoph Hellwig
2021-10-18 12:38       ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.