* [PATCH] nvme: don't memset() the normal read/write command
@ 2021-10-12 18:13 Jens Axboe
2021-10-12 18:31 ` Keith Busch
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Jens Axboe @ 2021-10-12 18:13 UTC (permalink / raw)
To: linux-block, Christoph Hellwig, Keith Busch
This memset in the fast path costs a lot of cycles on my setup. Here's a
top-of-profile of doing ~6.7M IOPS:
+ 5.90% io_uring [nvme] [k] nvme_queue_rq
+ 5.32% io_uring [nvme_core] [k] nvme_setup_cmd
+ 5.17% io_uring [kernel.vmlinux] [k] io_submit_sqes
+ 4.97% io_uring [kernel.vmlinux] [k] blkdev_direct_IO
and a perf diff with this patch:
0.92% +4.40% [nvme_core] [k] nvme_setup_cmd
reducing it from 5.3% to only 0.9%. This takes it from the 2nd most
cycle consumer to something that's mostly irrelevant.
Retain the full clear for the other commands to avoid doing any audits
there, and just clear the fields in the rw command manually that we
don't already fill.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index ec7fa6f31e68..c1b19fd69503 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -854,9 +854,16 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns,
dsmgmt |= NVME_RW_DSM_FREQ_PREFETCH;
cmnd->rw.opcode = op;
+ cmnd->rw.flags = 0;
+ cmnd->rw.command_id = 0;
cmnd->rw.nsid = cpu_to_le32(ns->head->ns_id);
+ cmnd->rw.rsvd2 = 0;
+ cmnd->rw.metadata = 0;
cmnd->rw.slba = cpu_to_le64(nvme_sect_to_lba(ns, blk_rq_pos(req)));
cmnd->rw.length = cpu_to_le16((blk_rq_bytes(req) >> ns->lba_shift) - 1);
+ cmnd->rw.reftag = 0;
+ cmnd->rw.apptag = 0;
+ cmnd->rw.appmask = 0;
if (req_op(req) == REQ_OP_WRITE && ctrl->nr_streams)
nvme_assign_write_stream(ctrl, req, &control, &dsmgmt);
@@ -907,51 +914,64 @@ void nvme_cleanup_cmd(struct request *req)
}
EXPORT_SYMBOL_GPL(nvme_cleanup_cmd);
+static void nvme_clear_cmd(struct request *req)
+{
+ if (!(req->rq_flags & RQF_DONTPREP)) {
+ nvme_clear_nvme_request(req);
+ memset(nvme_req(req)->cmd, 0, sizeof(struct nvme_command));
+ }
+}
+
blk_status_t nvme_setup_cmd(struct nvme_ns *ns, struct request *req)
{
struct nvme_command *cmd = nvme_req(req)->cmd;
struct nvme_ctrl *ctrl = nvme_req(req)->ctrl;
blk_status_t ret = BLK_STS_OK;
- if (!(req->rq_flags & RQF_DONTPREP)) {
- nvme_clear_nvme_request(req);
- memset(cmd, 0, sizeof(*cmd));
- }
-
switch (req_op(req)) {
case REQ_OP_DRV_IN:
case REQ_OP_DRV_OUT:
/* these are setup prior to execution in nvme_init_request() */
break;
case REQ_OP_FLUSH:
+ nvme_clear_cmd(req);
nvme_setup_flush(ns, cmd);
break;
case REQ_OP_ZONE_RESET_ALL:
case REQ_OP_ZONE_RESET:
+ nvme_clear_cmd(req);
ret = nvme_setup_zone_mgmt_send(ns, req, cmd, NVME_ZONE_RESET);
break;
case REQ_OP_ZONE_OPEN:
+ nvme_clear_cmd(req);
ret = nvme_setup_zone_mgmt_send(ns, req, cmd, NVME_ZONE_OPEN);
break;
case REQ_OP_ZONE_CLOSE:
+ nvme_clear_cmd(req);
ret = nvme_setup_zone_mgmt_send(ns, req, cmd, NVME_ZONE_CLOSE);
break;
case REQ_OP_ZONE_FINISH:
+ nvme_clear_cmd(req);
ret = nvme_setup_zone_mgmt_send(ns, req, cmd, NVME_ZONE_FINISH);
break;
case REQ_OP_WRITE_ZEROES:
+ nvme_clear_cmd(req);
ret = nvme_setup_write_zeroes(ns, req, cmd);
break;
case REQ_OP_DISCARD:
+ nvme_clear_cmd(req);
ret = nvme_setup_discard(ns, req, cmd);
break;
case REQ_OP_READ:
+ nvme_clear_nvme_request(req);
ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_read);
break;
case REQ_OP_WRITE:
+ nvme_clear_nvme_request(req);
ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_write);
break;
case REQ_OP_ZONE_APPEND:
+ nvme_clear_nvme_request(req);
ret = nvme_setup_rw(ns, req, cmd, nvme_cmd_zone_append);
break;
default:
--
Jens Axboe
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] nvme: don't memset() the normal read/write command
2021-10-12 18:13 [PATCH] nvme: don't memset() the normal read/write command Jens Axboe
@ 2021-10-12 18:31 ` Keith Busch
2021-10-12 18:56 ` Jens Axboe
2021-10-13 4:14 ` Chaitanya Kulkarni
2021-10-13 4:58 ` Christoph Hellwig
2 siblings, 1 reply; 9+ messages in thread
From: Keith Busch @ 2021-10-12 18:31 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-block, Christoph Hellwig
On Tue, Oct 12, 2021 at 12:13:52PM -0600, Jens Axboe wrote:
> This memset in the fast path costs a lot of cycles on my setup. Here's a
> top-of-profile of doing ~6.7M IOPS:
>
> + 5.90% io_uring [nvme] [k] nvme_queue_rq
> + 5.32% io_uring [nvme_core] [k] nvme_setup_cmd
> + 5.17% io_uring [kernel.vmlinux] [k] io_submit_sqes
> + 4.97% io_uring [kernel.vmlinux] [k] blkdev_direct_IO
>
> and a perf diff with this patch:
>
> 0.92% +4.40% [nvme_core] [k] nvme_setup_cmd
>
> reducing it from 5.3% to only 0.9%. This takes it from the 2nd most
> cycle consumer to something that's mostly irrelevant.
>
> Retain the full clear for the other commands to avoid doing any audits
> there, and just clear the fields in the rw command manually that we
> don't already fill.
Oo, we knew about this optimization *years* ago, yet didn't do anything
about it! Better late than never.
http://lists.infradead.org/pipermail/linux-nvme/2014-May/000837.html
Acked-by: Keith Busch <kbusch@kernel.org>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] nvme: don't memset() the normal read/write command
2021-10-12 18:31 ` Keith Busch
@ 2021-10-12 18:56 ` Jens Axboe
0 siblings, 0 replies; 9+ messages in thread
From: Jens Axboe @ 2021-10-12 18:56 UTC (permalink / raw)
To: Keith Busch; +Cc: linux-block, Christoph Hellwig
On 10/12/21 12:31 PM, Keith Busch wrote:
> On Tue, Oct 12, 2021 at 12:13:52PM -0600, Jens Axboe wrote:
>> This memset in the fast path costs a lot of cycles on my setup. Here's a
>> top-of-profile of doing ~6.7M IOPS:
>>
>> + 5.90% io_uring [nvme] [k] nvme_queue_rq
>> + 5.32% io_uring [nvme_core] [k] nvme_setup_cmd
>> + 5.17% io_uring [kernel.vmlinux] [k] io_submit_sqes
>> + 4.97% io_uring [kernel.vmlinux] [k] blkdev_direct_IO
>>
>> and a perf diff with this patch:
>>
>> 0.92% +4.40% [nvme_core] [k] nvme_setup_cmd
>>
>> reducing it from 5.3% to only 0.9%. This takes it from the 2nd most
>> cycle consumer to something that's mostly irrelevant.
>>
>> Retain the full clear for the other commands to avoid doing any audits
>> there, and just clear the fields in the rw command manually that we
>> don't already fill.
>
> Oo, we knew about this optimization *years* ago, yet didn't do anything
> about it! Better late than never.
>
> http://lists.infradead.org/pipermail/linux-nvme/2014-May/000837.html
Dang, 2014! Better late than never...
> Acked-by: Keith Busch <kbusch@kernel.org>
Added, thanks.
--
Jens Axboe
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] nvme: don't memset() the normal read/write command
2021-10-12 18:13 [PATCH] nvme: don't memset() the normal read/write command Jens Axboe
2021-10-12 18:31 ` Keith Busch
@ 2021-10-13 4:14 ` Chaitanya Kulkarni
2021-10-13 4:58 ` Christoph Hellwig
2 siblings, 0 replies; 9+ messages in thread
From: Chaitanya Kulkarni @ 2021-10-13 4:14 UTC (permalink / raw)
To: Jens Axboe, linux-block, Christoph Hellwig, Keith Busch
On 10/12/2021 11:13 AM, Jens Axboe wrote:
> External email: Use caution opening links or attachments
>
>
> This memset in the fast path costs a lot of cycles on my setup. Here's a
> top-of-profile of doing ~6.7M IOPS:
>
> + 5.90% io_uring [nvme] [k] nvme_queue_rq
> + 5.32% io_uring [nvme_core] [k] nvme_setup_cmd
> + 5.17% io_uring [kernel.vmlinux] [k] io_submit_sqes
> + 4.97% io_uring [kernel.vmlinux] [k] blkdev_direct_IO
>
> and a perf diff with this patch:
>
> 0.92% +4.40% [nvme_core] [k] nvme_setup_cmd
>
> reducing it from 5.3% to only 0.9%. This takes it from the 2nd most
> cycle consumer to something that's mostly irrelevant.
>
> Retain the full clear for the other commands to avoid doing any audits
> there, and just clear the fields in the rw command manually that we
> don't already fill.
>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>
> ---
>
Looks good to me.
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] nvme: don't memset() the normal read/write command
2021-10-12 18:13 [PATCH] nvme: don't memset() the normal read/write command Jens Axboe
2021-10-12 18:31 ` Keith Busch
2021-10-13 4:14 ` Chaitanya Kulkarni
@ 2021-10-13 4:58 ` Christoph Hellwig
2021-10-18 10:22 ` Christoph Hellwig
2021-10-18 12:33 ` Jens Axboe
2 siblings, 2 replies; 9+ messages in thread
From: Christoph Hellwig @ 2021-10-13 4:58 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Keith Busch
I like this in general, but some comments below:
> cmnd->rw.opcode = op;
> + cmnd->rw.flags = 0;
> + cmnd->rw.command_id = 0;
The command_id is always set in nvme_setup_cmd, so no need to clear it
here.
> cmnd->rw.nsid = cpu_to_le32(ns->head->ns_id);
> + cmnd->rw.rsvd2 = 0;
There should be no need to clear this reserved space.
> + cmnd->rw.metadata = 0;
> cmnd->rw.slba = cpu_to_le64(nvme_sect_to_lba(ns, blk_rq_pos(req)));
> cmnd->rw.length = cpu_to_le16((blk_rq_bytes(req) >> ns->lba_shift) - 1);
> + cmnd->rw.reftag = 0;
> + cmnd->rw.apptag = 0;
> + cmnd->rw.appmask = 0;
All these PI fields are reserved when PI isn't enabled using the control
field. So I think we can stop clearing reftag here entirely, and move
clearing the apptag and appmask down next to the reftag assignment.
>
> +static void nvme_clear_cmd(struct request *req)
> +{
> + if (!(req->rq_flags & RQF_DONTPREP)) {
> + nvme_clear_nvme_request(req);
> + memset(nvme_req(req)->cmd, 0, sizeof(struct nvme_command));
> + }
> +}
> +
> blk_status_t nvme_setup_cmd(struct nvme_ns *ns, struct request *req)
> {
> struct nvme_command *cmd = nvme_req(req)->cmd;
> struct nvme_ctrl *ctrl = nvme_req(req)->ctrl;
> blk_status_t ret = BLK_STS_OK;
>
> - if (!(req->rq_flags & RQF_DONTPREP)) {
> - nvme_clear_nvme_request(req);
> - memset(cmd, 0, sizeof(*cmd));
> - }
The nvme_clear_nvme_request is not done unconditionally for the read
and write commands below, which does the wrong thing. So I think we want
to keep it here and just move the memset.
I think the best way is to split this patch into two:
1) remove the memset here and do it unconditionally in the individual
nvme_setup_* handlers, and document there why the extra memsets don't
hurt (no partial completions unlikey SCSI)
2) optimize the read/write case
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] nvme: don't memset() the normal read/write command
2021-10-13 4:58 ` Christoph Hellwig
@ 2021-10-18 10:22 ` Christoph Hellwig
2021-10-18 12:33 ` Jens Axboe
1 sibling, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2021-10-18 10:22 UTC (permalink / raw)
To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Keith Busch
Do you plan to respin and resend this one?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] nvme: don't memset() the normal read/write command
2021-10-13 4:58 ` Christoph Hellwig
2021-10-18 10:22 ` Christoph Hellwig
@ 2021-10-18 12:33 ` Jens Axboe
2021-10-18 12:37 ` Christoph Hellwig
1 sibling, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2021-10-18 12:33 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-block, Keith Busch
On 10/12/21 10:58 PM, Christoph Hellwig wrote:
> I like this in general, but some comments below:
>
>> cmnd->rw.opcode = op;
>> + cmnd->rw.flags = 0;
>> + cmnd->rw.command_id = 0;
>
> The command_id is always set in nvme_setup_cmd, so no need to clear it
> here.
Perfect, I'll drop it.
>> cmnd->rw.nsid = cpu_to_le32(ns->head->ns_id);
>> + cmnd->rw.rsvd2 = 0;
>
> There should be no need to clear this reserved space.
Actually wasn't sure, sometimes hardware specs required reserved fields
to get cleared. nvme does not? I'd be happy to drop it if not the case.
>> + cmnd->rw.metadata = 0;
>> cmnd->rw.slba = cpu_to_le64(nvme_sect_to_lba(ns, blk_rq_pos(req)));
>> cmnd->rw.length = cpu_to_le16((blk_rq_bytes(req) >> ns->lba_shift) - 1);
>> + cmnd->rw.reftag = 0;
>> + cmnd->rw.apptag = 0;
>> + cmnd->rw.appmask = 0;
>
> All these PI fields are reserved when PI isn't enabled using the control
> field. So I think we can stop clearing reftag here entirely, and move
> clearing the apptag and appmask down next to the reftag assignment.
OK, will move.
>>
>> +static void nvme_clear_cmd(struct request *req)
>> +{
>> + if (!(req->rq_flags & RQF_DONTPREP)) {
>> + nvme_clear_nvme_request(req);
>> + memset(nvme_req(req)->cmd, 0, sizeof(struct nvme_command));
>> + }
>> +}
>> +
>> blk_status_t nvme_setup_cmd(struct nvme_ns *ns, struct request *req)
>> {
>> struct nvme_command *cmd = nvme_req(req)->cmd;
>> struct nvme_ctrl *ctrl = nvme_req(req)->ctrl;
>> blk_status_t ret = BLK_STS_OK;
>>
>> - if (!(req->rq_flags & RQF_DONTPREP)) {
>> - nvme_clear_nvme_request(req);
>> - memset(cmd, 0, sizeof(*cmd));
>> - }
>
> The nvme_clear_nvme_request is not done unconditionally for the read
> and write commands below, which does the wrong thing. So I think we want
> to keep it here and just move the memset.
>
> I think the best way is to split this patch into two:
>
> 1) remove the memset here and do it unconditionally in the individual
> nvme_setup_* handlers, and document there why the extra memsets don't
> hurt (no partial completions unlikey SCSI)
> 2) optimize the read/write case
OK I'll do that and resend.
--
Jens Axboe
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] nvme: don't memset() the normal read/write command
2021-10-18 12:33 ` Jens Axboe
@ 2021-10-18 12:37 ` Christoph Hellwig
2021-10-18 12:38 ` Jens Axboe
0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2021-10-18 12:37 UTC (permalink / raw)
To: Jens Axboe; +Cc: Christoph Hellwig, linux-block, Keith Busch
On Mon, Oct 18, 2021 at 06:33:51AM -0600, Jens Axboe wrote:
> >> cmnd->rw.nsid = cpu_to_le32(ns->head->ns_id);
> >> + cmnd->rw.rsvd2 = 0;
> >
> > There should be no need to clear this reserved space.
>
> Actually wasn't sure, sometimes hardware specs required reserved fields
> to get cleared. nvme does not? I'd be happy to drop it if not the case.
It only requires bits in otherwise unused fields to be the zeroed.
That being said, yes we're probably on the safe side by clearing
everything.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] nvme: don't memset() the normal read/write command
2021-10-18 12:37 ` Christoph Hellwig
@ 2021-10-18 12:38 ` Jens Axboe
0 siblings, 0 replies; 9+ messages in thread
From: Jens Axboe @ 2021-10-18 12:38 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-block, Keith Busch
On 10/18/21 6:37 AM, Christoph Hellwig wrote:
> On Mon, Oct 18, 2021 at 06:33:51AM -0600, Jens Axboe wrote:
>>>> cmnd->rw.nsid = cpu_to_le32(ns->head->ns_id);
>>>> + cmnd->rw.rsvd2 = 0;
>>>
>>> There should be no need to clear this reserved space.
>>
>> Actually wasn't sure, sometimes hardware specs required reserved fields
>> to get cleared. nvme does not? I'd be happy to drop it if not the case.
>
> It only requires bits in otherwise unused fields to be the zeroed.
>
> That being said, yes we're probably on the safe side by clearing
> everything.
I'll keep the reserved clearing.
--
Jens Axboe
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2021-10-18 12:39 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-12 18:13 [PATCH] nvme: don't memset() the normal read/write command Jens Axboe
2021-10-12 18:31 ` Keith Busch
2021-10-12 18:56 ` Jens Axboe
2021-10-13 4:14 ` Chaitanya Kulkarni
2021-10-13 4:58 ` Christoph Hellwig
2021-10-18 10:22 ` Christoph Hellwig
2021-10-18 12:33 ` Jens Axboe
2021-10-18 12:37 ` Christoph Hellwig
2021-10-18 12:38 ` Jens Axboe
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.