All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 2/2] nvme: Don't use a stack buffer for keep-alive command
@ 2018-01-11 21:38 Roland Dreier
  2018-01-14  9:31 ` Sagi Grimberg
  0 siblings, 1 reply; 10+ messages in thread
From: Roland Dreier @ 2018-01-11 21:38 UTC (permalink / raw)


From: Roland Dreier <roland@purestorage.com>

In nvme_keep_alive() we pass a request with a pointer to an NVMe command on
the stack into blk_execute_rq_nowait().  However, the block layer doesn't
guarantee that the request is fully queued before blk_execute_rq_nowait()
returns.  If not, and the request is queued after nvme_keep_alive() returns,
then we'll end up using stack memory that might have been overwritten to
form the NVMe command we pass to hardware.

Fix this by keeping a special command struct in the nvme_ctrl struct right
next to the delayed work struct used for keep-alives.

Signed-off-by: Roland Dreier <roland at purestorage.com>
---
 drivers/nvme/host/core.c | 8 +++-----
 drivers/nvme/host/nvme.h | 1 +
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 839650e0926a..329f1dd5c421 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -748,13 +748,9 @@ static void nvme_keep_alive_end_io(struct request *rq, blk_status_t status)
 
 static int nvme_keep_alive(struct nvme_ctrl *ctrl)
 {
-	struct nvme_command c;
 	struct request *rq;
 
-	memset(&c, 0, sizeof(c));
-	c.common.opcode = nvme_admin_keep_alive;
-
-	rq = nvme_alloc_request(ctrl->admin_q, &c, BLK_MQ_REQ_RESERVED,
+	rq = nvme_alloc_request(ctrl->admin_q, &ctrl->ka_cmd, BLK_MQ_REQ_RESERVED,
 			NVME_QID_ANY);
 	if (IS_ERR(rq))
 		return PTR_ERR(rq);
@@ -786,6 +782,8 @@ void nvme_start_keep_alive(struct nvme_ctrl *ctrl)
 		return;
 
 	INIT_DELAYED_WORK(&ctrl->ka_work, nvme_keep_alive_work);
+	memset(&ctrl->ka_cmd, 0, sizeof(ctrl->ka_cmd));
+	ctrl->ka_cmd.common.opcode = nvme_admin_keep_alive;
 	schedule_delayed_work(&ctrl->ka_work, ctrl->kato * HZ);
 }
 EXPORT_SYMBOL_GPL(nvme_start_keep_alive);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index a00eabd06427..f524c69339b5 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -180,6 +180,7 @@ struct nvme_ctrl {
 	struct work_struct scan_work;
 	struct work_struct async_event_work;
 	struct delayed_work ka_work;
+	struct nvme_command ka_cmd;
 	struct work_struct fw_act_work;
 
 	/* Power saving configuration */
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] nvme: Don't use a stack buffer for keep-alive command
  2018-01-11 21:38 [PATCH 2/2] nvme: Don't use a stack buffer for keep-alive command Roland Dreier
@ 2018-01-14  9:31 ` Sagi Grimberg
  2018-01-15  8:42   ` Christoph Hellwig
  0 siblings, 1 reply; 10+ messages in thread
From: Sagi Grimberg @ 2018-01-14  9:31 UTC (permalink / raw)



> From: Roland Dreier <roland at purestorage.com>
> 
> In nvme_keep_alive() we pass a request with a pointer to an NVMe command on
> the stack into blk_execute_rq_nowait().  However, the block layer doesn't
> guarantee that the request is fully queued before blk_execute_rq_nowait()
> returns.  If not, and the request is queued after nvme_keep_alive() returns,
> then we'll end up using stack memory that might have been overwritten to
> form the NVMe command we pass to hardware.
> 
> Fix this by keeping a special command struct in the nvme_ctrl struct right
> next to the delayed work struct used for keep-alives.

Thanks Roland,

Reviewed-by: Sagi Grimberg <sagi at grimberg.me>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/2] nvme: Don't use a stack buffer for keep-alive command
  2018-01-14  9:31 ` Sagi Grimberg
@ 2018-01-15  8:42   ` Christoph Hellwig
       [not found]     ` <CAG4TOxOtFqM-RAdS_r1hsPVuru_=abDtTUmB=XqGFaanBrqbEQ@mail.gmail.com>
  2018-02-08 15:59     ` Keith Busch
  0 siblings, 2 replies; 10+ messages in thread
From: Christoph Hellwig @ 2018-01-15  8:42 UTC (permalink / raw)


On Sun, Jan 14, 2018@11:31:40AM +0200, Sagi Grimberg wrote:
> Thanks Roland,
>
> Reviewed-by: Sagi Grimberg <sagi at grimberg.me>

I think we'll need to fix this properly and embedd the struct nvme_command
into struct nvme_request.  In the end any command could get an error
without DNR, and then we'd have a stale SQE on the stack.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/2] nvme: Don't use a stack buffer for keep-alive command
       [not found]     ` <CAG4TOxOtFqM-RAdS_r1hsPVuru_=abDtTUmB=XqGFaanBrqbEQ@mail.gmail.com>
@ 2018-01-19 19:12       ` Christoph Hellwig
  0 siblings, 0 replies; 10+ messages in thread
From: Christoph Hellwig @ 2018-01-19 19:12 UTC (permalink / raw)


On Tue, Jan 16, 2018@02:46:43PM -0800, Roland Dreier wrote:
> > I think we'll need to fix this properly and embedd the struct nvme_command
> > into struct nvme_request.  In the end any command could get an error
> > without DNR, and then we'd have a stale SQE on the stack.
> 
> I don't understand.  Are there other places that submit requests with
> a pointer to stack memory?  I haven't audited everything but I don't
> know of any places that submit a command and then free it before
> getting status back.

Every caller of nvme_alloc_request (except for lightnvm) uses stack
memory, but at least the __nvme_submit_sync_cmd and
nvme_submit_user_cmd synchronously wait for the completion, so it
doesn't matter.  That leaves nvme_keep_alive, nvme_timeout and
nvme_delete_queue as problematic.

I suspect the right answer is to embedd a struct nvme_command into
struct nvme_request instead of just pointing to it.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/2] nvme: Don't use a stack buffer for keep-alive command
  2018-01-15  8:42   ` Christoph Hellwig
       [not found]     ` <CAG4TOxOtFqM-RAdS_r1hsPVuru_=abDtTUmB=XqGFaanBrqbEQ@mail.gmail.com>
@ 2018-02-08 15:59     ` Keith Busch
  2018-02-08 16:02       ` Sagi Grimberg
  1 sibling, 1 reply; 10+ messages in thread
From: Keith Busch @ 2018-02-08 15:59 UTC (permalink / raw)


On Mon, Jan 15, 2018@09:42:40AM +0100, Christoph Hellwig wrote:
> On Sun, Jan 14, 2018@11:31:40AM +0200, Sagi Grimberg wrote:
> > Thanks Roland,
> >
> > Reviewed-by: Sagi Grimberg <sagi at grimberg.me>
> 
> I think we'll need to fix this properly and embedd the struct nvme_command
> into struct nvme_request.  In the end any command could get an error
> without DNR, and then we'd have a stale SQE on the stack.

We needn't worry about the DNR case since driver allocated commands are
flagged "REQ_FAILFAST_DRIVER" and are never retried.

Allocating the full 64-byte NVMe command for each request seems a bit
excessive. Since it only really applies to async driver commands, and
there are so few of those as it is, I'm okay with having special cases
for these as Roland suggests. Sound okay?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/2] nvme: Don't use a stack buffer for keep-alive command
  2018-02-08 15:59     ` Keith Busch
@ 2018-02-08 16:02       ` Sagi Grimberg
  2018-02-08 16:16         ` Keith Busch
  2018-02-08 16:26         ` Keith Busch
  0 siblings, 2 replies; 10+ messages in thread
From: Sagi Grimberg @ 2018-02-08 16:02 UTC (permalink / raw)



> We needn't worry about the DNR case since driver allocated commands are
> flagged "REQ_FAILFAST_DRIVER" and are never retried.

Yea.

> Allocating the full 64-byte NVMe command for each request seems a bit
> excessive.

Why?

> Since it only really applies to async driver commands, and
> there are so few of those as it is, I'm okay with having special cases
> for these as Roland suggests. Sound okay?

I'm fine with fixing this locally, btw isn't this exact same stale sqe
happen in pci's nvme_delete_queue?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/2] nvme: Don't use a stack buffer for keep-alive command
  2018-02-08 16:02       ` Sagi Grimberg
@ 2018-02-08 16:16         ` Keith Busch
  2018-02-08 16:26         ` Keith Busch
  1 sibling, 0 replies; 10+ messages in thread
From: Keith Busch @ 2018-02-08 16:16 UTC (permalink / raw)


On Thu, Feb 08, 2018@06:02:04PM +0200, Sagi Grimberg wrote:
> > Allocating the full 64-byte NVMe command for each request seems a bit
> > excessive.
> 
> Why?

That's 64B per request that would rarely ever get used. If you've enough
drives with high queue counts, this can add up to quite a lot of memory.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/2] nvme: Don't use a stack buffer for keep-alive command
  2018-02-08 16:02       ` Sagi Grimberg
  2018-02-08 16:16         ` Keith Busch
@ 2018-02-08 16:26         ` Keith Busch
  2018-02-12 19:39           ` Sagi Grimberg
  1 sibling, 1 reply; 10+ messages in thread
From: Keith Busch @ 2018-02-08 16:26 UTC (permalink / raw)


On Thu, Feb 08, 2018@06:02:04PM +0200, Sagi Grimberg wrote:
> I'm fine with fixing this locally, btw isn't this exact same stale sqe
> happen in pci's nvme_delete_queue?

Yeah, this is the other case. Fortunately that never gets called on a
quiesced admin queue, so we should be 'ok', but moving the command from
the stack into struct nvme_queue would be the right thing to do.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/2] nvme: Don't use a stack buffer for keep-alive command
  2018-02-08 16:26         ` Keith Busch
@ 2018-02-12 19:39           ` Sagi Grimberg
  2018-02-12 20:07             ` Keith Busch
  0 siblings, 1 reply; 10+ messages in thread
From: Sagi Grimberg @ 2018-02-12 19:39 UTC (permalink / raw)



>> I'm fine with fixing this locally, btw isn't this exact same stale sqe
>> happen in pci's nvme_delete_queue?
> 
> Yeah, this is the other case. Fortunately that never gets called on a
> quiesced admin queue, so we should be 'ok', but moving the command from
> the stack into struct nvme_queue would be the right thing to do.

So are you fine with picking Roland's patch to 4.16-rc?

Christoph?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/2] nvme: Don't use a stack buffer for keep-alive command
  2018-02-12 19:39           ` Sagi Grimberg
@ 2018-02-12 20:07             ` Keith Busch
  0 siblings, 0 replies; 10+ messages in thread
From: Keith Busch @ 2018-02-12 20:07 UTC (permalink / raw)


On Mon, Feb 12, 2018@09:39:52PM +0200, Sagi Grimberg wrote:
> > > I'm fine with fixing this locally, btw isn't this exact same stale sqe
> > > happen in pci's nvme_delete_queue?
> > 
> > Yeah, this is the other case. Fortunately that never gets called on a
> > quiesced admin queue, so we should be 'ok', but moving the command from
> > the stack into struct nvme_queue would be the right thing to do.
> 
> So are you fine with picking Roland's patch to 4.16-rc?

I am okay with it. If we need to change this interface in the long term,
this is looks like an okay stop-gap until then.
 
> Christoph?

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-02-12 20:07 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-11 21:38 [PATCH 2/2] nvme: Don't use a stack buffer for keep-alive command Roland Dreier
2018-01-14  9:31 ` Sagi Grimberg
2018-01-15  8:42   ` Christoph Hellwig
     [not found]     ` <CAG4TOxOtFqM-RAdS_r1hsPVuru_=abDtTUmB=XqGFaanBrqbEQ@mail.gmail.com>
2018-01-19 19:12       ` Christoph Hellwig
2018-02-08 15:59     ` Keith Busch
2018-02-08 16:02       ` Sagi Grimberg
2018-02-08 16:16         ` Keith Busch
2018-02-08 16:26         ` Keith Busch
2018-02-12 19:39           ` Sagi Grimberg
2018-02-12 20:07             ` Keith Busch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.