linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] block/scsi/dm-rq: fix leak of request private data in dm-mpath
@ 2019-07-18  3:25 Ming Lei
  2019-07-18  3:25 ` [PATCH 1/2] blk-mq: add callback of .cleanup_rq Ming Lei
  2019-07-18  3:25 ` [PATCH 2/2] scsi: implement .cleanup_rq callback Ming Lei
  0 siblings, 2 replies; 6+ messages in thread
From: Ming Lei @ 2019-07-18  3:25 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, James E . J . Bottomley, Martin K . Petersen,
	linux-scsi, Ming Lei, Ewan D . Milne, Bart Van Assche,
	Hannes Reinecke, Christoph Hellwig, Mike Snitzer, dm-devel,
	stable

Hi,

When one request is dispatched to LLD via dm-rq, if the result is
BLK_STS_*RESOURCE, dm-rq will free the request. However, LLD may allocate
private stuff for this request, so this way will cause memory leak.

Add .cleanup_rq() callback and implement it in SCSI for fixing the issue.
And SCSI is the only driver which allocates private stuff in .queue_rq()
path.

Another use case of this callback is to free the request and re-submit
bios during cpu hotplug when the hctx is dead, see the following link:

https://lore.kernel.org/linux-block/f122e8f2-5ede-2d83-9ca0-bc713ce66d01@huawei.com/T/#t

Ming Lei (2):
  blk-mq: add callback of .cleanup_rq
  scsi: implement .cleanup_rq callback

 drivers/md/dm-rq.c      |  1 +
 drivers/scsi/scsi_lib.c | 15 +++++++++++++++
 include/linux/blk-mq.h  | 13 +++++++++++++
 3 files changed, 29 insertions(+)

Cc: Ewan D. Milne <emilne@redhat.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@redhat.com
Cc: <stable@vger.kernel.org>
Fixes: 396eaf21ee17 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback")
-- 
2.20.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] blk-mq: add callback of .cleanup_rq
  2019-07-18  3:25 [PATCH 0/2] block/scsi/dm-rq: fix leak of request private data in dm-mpath Ming Lei
@ 2019-07-18  3:25 ` Ming Lei
  2019-07-18 14:52   ` Mike Snitzer
  2019-07-18  3:25 ` [PATCH 2/2] scsi: implement .cleanup_rq callback Ming Lei
  1 sibling, 1 reply; 6+ messages in thread
From: Ming Lei @ 2019-07-18  3:25 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, James E . J . Bottomley, Martin K . Petersen,
	linux-scsi, Ming Lei, Ewan D . Milne, Bart Van Assche,
	Hannes Reinecke, Christoph Hellwig, Mike Snitzer, dm-devel,
	stable

dm-rq needs to free request which has been dispatched and not completed
by underlying queue. However, the underlying queue may have allocated
private stuff for this request in .queue_rq(), so dm-rq will leak the
request private part.

Add one new callback of .cleanup_rq() to fix the memory leak issue.

Another use case is to free request when the hctx is dead during
cpu hotplug context.

Cc: Ewan D. Milne <emilne@redhat.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@redhat.com
Cc: <stable@vger.kernel.org>
Fixes: 396eaf21ee17 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/md/dm-rq.c     |  1 +
 include/linux/blk-mq.h | 13 +++++++++++++
 2 files changed, 14 insertions(+)

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index c9e44ac1f9a6..21d5c1784d0c 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -408,6 +408,7 @@ static int map_request(struct dm_rq_target_io *tio)
 		ret = dm_dispatch_clone_request(clone, rq);
 		if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) {
 			blk_rq_unprep_clone(clone);
+			blk_mq_cleanup_rq(clone);
 			tio->ti->type->release_clone_rq(clone, &tio->info);
 			tio->clone = NULL;
 			return DM_MAPIO_REQUEUE;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 3fa1fa59f9b2..8a7808be5d0b 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -140,6 +140,7 @@ typedef int (poll_fn)(struct blk_mq_hw_ctx *);
 typedef int (map_queues_fn)(struct blk_mq_tag_set *set);
 typedef bool (busy_fn)(struct request_queue *);
 typedef void (complete_fn)(struct request *);
+typedef void (cleanup_rq_fn)(struct request *);
 
 
 struct blk_mq_ops {
@@ -200,6 +201,12 @@ struct blk_mq_ops {
 	/* Called from inside blk_get_request() */
 	void (*initialize_rq_fn)(struct request *rq);
 
+	/*
+	 * Called before freeing one request which isn't completed yet,
+	 * and usually for freeing the driver private part
+	 */
+	cleanup_rq_fn		*cleanup_rq;
+
 	/*
 	 * If set, returns whether or not this queue currently is busy
 	 */
@@ -366,4 +373,10 @@ static inline blk_qc_t request_to_qc_t(struct blk_mq_hw_ctx *hctx,
 			BLK_QC_T_INTERNAL;
 }
 
+static inline void blk_mq_cleanup_rq(struct request *rq)
+{
+	if (rq->q->mq_ops->cleanup_rq)
+		rq->q->mq_ops->cleanup_rq(rq);
+}
+
 #endif
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] scsi: implement .cleanup_rq callback
  2019-07-18  3:25 [PATCH 0/2] block/scsi/dm-rq: fix leak of request private data in dm-mpath Ming Lei
  2019-07-18  3:25 ` [PATCH 1/2] blk-mq: add callback of .cleanup_rq Ming Lei
@ 2019-07-18  3:25 ` Ming Lei
  1 sibling, 0 replies; 6+ messages in thread
From: Ming Lei @ 2019-07-18  3:25 UTC (permalink / raw)
  To: Jens Axboe
  Cc: linux-block, James E . J . Bottomley, Martin K . Petersen,
	linux-scsi, Ming Lei, Ewan D . Milne, Bart Van Assche,
	Hannes Reinecke, Christoph Hellwig, Mike Snitzer, dm-devel,
	stable

Implement .cleanup_rq() callback for freeing driver private part
of the request. Then we can avoid to leak this part if the request isn't
completed by SCSI, and freed by blk-mq or upper layer(such as dm-rq) finally.

Cc: Ewan D. Milne <emilne@redhat.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@redhat.com
Cc: <stable@vger.kernel.org>
Fixes: 396eaf21ee17 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/scsi_lib.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index e1da8c70a266..59eee4605cda 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1089,6 +1089,20 @@ static void scsi_initialize_rq(struct request *rq)
 	cmd->retries = 0;
 }
 
+/*
+ * Only called when the request isn't completed by SCSI, and not freed by
+ * SCSI
+ */
+static void scsi_cleanup_rq(struct request *rq)
+{
+	struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(rq);
+
+	if (rq->rq_flags & RQF_DONTPREP) {
+		scsi_mq_uninit_cmd(cmd);
+		rq->rq_flags &= ~RQF_DONTPREP;
+	}
+}
+
 /* Add a command to the list used by the aacraid and dpt_i2o drivers */
 void scsi_add_cmd_to_list(struct scsi_cmnd *cmd)
 {
@@ -1816,6 +1830,7 @@ static const struct blk_mq_ops scsi_mq_ops = {
 	.init_request	= scsi_mq_init_request,
 	.exit_request	= scsi_mq_exit_request,
 	.initialize_rq_fn = scsi_initialize_rq,
+	.cleanup_rq	= scsi_cleanup_rq,
 	.busy		= scsi_mq_lld_busy,
 	.map_queues	= scsi_map_queues,
 };
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] blk-mq: add callback of .cleanup_rq
  2019-07-18  3:25 ` [PATCH 1/2] blk-mq: add callback of .cleanup_rq Ming Lei
@ 2019-07-18 14:52   ` Mike Snitzer
  2019-07-19  1:35     ` Ming Lei
  0 siblings, 1 reply; 6+ messages in thread
From: Mike Snitzer @ 2019-07-18 14:52 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, James E . J . Bottomley,
	Martin K . Petersen, linux-scsi, Ewan D . Milne, Bart Van Assche,
	Hannes Reinecke, Christoph Hellwig, dm-devel, stable

On Wed, Jul 17 2019 at 11:25pm -0400,
Ming Lei <ming.lei@redhat.com> wrote:

> dm-rq needs to free request which has been dispatched and not completed
> by underlying queue. However, the underlying queue may have allocated
> private stuff for this request in .queue_rq(), so dm-rq will leak the
> request private part.

No, SCSI (and blk-mq) will leak.  DM doesn't know anything about the
internal memory SCSI uses.  That memory is a SCSI implementation detail.

Please fix header to properly reflect which layer is doing the leaking.

> Add one new callback of .cleanup_rq() to fix the memory leak issue.
> 
> Another use case is to free request when the hctx is dead during
> cpu hotplug context.
> 
> Cc: Ewan D. Milne <emilne@redhat.com>
> Cc: Bart Van Assche <bvanassche@acm.org>
> Cc: Hannes Reinecke <hare@suse.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Mike Snitzer <snitzer@redhat.com>
> Cc: dm-devel@redhat.com
> Cc: <stable@vger.kernel.org>
> Fixes: 396eaf21ee17 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback")
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/md/dm-rq.c     |  1 +
>  include/linux/blk-mq.h | 13 +++++++++++++
>  2 files changed, 14 insertions(+)
> 
> diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
> index c9e44ac1f9a6..21d5c1784d0c 100644
> --- a/drivers/md/dm-rq.c
> +++ b/drivers/md/dm-rq.c
> @@ -408,6 +408,7 @@ static int map_request(struct dm_rq_target_io *tio)
>  		ret = dm_dispatch_clone_request(clone, rq);
>  		if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) {
>  			blk_rq_unprep_clone(clone);
> +			blk_mq_cleanup_rq(clone);
>  			tio->ti->type->release_clone_rq(clone, &tio->info);
>  			tio->clone = NULL;
>  			return DM_MAPIO_REQUEUE;

Requiring upper layer driver (dm-rq) to explicitly call blk_mq_cleanup_rq() 
seems wrong.  In this instance tio->ti->type->release_clone_rq()
(dm-mpath's multipath_release_clone) calls blk_put_request().  Why can't
blk_put_request(), or blk_mq_free_request(), call blk_mq_cleanup_rq()?

Not looked at the cpu hotplug case you mention, but my naive thought is
it'd be pretty weird to also sprinkle a call to blk_mq_cleanup_rq() from
that specific "dead hctx" code path.

Mike

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] blk-mq: add callback of .cleanup_rq
  2019-07-18 14:52   ` Mike Snitzer
@ 2019-07-19  1:35     ` Ming Lei
  2019-07-19 12:26       ` Mike Snitzer
  0 siblings, 1 reply; 6+ messages in thread
From: Ming Lei @ 2019-07-19  1:35 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Jens Axboe, linux-block, James E . J . Bottomley,
	Martin K . Petersen, linux-scsi, Ewan D . Milne, Bart Van Assche,
	Hannes Reinecke, Christoph Hellwig, dm-devel, stable

On Thu, Jul 18, 2019 at 10:52:01AM -0400, Mike Snitzer wrote:
> On Wed, Jul 17 2019 at 11:25pm -0400,
> Ming Lei <ming.lei@redhat.com> wrote:
> 
> > dm-rq needs to free request which has been dispatched and not completed
> > by underlying queue. However, the underlying queue may have allocated
> > private stuff for this request in .queue_rq(), so dm-rq will leak the
> > request private part.
> 
> No, SCSI (and blk-mq) will leak.  DM doesn't know anything about the
> internal memory SCSI uses.  That memory is a SCSI implementation detail.

It isn't noting to do with dm-rq, which frees one request after BLK_STS_*RESOURCE
is returned from blk_insert_cloned_request(), in this case it has to be
the user for releasing the request private data.

> 
> Please fix header to properly reflect which layer is doing the leaking.

Fine.

> 
> > Add one new callback of .cleanup_rq() to fix the memory leak issue.
> > 
> > Another use case is to free request when the hctx is dead during
> > cpu hotplug context.
> > 
> > Cc: Ewan D. Milne <emilne@redhat.com>
> > Cc: Bart Van Assche <bvanassche@acm.org>
> > Cc: Hannes Reinecke <hare@suse.com>
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: Mike Snitzer <snitzer@redhat.com>
> > Cc: dm-devel@redhat.com
> > Cc: <stable@vger.kernel.org>
> > Fixes: 396eaf21ee17 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback")
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  drivers/md/dm-rq.c     |  1 +
> >  include/linux/blk-mq.h | 13 +++++++++++++
> >  2 files changed, 14 insertions(+)
> > 
> > diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
> > index c9e44ac1f9a6..21d5c1784d0c 100644
> > --- a/drivers/md/dm-rq.c
> > +++ b/drivers/md/dm-rq.c
> > @@ -408,6 +408,7 @@ static int map_request(struct dm_rq_target_io *tio)
> >  		ret = dm_dispatch_clone_request(clone, rq);
> >  		if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) {
> >  			blk_rq_unprep_clone(clone);
> > +			blk_mq_cleanup_rq(clone);
> >  			tio->ti->type->release_clone_rq(clone, &tio->info);
> >  			tio->clone = NULL;
> >  			return DM_MAPIO_REQUEUE;
> 
> Requiring upper layer driver (dm-rq) to explicitly call blk_mq_cleanup_rq() 
> seems wrong.  In this instance tio->ti->type->release_clone_rq()
> (dm-mpath's multipath_release_clone) calls blk_put_request().  Why can't
> blk_put_request(), or blk_mq_free_request(), call blk_mq_cleanup_rq()?

I did think about doing it in blk_put_request(), and I just want to
avoid the little cost in generic fast path, given freeing request after
dispatch is very unusual, so far only nvme multipath and dm-rq did in
that way.

However, if no one objects to move blk_mq_cleanup_rq() to blk_put_request()
or blk_mq_free_request(), I am fine to do that in V2.

> 
> Not looked at the cpu hotplug case you mention, but my naive thought is
> it'd be pretty weird to also sprinkle a call to blk_mq_cleanup_rq() from
> that specific "dead hctx" code path.

It isn't weird, and it is exactly what NVMe multipath is doing, please see
nvme_failover_req(). And it is just that nvme doesn't allocate request
private data.

Wrt. blk-mq cpu hotplug handling: after one hctx is dead, we can't dispatch
request to this hctx any more, however one request has been bounded to its
hctx since its allocation and the association can't(or quite hard to) be
changed any more, do you have any better idea to deal with this issue?


Thanks,
Ming

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] blk-mq: add callback of .cleanup_rq
  2019-07-19  1:35     ` Ming Lei
@ 2019-07-19 12:26       ` Mike Snitzer
  0 siblings, 0 replies; 6+ messages in thread
From: Mike Snitzer @ 2019-07-19 12:26 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-block, James E . J . Bottomley,
	Martin K . Petersen, linux-scsi, Ewan D . Milne, Bart Van Assche,
	Hannes Reinecke, Christoph Hellwig, dm-devel, stable

On Thu, Jul 18 2019 at  9:35pm -0400,
Ming Lei <ming.lei@redhat.com> wrote:

> On Thu, Jul 18, 2019 at 10:52:01AM -0400, Mike Snitzer wrote:
> > On Wed, Jul 17 2019 at 11:25pm -0400,
> > Ming Lei <ming.lei@redhat.com> wrote:
> > 
> > > dm-rq needs to free request which has been dispatched and not completed
> > > by underlying queue. However, the underlying queue may have allocated
> > > private stuff for this request in .queue_rq(), so dm-rq will leak the
> > > request private part.
> > 
> > No, SCSI (and blk-mq) will leak.  DM doesn't know anything about the
> > internal memory SCSI uses.  That memory is a SCSI implementation detail.
> 
> It isn't noting to do with dm-rq, which frees one request after BLK_STS_*RESOURCE
> is returned from blk_insert_cloned_request(), in this case it has to be
> the user for releasing the request private data.
> 
> > 
> > Please fix header to properly reflect which layer is doing the leaking.
> 
> Fine.
> 
> > 
> > > Add one new callback of .cleanup_rq() to fix the memory leak issue.
> > > 
> > > Another use case is to free request when the hctx is dead during
> > > cpu hotplug context.
> > > 
> > > Cc: Ewan D. Milne <emilne@redhat.com>
> > > Cc: Bart Van Assche <bvanassche@acm.org>
> > > Cc: Hannes Reinecke <hare@suse.com>
> > > Cc: Christoph Hellwig <hch@lst.de>
> > > Cc: Mike Snitzer <snitzer@redhat.com>
> > > Cc: dm-devel@redhat.com
> > > Cc: <stable@vger.kernel.org>
> > > Fixes: 396eaf21ee17 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback")
> > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > ---
> > >  drivers/md/dm-rq.c     |  1 +
> > >  include/linux/blk-mq.h | 13 +++++++++++++
> > >  2 files changed, 14 insertions(+)
> > > 
> > > diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
> > > index c9e44ac1f9a6..21d5c1784d0c 100644
> > > --- a/drivers/md/dm-rq.c
> > > +++ b/drivers/md/dm-rq.c
> > > @@ -408,6 +408,7 @@ static int map_request(struct dm_rq_target_io *tio)
> > >  		ret = dm_dispatch_clone_request(clone, rq);
> > >  		if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) {
> > >  			blk_rq_unprep_clone(clone);
> > > +			blk_mq_cleanup_rq(clone);
> > >  			tio->ti->type->release_clone_rq(clone, &tio->info);
> > >  			tio->clone = NULL;
> > >  			return DM_MAPIO_REQUEUE;
> > 
> > Requiring upper layer driver (dm-rq) to explicitly call blk_mq_cleanup_rq() 
> > seems wrong.  In this instance tio->ti->type->release_clone_rq()
> > (dm-mpath's multipath_release_clone) calls blk_put_request().  Why can't
> > blk_put_request(), or blk_mq_free_request(), call blk_mq_cleanup_rq()?
> 
> I did think about doing it in blk_put_request(), and I just want to
> avoid the little cost in generic fast path, given freeing request after
> dispatch is very unusual, so far only nvme multipath and dm-rq did in
> that way.
> 
> However, if no one objects to move blk_mq_cleanup_rq() to blk_put_request()
> or blk_mq_free_request(), I am fine to do that in V2.

Think it'd be a less fragile/nuanced way to extend the blk-mq
interface.  Otherwise there is potential for other future drivers
experiencing leaks.

> > Not looked at the cpu hotplug case you mention, but my naive thought is
> > it'd be pretty weird to also sprinkle a call to blk_mq_cleanup_rq() from
> > that specific "dead hctx" code path.
> 
> It isn't weird, and it is exactly what NVMe multipath is doing, please see
> nvme_failover_req(). And it is just that nvme doesn't allocate request
> private data.
> 
> Wrt. blk-mq cpu hotplug handling: after one hctx is dead, we can't dispatch
> request to this hctx any more, however one request has been bounded to its
> hctx since its allocation and the association can't(or quite hard to) be
> changed any more, do you have any better idea to deal with this issue?

No, as I prefaced before "Not looked at the cpu hotplug case you
mention".  As such I should've stayed silent ;)

But my point was we should hook off current interfaces rather than rely
on a new primary function call.

Mike

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-07-19 12:27 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-18  3:25 [PATCH 0/2] block/scsi/dm-rq: fix leak of request private data in dm-mpath Ming Lei
2019-07-18  3:25 ` [PATCH 1/2] blk-mq: add callback of .cleanup_rq Ming Lei
2019-07-18 14:52   ` Mike Snitzer
2019-07-19  1:35     ` Ming Lei
2019-07-19 12:26       ` Mike Snitzer
2019-07-18  3:25 ` [PATCH 2/2] scsi: implement .cleanup_rq callback Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).