linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] xen/blkback: several fixes of resource management
@ 2017-05-16  6:23 Juergen Gross
  2017-05-16  6:23 ` [PATCH 1/3] xen/blkback: fix disconnect while I/Os in flight Juergen Gross
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Juergen Gross @ 2017-05-16  6:23 UTC (permalink / raw)
  To: linux-kernel, xen-devel; +Cc: konrad.wilk, roger.pau, netwiz, Juergen Gross

Destroying a Xen guest domain while it was doing I/Os via xen-blkback
leaked several resources, including references of the guest's memory
pages.

This patch series addresses those leaks by correcting usage of
reference counts and the sequence when to free which resource.

The series applies on top of commit 2d4456c73a487abe ("block:
xen-blkback: add null check to avoid null pointer dereference") in
Jens Axboe's tree kernel/git/axboe/linux-block.git

Juergen Gross (3):
  xen/blkback: fix disconnect while I/Os in flight
  xen/blkback: don't free be structure too early
  xen/blkback: don't use xen_blkif_get() in xen-blkback kthread

 drivers/block/xen-blkback/blkback.c |  3 ---
 drivers/block/xen-blkback/common.h  |  1 +
 drivers/block/xen-blkback/xenbus.c  | 15 ++++++++-------
 3 files changed, 9 insertions(+), 10 deletions(-)

-- 
2.12.0

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/3] xen/blkback: fix disconnect while I/Os in flight
  2017-05-16  6:23 [PATCH 0/3] xen/blkback: several fixes of resource management Juergen Gross
@ 2017-05-16  6:23 ` Juergen Gross
  2017-05-16  9:28   ` [Xen-devel] " Dietmar Hahn
  2017-05-16  6:23 ` [PATCH 2/3] xen/blkback: don't free be structure too early Juergen Gross
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 8+ messages in thread
From: Juergen Gross @ 2017-05-16  6:23 UTC (permalink / raw)
  To: linux-kernel, xen-devel
  Cc: konrad.wilk, roger.pau, netwiz, Juergen Gross, stable

Today disconnecting xen-blkback is broken in case there are still
I/Os in flight: xen_blkif_disconnect() will bail out early without
releasing all resources in the hope it will be called again when
the last request has terminated. This, however, won't happen as
xen_blkif_free() won't be called on termination of the last running
request: xen_blkif_put() won't decrement the blkif refcnt to 0 as
xen_blkif_disconnect() didn't finish before thus some xen_blkif_put()
calls in xen_blkif_disconnect() didn't happen.

To solve this deadlock xen_blkif_disconnect() and
xen_blkif_alloc_rings() shouldn't use xen_blkif_put() and
xen_blkif_get() but use some other way to do their accounting of
resources.

This at once fixes another error in xen_blkif_disconnect(): when it
returned early with -EBUSY for another ring than 0 it would call
xen_blkif_put() again for already handled rings on a subsequent call.
This will lead to inconsistencies in the refcnt handling.

Cc: stable@vger.kernel.org
Reported-by: Glenn Enright <glenn@rimuhosting.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Steven Haigh <netwiz@crc.id.au>
---
 drivers/block/xen-blkback/common.h | 1 +
 drivers/block/xen-blkback/xenbus.c | 7 +++++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index dea61f6ab8cb..953f38802333 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -281,6 +281,7 @@ struct xen_blkif_ring {
 
 	wait_queue_head_t	wq;
 	atomic_t		inflight;
+	int			active;
 	/* One thread per blkif ring. */
 	struct task_struct	*xenblkd;
 	unsigned int		waiting_reqs;
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index 1f3dfaa54d87..e68df9de8858 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -159,7 +159,7 @@ static int xen_blkif_alloc_rings(struct xen_blkif *blkif)
 		init_waitqueue_head(&ring->shutdown_wq);
 		ring->blkif = blkif;
 		ring->st_print = jiffies;
-		xen_blkif_get(blkif);
+		ring->active = 1;
 	}
 
 	return 0;
@@ -249,6 +249,9 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
 		struct xen_blkif_ring *ring = &blkif->rings[r];
 		unsigned int i = 0;
 
+		if (!ring->active)
+			continue;
+
 		if (ring->xenblkd) {
 			kthread_stop(ring->xenblkd);
 			wake_up(&ring->shutdown_wq);
@@ -296,7 +299,7 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
 		BUG_ON(ring->free_pages_num != 0);
 		BUG_ON(ring->persistent_gnt_c != 0);
 		WARN_ON(i != (XEN_BLKIF_REQS_PER_PAGE * blkif->nr_ring_pages));
-		xen_blkif_put(blkif);
+		ring->active = 0;
 	}
 	blkif->nr_ring_pages = 0;
 	/*
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/3] xen/blkback: don't free be structure too early
  2017-05-16  6:23 [PATCH 0/3] xen/blkback: several fixes of resource management Juergen Gross
  2017-05-16  6:23 ` [PATCH 1/3] xen/blkback: fix disconnect while I/Os in flight Juergen Gross
@ 2017-05-16  6:23 ` Juergen Gross
  2017-05-16  6:23 ` [PATCH 3/3] xen/blkback: don't use xen_blkif_get() in xen-blkback kthread Juergen Gross
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Juergen Gross @ 2017-05-16  6:23 UTC (permalink / raw)
  To: linux-kernel, xen-devel
  Cc: konrad.wilk, roger.pau, netwiz, Juergen Gross, stable

The be structure must nor be freed when freeing the blkif structure
isn't done. Otherwise a use-after-free of be when unmapping the ring
used for communicating with the frontend will occur in case of a
late call of xenblk_disconnect() (e.g. due to an I/O still active
when trying to disconnect).

Cc: stable@vger.kernel.org
Reported-by: Glenn Enright <glenn@rimuhosting.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 drivers/block/xen-blkback/xenbus.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index e68df9de8858..4d2f57fa35da 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -315,9 +315,10 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
 
 static void xen_blkif_free(struct xen_blkif *blkif)
 {
-
-	xen_blkif_disconnect(blkif);
+	WARN_ON(xen_blkif_disconnect(blkif));
 	xen_vbd_free(&blkif->vbd);
+	kfree(blkif->be->mode);
+	kfree(blkif->be);
 
 	/* Make sure everything is drained before shutting down */
 	kmem_cache_free(xen_blkif_cachep, blkif);
@@ -514,8 +515,6 @@ static int xen_blkbk_remove(struct xenbus_device *dev)
 		xen_blkif_put(be->blkif);
 	}
 
-	kfree(be->mode);
-	kfree(be);
 	return 0;
 }
 
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/3] xen/blkback: don't use xen_blkif_get() in xen-blkback kthread
  2017-05-16  6:23 [PATCH 0/3] xen/blkback: several fixes of resource management Juergen Gross
  2017-05-16  6:23 ` [PATCH 1/3] xen/blkback: fix disconnect while I/Os in flight Juergen Gross
  2017-05-16  6:23 ` [PATCH 2/3] xen/blkback: don't free be structure too early Juergen Gross
@ 2017-05-16  6:23 ` Juergen Gross
  2017-05-18  1:55 ` [PATCH 0/3] xen/blkback: several fixes of resource management Steven Haigh
  2017-05-18 14:38 ` Roger Pau Monné
  4 siblings, 0 replies; 8+ messages in thread
From: Juergen Gross @ 2017-05-16  6:23 UTC (permalink / raw)
  To: linux-kernel, xen-devel
  Cc: konrad.wilk, roger.pau, netwiz, Juergen Gross, stable

There is no need to use xen_blkif_get()/xen_blkif_put() in the kthread
of xen-blkback. Thread stopping is synchronous and using the blkif
reference counting in the kthread will avoid to ever let the reference
count drop to zero at the end of an I/O running concurrent to
disconnecting and multiple rings.

Setting ring->xenblkd to NULL after stopping the kthread isn't needed
as the kthread does this already.

Cc: stable@vger.kernel.org
Reported-by: Glenn Enright <glenn@rimuhosting.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 drivers/block/xen-blkback/blkback.c | 3 ---
 drivers/block/xen-blkback/xenbus.c  | 1 -
 2 files changed, 4 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c
index 726c32e35db9..6b14c509f3c7 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -609,8 +609,6 @@ int xen_blkif_schedule(void *arg)
 	unsigned long timeout;
 	int ret;
 
-	xen_blkif_get(blkif);
-
 	set_freezable();
 	while (!kthread_should_stop()) {
 		if (try_to_freeze())
@@ -665,7 +663,6 @@ int xen_blkif_schedule(void *arg)
 		print_stats(ring);
 
 	ring->xenblkd = NULL;
-	xen_blkif_put(blkif);
 
 	return 0;
 }
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index 4d2f57fa35da..1dc0ff5ed912 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -255,7 +255,6 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
 		if (ring->xenblkd) {
 			kthread_stop(ring->xenblkd);
 			wake_up(&ring->shutdown_wq);
-			ring->xenblkd = NULL;
 		}
 
 		/* The above kthread_stop() guarantees that at this point we
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [Xen-devel] [PATCH 1/3] xen/blkback: fix disconnect while I/Os in flight
  2017-05-16  6:23 ` [PATCH 1/3] xen/blkback: fix disconnect while I/Os in flight Juergen Gross
@ 2017-05-16  9:28   ` Dietmar Hahn
  0 siblings, 0 replies; 8+ messages in thread
From: Dietmar Hahn @ 2017-05-16  9:28 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, linux-kernel, xen-devel, roger.pau, netwiz, stable

Am Dienstag, 16. Mai 2017, 08:23:18 schrieb Juergen Gross:
> Today disconnecting xen-blkback is broken in case there are still
> I/Os in flight: xen_blkif_disconnect() will bail out early without
> releasing all resources in the hope it will be called again when
> the last request has terminated. This, however, won't happen as
> xen_blkif_free() won't be called on termination of the last running
> request: xen_blkif_put() won't decrement the blkif refcnt to 0 as
> xen_blkif_disconnect() didn't finish before thus some xen_blkif_put()
> calls in xen_blkif_disconnect() didn't happen.
> 
> To solve this deadlock xen_blkif_disconnect() and
> xen_blkif_alloc_rings() shouldn't use xen_blkif_put() and
> xen_blkif_get() but use some other way to do their accounting of
> resources.
> 
> This at once fixes another error in xen_blkif_disconnect(): when it
> returned early with -EBUSY for another ring than 0 it would call
> xen_blkif_put() again for already handled rings on a subsequent call.
> This will lead to inconsistencies in the refcnt handling.
> 
> Cc: stable@vger.kernel.org
> Reported-by: Glenn Enright <glenn@rimuhosting.com>
> Signed-off-by: Juergen Gross <jgross@suse.com>
> Tested-by: Steven Haigh <netwiz@crc.id.au>
> ---
>  drivers/block/xen-blkback/common.h | 1 +
>  drivers/block/xen-blkback/xenbus.c | 7 +++++--
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
> index dea61f6ab8cb..953f38802333 100644
> --- a/drivers/block/xen-blkback/common.h
> +++ b/drivers/block/xen-blkback/common.h
> @@ -281,6 +281,7 @@ struct xen_blkif_ring {
>  
>  	wait_queue_head_t	wq;
>  	atomic_t		inflight;
> +	int			active;

Maybe better using bool?

Dietmar.

>  	/* One thread per blkif ring. */
>  	struct task_struct	*xenblkd;
>  	unsigned int		waiting_reqs;
> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
> index 1f3dfaa54d87..e68df9de8858 100644
> --- a/drivers/block/xen-blkback/xenbus.c
> +++ b/drivers/block/xen-blkback/xenbus.c
> @@ -159,7 +159,7 @@ static int xen_blkif_alloc_rings(struct xen_blkif *blkif)
>  		init_waitqueue_head(&ring->shutdown_wq);
>  		ring->blkif = blkif;
>  		ring->st_print = jiffies;
> -		xen_blkif_get(blkif);
> +		ring->active = 1;
>  	}
>  
>  	return 0;
> @@ -249,6 +249,9 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
>  		struct xen_blkif_ring *ring = &blkif->rings[r];
>  		unsigned int i = 0;
>  
> +		if (!ring->active)
> +			continue;
> +
>  		if (ring->xenblkd) {
>  			kthread_stop(ring->xenblkd);
>  			wake_up(&ring->shutdown_wq);
> @@ -296,7 +299,7 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
>  		BUG_ON(ring->free_pages_num != 0);
>  		BUG_ON(ring->persistent_gnt_c != 0);
>  		WARN_ON(i != (XEN_BLKIF_REQS_PER_PAGE * blkif->nr_ring_pages));
> -		xen_blkif_put(blkif);
> +		ring->active = 0;
>  	}
>  	blkif->nr_ring_pages = 0;
>  	/*
> 

-- 
Company details: http://ts.fujitsu.com/imprint.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/3] xen/blkback: several fixes of resource management
  2017-05-16  6:23 [PATCH 0/3] xen/blkback: several fixes of resource management Juergen Gross
                   ` (2 preceding siblings ...)
  2017-05-16  6:23 ` [PATCH 3/3] xen/blkback: don't use xen_blkif_get() in xen-blkback kthread Juergen Gross
@ 2017-05-18  1:55 ` Steven Haigh
  2017-05-18 14:38 ` Roger Pau Monné
  4 siblings, 0 replies; 8+ messages in thread
From: Steven Haigh @ 2017-05-18  1:55 UTC (permalink / raw)
  To: Juergen Gross; +Cc: linux-kernel, xen-devel, roger.pau

On 2017-05-16 16:23, Juergen Gross wrote:
> Destroying a Xen guest domain while it was doing I/Os via xen-blkback
> leaked several resources, including references of the guest's memory
> pages.
> 
> This patch series addresses those leaks by correcting usage of
> reference counts and the sequence when to free which resource.
> 
> The series applies on top of commit 2d4456c73a487abe ("block:
> xen-blkback: add null check to avoid null pointer dereference") in
> Jens Axboe's tree kernel/git/axboe/linux-block.git
> 
> Juergen Gross (3):
>   xen/blkback: fix disconnect while I/Os in flight
>   xen/blkback: don't free be structure too early
>   xen/blkback: don't use xen_blkif_get() in xen-blkback kthread
> 
>  drivers/block/xen-blkback/blkback.c |  3 ---
>  drivers/block/xen-blkback/common.h  |  1 +
>  drivers/block/xen-blkback/xenbus.c  | 15 ++++++++-------
>  3 files changed, 9 insertions(+), 10 deletions(-)

Tested-by: Steven Haigh <netwiz@crc.id.au>

I've had a report that a new message is logged on destroy sometimes:
vif vif-1-0 vif1.0: Guest Rx stalled

This may be a different issue - however the main fix of this patch set 
is fully functional.

-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/3] xen/blkback: several fixes of resource management
  2017-05-16  6:23 [PATCH 0/3] xen/blkback: several fixes of resource management Juergen Gross
                   ` (3 preceding siblings ...)
  2017-05-18  1:55 ` [PATCH 0/3] xen/blkback: several fixes of resource management Steven Haigh
@ 2017-05-18 14:38 ` Roger Pau Monné
  2017-05-18 15:04   ` Juergen Gross
  4 siblings, 1 reply; 8+ messages in thread
From: Roger Pau Monné @ 2017-05-18 14:38 UTC (permalink / raw)
  To: Juergen Gross; +Cc: linux-kernel, xen-devel, konrad.wilk, netwiz

On Tue, May 16, 2017 at 08:23:17AM +0200, Juergen Gross wrote:
> Destroying a Xen guest domain while it was doing I/Os via xen-blkback
> leaked several resources, including references of the guest's memory
> pages.
> 
> This patch series addresses those leaks by correcting usage of
> reference counts and the sequence when to free which resource.
> 
> The series applies on top of commit 2d4456c73a487abe ("block:
> xen-blkback: add null check to avoid null pointer dereference") in
> Jens Axboe's tree kernel/git/axboe/linux-block.git
> 
> Juergen Gross (3):
>   xen/blkback: fix disconnect while I/Os in flight
>   xen/blkback: don't free be structure too early
>   xen/blkback: don't use xen_blkif_get() in xen-blkback kthread

All 3:

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

The comment reported by Dietmar in patch #1 would be nice to fix.

Roger.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/3] xen/blkback: several fixes of resource management
  2017-05-18 14:38 ` Roger Pau Monné
@ 2017-05-18 15:04   ` Juergen Gross
  0 siblings, 0 replies; 8+ messages in thread
From: Juergen Gross @ 2017-05-18 15:04 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: linux-kernel, xen-devel, konrad.wilk, netwiz

On 18/05/17 16:38, Roger Pau Monné wrote:
> On Tue, May 16, 2017 at 08:23:17AM +0200, Juergen Gross wrote:
>> Destroying a Xen guest domain while it was doing I/Os via xen-blkback
>> leaked several resources, including references of the guest's memory
>> pages.
>>
>> This patch series addresses those leaks by correcting usage of
>> reference counts and the sequence when to free which resource.
>>
>> The series applies on top of commit 2d4456c73a487abe ("block:
>> xen-blkback: add null check to avoid null pointer dereference") in
>> Jens Axboe's tree kernel/git/axboe/linux-block.git
>>
>> Juergen Gross (3):
>>   xen/blkback: fix disconnect while I/Os in flight
>>   xen/blkback: don't free be structure too early
>>   xen/blkback: don't use xen_blkif_get() in xen-blkback kthread
> 
> All 3:
> 
> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> The comment reported by Dietmar in patch #1 would be nice to fix.

Okay, I'll send V2 soon.


Juergen

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-05-18 15:04 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-16  6:23 [PATCH 0/3] xen/blkback: several fixes of resource management Juergen Gross
2017-05-16  6:23 ` [PATCH 1/3] xen/blkback: fix disconnect while I/Os in flight Juergen Gross
2017-05-16  9:28   ` [Xen-devel] " Dietmar Hahn
2017-05-16  6:23 ` [PATCH 2/3] xen/blkback: don't free be structure too early Juergen Gross
2017-05-16  6:23 ` [PATCH 3/3] xen/blkback: don't use xen_blkif_get() in xen-blkback kthread Juergen Gross
2017-05-18  1:55 ` [PATCH 0/3] xen/blkback: several fixes of resource management Steven Haigh
2017-05-18 14:38 ` Roger Pau Monné
2017-05-18 15:04   ` Juergen Gross

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).