Re: null domains after xl destroy

From: Juergen Gross <jgross@suse.com>
To: glenn@rimuhosting.com
Cc: "Andrew Cooper" <andrew.cooper3@citrix.com>,
	xen-devel@lists.xen.org,
	"Dietmar Hahn" <dietmar.hahn@ts.fujitsu.com>,
	"Roger Pau Monné" <roger.pau@citrix.com>
Subject: Re: null domains after xl destroy
Date: Wed, 19 Apr 2017 12:09:18 +0200	[thread overview]
Message-ID: <0b981374-700b-f26a-9504-583bad046f7d@suse.com> (raw)
In-Reply-To: <20170419071624.6enfeemielfqhqw2@dhcp-3-128.uk.xensource.com>

[-- Attachment #1: Type: text/plain, Size: 4099 bytes --]

On 19/04/17 09:16, Roger Pau Monné wrote:
> On Wed, Apr 19, 2017 at 06:39:41AM +0200, Juergen Gross wrote:
>> On 19/04/17 03:02, Glenn Enright wrote:
>>> Thanks Juergen. I applied that, to our 4.9.23 dom0 kernel, which still
>>> shows the issue. When replicating the leak I now see this trace (via
>>> dmesg). Hopefully that is useful.
>>>
>>> Please note, I'm going to be offline next week, but am keen to keep on
>>> with this, it may just be a while before I followup is all.
>>>
>>> Regards, Glenn
>>> http://rimuhosting.com
>>>
>>>
>>> ------------[ cut here ]------------
>>> WARNING: CPU: 0 PID: 19 at drivers/block/xen-blkback/xenbus.c:508
>>> xen_blkbk_remove+0x138/0x140
>>> Modules linked in: xen_pciback xen_netback xen_gntalloc xen_gntdev
>>> xen_evtchn xenfs xen_privcmd xt_CT ipt_REJECT nf_reject_ipv4
>>> ebtable_filter ebtables xt_hashlimit xt_recent xt_state iptable_security
>>> iptable_raw igle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
>>> nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables bridge stp llc
>>> ipv6 crc_ccitt ppdev parport_pc parport serio_raw sg i2c_i801 i2c_smbus
>>> i2c_core e1000e ptp p000_edac edac_core raid1 sd_mod ahci libahci floppy
>>> dm_mirror dm_region_hash dm_log dm_mod
>>> CPU: 0 PID: 19 Comm: xenwatch Not tainted 4.9.23-1.el6xen.x86_64 #1
>>> Hardware name: Supermicro PDSML/PDSML+, BIOS 6.00 08/27/2007
>>>  ffffc90040cfbba8 ffffffff8136b61f 0000000000000013 0000000000000000
>>>  0000000000000000 0000000000000000 ffffc90040cfbbf8 ffffffff8108007d
>>>  ffffea0001373fe0 000001fc33394434 ffff880000000001 ffff88004d93fac0
>>> Call Trace:
>>>  [<ffffffff8136b61f>] dump_stack+0x67/0x98
>>>  [<ffffffff8108007d>] __warn+0xfd/0x120
>>>  [<ffffffff810800bd>] warn_slowpath_null+0x1d/0x20
>>>  [<ffffffff814ebde8>] xen_blkbk_remove+0x138/0x140
>>>  [<ffffffff814497f7>] xenbus_dev_remove+0x47/0xa0
>>>  [<ffffffff814bcfd4>] __device_release_driver+0xb4/0x160
>>>  [<ffffffff814bd0ad>] device_release_driver+0x2d/0x40
>>>  [<ffffffff814bbfd4>] bus_remove_device+0x124/0x190
>>>  [<ffffffff814b93a2>] device_del+0x112/0x210
>>>  [<ffffffff81448113>] ? xenbus_read+0x53/0x70
>>>  [<ffffffff814b94c2>] device_unregister+0x22/0x60
>>>  [<ffffffff814ed7cd>] frontend_changed+0xad/0x4c0
>>>  [<ffffffff810a974e>] ? schedule_tail+0x1e/0xc0
>>>  [<ffffffff81449b57>] xenbus_otherend_changed+0xc7/0x140
>>>  [<ffffffff816f1436>] ? _raw_spin_unlock_irqrestore+0x16/0x20
>>>  [<ffffffff810a974e>] ? schedule_tail+0x1e/0xc0
>>>  [<ffffffff81449fe0>] frontend_changed+0x10/0x20
>>>  [<ffffffff814477fc>] xenwatch_thread+0x9c/0x140
>>>  [<ffffffff810bffa0>] ? woken_wake_function+0x20/0x20
>>>  [<ffffffff816ed93a>] ? schedule+0x3a/0xa0
>>>  [<ffffffff816f1436>] ? _raw_spin_unlock_irqrestore+0x16/0x20
>>>  [<ffffffff810c0c5d>] ? complete+0x4d/0x60
>>>  [<ffffffff81447760>] ? split+0xf0/0xf0
>>>  [<ffffffff810a051d>] kthread+0xcd/0xf0
>>>  [<ffffffff810a974e>] ? schedule_tail+0x1e/0xc0
>>>  [<ffffffff810a0450>] ? __kthread_init_worker+0x40/0x40
>>>  [<ffffffff810a0450>] ? __kthread_init_worker+0x40/0x40
>>>  [<ffffffff816f1b45>] ret_from_fork+0x25/0x30
>>> ---[ end trace ee097287c9865a62 ]---
>>
>> Konrad, Roger,
>>
>> this was triggered by a debug patch in xen_blkbk_remove():
>>
>> 	if (be->blkif)
>> -		xen_blkif_disconnect(be->blkif);
>> +		WARN_ON(xen_blkif_disconnect(be->blkif));
>>
>> So I guess we need something like xen_blk_drain_io() in case of calls to
>> xen_blkif_disconnect() which are not allowed to fail (either at the call
>> sites of xen_blkif_disconnect() or in this function depending on a new
>> boolean parameter indicating it should wait for outstanding I/Os).
>>
>> I can try a patch, but I'd appreciate if you could confirm this wouldn't
>> add further problems...
> 
> Hello,
> 
> Thanks for debugging this, the easiest solution seems to be to replace the
> ring->inflight atomic_read check in xen_blkif_disconnect with a call to
> xen_blk_drain_io instead, and making xen_blkif_disconnect return void (to
> prevent further issues like this one).

Glenn,

can you please try the attached patch (in dom0)?


Juergen


[-- Attachment #2: blkback.patch --]
[-- Type: text/x-patch, Size: 2698 bytes --]

commit fd4252549f5f3e4de6d887d9ae4c4d7f35d7de52
Author: Juergen Gross <jgross@suse.com>
Date:   Wed Apr 19 11:19:51 2017 +0200

    xen/blkback: fix disconnect while I/Os in flight
    
    Today disconnecting xen-blkback is broken in case there are still
    I/Os in flight: xen_blkif_disconnect() will bail out early without
    releasing all resources in the hope it will be called again when
    the last request has terminated. This, however, won't happen as
    xen_blkif_free() won't be called on termination of the last running
    request: xen_blkif_put() won't decrement the blkif refcnt to 0 as
    xen_blkif_disconnect() didn't finish before thus some xen_blkif_put()
    calls in xen_blkif_disconnect() didn't happen.
    
    To solve this deadlock xen_blkif_disconnect() and
    xen_blkif_alloc_rings() shouldn't use xen_blkif_put() and
    xen_blkif_get() but use some other way to do their accounting of
    resources.
    
    This at once fixes another error in xen_blkif_disconnect(): when it
    returned early with -EBUSY for another ring than 0 it would call
    xen_blkif_put() again for already handled rings on a subsequent call.
    This will lead to inconsistencies in the refcnt handling.
    
    Cc: stable@vger.kernel.org

diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h
index dea61f6ab8cb..953f38802333 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -281,6 +281,7 @@ struct xen_blkif_ring {
 
 	wait_queue_head_t	wq;
 	atomic_t		inflight;
+	int			active;
 	/* One thread per blkif ring. */
 	struct task_struct	*xenblkd;
 	unsigned int		waiting_reqs;
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index 8fe61b5dc5a6..411d2ded2456 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -159,7 +159,7 @@ static int xen_blkif_alloc_rings(struct xen_blkif *blkif)
 		init_waitqueue_head(&ring->shutdown_wq);
 		ring->blkif = blkif;
 		ring->st_print = jiffies;
-		xen_blkif_get(blkif);
+		ring->active = 1;
 	}
 
 	return 0;
@@ -249,6 +249,9 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
 		struct xen_blkif_ring *ring = &blkif->rings[r];
 		unsigned int i = 0;
 
+		if (!ring->active)
+			continue;
+
 		if (ring->xenblkd) {
 			kthread_stop(ring->xenblkd);
 			wake_up(&ring->shutdown_wq);
@@ -296,7 +299,7 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
 		BUG_ON(ring->free_pages_num != 0);
 		BUG_ON(ring->persistent_gnt_c != 0);
 		WARN_ON(i != (XEN_BLKIF_REQS_PER_PAGE * blkif->nr_ring_pages));
-		xen_blkif_put(blkif);
+		ring->active = 0;
 	}
 	blkif->nr_ring_pages = 0;
 	/*

[-- Attachment #3: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel