From mboxrd@z Thu Jan 1 00:00:00 1970 From: Steven Haigh Subject: Re: null domains after xl destroy Date: Thu, 20 Apr 2017 02:22:41 +1000 Message-ID: References: <78571a7b-62ec-b046-02e3-3d6739b779a6@rimuhosting.com> <95efee87-6925-5376-e347-55e438c90212@suse.com> <70eae378-2392-bd82-670a-5dafff58c259@rimuhosting.com> <3385656.IoOB642KYU@amur> <6e150a33-576b-5cf8-7abc-2cba584602ff@citrix.com> <05cd7b43-153a-8b51-8fd9-e8ae4a8b5287@rimuhosting.com> <06829f8f-def6-4822-c18a-877d8633556c@suse.com> <034c9f96-1bfe-6793-68a7-9b070676971a@suse.com> <20170419071624.6enfeemielfqhqw2@dhcp-3-128.uk.xensource.com> <0b981374-700b-f26a-9504-583bad046f7d@suse.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============5125468980025824706==" Return-path: In-Reply-To: <0b981374-700b-f26a-9504-583bad046f7d@suse.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" To: Juergen Gross , glenn@rimuhosting.com Cc: Andrew Cooper , =?UTF-8?Q?Roger_Pau_Monn=c3=a9?= , Dietmar Hahn , xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --===============5125468980025824706== Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="q7ejBIdaoKt0gomjchGsjuDNXaTT6CHFN" This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --q7ejBIdaoKt0gomjchGsjuDNXaTT6CHFN Content-Type: multipart/mixed; boundary="HkB9RfCBjKQEmqGfx3Fvg4XuhHUXet81O"; protected-headers="v1" From: Steven Haigh To: Juergen Gross , glenn@rimuhosting.com Cc: Andrew Cooper , xen-devel@lists.xen.org, Dietmar Hahn , =?UTF-8?Q?Roger_Pau_Monn=c3=a9?= Message-ID: Subject: Re: null domains after xl destroy References: <78571a7b-62ec-b046-02e3-3d6739b779a6@rimuhosting.com> <95efee87-6925-5376-e347-55e438c90212@suse.com> <70eae378-2392-bd82-670a-5dafff58c259@rimuhosting.com> <3385656.IoOB642KYU@amur> <6e150a33-576b-5cf8-7abc-2cba584602ff@citrix.com> <05cd7b43-153a-8b51-8fd9-e8ae4a8b5287@rimuhosting.com> <06829f8f-def6-4822-c18a-877d8633556c@suse.com> <034c9f96-1bfe-6793-68a7-9b070676971a@suse.com> <20170419071624.6enfeemielfqhqw2@dhcp-3-128.uk.xensource.com> <0b981374-700b-f26a-9504-583bad046f7d@suse.com> In-Reply-To: <0b981374-700b-f26a-9504-583bad046f7d@suse.com> --HkB9RfCBjKQEmqGfx3Fvg4XuhHUXet81O Content-Type: text/plain; charset=windows-1252 Content-Language: en-AU Content-Transfer-Encoding: quoted-printable On 19/04/17 20:09, Juergen Gross wrote: > On 19/04/17 09:16, Roger Pau Monn=E9 wrote: >> On Wed, Apr 19, 2017 at 06:39:41AM +0200, Juergen Gross wrote: >>> On 19/04/17 03:02, Glenn Enright wrote: >>>> Thanks Juergen. I applied that, to our 4.9.23 dom0 kernel, which sti= ll >>>> shows the issue. When replicating the leak I now see this trace (via= >>>> dmesg). Hopefully that is useful. >>>> >>>> Please note, I'm going to be offline next week, but am keen to keep = on >>>> with this, it may just be a while before I followup is all. >>>> >>>> Regards, Glenn >>>> http://rimuhosting.com >>>> >>>> >>>> ------------[ cut here ]------------ >>>> WARNING: CPU: 0 PID: 19 at drivers/block/xen-blkback/xenbus.c:508 >>>> xen_blkbk_remove+0x138/0x140 >>>> Modules linked in: xen_pciback xen_netback xen_gntalloc xen_gntdev >>>> xen_evtchn xenfs xen_privcmd xt_CT ipt_REJECT nf_reject_ipv4 >>>> ebtable_filter ebtables xt_hashlimit xt_recent xt_state iptable_secu= rity >>>> iptable_raw igle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 >>>> nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables bridge stp = llc >>>> ipv6 crc_ccitt ppdev parport_pc parport serio_raw sg i2c_i801 i2c_sm= bus >>>> i2c_core e1000e ptp p000_edac edac_core raid1 sd_mod ahci libahci fl= oppy >>>> dm_mirror dm_region_hash dm_log dm_mod >>>> CPU: 0 PID: 19 Comm: xenwatch Not tainted 4.9.23-1.el6xen.x86_64 #1 >>>> Hardware name: Supermicro PDSML/PDSML+, BIOS 6.00 08/27/2007 >>>> ffffc90040cfbba8 ffffffff8136b61f 0000000000000013 0000000000000000= >>>> 0000000000000000 0000000000000000 ffffc90040cfbbf8 ffffffff8108007d= >>>> ffffea0001373fe0 000001fc33394434 ffff880000000001 ffff88004d93fac0= >>>> Call Trace: >>>> [] dump_stack+0x67/0x98 >>>> [] __warn+0xfd/0x120 >>>> [] warn_slowpath_null+0x1d/0x20 >>>> [] xen_blkbk_remove+0x138/0x140 >>>> [] xenbus_dev_remove+0x47/0xa0 >>>> [] __device_release_driver+0xb4/0x160 >>>> [] device_release_driver+0x2d/0x40 >>>> [] bus_remove_device+0x124/0x190 >>>> [] device_del+0x112/0x210 >>>> [] ? xenbus_read+0x53/0x70 >>>> [] device_unregister+0x22/0x60 >>>> [] frontend_changed+0xad/0x4c0 >>>> [] ? schedule_tail+0x1e/0xc0 >>>> [] xenbus_otherend_changed+0xc7/0x140 >>>> [] ? _raw_spin_unlock_irqrestore+0x16/0x20 >>>> [] ? schedule_tail+0x1e/0xc0 >>>> [] frontend_changed+0x10/0x20 >>>> [] xenwatch_thread+0x9c/0x140 >>>> [] ? woken_wake_function+0x20/0x20 >>>> [] ? schedule+0x3a/0xa0 >>>> [] ? _raw_spin_unlock_irqrestore+0x16/0x20 >>>> [] ? complete+0x4d/0x60 >>>> [] ? split+0xf0/0xf0 >>>> [] kthread+0xcd/0xf0 >>>> [] ? schedule_tail+0x1e/0xc0 >>>> [] ? __kthread_init_worker+0x40/0x40 >>>> [] ? __kthread_init_worker+0x40/0x40 >>>> [] ret_from_fork+0x25/0x30 >>>> ---[ end trace ee097287c9865a62 ]--- >>> >>> Konrad, Roger, >>> >>> this was triggered by a debug patch in xen_blkbk_remove(): >>> >>> if (be->blkif) >>> - xen_blkif_disconnect(be->blkif); >>> + WARN_ON(xen_blkif_disconnect(be->blkif)); >>> >>> So I guess we need something like xen_blk_drain_io() in case of calls= to >>> xen_blkif_disconnect() which are not allowed to fail (either at the c= all >>> sites of xen_blkif_disconnect() or in this function depending on a ne= w >>> boolean parameter indicating it should wait for outstanding I/Os). >>> >>> I can try a patch, but I'd appreciate if you could confirm this would= n't >>> add further problems... >> >> Hello, >> >> Thanks for debugging this, the easiest solution seems to be to replace= the >> ring->inflight atomic_read check in xen_blkif_disconnect with a call t= o >> xen_blk_drain_io instead, and making xen_blkif_disconnect return void = (to >> prevent further issues like this one). >=20 > Glenn, >=20 > can you please try the attached patch (in dom0)? For what its worth, I have applied this in kernel package 4.9.23-2 as follows: * Wed Apr 19 2017 Steven Haigh - 4.9.23-2 - xen/blkback: fix disconnect while I/Os in flight Its available from any 'in sync' mirror: https://xen.crc.id.au/downloads/ Feedback welcome for both mine and Juergen's sake. --=20 Steven Haigh Email: netwiz@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 --HkB9RfCBjKQEmqGfx3Fvg4XuhHUXet81O-- --q7ejBIdaoKt0gomjchGsjuDNXaTT6CHFN Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJY947XAAoJEEGvNdV6fTHcFH4QAL9o8CTdvuSCIF4g5y6mWJ+e IGQJkX+lbK7cSk+/OMcAA+9vDDsZEWLcCu1Co9ehBniKrSWr6Roz3x8bGBHBWYae xPi8NmoCM23sqCkcR08ycqOvvjZbRVTSxRgT6tnCKmgOUxzLKblxjALTm6RRAZNp 9XNwfd2JTBMVQYZ0A1rXIiLJX+fcV2aTmgOj5S+TTF26oOT0zsdNuobcMzxw0vg6 cu2xzIuNlHfUYfJRxOyqC0Twh2n3I4I7cpItc3tsdDsKX1bV4E0ePtry0RwATdW+ q946FXmU/V6q3VyOKCitYXgKNllyFUVIP0VRQpfpnZNTDmCPV0u6Af7IHQFfNy8L ZdTGlzs3JM2HvUYu/ejjIK97G/C2ViqlHAd0Nk2BiOqubocDEy5nb8k5uHurtMAX dJXPo0gjDRWGU5ifCRrMIE90u5f6lvEalb61IcXK50SCSz6qzzdMOxevTaBs7Wfh WYiXeCsF3AbkAVWwG3trgTSuEK9gZ6CyoJSWhn+Nl9bY1Fl6y2H12Udhyhch4V1w hmXkNY3b5LUI+9SzqAsiYK0Zh4kZCQoMFGG1x2Pr50lGv5W+U0l7E92CR2R1pAko fpbeHda0djVGlXdk+E7iH8FNi0smYE0KP4u4E15P6u0Ggz7bcX5B7+fmgefrK7PR 50XhZ2YUL9T7gVZyPJW5 =wdty -----END PGP SIGNATURE----- --q7ejBIdaoKt0gomjchGsjuDNXaTT6CHFN-- --===============5125468980025824706== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KWGVuLWRldmVs IG1haWxpbmcgbGlzdApYZW4tZGV2ZWxAbGlzdHMueGVuLm9yZwpodHRwczovL2xpc3RzLnhlbi5v cmcveGVuLWRldmVsCg== --===============5125468980025824706==--