On 12/04/17 00:45, Glenn Enright wrote: > On 12/04/17 10:23, Andrew Cooper wrote: >> On 11/04/2017 23:13, Glenn Enright wrote: >>> On 11/04/17 21:49, Dietmar Hahn wrote: >>>> Am Dienstag, 11. April 2017, 20:03:14 schrieb Glenn Enright: >>>>> On 11/04/17 17:59, Juergen Gross wrote: >>>>>> On 11/04/17 07:25, Glenn Enright wrote: >>>>>>> Hi all >>>>>>> >>>>>>> We are seeing an odd issue with domu domains from xl destroy, under >>>>>>> recent 4.9 kernels a (null) domain is left behind. >>>>>> >>>>>> I guess this is the dom0 kernel version? >>>>>> >>>>>>> This has occurred on a variety of hardware, with no obvious >>>>>>> commonality. >>>>>>> >>>>>>> 4.4.55 does not show this behavior. >>>>>>> >>>>>>> On my test machine I have the following packages installed under >>>>>>> centos6, from https://xen.crc.id.au/ >>>>>>> >>>>>>> ~]# rpm -qa | grep xen >>>>>>> xen47-licenses-4.7.2-4.el6.x86_64 >>>>>>> xen47-4.7.2-4.el6.x86_64 >>>>>>> kernel-xen-4.9.21-1.el6xen.x86_64 >>>>>>> xen47-ocaml-4.7.2-4.el6.x86_64 >>>>>>> xen47-libs-4.7.2-4.el6.x86_64 >>>>>>> xen47-libcacard-4.7.2-4.el6.x86_64 >>>>>>> xen47-hypervisor-4.7.2-4.el6.x86_64 >>>>>>> xen47-runtime-4.7.2-4.el6.x86_64 >>>>>>> kernel-xen-firmware-4.9.21-1.el6xen.x86_64 >>>>>>> >>>>>>> I've also replicated the issue with 4.9.17 and 4.9.20 >>>>>>> >>>>>>> To replicate, on a cleanly booted dom0 with one pv VM, I run the >>>>>>> following on the VM >>>>>>> >>>>>>> { >>>>>>> while true; do >>>>>>> dd bs=1M count=512 if=/dev/zero of=test conv=fdatasync >>>>>>> done >>>>>>> } >>>>>>> >>>>>>> Then on the dom0 I do this sequence to reliably get a null domain. >>>>>>> This >>>>>>> occurs with oxenstored and xenstored both. >>>>>>> >>>>>>> { >>>>>>> xl sync 1 >>>>>>> xl destroy 1 >>>>>>> } >>>>>>> >>>>>>> xl list then renders something like ... >>>>>>> >>>>>>> (null) 1 4 4 >>>>>>> --p--d >>>>>>> 9.8 0 >>>>>> >>>>>> Something is referencing the domain, e.g. some of its memory pages >>>>>> are >>>>>> still mapped by dom0. >>>> >>>> You can try >>>> # xl debug-keys q >>>> and further >>>> # xl dmesg >>>> to see the output of the previous command. The 'q' dumps domain >>>> (and guest debug) info. >>>> # xl debug-keys h >>>> prints all possible parameters for more informations. >>>> >>>> Dietmar. >>>> >>> >>> I've done this as requested, below is the output. >>> >>> >>> (XEN) Memory pages belonging to domain 1: >>> (XEN) DomPage 0000000000071c00: caf=00000001, taf=7400000000000001 >>> (XEN) DomPage 0000000000071c01: caf=00000001, taf=7400000000000001 >>> (XEN) DomPage 0000000000071c02: caf=00000001, taf=7400000000000001 >>> (XEN) DomPage 0000000000071c03: caf=00000001, taf=7400000000000001 >>> (XEN) DomPage 0000000000071c04: caf=00000001, taf=7400000000000001 >>> (XEN) DomPage 0000000000071c05: caf=00000001, taf=7400000000000001 >>> (XEN) DomPage 0000000000071c06: caf=00000001, taf=7400000000000001 >>> (XEN) DomPage 0000000000071c07: caf=00000001, taf=7400000000000001 >>> (XEN) DomPage 0000000000071c08: caf=00000001, taf=7400000000000001 >>> (XEN) DomPage 0000000000071c09: caf=00000001, taf=7400000000000001 >>> (XEN) DomPage 0000000000071c0a: caf=00000001, taf=7400000000000001 >>> (XEN) DomPage 0000000000071c0b: caf=00000001, taf=7400000000000001 >>> (XEN) DomPage 0000000000071c0c: caf=00000001, taf=7400000000000001 >>> (XEN) DomPage 0000000000071c0d: caf=00000001, taf=7400000000000001 >>> (XEN) DomPage 0000000000071c0e: caf=00000001, taf=7400000000000001 >>> (XEN) DomPage 0000000000071c0f: caf=00000001, taf=7400000000000001 >> >> There are 16 pages still referenced from somewhere. Just a wild guess: could you please try the attached kernel patch? This might give us some more diagnostic data... Juergen